Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001182 [1003.1(2016)/Issue7+TC2] System Interfaces Objection Enhancement Request 2018-01-30 00:01 2018-02-21 02:23
Reporter shware_systems View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Mark Ziegast
Organization SHware Systems Dev.
User Reference
Section btowc()
Page Number C165 632
Line Number 21871-2
Interp Status ---
Final Accepted Text
Summary 0001182: CX behavior wasn't changed appropriately with TC2
Description The text currently is:
In the POSIX locale, btowc( ) shall not return WEOF if c has a value in the range 0 to 255 inclusive.

However, with the POSIX locale considering the values over 127 unassigned using the ISO-646-IRV encoding, by TC2 emphasizing all 8 bits are significant, and some of the currently optional control characters (especially <ESC>) of the portable character set being usable as single or locking shift chars, WEOF is appropriate for more values than not, even now. The range 0 to 127, not 255, is valid only for ASCII-68 for that restriction, as it did not require <ESC>, <SI>, and <SO> have their current definitions.

Notes on the Desired Action:
The phrasing is meant to leave open support for optional locales making use of existing DBCS simple shift encodings. This is more for c2x to address than POSIX, however, so I don't put it as a Future Direction. Disallowing all shifts is more in keeping with the current definition of wchar_t.

Because the current ISO-646, EBCDIC, and 8-bit Unicode encodings require them for minimal conformance, my XBD6 draft defines and requires support for extended shifts, rather than leave them implicitly allowed as it has been, so they're part of the text.

The NATIVE locale mentioned, which is already required to be provided by implementations as the "" argument to setlocale() and is expected to differ from the POSIX locale, I hope gets a more formal definition in Issue 8 as to what implementations can do to be cross-platform portable for both POSIX and historical NLS requirements; such as the standard utilities correctly handling '$' possibly having multiple encodings. As no current implementation has bothered to attempt this it amounts to invention, unfortunately. The "other defined locales" part includes POSIX on UTF8.

The error case is added to distinguish where mbtowc() may succeed (not set) from those where it won't (is set to EILSEQ). I consider this obvious, but an Application Usage note may be desirable.
Desired Action Replace the section with:
In the POSIX, NATIVE, and some other standard defined locales, btowc( ) shall not return WEOF if c is the single byte encoding for a member the portable character set of charclass <graphic> and the members of charclass <control> that are not single shift introducers, simple or extended, or single byte locking shift introducers.

In ERRORS, replace:
No errors are defined.

with:
The btowc( ) function may fail if:
[EILSEQ] The c argument is an unassigned single or lead encoding unit of the current locale's codeset.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0003914)
geoffclare (manager)
2018-01-30 09:31

The statement "In the POSIX locale, btowc() shall not return WEOF if c has a value in the range 0 to 255 inclusive" is precisely what was intended by the changes made in TC2. If there is any inconsistency between this statement and other parts of the standard post-TC2, then it is those other parts that need to change.

See in particular the interpretation rationale in 0000663 "The intention was always that the POSIX locale should have an 8-bit-clean single-byte encoding. The omission of an explicit statement to that effect was an oversight."
(0003925)
shware_systems (reporter)
2018-02-21 02:23

Re: 3914
That makes things look cleaner for btowc(), but we established back in 2009 the POSIX locale is required to function dirty as a multi-byte encoding, and more recently due to factors that predate and are outside the control of the standard. Bug 663 was approved because we hadn't established then those additional factors had to be considered binding. The standard can't change that much so that rationale has to, in other words.

As to this report, you were the one that pointed out where XBD7 conflicts with XBD6 trying to make that assertion of intent, according to the Oct 2009 "multibyte C locale" email thread, and listed most of the code points of the portable charset that are not permitted to have wchar_t encodings because they function as shift codes of one type or another. You were right; earlier in that thread I was guilty of focusing too narrowly on what XBD6 alone was saying and just muddling the debate. The code points POSIX doesn't require be assigned to specific functions have to be treated the same, imo now (and then but I forgot), because encodings like UTF-8 or 8859-1 do make use of some of them in that manner. TC2 saying the intent was char, CHAR_MIN and CHAR_MAX be unsigned for the POSIX locale, is fine; this does not mean all values between MIN and MAX magically become valid for btowc() to successfully convert, however. All it means is the interface shouldn't reject 128 because an implementation wants to say char is the same as signed char as an extension. This is different from what that change requires of implementations.

- Issue History
Date Modified Username Field Change
2018-01-30 00:01 shware_systems New Issue
2018-01-30 00:01 shware_systems Name => Mark Ziegast
2018-01-30 00:01 shware_systems Organization => SHware Systems Dev.
2018-01-30 00:01 shware_systems Section => btowc()
2018-01-30 00:01 shware_systems Page Number => C165 632
2018-01-30 00:01 shware_systems Line Number => 21871-2
2018-01-30 09:31 geoffclare Note Added: 0003914
2018-02-21 02:23 shware_systems Note Added: 0003925


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker