|Anonymous | Login||2019-02-16 06:12 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001182||[1003.1(2016)/Issue7+TC2] System Interfaces||Objection||Enhancement Request||2018-01-30 00:01||2018-02-21 02:23|
|Organization||SHware Systems Dev.|
|Page Number||C165 632|
|Final Accepted Text|
|Summary||0001182: CX behavior wasn't changed appropriately with TC2|
The text currently is:
In the POSIX locale, btowc( ) shall not return WEOF if c has a value in the range 0 to 255 inclusive.
However, with the POSIX locale considering the values over 127 unassigned using the ISO-646-IRV encoding, by TC2 emphasizing all 8 bits are significant, and some of the currently optional control characters (especially <ESC>) of the portable character set being usable as single or locking shift chars, WEOF is appropriate for more values than not, even now. The range 0 to 127, not 255, is valid only for ASCII-68 for that restriction, as it did not require <ESC>, <SI>, and <SO> have their current definitions.
Notes on the Desired Action:
The phrasing is meant to leave open support for optional locales making use of existing DBCS simple shift encodings. This is more for c2x to address than POSIX, however, so I don't put it as a Future Direction. Disallowing all shifts is more in keeping with the current definition of wchar_t.
Because the current ISO-646, EBCDIC, and 8-bit Unicode encodings require them for minimal conformance, my XBD6 draft defines and requires support for extended shifts, rather than leave them implicitly allowed as it has been, so they're part of the text.
The NATIVE locale mentioned, which is already required to be provided by implementations as the "" argument to setlocale() and is expected to differ from the POSIX locale, I hope gets a more formal definition in Issue 8 as to what implementations can do to be cross-platform portable for both POSIX and historical NLS requirements; such as the standard utilities correctly handling '$' possibly having multiple encodings. As no current implementation has bothered to attempt this it amounts to invention, unfortunately. The "other defined locales" part includes POSIX on UTF8.
The error case is added to distinguish where mbtowc() may succeed (not set) from those where it won't (is set to EILSEQ). I consider this obvious, but an Application Usage note may be desirable.
Replace the section with:
In the POSIX, NATIVE, and some other standard defined locales, btowc( ) shall not return WEOF if c is the single byte encoding for a member the portable character set of charclass <graphic> and the members of charclass <control> that are not single shift introducers, simple or extended, or single byte locking shift introducers.
In ERRORS, replace:
No errors are defined.
The btowc( ) function may fail if:
[EILSEQ] The c argument is an unassigned single or lead encoding unit of the current locale's codeset.
|Tags||No tags attached.|
The statement "In the POSIX locale, btowc() shall not return WEOF if c has a value in the range 0 to 255 inclusive" is precisely what was intended by the changes made in TC2. If there is any inconsistency between this statement and other parts of the standard post-TC2, then it is those other parts that need to change.
See in particular the interpretation rationale in 0000663 "The intention was always that the POSIX locale should have an 8-bit-clean single-byte encoding. The omission of an explicit statement to that effect was an oversight."
That makes things look cleaner for btowc(), but we established back in 2009 the POSIX locale is required to function dirty as a multi-byte encoding, and more recently due to factors that predate and are outside the control of the standard. Bug 663 was approved because we hadn't established then those additional factors had to be considered binding. The standard can't change that much so that rationale has to, in other words.
As to this report, you were the one that pointed out where XBD7 conflicts with XBD6 trying to make that assertion of intent, according to the Oct 2009 "multibyte C locale" email thread, and listed most of the code points of the portable charset that are not permitted to have wchar_t encodings because they function as shift codes of one type or another. You were right; earlier in that thread I was guilty of focusing too narrowly on what XBD6 alone was saying and just muddling the debate. The code points POSIX doesn't require be assigned to specific functions have to be treated the same, imo now (and then but I forgot), because encodings like UTF-8 or 8859-1 do make use of some of them in that manner. TC2 saying the intent was char, CHAR_MIN and CHAR_MAX be unsigned for the POSIX locale, is fine; this does not mean all values between MIN and MAX magically become valid for btowc() to successfully convert, however. All it means is the interface shouldn't reject 128 because an implementation wants to say char is the same as signed char as an extension. This is different from what that change requires of implementations.
|2018-01-30 00:01||shware_systems||New Issue|
|2018-01-30 00:01||shware_systems||Name||=> Mark Ziegast|
|2018-01-30 00:01||shware_systems||Organization||=> SHware Systems Dev.|
|2018-01-30 00:01||shware_systems||Section||=> btowc()|
|2018-01-30 00:01||shware_systems||Page Number||=> C165 632|
|2018-01-30 00:01||shware_systems||Line Number||=> 21871-2|
|2018-01-30 09:31||geoffclare||Note Added: 0003914|
|2018-02-21 02:23||shware_systems||Note Added: 0003925|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|