0001182: CX behavior wasn't changed appropriately with TC2

ID	Project	Category	View Status	Date Submitted	Last Update

0001182	1003.1(2016/18)/Issue7+TC2	System Interfaces	public	2018-01-30 00:01	2019-02-21 16:15

Reporter	shware_systems	Assigned To
Priority	normal	Severity	Objection	Type	Enhancement Request
Status	Closed	Resolution	Rejected

Name	Mark Ziegast
Organization	SHware Systems Dev.
User Reference
Section	btowc()
Page Number	C165 632
Line Number	21871-2
Interp Status	---
Final Accepted Text


Summary	0001182: CX behavior wasn't changed appropriately with TC2
Description	The text currently is: In the POSIX locale, btowc( ) shall not return WEOF if c has a value in the range 0 to 255 inclusive. However, with the POSIX locale considering the values over 127 unassigned using the ISO-646-IRV encoding, by TC2 emphasizing all 8 bits are significant, and some of the currently optional control characters (especially <ESC>) of the portable character set being usable as single or locking shift chars, WEOF is appropriate for more values than not, even now. The range 0 to 127, not 255, is valid only for ASCII-68 for that restriction, as it did not require <ESC>, <SI>, and <SO> have their current definitions. Notes on the Desired Action: The phrasing is meant to leave open support for optional locales making use of existing DBCS simple shift encodings. This is more for c2x to address than POSIX, however, so I don't put it as a Future Direction. Disallowing all shifts is more in keeping with the current definition of wchar_t. Because the current ISO-646, EBCDIC, and 8-bit Unicode encodings require them for minimal conformance, my XBD6 draft defines and requires support for extended shifts, rather than leave them implicitly allowed as it has been, so they're part of the text. The NATIVE locale mentioned, which is already required to be provided by implementations as the "" argument to setlocale() and is expected to differ from the POSIX locale, I hope gets a more formal definition in Issue 8 as to what implementations can do to be cross-platform portable for both POSIX and historical NLS requirements; such as the standard utilities correctly handling '$' possibly having multiple encodings. As no current implementation has bothered to attempt this it amounts to invention, unfortunately. The "other defined locales" part includes POSIX on UTF8. The error case is added to distinguish where mbtowc() may succeed (not set) from those where it won't (is set to EILSEQ). I consider this obvious, but an Application Usage note may be desirable.
Desired Action	Replace the section with: In the POSIX, NATIVE, and some other standard defined locales, btowc( ) shall not return WEOF if c is the single byte encoding for a member the portable character set of charclass <graphic> and the members of charclass <control> that are not single shift introducers, simple or extended, or single byte locking shift introducers. In ERRORS, replace: No errors are defined. with: The btowc( ) function may fail if: [EILSEQ] The c argument is an unassigned single or lead encoding unit of the current locale's codeset.
Tags	No tags attached.

geoffclare 2018-01-30 09:31 manager bugnote:0003914	The statement "In the POSIX locale, btowc() shall not return WEOF if c has a value in the range 0 to 255 inclusive" is precisely what was intended by the changes made in TC2. If there is any inconsistency between this statement and other parts of the standard post-TC2, then it is those other parts that need to change. See in particular the interpretation rationale in 0000663 "The intention was always that the POSIX locale should have an 8-bit-clean single-byte encoding. The omission of an explicit statement to that effect was an oversight."

shware_systems 2018-02-21 02:23 reporter bugnote:0003925	Re: 3914 That makes things look cleaner for btowc(), but we established back in 2009 the POSIX locale is required to function dirty as a multi-byte encoding, and more recently due to factors that predate and are outside the control of the standard. Bug 663 was approved because we hadn't established then those additional factors had to be considered binding. The standard can't change that much so that rationale has to, in other words. As to this report, you were the one that pointed out where XBD7 conflicts with XBD6 trying to make that assertion of intent, according to the Oct 2009 "multibyte C locale" email thread, and listed most of the code points of the portable charset that are not permitted to have wchar_t encodings because they function as shift codes of one type or another. You were right; earlier in that thread I was guilty of focusing too narrowly on what XBD6 alone was saying and just muddling the debate. The code points POSIX doesn't require be assigned to specific functions have to be treated the same, imo now (and then but I forgot), because encodings like UTF-8 or 8859-1 do make use of some of them in that manner. TC2 saying the intent was char, CHAR_MIN and CHAR_MAX be unsigned for the POSIX locale, is fine; this does not mean all values between MIN and MAX magically become valid for btowc() to successfully convert, however. All it means is the interface shouldn't reject 128 because an implementation wants to say char is the same as signed char as an extension. This is different from what that change requires of implementations.

Date Modified	Username	Field	Change
2018-01-30 00:01	shware_systems	New Issue
2018-01-30 00:01	shware_systems	Name	=> Mark Ziegast
2018-01-30 00:01	shware_systems	Organization	=> SHware Systems Dev.
2018-01-30 00:01	shware_systems	Section	=> btowc()
2018-01-30 00:01	shware_systems	Page Number	=> C165 632
2018-01-30 00:01	shware_systems	Line Number	=> 21871-2
2018-01-30 09:31	geoffclare	Note Added: 0003914
2018-02-21 02:23	shware_systems	Note Added: 0003925
2019-02-21 16:01	nick	Relationship added	related to 0000663
2019-02-21 16:15	geoffclare	Interp Status	=> ---
2019-02-21 16:15	geoffclare	Status	New => Closed
2019-02-21 16:15	geoffclare	Resolution	Open => Rejected

View Issue Details

Relationships

Activities

Issue History