View Issue Details

IDProjectCategoryView StatusLast Update
00011821003.1(2016/18)/Issue7+TC2System Interfacespublic2019-02-21 16:15
Reportershware_systems Assigned To 
PrioritynormalSeverityObjectionTypeEnhancement Request
Status ClosedResolutionRejected 
NameMark Ziegast
OrganizationSHware Systems Dev.
User Reference
Sectionbtowc()
Page NumberC165 632
Line Number21871-2
Interp Status---
Final Accepted Text
Summary0001182: CX behavior wasn't changed appropriately with TC2
DescriptionThe text currently is:
In the POSIX locale, btowc( ) shall not return WEOF if c has a value in the range 0 to 255 inclusive.

However, with the POSIX locale considering the values over 127 unassigned using the ISO-646-IRV encoding, by TC2 emphasizing all 8 bits are significant, and some of the currently optional control characters (especially <ESC>) of the portable character set being usable as single or locking shift chars, WEOF is appropriate for more values than not, even now. The range 0 to 127, not 255, is valid only for ASCII-68 for that restriction, as it did not require <ESC>, <SI>, and <SO> have their current definitions.

Notes on the Desired Action:
The phrasing is meant to leave open support for optional locales making use of existing DBCS simple shift encodings. This is more for c2x to address than POSIX, however, so I don't put it as a Future Direction. Disallowing all shifts is more in keeping with the current definition of wchar_t.

Because the current ISO-646, EBCDIC, and 8-bit Unicode encodings require them for minimal conformance, my XBD6 draft defines and requires support for extended shifts, rather than leave them implicitly allowed as it has been, so they're part of the text.

The NATIVE locale mentioned, which is already required to be provided by implementations as the "" argument to setlocale() and is expected to differ from the POSIX locale, I hope gets a more formal definition in Issue 8 as to what implementations can do to be cross-platform portable for both POSIX and historical NLS requirements; such as the standard utilities correctly handling '$' possibly having multiple encodings. As no current implementation has bothered to attempt this it amounts to invention, unfortunately. The "other defined locales" part includes POSIX on UTF8.

The error case is added to distinguish where mbtowc() may succeed (not set) from those where it won't (is set to EILSEQ). I consider this obvious, but an Application Usage note may be desirable.
Desired ActionReplace the section with:
In the POSIX, NATIVE, and some other standard defined locales, btowc( ) shall not return WEOF if c is the single byte encoding for a member the portable character set of charclass <graphic> and the members of charclass <control> that are not single shift introducers, simple or extended, or single byte locking shift introducers.

In ERRORS, replace:
No errors are defined.

with:
The btowc( ) function may fail if:
[EILSEQ] The c argument is an unassigned single or lead encoding unit of the current locale's codeset.
TagsNo tags attached.

Relationships

related to 0000663 Closedajosey 1003.1(2008)/Issue 7 Specification of str[n]casecmp is ambiguous 

Activities

geoffclare

2018-01-30 09:31

manager   bugnote:0003914

The statement "In the POSIX locale, btowc() shall not return WEOF if c has a value in the range 0 to 255 inclusive" is precisely what was intended by the changes made in TC2. If there is any inconsistency between this statement and other parts of the standard post-TC2, then it is those other parts that need to change.

See in particular the interpretation rationale in 0000663 "The intention was always that the POSIX locale should have an 8-bit-clean single-byte encoding. The omission of an explicit statement to that effect was an oversight."

shware_systems

2018-02-21 02:23

reporter   bugnote:0003925

Re: 3914
That makes things look cleaner for btowc(), but we established back in 2009 the POSIX locale is required to function dirty as a multi-byte encoding, and more recently due to factors that predate and are outside the control of the standard. Bug 663 was approved because we hadn't established then those additional factors had to be considered binding. The standard can't change that much so that rationale has to, in other words.

As to this report, you were the one that pointed out where XBD7 conflicts with XBD6 trying to make that assertion of intent, according to the Oct 2009 "multibyte C locale" email thread, and listed most of the code points of the portable charset that are not permitted to have wchar_t encodings because they function as shift codes of one type or another. You were right; earlier in that thread I was guilty of focusing too narrowly on what XBD6 alone was saying and just muddling the debate. The code points POSIX doesn't require be assigned to specific functions have to be treated the same, imo now (and then but I forgot), because encodings like UTF-8 or 8859-1 do make use of some of them in that manner. TC2 saying the intent was char, CHAR_MIN and CHAR_MAX be unsigned for the POSIX locale, is fine; this does not mean all values between MIN and MAX magically become valid for btowc() to successfully convert, however. All it means is the interface shouldn't reject 128 because an implementation wants to say char is the same as signed char as an extension. This is different from what that change requires of implementations.

Issue History

Date Modified Username Field Change
2018-01-30 00:01 shware_systems New Issue
2018-01-30 00:01 shware_systems Name => Mark Ziegast
2018-01-30 00:01 shware_systems Organization => SHware Systems Dev.
2018-01-30 00:01 shware_systems Section => btowc()
2018-01-30 00:01 shware_systems Page Number => C165 632
2018-01-30 00:01 shware_systems Line Number => 21871-2
2018-01-30 09:31 geoffclare Note Added: 0003914
2018-02-21 02:23 shware_systems Note Added: 0003925
2019-02-21 16:01 nick Relationship added related to 0000663
2019-02-21 16:15 geoffclare Interp Status => ---
2019-02-21 16:15 geoffclare Status New => Closed
2019-02-21 16:15 geoffclare Resolution Open => Rejected