Anonymous | Login | 2024-10-15 00:23 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001170 | [1003.1(2016/18)/Issue7+TC2] System Interfaces | Editorial | Clarification Requested | 2017-11-20 19:59 | 2019-05-02 13:03 | ||
Reporter | dalias | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Duplicate | ||||
Status | Closed | ||||||
Name | Rich Felker | ||||||
Organization | musl libc | ||||||
User Reference | |||||||
Section | fgetwc | ||||||
Page Number | unknown | ||||||
Line Number | unknown | ||||||
Interp Status | --- | ||||||
Final Accepted Text | |||||||
Summary | 0001170: Error indicator for stream on encoding errors may conflict with ISO C | ||||||
Description |
The POSIX text reads: "If a read error occurs, the error indicator for the stream shall be set, fgetwc() shall return WEOF, [CX] [Option Start] and shall set errno to indicate the error. [Option End] If an encoding error occurs, the error indicator for the stream shall be set, fgetwc() shall return WEOF, and shall set errno to indicate the error." whereas the ISO C11 text reads: "If a read error occurs, the error indicator for the stream is set and the fgetwc function returns WEOF. If an encoding error occurs (including too few bytes), the value of the macro EILSEQ is stored in errno and the fgetwc function returns WEOF." Both versions use separate text for the read error and encoding error cases, but the ISO C version explicitly states that read errors cause the error indicator to be set, and omits any similar text from the encoding error case. Modifying the eof or error indicators for a stdio stream except as specified is an observably nonconforming behavior for an implementation. POSIX seems to require behavior that does not conform to the requirements of ISO C. |
||||||
Desired Action |
Inquire with WG14 to determine if there is an intent that the C standard require the implementation not to set the error indicator on encoding errors. If so, remove the requirement to do so from POSIX, since it conflicts with ISO C. If not (i.e. if setting the error indicator is deemed to conform with ISO C, but not to be a requirement for C conformance), then the requirement to do so in POSIX should be CX-shaded. |
||||||
Tags | No tags attached. | ||||||
Attached Files | |||||||
|
Relationships | |||||||
|
Notes | |
(0003882) schwarze (reporter) 2017-11-21 00:11 |
The possibility that this might be an oversight in the C standard itself should also be considered. Your suggestion on how to proceed makes a lot of sense because if that is the case, the C standard committee might come to the conclusion to fix the oversight in the C standard. Specifically, to me, not requiring to set the error indicator does not make much sense. In the case of a state-dependent locale, the shift state is no longer known after this kind of error, so reading is no longer safe, and subsequent read operations should better return failure as well. What is worse, i'm not aware of any way to reinitialize the shift state, and i don't even think such a way could reasonably be defined because the reader really can't know what the intention of the one who wrote or writes to that file was or is when writing those invalid bytes. So the only safe way for the program to proceed seems to be to close the now broken file descriptor. Note that even with a state-independent, but variable-size locale like UTF-8, continuing to read is not fully safe because we don't know where the next character may start. So with a naive UTF-8 implementation of fgetwc(3), getting back to a working state may take up to four subsequent tries of fgetwc() - if the first byte of a four-byte character was corrupted in the file - or even an arbitrarily larger number of tries, for example if somebody wrote an invalid six-byte sequence to the file that was intended to represent a single character. |
(0003883) nick (manager) 2017-11-21 16:05 |
Note also the comments in footnote 341 (C11):
This leads to the following expectations for the programmer if fgetwc returns WEOF: + test with feof to see if there are no more characters available. If feof returns true, errno may have been set to EILSEQ if the if the last character was incomplete + test with ferror to see if an error occurred. If ferror returns true, then errno should contain information about that error (including EILSEQ) However, since footnotes are only informative, this footnote only clarifies that the WG14 committee intended that the error indicator is permitted to be set in this case and that POSIX should mark this as CX shaded |
(0003884) geoffclare (manager) 2017-11-21 16:23 |
This is a duplicate of 0001022. |
(0003885) dalias (reporter) 2017-11-21 16:30 |
I still think some feedback from the C standard side is needed. Even if that footnote were normative, I don't read it as suggesting that encoding errors might set the error indicator. It (1) tells you how to distinguish eof from a "read error" (not an "encoding error"; these are different concepts as can be seen in the text above the footnote), and (2) tells you how to use errno to identify an encoding error. |
(0004382) nick (manager) 2019-05-02 13:03 |
WG14 have agreed to add normative text to C2X: Change 7.29.3.1p3 to require the error indicator to be set in this case:
|
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |