0001170: Error indicator for stream on encoding errors may conflict with ISO C

Notes
(0003882) schwarze (reporter) 2017-11-21 00:11	The possibility that this might be an oversight in the C standard itself should also be considered. Your suggestion on how to proceed makes a lot of sense because if that is the case, the C standard committee might come to the conclusion to fix the oversight in the C standard. Specifically, to me, not requiring to set the error indicator does not make much sense. In the case of a state-dependent locale, the shift state is no longer known after this kind of error, so reading is no longer safe, and subsequent read operations should better return failure as well. What is worse, i'm not aware of any way to reinitialize the shift state, and i don't even think such a way could reasonably be defined because the reader really can't know what the intention of the one who wrote or writes to that file was or is when writing those invalid bytes. So the only safe way for the program to proceed seems to be to close the now broken file descriptor. Note that even with a state-independent, but variable-size locale like UTF-8, continuing to read is not fully safe because we don't know where the next character may start. So with a naive UTF-8 implementation of fgetwc(3), getting back to a working state may take up to four subsequent tries of fgetwc() - if the first byte of a four-byte character was corrupted in the file - or even an arbitrarily larger number of tries, for example if somebody wrote an invalid six-byte sequence to the file that was intended to represent a single character.

(0003883) nick (manager) 2017-11-21 16:05	Note also the comments in footnote 341 (C11): An end-of-file and a read error can be distinguished by use of the feof and ferror functions. Also, errno will be set to EILSEQ by input/output functions only if an encoding error occurs. This leads to the following expectations for the programmer if fgetwc returns WEOF: + test with feof to see if there are no more characters available. If feof returns true, errno may have been set to EILSEQ if the if the last character was incomplete + test with ferror to see if an error occurred. If ferror returns true, then errno should contain information about that error (including EILSEQ) However, since footnotes are only informative, this footnote only clarifies that the WG14 committee intended that the error indicator is permitted to be set in this case and that POSIX should mark this as CX shaded

(0003884) geoffclare (manager) 2017-11-21 16:23	This is a duplicate of 0001022.

(0003885) dalias (reporter) 2017-11-21 16:30	I still think some feedback from the C standard side is needed. Even if that footnote were normative, I don't read it as suggesting that encoding errors might set the error indicator. It (1) tells you how to distinguish eof from a "read error" (not an "encoding error"; these are different concepts as can be seen in the text above the footnote), and (2) tells you how to use errno to identify an encoding error.

(0004382) nick (manager) 2019-05-02 13:03	WG14 have agreed to add normative text to C2X: Change 7.29.3.1p3 to require the error indicator to be set in this case: If a read error occurs, the error indicator for the stream is set and the fgetwc function returns WEOF. If an encoding error occurs (including too few bytes), the error indicator for the stream is set and the value of the macro EILSEQ is stored in errno and the fgetwc function returns WEOF.

Issue History
Date Modified	Username	Field	Change
2017-11-20 19:59	dalias	New Issue
2017-11-20 19:59	dalias	Name	=> Rich Felker
2017-11-20 19:59	dalias	Organization	=> musl libc
2017-11-20 19:59	dalias	Section	=> fgetwc
2017-11-20 19:59	dalias	Page Number	=> unknown
2017-11-20 19:59	dalias	Line Number	=> unknown
2017-11-21 00:11	schwarze	Note Added: 0003882
2017-11-21 16:05	nick	Note Added: 0003883
2017-11-21 16:23	geoffclare	Note Added: 0003884
2017-11-21 16:23	geoffclare	Relationship added	duplicate of 0001022
2017-11-21 16:30	dalias	Note Added: 0003885
2019-02-04 16:32	Don Cragun	Interp Status	=> ---
2019-02-04 16:32	Don Cragun	Status	New => Closed
2019-02-04 16:32	Don Cragun	Resolution	Open => Duplicate
2019-05-02 13:03	nick	Note Added: 0004382

Aardvark Mark IV