Austin Group Defect Tracker

Aardvark Mark III

Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001170 [1003.1(2016)/Issue7+TC2] System Interfaces Editorial Clarification Requested 2017-11-20 19:59 2017-11-21 16:30
Reporter dalias View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Rich Felker
Organization musl libc
User Reference
Section fgetwc
Page Number unknown
Line Number unknown
Interp Status ---
Final Accepted Text
Summary 0001170: Error indicator for stream on encoding errors may conflict with ISO C
Description The POSIX text reads:

"If a read error occurs, the error indicator for the stream shall be set, fgetwc() shall return WEOF, [CX] [Option Start] and shall set errno to indicate the error. [Option End] If an encoding error occurs, the error indicator for the stream shall be set, fgetwc() shall return WEOF, and shall set errno to indicate the error."

whereas the ISO C11 text reads:

"If a read error occurs, the error indicator for the stream is set and the fgetwc function returns WEOF. If an encoding error occurs (including too few bytes), the value of the macro EILSEQ is stored in errno and the fgetwc function returns WEOF."

Both versions use separate text for the read error and encoding error cases, but the ISO C version explicitly states that read errors cause the error indicator to be set, and omits any similar text from the encoding error case.

Modifying the eof or error indicators for a stdio stream except as specified is an observably nonconforming behavior for an implementation. POSIX seems to require behavior that does not conform to the requirements of ISO C.
Desired Action Inquire with WG14 to determine if there is an intent that the C standard require the implementation not to set the error indicator on encoding errors. If so, remove the requirement to do so from POSIX, since it conflicts with ISO C. If not (i.e. if setting the error indicator is deemed to conform with ISO C, but not to be a requirement for C conformance), then the requirement to do so in POSIX should be CX-shaded.
Tags No tags attached.
Attached Files

- Relationships
duplicate of 0001022New 1003.1(2013)/Issue7+TC1 error indicator for encoding errors in fgetwc(3) 

-  Notes
schwarze (reporter)
2017-11-21 00:11

The possibility that this might be an oversight in the C standard itself should also be considered. Your suggestion on how to proceed makes a lot of sense because if that is the case, the C standard committee might come to the conclusion to fix the oversight in the C standard.

Specifically, to me, not requiring to set the error indicator does not make much sense. In the case of a state-dependent locale, the shift state is no longer known after this kind of error, so reading is no longer safe, and subsequent read operations should better return failure as well. What is worse, i'm not aware of any way to reinitialize the shift state, and i don't even think such a way could reasonably be defined because the reader really can't know what the intention of the one who wrote or writes to that file was or is when writing those invalid bytes. So the only safe way for the program to proceed seems to be to close the now broken file descriptor.

Note that even with a state-independent, but variable-size locale like UTF-8, continuing to read is not fully safe because we don't know where the next character may start. So with a naive UTF-8 implementation of fgetwc(3), getting back to a working state may take up to four subsequent tries of fgetwc() - if the first byte of a four-byte character was corrupted in the file - or even an arbitrarily larger number of tries, for example if somebody wrote an invalid six-byte sequence to the file that was intended to represent a single character.
nick (manager)
2017-11-21 16:05

Note also the comments in footnote 341 (C11):

An end-of-file and a read error can be distinguished by use of the feof and ferror functions. Also, errno will be set to
EILSEQ by input/output functions only if an encoding error occurs.

This leads to the following expectations for the programmer if fgetwc returns WEOF:

+ test with feof to see if there are no more characters available. If feof returns true, errno may have been set to EILSEQ if the if the last character was incomplete

+ test with ferror to see if an error occurred. If ferror returns true, then errno should contain information about that error (including EILSEQ)

However, since footnotes are only informative, this footnote only clarifies that the WG14 committee intended that the error indicator is permitted to be set in this case and that POSIX should mark this as CX shaded
geoffclare (manager)
2017-11-21 16:23

This is a duplicate of 0001022.
dalias (reporter)
2017-11-21 16:30

I still think some feedback from the C standard side is needed. Even if that footnote were normative, I don't read it as suggesting that encoding errors might set the error indicator. It (1) tells you how to distinguish eof from a "read error" (not an "encoding error"; these are different concepts as can be seen in the text above the footnote), and (2) tells you how to use errno to identify an encoding error.

- Issue History
Date Modified Username Field Change
2017-11-20 19:59 dalias New Issue
2017-11-20 19:59 dalias Name => Rich Felker
2017-11-20 19:59 dalias Organization => musl libc
2017-11-20 19:59 dalias Section => fgetwc
2017-11-20 19:59 dalias Page Number => unknown
2017-11-20 19:59 dalias Line Number => unknown
2017-11-21 00:11 schwarze Note Added: 0003882
2017-11-21 16:05 nick Note Added: 0003883
2017-11-21 16:23 geoffclare Note Added: 0003884
2017-11-21 16:23 geoffclare Relationship added duplicate of 0001022
2017-11-21 16:30 dalias Note Added: 0003885

Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker