View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000182 | 1003.1(2008)/Issue 7 | System Interfaces | public | 2009-11-12 19:23 | 2013-04-16 13:06 |
Reporter | Don Cragun | Assigned To | ajosey | ||
Priority | normal | Severity | Comment | Type | Error |
Status | Closed | Resolution | Accepted | ||
Name | Don Cragun | ||||
Organization | Self | ||||
User Reference | fgets() LINE_MAX | ||||
Section | fgets() EXAMPLES | ||||
Page Number | 852 | ||||
Line Number | 28298-28308 | ||||
Interp Status | --- | ||||
Final Accepted Text | |||||
Summary | 0000182: Unsafe use of LINE_MAX in fgets() example | ||||
Description | Several issues have been discussed on the Austin Group's e-mail list related to LINE_MAX. The discussion started with a discussion about the use of LINE_MAX in the example on the description of the fgets() function, and branched into a discussion about the differences between the definition of LINE_MAX in <limits.h> and the legal return values for a call to sysconf(_SC_LINE_MAX); This bug report intends to resolve all of the issues raised except for the use of LINE_MAX in an example on the description of the strtok() function which has already been addressed by the resolution of bug #177 (See 0000177.) The Utility Limits rationale (subclause C.1.3 on P3639, L123638-123646) already warns application writers that it is not a good idea to "blindly" allocate an array of size [LINE_MAX] and assume that it will be large enough to hold a full input line. Further guidance later in that section (P3640, L123685-123701) warns about creating lines longer than LINE_MAX and needing to be careful if those lines will later become input to a utility that is specified to process text files. I believe that it is safe to assume that applications using fgets() already assume that any stream being read with fgets() is connected to a text file. If the application does not have reason to believe that the input is from a text file, it should be using functions like read() or fread() rather than functions like fgets() and getline(). | ||||
Desired Action | In XBD, <limits.h>, Minimum Values section, P274, L9019-9020 change: A conforming implementation shall provide values at least this large. to: For each of these limits, a conforming implementation shall provide a value at least this large or shall have no limit. In XSH, fgets(), EXAMPLES section, P852, L28298-28308 change: The following example uses fgets() to read each line of input. {LINE_MAX}, which defines the maximum size of the input line, is defined in the <limits.h> header. #include <stdio.h> ... char line[LINE_MAX]; ... while (fgets(line, LINE_MAX, fp) != NULL) { ... } ... to: The following example uses fgets() to read lines of input. It assumes that the file it is reading is a text file and that lines in this text file are no longer than 16384 (or {LINE_MAX} if it is less than 16384 on the implementation where it is running) bytes long. (Note that the standard utilities have no line length limit if sysconf(_SC_LINE_MAX) returns -1 without setting errno. This example assumes that sysconf(_SC_LINE_MAX) will not fail.) #include <limits.h> #include <stdio.h> #include <unistd.h> #define MYLIMIT 16384 char *line; int line_max; if (LINE_MAX >= MYLIMIT) { // Use maximum line size of MYLIMIT. If LINE_MAX is // bigger than our limit, sysconf() can't report a // smaller limit. line_max = MYLIMIT; } else { long limit = sysconf(_SC_LINE_MAX); line_max = (limit < 0 || limit > MYLIMIT) ? MYLIMIT : (int)limit; } // line_max + 1 leaves room for the nul byte added by fgets(). line = malloc(line_max + 1); if (line == NULL) { // out of space ... return error; } while (fgets(line, line_max + 1, fp) != NULL) { // Verify that a full line has been read... // If not report an error or prepare to treat the // next time through the loop as a read of a // continuation of the current line. ... // Process line... ... } free(line); ... | ||||
Tags | tc1-2008 |
|
It may be safe to assume fgets expects a stream of text as if it were reading a text file, but it is not safe to assume fgets won't be reading from stdin, i.e., a pipe or the keyboard. The wording should reflect that. Although fgets is defined strictly as working with bytes, someday that may change. As a result I think the example better not assume that one char equals one byte. I would thus change: line = malloc(line_max + 1); to: line = malloc(sizeof(char)*(line_max + 1)); |
|
sizeof(char) == 1, can't change, or would break all existing application. The WG14 will never change this statement of the Standard. best practice is char * line = malloc((line_max + 1) * sizeof(*line)); so one could easily change type of line: wchar_t * line = malloc((line_max + 1) * sizeof(*line)); |
|
> it is not safe to assume fgets won't be reading from stdin, i.e., a > pipe or the keyboard. The wording should reflect that. The wording does already reflect that. You seem to be misreading "file" as "regular file". In the standard a "file" is "an object that can be written to, or read from, or both" (see XBD7 3.164). |
|
The example above deals with the POSIX-specified requirement that text files could have infinite line length (i.e., sysconf (_SC_LINE_MAX) returns a negative number). It does so by imposing an arbitrary, hard coded limit on the maximum line length it is willing to process, and deals with lines larger than that by commenting that longer lines are either treated as an error or as a continuation line. When processing lines of unbounded length, fgetc() should be used in preference to fgets(): faced with the requirement to deal with lines piecemeal, an application draws no benefit from the buffer which is the hallmark of fgets() -- inappropriate to an example demonstrating fgets(). Thus, too long lines should be treated as errors in the example: the final, non-NULL character in the buffer must be a newline, and the code could/should show this. Faced with the choice between an application "guessing" and hard coding a limit, and using a system-supplied value, I think the better solution is the latter. Thus, I would use the macro (LINE_MAX) defined in <limits.h> as the fallback value. In fact, a conforming application may rely on LINE_MAX exclusively, omitting the query to sysconf() all together. For simplicity, such an approach should be considered -- it happens to be the original example (except that an extra position needs to be reserved for the trailing NULL). If using _SC_LINE_MAX is wished, I would use the following example: The following example uses fgets() to read lines of input. It assumes that the file it is reading is a text file. On a system that does not bound line lengths, it can process only lines that are no longer than LINE_MAX. (Note that the standard utilities have no line length limit if sysconf(_SC_LINE_MAX) returns -1 without setting errno. This example assumes that sysconf(_SC_LINE_MAX) will not fail.) #include <limits.h> #include <stdio.h> #include <unistd.h> #include <string.h> char *line; int line_max; line_max = sysconf(_SC_LINE_MAX); if (0 > line_max) line_max = LINE_MAX; // add room for the nul byte added by fgets(). ++line_max; line = malloc(line_max); if (line == NULL) { // out of space ... return error; } while (fgets(line, line_max, fp) != NULL) { // Verify that a full line has been read... if ('\n' != line [strlen (line) - 1]) { ... return error; } // Process line... ... } free(line); ... |
|
There was a lengthy discussion about how limits are handled in the standard in the email discussion of this defect report on the austin-group-l alias. Despite the desires of some, the decision made during today's conference call is that the current text in the standard accurately describes the intent of the balloting group when the standard was written and as it has been revised over the years since then. The full discussion of this issue can be found in the mail archives and in the some of the comments in a related bug (see bugID:177) concerning an example in strtok(). The email sequence numbers in the archives are: 12789, 13018-10320, 13023, 13025, 13026, 13028, 13031-13033, 13036, 13047, 13048, 13052, 13065, 13083-13092, 13100, 13116, 13117, and 13121-13124. |
|
While I'm sorry I wasn't able to make the teleconference, and have no basic objection to the example change, after further thought on the matter the change to <limits.h> I feel must be considered illegal. There is NO PRIOR OR CURRENT ART, as the standard REQUIRES, of any actual implementation that does not have implicit, due to hardware constraints, or arbitrarily set explicit limits. As even Mr. Cragun admitted in the mailing list discussion this is a 'holy grail' of operating system design, not something he's aware of having been implemented either, the use of a symbolic 'indefinite' to represent 'shall have no limit' is not permitted. While many functions in C are specified in terms of an indefinite ideal, and I certainly understanf the desire to have the standard match such ideals as near as practical, the underlying implementations of those functions ALL deal with specific limits in one form or another, whether heuristically or analytically derived from hardware constraints, and it is the purpose of sysconf() and the <limits.h> and other headers to expose those values so applications don't need to duplicate the logic to discover them on their own. It is the text of sysconf() that must be changed so the blanket allowance of 'indefinite' be excised. Desires for future hardware or not as the original intent, it shouldn't legally have been there to begin with. If a valid reason for 'indefinite' is established as mandated by the hardware hosting an implemenation that always needs to be accounted for by any application, for a specific _SC_* index value, the text describing that index can indicate this and the Rationale should explain in detail the circumstances which lead to it being mandated. No current or past hardware I'm aware of has this requirement; I include the caveat only on grounds that future computers based on theoretical technologies like quantum effect transistors and hyperspatial holographic memory systems could need them. |
Date Modified | Username | Field | Change |
---|---|---|---|
2009-11-12 19:23 | Don Cragun | New Issue | |
2009-11-12 19:23 | Don Cragun | Status | New => Under Review |
2009-11-12 19:23 | Don Cragun | Assigned To | => ajosey |
2009-11-12 19:23 | Don Cragun | Name | => Don Cragun |
2009-11-12 19:23 | Don Cragun | Organization | => Self |
2009-11-12 19:23 | Don Cragun | User Reference | => fgets() LINE_MAX |
2009-11-12 19:23 | Don Cragun | Section | => fgets() EXAMPLES |
2009-11-12 19:23 | Don Cragun | Page Number | => 852 |
2009-11-12 19:23 | Don Cragun | Line Number | => 28298-28308 |
2009-11-12 19:23 | Don Cragun | Interp Status | => --- |
2009-11-12 20:22 | wpollock | Note Added: 0000302 | |
2009-11-12 20:31 | nsitbon | Note Added: 0000303 | |
2009-11-13 10:07 | geoffclare | Note Added: 0000304 | |
2009-11-16 10:14 | Konrad_Schwarz | Note Added: 0000307 | |
2009-12-03 16:25 | Don Cragun | Resolution | Open => Accepted |
2009-12-03 16:25 | Don Cragun | Desired Action Updated | |
2009-12-03 16:26 | Don Cragun | Status | Under Review => Resolved |
2009-12-03 18:53 | Don Cragun | Note Added: 0000321 | |
2009-12-05 21:05 | SHwareSystems | Note Added: 0000323 | |
2010-09-09 15:30 | Don Cragun | Tag Attached: tc1-2008 | |
2013-04-16 13:06 | ajosey | Status | Resolved => Closed |