Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000182 [1003.1(2008)/Issue 7] System Interfaces Comment Error 2009-11-12 19:23 2013-04-16 13:06
Reporter Don Cragun View Status public  
Assigned To ajosey
Priority normal Resolution Accepted  
Status Closed  
Name Don Cragun
Organization Self
User Reference fgets() LINE_MAX
Section fgets() EXAMPLES
Page Number 852
Line Number 28298-28308
Interp Status ---
Final Accepted Text
Summary 0000182: Unsafe use of LINE_MAX in fgets() example
Description Several issues have been discussed on the Austin Group's e-mail
list related to LINE_MAX. The discussion started with a discussion
about the use of LINE_MAX in the example on the description of the
fgets() function, and branched into a discussion about the differences
between the definition of LINE_MAX in <limits.h> and the legal
return values for a call to sysconf(_SC_LINE_MAX); This bug report
intends to resolve all of the issues raised except for the use of
LINE_MAX in an example on the description of the strtok() function
which has already been addressed by the resolution of bug #177 (See
0000177.)

The Utility Limits rationale (subclause C.1.3 on P3639, L123638-123646)
already warns application writers that it is not a good idea to
"blindly" allocate an array of size [LINE_MAX] and assume that it
will be large enough to hold a full input line. Further guidance
later in that section (P3640, L123685-123701) warns about creating
lines longer than LINE_MAX and needing to be careful if those lines
will later become input to a utility that is specified to process
text files.

I believe that it is safe to assume that applications using fgets()
already assume that any stream being read with fgets() is connected
to a text file. If the application does not have reason to believe
that the input is from a text file, it should be using functions
like read() or fread() rather than functions like fgets() and
getline().
Desired Action
In XBD, <limits.h>, Minimum Values section, P274, L9019-9020 change:
    A conforming implementation shall provide values at least this
    large.
to:
    For each of these limits, a conforming implementation shall
    provide a value at least this large or shall have no limit.

In XSH, fgets(), EXAMPLES section, P852, L28298-28308 change:
    The following example uses fgets() to read each line of
    input.  {LINE_MAX}, which defines the maximum size of the
    input line, is defined in the <limits.h> header.

    #include <stdio.h>
    ...
    char line[LINE_MAX];
    ...
    while (fgets(line, LINE_MAX, fp) != NULL) {
        ...
    }
    ...
to:
    The following example uses fgets() to read lines of input.
    It assumes that the file it is reading is a text file and
    that lines in this text file are no longer than 16384 (or
    {LINE_MAX} if it is less than 16384 on the implementation
    where it is running) bytes long.  (Note that the standard
    utilities have no line length limit if sysconf(_SC_LINE_MAX)
    returns -1 without setting errno.  This example assumes that
    sysconf(_SC_LINE_MAX) will not fail.)

    #include <limits.h>
    #include <stdio.h>
    #include <unistd.h>

    #define MYLIMIT 16384

    char    *line;
    int     line_max;

    if (LINE_MAX >= MYLIMIT) {
        // Use maximum line size of MYLIMIT.  If LINE_MAX is
        // bigger than our limit, sysconf() can't report a
        // smaller limit.
        line_max = MYLIMIT;
    } else {
        long limit = sysconf(_SC_LINE_MAX);
        line_max = (limit < 0 || limit > MYLIMIT) ? MYLIMIT : (int)limit;
    }
    // line_max + 1 leaves room for the nul byte added by fgets().
    line = malloc(line_max + 1);
    if (line == NULL) {
        // out of space
        ...
        return error;
    }
    while (fgets(line, line_max + 1, fp) != NULL) {
        // Verify that a full line has been read...
        //    If not report an error or prepare to treat the
        //    next time through the loop as a read of a
        //    continuation of the current line.
        ...
        // Process line...
        ...
    }
    free(line);
    ...

Tags tc1-2008
Attached Files

- Relationships

-  Notes
(0000302)
wpollock (reporter)
2009-11-12 20:22

It may be safe to assume fgets expects a stream of text as if it were reading a text file, but it is not safe to assume fgets won't be reading from stdin, i.e., a pipe or the keyboard. The wording should reflect that.

Although fgets is defined strictly as working with bytes, someday that may change. As a result I think the example better not assume that one char equals one byte. I would thus change:

line = malloc(line_max + 1);

to:

line = malloc(sizeof(char)*(line_max + 1));
(0000303)
nsitbon (reporter)
2009-11-12 20:31

sizeof(char) == 1, can't change, or would break all existing application. The WG14 will never change this statement of the Standard. best practice is
char * line = malloc((line_max + 1) * sizeof(*line));
so one could easily change type of line:
wchar_t * line = malloc((line_max + 1) * sizeof(*line));
(0000304)
geoffclare (manager)
2009-11-13 10:07

> it is not safe to assume fgets won't be reading from stdin, i.e., a
> pipe or the keyboard. The wording should reflect that.

The wording does already reflect that. You seem to be misreading
"file" as "regular file". In the standard a "file" is "an object that
can be written to, or read from, or both" (see XBD7 3.164).
(0000307)
Konrad_Schwarz (reporter)
2009-11-16 10:14

The example above deals with the POSIX-specified requirement that text files could have infinite line length (i.e., sysconf (_SC_LINE_MAX) returns a negative number).

It does so by imposing an arbitrary, hard coded limit on the maximum line length it is willing to process, and deals with lines larger than that by commenting that longer lines are either treated as an error or as a continuation line.

When processing lines of unbounded length, fgetc() should be used in preference to fgets(): faced with the requirement to deal with lines piecemeal, an application draws no benefit from the buffer which is the hallmark of fgets() -- inappropriate to an example demonstrating fgets().

Thus, too long lines should be treated as errors in the example: the final, non-NULL character in the buffer must be a newline, and the code could/should show this.

Faced with the choice between an application "guessing" and hard coding a limit, and using a system-supplied value, I think the better solution is the latter. Thus, I would use the macro (LINE_MAX) defined in <limits.h> as the fallback value.

In fact, a conforming application may rely on LINE_MAX exclusively, omitting the query to sysconf() all together. For simplicity, such an approach should be considered -- it happens to be the original example (except that an extra position needs to be reserved for the trailing NULL).

If using _SC_LINE_MAX is wished, I would use the following example:

    The following example uses fgets() to read lines of input.
    It assumes that the file it is reading is a text file. On
    a system that does not bound line lengths, it can process
    only lines that are no longer than LINE_MAX. (Note that the
    standard utilities have no line length limit if sysconf(_SC_LINE_MAX)
    returns -1 without setting errno. This example assumes that
    sysconf(_SC_LINE_MAX) will not fail.)

    #include <limits.h>
    #include <stdio.h>
    #include <unistd.h>
    #include <string.h>

    char *line;
    int line_max;

    line_max = sysconf(_SC_LINE_MAX);
    if (0 > line_max)
         line_max = LINE_MAX;

    // add room for the nul byte added by fgets().
    ++line_max;

    line = malloc(line_max);
    if (line == NULL) {
        // out of space
        ...
        return error;
    }

    while (fgets(line, line_max, fp) != NULL) {
        // Verify that a full line has been read...
        if ('\n' != line [strlen (line) - 1]) {
            ...
            return error;
        }
        
        // Process line...
        ...
    }
    free(line);
    ...
(0000321)
Don Cragun (manager)
2009-12-03 18:53

There was a lengthy discussion about how limits are handled in the standard in the email
discussion of this defect report on the austin-group-l alias. Despite the desires of some,
the decision made during today's conference call is that the current text in the standard
accurately describes the intent of the balloting group when the standard was written and
as it has been revised over the years since then. The full discussion of this issue can be
found in the mail archives and in the some of the comments in a related bug (see
bugID:177) concerning an example in strtok().

The email sequence numbers in the archives are: 12789, 13018-10320, 13023, 13025,
13026, 13028, 13031-13033, 13036, 13047, 13048, 13052, 13065, 13083-13092,
13100, 13116, 13117, and 13121-13124.
(0000323)
SHwareSystems (reporter)
2009-12-05 21:05

While I'm sorry I wasn't able to make the teleconference, and have no basic objection to the example change, after further thought on the matter the change to <limits.h> I feel must be considered illegal. There is NO PRIOR OR CURRENT ART, as the standard REQUIRES, of any actual implementation that does not have implicit, due to hardware constraints, or arbitrarily set explicit limits. As even Mr. Cragun admitted in the mailing list discussion this is a 'holy grail' of operating system design, not something he's aware of having been implemented either, the use of a symbolic 'indefinite' to represent 'shall have no limit' is not permitted.

While many functions in C are specified in terms of an indefinite ideal, and I certainly understanf the desire to have the standard match such ideals as near as practical, the underlying implementations of those functions ALL deal with specific limits in one form or another, whether heuristically or analytically derived from hardware constraints, and it is the purpose of sysconf() and the <limits.h> and other headers to expose those values so applications don't need to duplicate the logic to discover them on their own. It is the text of sysconf() that must be changed so the blanket allowance of 'indefinite' be excised. Desires for future hardware or not as the original intent, it shouldn't legally have been there to begin with.

If a valid reason for 'indefinite' is established as mandated by the hardware hosting an implemenation that always needs to be accounted for by any application, for a specific _SC_* index value, the text describing that index can indicate this and the Rationale should explain in detail the circumstances which lead to it being mandated. No current or past hardware I'm aware of has this requirement; I include the caveat only on grounds that future computers based on theoretical technologies like quantum effect transistors and hyperspatial holographic memory systems could need them.

- Issue History
Date Modified Username Field Change
2009-11-12 19:23 Don Cragun New Issue
2009-11-12 19:23 Don Cragun Status New => Under Review
2009-11-12 19:23 Don Cragun Assigned To => ajosey
2009-11-12 19:23 Don Cragun Name => Don Cragun
2009-11-12 19:23 Don Cragun Organization => Self
2009-11-12 19:23 Don Cragun User Reference => fgets() LINE_MAX
2009-11-12 19:23 Don Cragun Section => fgets() EXAMPLES
2009-11-12 19:23 Don Cragun Page Number => 852
2009-11-12 19:23 Don Cragun Line Number => 28298-28308
2009-11-12 19:23 Don Cragun Interp Status => ---
2009-11-12 20:22 wpollock Note Added: 0000302
2009-11-12 20:31 nsitbon Note Added: 0000303
2009-11-13 10:07 geoffclare Note Added: 0000304
2009-11-16 10:14 Konrad_Schwarz Note Added: 0000307
2009-12-03 16:25 Don Cragun Resolution Open => Accepted
2009-12-03 16:25 Don Cragun Desired Action Updated
2009-12-03 16:26 Don Cragun Status Under Review => Resolved
2009-12-03 18:53 Don Cragun Note Added: 0000321
2009-12-05 21:05 SHwareSystems Note Added: 0000323
2010-09-09 15:30 Don Cragun Tag Attached: tc1-2008
2013-04-16 13:06 ajosey Status Resolved => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker