Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000167 [1003.1(2008)/Issue 7] System Interfaces Objection Error 2009-10-12 15:32 2013-04-16 13:06
Reporter geoffclare View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Geoff Clare
Organization The Open Group
User Reference
Section getenv
Page Number 1008
Line Number 33856
Interp Status Approved
Final Accepted Text Note: 0000281
Summary 0000167: getenv() and modifying environ directly
Description The description of getenv() states:

    If the application modifies environ or the pointers to which it
    points, the behavior of getenv() is undefined.

According to the getenv() rationale (line 33880 onwards), this
statement is there so that implementations can make use of additional
data structures internally to speed up environment searches.

However, several functions effectively use getenv() internally,
e.g. tzset(), localtime() and mktime() obtain the value of TZ,
setlocale() obtains the LANG and LC_* variable values, etc. None
of these functions have an equivalent statement about undefined
behaviour if the application modifies environ or the pointers to
which it points. So, currently it appears that these functions
are required to cope with direct changes made to environ (or its
pointers) whereas getenv() is not. In fact, any function can
make use of environment variables (e.g. the variables listed by
confstr() with _CS_V7_ENV), so this problem is not just restricted
to functions where the use of environment variables is specified in
the standard.

I can see three possible solutions:

1. Remove the undefined behaviour statement from the getenv() page.
   This would mean that if implementations want to use additional data
   structures internally to speed up searches, they have to do so in a
   way that can cope with direct changes to environ or its pointers.

2. Extend the getenv() undefined behaviour statement to apply to all
   functions in the standard. This would enforce, via normative
   text, the informal statement in the getenv() rationale from
   POSIX.1a that "Conforming applications are required not to modify
   environ directly, but to use only the functions described here to
   manipulate the process environment as an abstract object."
   (Currently this is not reflected in normative text. Applications
   can modify environ directly as long as they do not subsequently
   call getenv(), setenv() or unsetenv().)

3. Extend the getenv() undefined behaviour statement to apply to the
   standard functions that are required to obtain environment variable
   values. If other functions obtain environment variable values,
   they must do so in a "safe" way. Applications would have a list
   of functions that they must avoid calling if they have modified
   environ or its pointers.

In the suggested changes I have plumped for option 1. This
effectively restores the status quo from before POSIX.1a (except
that setenv() and unsetenv() are now standard whereas before they
were extensions.)

A side issue is that some of the rationale explaining the reason
for this undefined behaviour in getenv() is out of date. For
example:

    Thus, the implementation of the environment access functions has
    complete control over the data structure used to represent the
    environment (subject to the requirement that environ be maintained
    as a list of strings with embedded <equals-sign> characters for
    applications that wish to scan the environment). This constraint
    allows the implementation to properly manage the memory it
    allocates, either by using allocated storage for all variables
    (copying them on the first invocation of setenv() or unsetenv()),
    or keeping track of which strings are currently in allocated space
    and which are not, via a separate table or some other means. This
    enables the implementation to free any allocated space used by
    strings (and perhaps the pointers to them) stored in environ when
    unsetenv() is called.

This rationale came in from POSIX.1a. It was fine to make such a
statement in that standard because it only defined getenv(), setenv()
and unsetenv(). However, with the merger into SUSv3 some of it is no
longer true because of the XSI putenv() function. This function
places its argument directly into environ[]. Thus the memory for
variables added using putenv() is under the application's control,
not the implementation's.

This side issue needs to be taken into account when updating the
rationale for whichever of the 3 options to solve the main issue
the group decides on.

Another related issue is that setenv() and unsetenv() have undefined
behaviour if environ or its pointers have been modified directly,
but putenv() does not. Since option 1 abandons the POSIX.1a idea of
allowing "unsafe" data structures to speed up searches, the suggested
changes also remove these statements for setenv() and unsetenv().
If the group decides to go with option 2 or 3, the statements should
be kept and there may be a case for adding an undefined behaviour
statement for putenv().

The suggested changes also delete the statements that follow the
undefined behaviour statements for setenv() and unsetenv(). They say
that the functions "update the list of pointers to which environ
points". This seems redundant, as the functions have to update those
pointers in order to make the changes to the environment that they
are required to make.
Desired Action Delete the paragraph:

    If the application modifies environ or the pointers to which it
    points, the behavior of getenv() is undefined.

Replace the third to last paragraphs of RATIONALE (lines 33880 to
33906) with:

    Some earlier versions of this standard stated that the behavior of
    getenv() is undefined if the application modifies environ or the
    pointers to which it points. The idea was that if the environment
    was only modified using standard functions, then implementations
    could make use of additional data structures internally to speed
    up environment searches. However, there were no equivalent
    statements about undefined behavior for other functions that
    obtain the value of environment variables (effectively using
    getenv() internally). This meant that implementations could only
    perform "unsafe" faster environment searches in getenv() itself,
    not in other functions such as tzset() or setlocale().

    In this version of the standard, the restrictions on updating
    environ have been removed, so that the requirements for getenv()
    are the same as for other functions that obtain the value of
    environment variables. Implementations can still use additional
    data structures internally to speed up searches, but if they do
    then they must do it in a way that can cope with direct changes
    to environ or the pointers to which it points.

At page 1857 line 59347 section setenv, delete the paragraph:

    If the application modifies environ or the pointers to which it
    points, the behavior of setenv() is undefined. The setenv()
    function shall update the list of pointers to which environ points.

At page 2161 line 68256 section unsetenv, delete the paragraph:

    If the application modifies environ or the pointers to which it
    points, the behavior of unsetenv() is undefined. The unsetenv()
    function shall update the list of pointers to which environ points.

Tags tc1-2008
Attached Files

- Relationships
related to 0000386Closedajosey 1003.1(2008)/Issue 7 environ should be declared in <unistd.h> 
related to 0000273Closedajosey 1003.1(2004)/Issue 6 putenv() missing requirement 
related to 0000438Closedajosey 2008-TC1 XSH/TC1/D1/0235 partially redundant 

-  Notes
(0000259)
nick (manager)
2009-10-15 15:59
edited on: 2009-10-15 16:25

Consensus on 2009-10-15 telecon is to go for option 2.

Interpretation response
------------------------
The standard states the requirements for environ, and conforming implementations must
     conform to this. However, concerns have been raised about this
     which are being referred to the sponsor.
Rationale:
-------------
None.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
Remove at 5476
         "manipulating the environ variable,"
Add new paragraph at 5478:

If the application modifies environ or the pointers to which it
    points, the behavior of all interfaces described in the System Interfaces volume of POSIX.1-2008 is undefined.

Also need to remove undefined behavior note in getenv (line 33856), setenv (line 59347) and unsetenv (line 68256)

Modify the rationale for getenv on page 1009, line 33885 from:
    This constraint allows the implementation to properly manage
      the memory it allocates, either by using allocated storage for all variables (copying them on the
      first invocation of setenv( ) or unsetenv( )), or keeping track of which strings are currently in
      allocated space and which are not, via a separate table or some other means. This enables the
      implementation to free any allocated space used by strings (and perhaps the pointers to them)
      stored in environ when unsetenv( ) is called.

to
    This constraint allows the implementation to properly manage
      the memory it allocates. This enables the
      implementation to free any space it has allocated to strings (and perhaps the pointers to them)
      stored in environ when unsetenv() is called.

Add putenv to the list of functions that modify environ on line 33897.

Add to Rationale for getenv (page 1009, line 33897):

          For these reasons, any application that directly modifies the environ variable, or the pointers to which it points, has undefined behavior.

Also exec page 772 line 25713:
          Any application that directly modifies the environ variable, or the pointers to which it points, has undefined behavior.

(0000281)
ajosey (manager)
2009-11-05 16:28

Interpretation response
------------------------
The standard states the requirements for environ, and conforming implementations must
     conform to this. However, concerns have been raised about this
     which are being referred to the sponsor.

Rationale:
-------------
None.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

Changes to XBD...

At page 173 line 5476 section 8.1, change:

    manipulating the environ variable

to:

    assigning a new value to the environ variable

Add a new paragraph after line 5478:

    If the application modifies the pointers to which environ
    points, the behavior of all interfaces described in the System
    Interfaces volume of POSIX.1-2008 is undefined.

Changes to XSH...

Remove the undefined behavior note in getenv (line 33856), setenv
(line 59347) and unsetenv (line 68256). The change history should
note that these are no longer needed because XBD 8.1 says the
behaviour of all interfaces is undefined.

Modify the rationale for getenv on page 1009, line 33885 from:

    Conforming applications are required not to modify environ
    directly, but to use only the functions described here to
    manipulate the process environment as an abstract object. Thus,
    the implementation of the environment access functions has
    complete control over the data structure used to represent the
    environment (subject to the requirement that environ be
    maintained as a list of strings with embedded <equals-sign>
    characters for applications that wish to scan the environment).
    This constraint allows the implementation to properly manage
    the memory it allocates, either by using allocated storage for all
    variables (copying them on the first invocation of setenv() or
    unsetenv()), or keeping track of which strings are currently in
    allocated space and which are not, via a separate table or some
    other means. This enables the implementation to free any allocated
    space used by strings (and perhaps the pointers to them) stored in
    environ when unsetenv() is called. A C runtime start-up procedure
    (that which invokes main() and perhaps initializes environ) can
    also initialize a flag indicating that none of the environment has
    yet been copied to allocated storage, or that the separate table
    has not yet been initialized.

    In fact, for higher performance of getenv(), the implementation
    could also maintain a separate copy of the environment in a data
    structure that could be searched much more quickly (such as an
    indexed hash table, or a binary tree), and update both it and the
    linear list at environ when setenv() or unsetenv() is invoked.

to:

    Conforming applications are required not to directly modify the
    pointers to which environ points, but to use only the setenv(),
    unsetenv() and putenv() functions, or assignment to environ
    itself, to manipulate the process environment. This constraint
    allows the implementation to properly manage the memory it
    allocates. This enables the implementation to free any space it
    has allocated to strings (and perhaps the pointers to them)
    stored in environ when unsetenv() is called. A C runtime start-up
    procedure (that which invokes main() and perhaps initializes
    environ) can also initialize a flag indicating that none of the
    environment has yet been copied to allocated storage, or that the
    separate table has not yet been initialized. If the application
    switches to a complete new environment by assigning a new value
    to environ, this can be detected by getenv(), setenv(), unsetenv()
    or putenv() and the implementation can at that point reinitialize
    based on the new environment. (This may include copying the
    environment strings into a new array and assigning environ to
    point to it.)

    In fact, for higher performance of getenv(), implementations
    that do not provide putenv() could also maintain a separate copy
    of the environment in a data structure that could be searched
    much more quickly (such as an indexed hash table, or a binary
    tree), and update both it and the linear list at environ when
    setenv() or unsetenv() is invoked. On implementations that do
    provide putenv(), such a copy might still be worthwhile but
    would need to allow for the fact that applications can directly
    modify the content of environment strings added with putenv().
    For example, if an environment string found by searching the
    copy is one that was added using putenv(), the implementation
    would need to check that the string in environ still has the
    same name (and value, if the copy includes values), and whenever
    searching the copy produces no match the implementation would
    then need to search each environment string in environ that
    was added using putenv() in case any of them have changed their
    names and now match. Thus each use of putenv() to add to the
    environment would reduce the speed advantage of having the copy.

After page 772 line 25712 section exec, add two new paragraphs:

    Applications can change the entire environment in a single
    operation by assigning the environ variable to point to an array
    of character pointers to the new environment strings.
    After assigning a new value to environ, applications should
    not rely on the new environment strings remaining part of the
    environment, as a call to getenv(), [XSI]putenv(),[/XSI]
    setenv(), unsetenv() or any function that is dependent on an
    environment variable may, on noticing that environ has changed,
    copy the environment strings to a new array and assign environ
    to point to it.

    Any application that directly modifies the pointers to which the
    environ variable points has undefined behavior.

At page 779 line 25989 section exec, change:

    The environ array should not be accessed directly by the
    application.

    The new process might be invoked in a non-conforming environment
    if the envp array does not contain implementation-defined
    variables required by the implementation to provide a
    conforming environment. See the _CS_V7_ENV entry in <unistd.h>
    and confstr() for details.

to:

    When assigning a new value to the environ variable, applications
    should ensure that the environment to which it will point contains
    at least the following:

        a. Any implementation-defined variables required by the
           implementation to provide a conforming environment. See the
           _CS_V7_ENV entry in <unistd.h> and confstr() for details.

        b. A value for PATH which finds conforming versions of all
           standard utilities before any other versions.

    The same constraint applies to the envp array passed to execle()
    or execve(), in order to ensure that the new process image is
    invoked in a conforming environment.

At page 1717 line 54880 section putenv, after:

    The setenv() function is preferred over this function.

add (as part of the same paragraph):

    One reason is that putenv() is optional and therefore less
    portable. Another is that using putenv() can slow down
    environment searches, as explained in the RATIONALE for getenv().

At page 1717 line 54882 section putenv, change:

    The standard developers noted that putenv() is the only function
    available to add to the environment without permitting memory
    leaks.

to:

    Refer to the RATIONALE section in [xref to setenv()].

Add to the end of the setenv() rationale (P1858 L59381):

    See also the RATIONALE section in [xref to getenv()].

- Issue History
Date Modified Username Field Change
2009-10-12 15:32 geoffclare New Issue
2009-10-12 15:32 geoffclare Status New => Under Review
2009-10-12 15:32 geoffclare Assigned To => ajosey
2009-10-12 15:32 geoffclare Name => Geoff Clare
2009-10-12 15:32 geoffclare Organization => The Open Group
2009-10-12 15:32 geoffclare Section => getenv
2009-10-12 15:32 geoffclare Page Number => 1008
2009-10-12 15:32 geoffclare Line Number => 33856
2009-10-12 15:32 geoffclare Interp Status => ---
2009-10-15 15:59 nick Note Added: 0000259
2009-10-15 16:00 nick Note Edited: 0000259
2009-10-15 16:02 nick Note Edited: 0000259
2009-10-15 16:07 nick Note Edited: 0000259
2009-10-15 16:16 nick Note Edited: 0000259
2009-10-15 16:20 nick Note Edited: 0000259
2009-10-15 16:22 nick Note Edited: 0000259
2009-10-15 16:25 nick Note Edited: 0000259
2009-10-15 16:26 nick Interp Status --- => Pending
2009-10-15 16:26 nick Final Accepted Text => Note: 0000259
2009-10-15 16:26 nick Status Under Review => Interpretation Required
2009-10-15 16:26 nick Resolution Open => Accepted As Marked
2009-10-15 16:26 nick Desired Action Updated
2009-11-03 17:12 geoffclare Resolution Accepted As Marked => Open
2009-11-05 16:28 ajosey Note Added: 0000281
2009-11-05 16:28 ajosey Resolution Open => Accepted As Marked
2009-11-05 16:29 ajosey Final Accepted Text Note: 0000259 => Note: 0000281
2009-11-06 06:55 ajosey Interp Status Pending => Proposed
2009-12-07 16:55 ajosey Interp Status Proposed => Approved
2010-09-21 11:28 geoffclare Tag Attached: tc1-2008
2011-02-23 20:25 eblake Relationship added related to 0000386
2011-04-15 10:15 ajosey Relationship added related to 0000273
2011-05-13 15:32 eblake Relationship added related to 0000438
2013-04-16 13:06 ajosey Status Interpretation Required => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker