View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000167 | 1003.1(2008)/Issue 7 | System Interfaces | public | 2009-10-12 15:32 | 2013-04-16 13:06 |
Reporter | geoffclare | Assigned To | ajosey | ||
Priority | normal | Severity | Objection | Type | Error |
Status | Closed | Resolution | Accepted As Marked | ||
Name | Geoff Clare | ||||
Organization | The Open Group | ||||
User Reference | |||||
Section | getenv | ||||
Page Number | 1008 | ||||
Line Number | 33856 | ||||
Interp Status | Approved | ||||
Final Accepted Text | 0000167:0000281 | ||||
Summary | 0000167: getenv() and modifying environ directly | ||||
Description | The description of getenv() states: If the application modifies environ or the pointers to which it points, the behavior of getenv() is undefined. According to the getenv() rationale (line 33880 onwards), this statement is there so that implementations can make use of additional data structures internally to speed up environment searches. However, several functions effectively use getenv() internally, e.g. tzset(), localtime() and mktime() obtain the value of TZ, setlocale() obtains the LANG and LC_* variable values, etc. None of these functions have an equivalent statement about undefined behaviour if the application modifies environ or the pointers to which it points. So, currently it appears that these functions are required to cope with direct changes made to environ (or its pointers) whereas getenv() is not. In fact, any function can make use of environment variables (e.g. the variables listed by confstr() with _CS_V7_ENV), so this problem is not just restricted to functions where the use of environment variables is specified in the standard. I can see three possible solutions: 1. Remove the undefined behaviour statement from the getenv() page. This would mean that if implementations want to use additional data structures internally to speed up searches, they have to do so in a way that can cope with direct changes to environ or its pointers. 2. Extend the getenv() undefined behaviour statement to apply to all functions in the standard. This would enforce, via normative text, the informal statement in the getenv() rationale from POSIX.1a that "Conforming applications are required not to modify environ directly, but to use only the functions described here to manipulate the process environment as an abstract object." (Currently this is not reflected in normative text. Applications can modify environ directly as long as they do not subsequently call getenv(), setenv() or unsetenv().) 3. Extend the getenv() undefined behaviour statement to apply to the standard functions that are required to obtain environment variable values. If other functions obtain environment variable values, they must do so in a "safe" way. Applications would have a list of functions that they must avoid calling if they have modified environ or its pointers. In the suggested changes I have plumped for option 1. This effectively restores the status quo from before POSIX.1a (except that setenv() and unsetenv() are now standard whereas before they were extensions.) A side issue is that some of the rationale explaining the reason for this undefined behaviour in getenv() is out of date. For example: Thus, the implementation of the environment access functions has complete control over the data structure used to represent the environment (subject to the requirement that environ be maintained as a list of strings with embedded <equals-sign> characters for applications that wish to scan the environment). This constraint allows the implementation to properly manage the memory it allocates, either by using allocated storage for all variables (copying them on the first invocation of setenv() or unsetenv()), or keeping track of which strings are currently in allocated space and which are not, via a separate table or some other means. This enables the implementation to free any allocated space used by strings (and perhaps the pointers to them) stored in environ when unsetenv() is called. This rationale came in from POSIX.1a. It was fine to make such a statement in that standard because it only defined getenv(), setenv() and unsetenv(). However, with the merger into SUSv3 some of it is no longer true because of the XSI putenv() function. This function places its argument directly into environ[]. Thus the memory for variables added using putenv() is under the application's control, not the implementation's. This side issue needs to be taken into account when updating the rationale for whichever of the 3 options to solve the main issue the group decides on. Another related issue is that setenv() and unsetenv() have undefined behaviour if environ or its pointers have been modified directly, but putenv() does not. Since option 1 abandons the POSIX.1a idea of allowing "unsafe" data structures to speed up searches, the suggested changes also remove these statements for setenv() and unsetenv(). If the group decides to go with option 2 or 3, the statements should be kept and there may be a case for adding an undefined behaviour statement for putenv(). The suggested changes also delete the statements that follow the undefined behaviour statements for setenv() and unsetenv(). They say that the functions "update the list of pointers to which environ points". This seems redundant, as the functions have to update those pointers in order to make the changes to the environment that they are required to make. | ||||
Desired Action | Delete the paragraph: If the application modifies environ or the pointers to which it points, the behavior of getenv() is undefined. Replace the third to last paragraphs of RATIONALE (lines 33880 to 33906) with: Some earlier versions of this standard stated that the behavior of getenv() is undefined if the application modifies environ or the pointers to which it points. The idea was that if the environment was only modified using standard functions, then implementations could make use of additional data structures internally to speed up environment searches. However, there were no equivalent statements about undefined behavior for other functions that obtain the value of environment variables (effectively using getenv() internally). This meant that implementations could only perform "unsafe" faster environment searches in getenv() itself, not in other functions such as tzset() or setlocale(). In this version of the standard, the restrictions on updating environ have been removed, so that the requirements for getenv() are the same as for other functions that obtain the value of environment variables. Implementations can still use additional data structures internally to speed up searches, but if they do then they must do it in a way that can cope with direct changes to environ or the pointers to which it points. At page 1857 line 59347 section setenv, delete the paragraph: If the application modifies environ or the pointers to which it points, the behavior of setenv() is undefined. The setenv() function shall update the list of pointers to which environ points. At page 2161 line 68256 section unsetenv, delete the paragraph: If the application modifies environ or the pointers to which it points, the behavior of unsetenv() is undefined. The unsetenv() function shall update the list of pointers to which environ points. | ||||
Tags | tc1-2008 |
|
Consensus on 2009-10-15 telecon is to go for option 2. Interpretation response ------------------------ The standard states the requirements for environ, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- None. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- Remove at 5476 "manipulating the environ variable," Add new paragraph at 5478: If the application modifies environ or the pointers to which it points, the behavior of all interfaces described in the System Interfaces volume of POSIX.1-2008 is undefined. Also need to remove undefined behavior note in getenv (line 33856), setenv (line 59347) and unsetenv (line 68256) Modify the rationale for getenv on page 1009, line 33885 from: This constraint allows the implementation to properly manage the memory it allocates, either by using allocated storage for all variables (copying them on the first invocation of setenv( ) or unsetenv( )), or keeping track of which strings are currently in allocated space and which are not, via a separate table or some other means. This enables the implementation to free any allocated space used by strings (and perhaps the pointers to them) stored in environ when unsetenv( ) is called. to This constraint allows the implementation to properly manage the memory it allocates. This enables the implementation to free any space it has allocated to strings (and perhaps the pointers to them) stored in environ when unsetenv() is called. Add putenv to the list of functions that modify environ on line 33897. Add to Rationale for getenv (page 1009, line 33897): For these reasons, any application that directly modifies the environ variable, or the pointers to which it points, has undefined behavior. Also exec page 772 line 25713: Any application that directly modifies the environ variable, or the pointers to which it points, has undefined behavior. |
|
Interpretation response ------------------------ The standard states the requirements for environ, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- None. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- Changes to XBD... At page 173 line 5476 section 8.1, change: manipulating the environ variable to: assigning a new value to the environ variable Add a new paragraph after line 5478: If the application modifies the pointers to which environ points, the behavior of all interfaces described in the System Interfaces volume of POSIX.1-2008 is undefined. Changes to XSH... Remove the undefined behavior note in getenv (line 33856), setenv (line 59347) and unsetenv (line 68256). The change history should note that these are no longer needed because XBD 8.1 says the behaviour of all interfaces is undefined. Modify the rationale for getenv on page 1009, line 33885 from: Conforming applications are required not to modify environ directly, but to use only the functions described here to manipulate the process environment as an abstract object. Thus, the implementation of the environment access functions has complete control over the data structure used to represent the environment (subject to the requirement that environ be maintained as a list of strings with embedded <equals-sign> characters for applications that wish to scan the environment). This constraint allows the implementation to properly manage the memory it allocates, either by using allocated storage for all variables (copying them on the first invocation of setenv() or unsetenv()), or keeping track of which strings are currently in allocated space and which are not, via a separate table or some other means. This enables the implementation to free any allocated space used by strings (and perhaps the pointers to them) stored in environ when unsetenv() is called. A C runtime start-up procedure (that which invokes main() and perhaps initializes environ) can also initialize a flag indicating that none of the environment has yet been copied to allocated storage, or that the separate table has not yet been initialized. In fact, for higher performance of getenv(), the implementation could also maintain a separate copy of the environment in a data structure that could be searched much more quickly (such as an indexed hash table, or a binary tree), and update both it and the linear list at environ when setenv() or unsetenv() is invoked. to: Conforming applications are required not to directly modify the pointers to which environ points, but to use only the setenv(), unsetenv() and putenv() functions, or assignment to environ itself, to manipulate the process environment. This constraint allows the implementation to properly manage the memory it allocates. This enables the implementation to free any space it has allocated to strings (and perhaps the pointers to them) stored in environ when unsetenv() is called. A C runtime start-up procedure (that which invokes main() and perhaps initializes environ) can also initialize a flag indicating that none of the environment has yet been copied to allocated storage, or that the separate table has not yet been initialized. If the application switches to a complete new environment by assigning a new value to environ, this can be detected by getenv(), setenv(), unsetenv() or putenv() and the implementation can at that point reinitialize based on the new environment. (This may include copying the environment strings into a new array and assigning environ to point to it.) In fact, for higher performance of getenv(), implementations that do not provide putenv() could also maintain a separate copy of the environment in a data structure that could be searched much more quickly (such as an indexed hash table, or a binary tree), and update both it and the linear list at environ when setenv() or unsetenv() is invoked. On implementations that do provide putenv(), such a copy might still be worthwhile but would need to allow for the fact that applications can directly modify the content of environment strings added with putenv(). For example, if an environment string found by searching the copy is one that was added using putenv(), the implementation would need to check that the string in environ still has the same name (and value, if the copy includes values), and whenever searching the copy produces no match the implementation would then need to search each environment string in environ that was added using putenv() in case any of them have changed their names and now match. Thus each use of putenv() to add to the environment would reduce the speed advantage of having the copy. After page 772 line 25712 section exec, add two new paragraphs: Applications can change the entire environment in a single operation by assigning the environ variable to point to an array of character pointers to the new environment strings. After assigning a new value to environ, applications should not rely on the new environment strings remaining part of the environment, as a call to getenv(), [XSI]putenv(),[/XSI] setenv(), unsetenv() or any function that is dependent on an environment variable may, on noticing that environ has changed, copy the environment strings to a new array and assign environ to point to it. Any application that directly modifies the pointers to which the environ variable points has undefined behavior. At page 779 line 25989 section exec, change: The environ array should not be accessed directly by the application. The new process might be invoked in a non-conforming environment if the envp array does not contain implementation-defined variables required by the implementation to provide a conforming environment. See the _CS_V7_ENV entry in <unistd.h> and confstr() for details. to: When assigning a new value to the environ variable, applications should ensure that the environment to which it will point contains at least the following: a. Any implementation-defined variables required by the implementation to provide a conforming environment. See the _CS_V7_ENV entry in <unistd.h> and confstr() for details. b. A value for PATH which finds conforming versions of all standard utilities before any other versions. The same constraint applies to the envp array passed to execle() or execve(), in order to ensure that the new process image is invoked in a conforming environment. At page 1717 line 54880 section putenv, after: The setenv() function is preferred over this function. add (as part of the same paragraph): One reason is that putenv() is optional and therefore less portable. Another is that using putenv() can slow down environment searches, as explained in the RATIONALE for getenv(). At page 1717 line 54882 section putenv, change: The standard developers noted that putenv() is the only function available to add to the environment without permitting memory leaks. to: Refer to the RATIONALE section in [xref to setenv()]. Add to the end of the setenv() rationale (P1858 L59381): See also the RATIONALE section in [xref to getenv()]. |
Date Modified | Username | Field | Change |
---|---|---|---|
2009-10-12 15:32 | geoffclare | New Issue | |
2009-10-12 15:32 | geoffclare | Status | New => Under Review |
2009-10-12 15:32 | geoffclare | Assigned To | => ajosey |
2009-10-12 15:32 | geoffclare | Name | => Geoff Clare |
2009-10-12 15:32 | geoffclare | Organization | => The Open Group |
2009-10-12 15:32 | geoffclare | Section | => getenv |
2009-10-12 15:32 | geoffclare | Page Number | => 1008 |
2009-10-12 15:32 | geoffclare | Line Number | => 33856 |
2009-10-12 15:32 | geoffclare | Interp Status | => --- |
2009-10-15 15:59 | nick | Note Added: 0000259 | |
2009-10-15 16:00 | nick | Note Edited: 0000259 | |
2009-10-15 16:02 | nick | Note Edited: 0000259 | |
2009-10-15 16:07 | nick | Note Edited: 0000259 | |
2009-10-15 16:16 | nick | Note Edited: 0000259 | |
2009-10-15 16:20 | nick | Note Edited: 0000259 | |
2009-10-15 16:22 | nick | Note Edited: 0000259 | |
2009-10-15 16:25 | nick | Note Edited: 0000259 | |
2009-10-15 16:26 | nick | Interp Status | --- => Pending |
2009-10-15 16:26 | nick | Final Accepted Text | => 0000167:0000259 |
2009-10-15 16:26 | nick | Status | Under Review => Interpretation Required |
2009-10-15 16:26 | nick | Resolution | Open => Accepted As Marked |
2009-10-15 16:26 | nick | Desired Action Updated | |
2009-11-03 17:12 | geoffclare | Resolution | Accepted As Marked => Open |
2009-11-05 16:28 | ajosey | Note Added: 0000281 | |
2009-11-05 16:28 | ajosey | Resolution | Open => Accepted As Marked |
2009-11-05 16:29 | ajosey | Final Accepted Text | 0000167:0000259 => 0000167:0000281 |
2009-11-06 06:55 | ajosey | Interp Status | Pending => Proposed |
2009-12-07 16:55 | ajosey | Interp Status | Proposed => Approved |
2010-09-21 11:28 | geoffclare | Tag Attached: tc1-2008 | |
2011-02-23 20:25 | eblake | Relationship added | related to 0000386 |
2011-04-15 10:15 | ajosey | Relationship added | related to 0000273 |
2011-05-13 15:32 | eblake | Relationship added | related to 0000438 |
2013-04-16 13:06 | ajosey | Status | Interpretation Required => Closed |