Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001064 [1003.1(2008)/Issue 7] Base Definitions and Headers Editorial Clarification Requested 2016-07-31 14:11 2020-04-16 09:13
Reporter EdSchouten View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Ed Schouten
Organization Nuxi, the Netherlands
User Reference
Section libgen.h
Page Number -
Line Number -
Interp Status ---
Final Accepted Text Make the changes specified in Note: 0003898.
Summary 0001064: basename() and dirname(): Specification is not complete enough to allow existing thread-unsafe implementations
Description The basename() and dirname() functions are a bit hairy: some operating systems implement them as functions modifying their input (Solaris), others copy the result into internal storage (the BSDs). POSIX seems to allow both these versions. For example, from the basename() article:

"The basename() function may modify the string pointed to by path, and may return a pointer to internal storage. The returned pointer might be invalidated or the storage might be overwritten by a subsequent call to basename(). The basename() function need not be thread-safe."

What is problematic is that POSIX doesn't specify any error conditions. Existing implementations that use internal storage may fail for at least the following two reasons:

1. They use malloc() to allocate a sufficiently large buffer, but this call fails.
2. They use a fixed-size buffer, but the result is too large to be stored in such a buffer.

Instead, both the dirname() and basename() articles say "No errors are defined". Does this mean that failing is never allowed? If so, that would mean thread-unsafe implementations actually don't comply.
Desired Action There are two ways to solve this:

------ Solution 1: Accept the status quo ------
In the basename() article, change this sentence:
"The basename() function shall return a pointer to the final component of path."
To:
"Upon successful completion, the basename() function shall return a pointer to the final component of path. Otherwise, it shall return a null pointer and set errno to indicate the error."

In the dirname() article, change this sentence:
"The dirname() function shall return a pointer to a string that is the parent directory of path. If path is a null pointer or points to an empty string, a pointer to a string "." is returned."
To:
"Upon successful completion, the dirname() function shall return a pointer to a string that is the parent directory of path. If path is a null pointer or points to an empty string, a pointer to a string "." is returned. Otherwise, it shall return a null pointer and set errno to indicate the error."

In both the basename() and dirname() articles, replace:
"No errors are defined."
With:
"The dirname() function may fail if:
[ENOMEM] The space for storing the result cannot be allocated.
[ENAMETOOLONG] The result is longer than {PATH_MAX}."

------ Solution 2: Raising the bar ------
As existing implementations that copy the result into internal storage already break the standard according to the current phrasing, maybe we could use this opportunity to just strengthen the thread-safety requirements?

From the basename() and dirname() articles, remove the following two sentences:
"The basename() function need not be thread-safe."
"The dirname() function need not be thread-safe."

In the basename() article, change this sentence:
"The basename() function may modify the string pointed to by path, and may return a pointer to internal storage. The returned pointer might be invalidated or the storage might be overwritten by a subsequent call to basename()."
To:
The basename() function may modify the string pointed to by path."

In the dirname() article, change this sentence:
"The dirname() function may modify the string pointed to by path, and may return a pointer to internal storage. The returned pointer might be invalidated or the storage might be overwritten by a subsequent call to dirname()."
To:
"The dirname() function may modify the string pointed to by path."
Tags issue8
Attached Files

- Relationships
related to 0001073Appliedajosey 1003.1(2008)/Issue 7 dirname utility: algorithm for computing pathname string is stricter than the corresponding dirname() function 
related to 0001358Applied Issue 8 drafts basename/dirname difference for thread termination 
related to 0000721Closed 1003.1(2013)/Issue7+TC1 Internal storage vs static storage 

-  Notes
(0003315)
shware_systems (reporter)
2016-07-31 16:52

basename() may be implemented so no errors are plausible; it can return either a pointer to somewhere in the input without modifying it or to the static string ".".

I agree dirname() is more problematic, and there should probably be a dirname_r(path, out, outlen) version, but an alternative might be if creating a copy would fail, to a process wide static buffer or dynamically allocated one, then it shall ensure the char after the last '/' in the input is a NUL char. A static char value that holds a copy of any non-NUL char that is overwritten should be accessible also, as a refinement, and thread local similar to errno.

Then the only failure would be if the input is in write locked memory, which could be indicated by returning a null pointer. That is also a possibility for implementations that always modify the input so should be a caveat somewhere.
(0003316)
EdSchouten (updater)
2016-07-31 16:58

basename() also needs to strip trailing slashes, so there are inputs for which it needs to modify.
(0003317)
shware_systems (reporter)
2016-07-31 17:39

Yes, I skipped that thinking basename("foo//") as "/" or ".", not "foo". It can be implemented like dirname() via copy or overwrite, but is harder to check an overwrite occurs that indicates a char has been copied to the save buffer, if that's supported.
(0003318)
EdSchouten (updater)
2016-07-31 18:02
edited on: 2016-07-31 18:03

I'd argue that standardizing dirname_r()/basename_r() is also going to be out of the question. One of the problems already is that FreeBSD provides a basename_r() that is pretty badly designed:

char *basename_r(const char *in, char *out);

Notice the lack of an output buffer size. The function requires the output buffer to be of size PATH_MAX, which wouldn't work for us, as PATH_MAX may be left undefined. It's basically the same problem as what we had with readdir_r() all over again.

Even if a function like basename_r() would have a size argument, people will end up writing code like this:

char buf[PATH_MAX];
const char *output = basename_r(input, buf, sizeof(buf));

Meaning that, again, you'll end up with code that has arbitrary restrictions and doesn't work on systems that don't have PATH_MAX.

It would be a lot easier if a function like this would exist:

void extract_dirname_basename(const char *input, const char **dirname, size_t *dirnamelen, const char **basename, size_t *basenamelen);

As in, you pass in a pathname and it returns two substrings with lengths pointing to the dirname and the basename parts of the input. If you need one of those null terminated, just call strndup(dirname, dirnamelen) or strndup(basename, basenamelen) afterwards.

But then again, we're not in the business of designing new interfaces. Also, coming up with new interfaces doesn't solve the problem described in this bug report, namely that basename() and dirname() are underspecified.

(0003319)
shware_systems (reporter)
2016-07-31 23:20

The main advantage of those varieties is thread safety, as the out buffer can be thread local easily enough, not process wide. That's why I mention it as a 'should', anyways. Whether someone whose business it is actually implements them with the length parameter so they could be proposed as additions is the doubtful part, imo, not that they're not useful enough to consider adding. I believe the FreeBSD version always expects applications to use code like:

if (in != NULL && size_t il=strlen(in) > 0 ) {
char *out=malloc(il);
if (out == NULL) {handle_ENOMEM()};
char *base=basename_r(in, out); // or (in, out, il);
if (base != out) {handle_special_in();};
use_out();
free(out);
}

which avoids any dependance on PATH_MAX being defined, ENAMETOOLONG occuring, and doesn't waste a lot of space. Adding the length parameter permits safer use of buffers that can't be dynamically allocated due to hardware constraints or for speed considerations, as in many malloc()/free()s much slower than handling an occasional ENAMETOOLONG. That they might get abused as you suggest by lazy developers is not the standard's lookout :-).

As it is, I agree the current versions are underspecified, I simply don't see that modifying errno is actually needed as the cases cited can be avoided, or the overhead of thread local buffering option 2 might entail to avoid modifying the input as often as is practical.
(0003320)
EdSchouten (updater)
2016-08-01 07:03

No, that's not how basename_r() is supposed to be used. A buffer of (at least) MAXPATHLEN/PATH_MAX bytes must be provided. ENAMETOOLONG will still occur for long inputs, regardless of the size of the provided output buffer. Please check out the man page:

https://www.freebsd.org/cgi/man.cgi?query=basename&sektion=3 [^]

Also, if I interpret your response correctly, you are saying that implementations that don't modify their input can still be made to never fail. Namely, by modifying their input.... but only in the degenerate cases. That sounds like an incredibly bad idea.
(0003321)
geoffclare (manager)
2016-08-01 11:31

The problem description asks whether an ERRORS section saying "No errors are defined" means that failing is never allowed. The answer is no. The general rules in XSH 2.3 Error Numbers say "Implementations [...] may generate additional errors unless explicitly disallowed for a particular function." As an example of a function for which failing is never allowed, see getuid().

So the only potential problem I see here is that the standard does not specify how to check for errors. This doesn't mean the functions can't report an error - it just means if application writers want to check for errors they need to consult the implementation documentation for each implementation they are targeting.

If existing practice on all implementations where the functions can fail is for them to return a null pointer and set errno, then it may be worth making changes along the lines suggested under "solution 1" in the desired action.
(0003322)
EdSchouten (updater)
2016-08-01 12:14

Hi Geoff,

Thanks for clarifying. That indeed answers the question that I raised in my original post.

Right now the operating system that I 'represent' (FreeBSD) is in a transition, going from using an internal buffer to one that modifies its input unconditionally, for the sake of thread-safety. This means that the exact phrasing of solution 1 would not affect the project going forward.

It looks like OpenBSD, FreeBSD<=11 and Mac OS X can all return NULL and set errno to ENOMEM and ENAMETOOLONG under the conditions that I gave above. NetBSD, however, ends up truncating the result. In other words, there doesn't seem to be a uniform behaviour.
(0003887)
eblake (manager)
2017-11-30 16:12

The specification for basename() and dirname() is so loose that there is no effective safe paradigm for using them in a multi-threaded environment that works across all permitted implementations. GNU code using gnulib (such as coreutils) has opted instead to use different functions base_name() and dir_name() that malloc() results rather than modifying the input buffer, precisely because the standard functions are so difficult to use safely.
(0003888)
Don Cragun (manager)
2017-11-30 19:20

Make the changes noted in Solution #2 in the Desired Action...

Change P625, L21600 AND P737, L25113 basename() and dirname() RATIONALE sections from:
None.

to:
Earlier versions of this standard seemed to allow thread-safe and non-thread-safe implementations of basename() and dirname(), but did not allow implementations to return a NULL pointer and require that errno be set when that happened. The standard now requires thread-safe behavior for both of these functions.
(0003889)
geoffclare (manager)
2017-12-05 09:09

Reopening because:

1. We forgot to remove basename() and dirname() from the list in 2.9.1 of functions that need not be thread-safe.

2. A suggestion from Robert Elz on the mailing list (Dec 4, 2017) that the text should explicitly say the return value may point into the input string and warn that it may be invalidated.

3. Possible issues about the requirement to return "." : can the return value point to a string literal given that the return type is (char *) not (const char *), and if the "." is in a static buffer what happens if the application modifies it? (i.e. can basename() just return a statically-initialized buffer: static char dotbuf[] = "."; or does it have to ensure the buffer contains "." each time it returns dotbuf?)
(0003890)
geoffclare (manager)
2017-12-05 09:21

Another change we might consider is to add a statement to RETURN VALUE like the one on the getpid() page: "The getpid() function shall always be successful and no return value is reserved to indicate an error."
(0003891)
EdSchouten (updater)
2017-12-05 09:41

Hi Geoff,

Regarding the third bullet point, FreeBSD's implementation does this:

- When the provided string is NULL or empty, it returns a constant "." literal.
- When the provided string is non-empty, it stores "." in the input buffer and returns that.

In other words, it only returns a pointer to an external buffer if the provided buffer is too small.

The advantage of this approach is that in non-degenerate cases, it's safe for callers to make subsequent changes to the returned string (overwrite characters, successive basename()/dirname() calls, etc).

In the degenerate case where the constant "." literal is returned, this could cause crashes. Even though this is annoying, it won't introduce any race conditions on the buffer's contents.

Considering that many other implementations also return constant string literals, I suspect we should document that, right?
(0003898)
geoffclare (manager)
2017-12-14 16:32

(All page and line numbers are for the 2016 edition)

On page 513 lline 17908,17921 XSH section 2.9.1, remove basename() and dirname() from the list of functions that need not be thread-safe.

From the basename() and dirname() pages, remove the following two sentences:
The basename() function need not be thread-safe.
from P624, L21565, and:
The dirname() function need not be thread-safe.
from P736, L25075.

In basename(), page 624 line 21557-21558, change:
the string "/". If the string pointed to by path is exactly "//", it is implementation-defined whether '/' or "//" is returned.
to:
the string "/", except that if the string pointed to by path is exactly "//", it is implementation-defined whether "/" or "//" is returned.


In the basename() article on P624, L21561-21564, change this sentence:
The basename() function may modify the string pointed to by path, and may return a pointer to internal storage. The returned pointer might be invalidated or the storage might be overwritten by a subsequent call to basename().
to:
The basename() function may modify the string pointed to by path, and may return a pointer into the input string. The returned pointer might be invalidated if the input string is subsequently modified or freed. If path is a null pointer or points to an empty string, or if the string pointed to by path consists entirely of the '/' character, the returned pointer may point to constant data that cannot be modified.


On page 624 line 21567 section basename() add a new paragraph to the RETURN VALUE section:
The basename() function shall always be successful and no return value is reserved to indicate an error.


In APPLICATION USAGE on page 625 line 21598 change:
None.
to:
Note that in some circumstances (in particular, when the returned string is required to be "/" or "."), the returned pointer might point into constant data. Therefore, if the application needs to modify the returned data, it should be copied first.


On P736 L25069-25071 change:
Trailing <tt>'/'</tt> characters in the path that are not also leading <tt>'/'</tt> characters shall not be counted as part of the path.

If path does not contain
to:
Trailing <tt>'/'</tt> characters in the pathname that are not also leading <tt>'/'</tt> characters shall not be counted as part of the pathname.

If the pathname does not contain


In the dirname() article on P736, L25073-25074, change this sentence:
The dirname() function may modify the string pointed to by path, and may return a pointer to static storage that may then be overwritten by a subsequent call to dirname().
to:
The dirname() function may modify the string pointed to by path, and may return a pointer into the input string. The returned pointer might be invalidated if the input string is subsequently modified or freed. If path does not contain a '/', is a null pointer, or points to an empty string the returned pointer may point to constant data that cannot be modified.


On page 736 line 25078 section dirname(), in RETURN VALUE change:
The dirname( ) function may modify the string pointed to by path, and may return a pointer to internal storage. The returned pointer might be invalidated or the storage might be overwritten by a subsequent call to dirname( ). The returned pointer might also be invalidated if the calling thread is terminated.
to:
The dirname() function shall always be successful and no return value is reserved to indicate an error.


In dirname() APPLICATION USAGE, P737, L25110-25111 change:
Since the meaning of the leading "//" is implementation-defined, dirname("//foo) may return either "//" or '/' (but nothing else).
to:
Since the meaning of the leading "//" is implementation-defined, dirname("//foo") may return either "//" or "/" (but nothing else).

Note that in some circumstances, the returned pointer might point into constant data. Therefore, if the application needs to modify the returned data, it should be copied first.


(note two quoting fixes: the closing double quotes character after //foo, and change from single quote marks to double quote marks around the returned single /)

Change P625, L21600 AND P737, L25113 basename() and dirname() RATIONALE sections from:
None.
to:
Earlier versions of this standard seemed to allow thread-safe and non-thread-safe implementations of basename() and dirname(), but did not allow implementations to return a NULL pointer and require that errno be set when that happened. The standard now requires thread-safe behavior for both of these functions and clearly states that they are always successful.

- Issue History
Date Modified Username Field Change
2016-07-31 14:11 EdSchouten New Issue
2016-07-31 14:11 EdSchouten Status New => Under Review
2016-07-31 14:11 EdSchouten Assigned To => ajosey
2016-07-31 14:11 EdSchouten Name => Ed Schouten
2016-07-31 14:11 EdSchouten Organization => Nuxi, the Netherlands
2016-07-31 14:11 EdSchouten Section => libgen.h
2016-07-31 14:11 EdSchouten Page Number => -
2016-07-31 14:11 EdSchouten Line Number => -
2016-07-31 16:52 shware_systems Note Added: 0003315
2016-07-31 16:58 EdSchouten Note Added: 0003316
2016-07-31 17:39 shware_systems Note Added: 0003317
2016-07-31 18:02 EdSchouten Note Added: 0003318
2016-07-31 18:02 EdSchouten Note Edited: 0003318
2016-07-31 18:03 EdSchouten Note Edited: 0003318
2016-07-31 23:20 shware_systems Note Added: 0003319
2016-08-01 07:03 EdSchouten Note Added: 0003320
2016-08-01 11:31 geoffclare Note Added: 0003321
2016-08-01 12:14 EdSchouten Note Added: 0003322
2016-09-14 14:14 emaste Issue Monitored: emaste
2017-11-30 16:12 eblake Note Added: 0003887
2017-11-30 19:20 Don Cragun Note Added: 0003888
2017-11-30 19:22 Don Cragun Interp Status => ---
2017-11-30 19:22 Don Cragun Final Accepted Text => Make the changes specified in Note: 0003888.
2017-11-30 19:22 Don Cragun Status Under Review => Resolved
2017-11-30 19:22 Don Cragun Resolution Open => Accepted As Marked
2017-11-30 19:22 Don Cragun Tag Attached: issue8
2017-12-05 09:09 geoffclare Note Added: 0003889
2017-12-05 09:09 geoffclare Status Resolved => Under Review
2017-12-05 09:09 geoffclare Resolution Accepted As Marked => Reopened
2017-12-05 09:21 geoffclare Note Added: 0003890
2017-12-05 09:41 EdSchouten Note Added: 0003891
2017-12-14 16:32 geoffclare Note Added: 0003898
2017-12-14 16:34 geoffclare Final Accepted Text Make the changes specified in Note: 0003888. => Make the changes specified in Note: 0003898.
2017-12-14 16:34 geoffclare Status Under Review => Resolved
2017-12-14 16:34 geoffclare Resolution Reopened => Accepted As Marked
2018-02-15 16:17 eblake Relationship added related to 0001073
2020-04-16 09:13 geoffclare Status Resolved => Applied
2020-06-30 14:43 geoffclare Relationship added related to 0001358
2022-08-11 15:37 Don Cragun Relationship added related to 0000721


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker