Anonymous | Login | 2024-12-03 18:42 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001273 | [1003.1(2016/18)/Issue7+TC2] System Interfaces | Objection | Error | 2019-07-27 10:49 | 2024-06-11 09:08 | ||
Reporter | stephane | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Stephane Chazelas | ||||||
Organization | |||||||
User Reference | |||||||
Section | glob() | ||||||
Page Number | 1109, 1110 (in 2018 edition) | ||||||
Line Number | 35742, 35768 | ||||||
Interp Status | Approved | ||||||
Final Accepted Text | Note: 0006426 | ||||||
Summary | 0001273: glob()'s GLOB_ERR/errfunc and non-directory files | ||||||
Description |
In the XSH glob() specification, For GLOB_ERR, the spec says: > Cause glob() to return when it encounters a directory that it > cannot open or read. Ordinarily, glob() continues to find > matches. (Note: it's not clear what "Ordinarily" means here. When errfunc is set and returns non-zero, glob() doesn't continue, is it ordinary?). For errfunc: > If, during the search, a directory is encountered that cannot > be opened or read and errfunc is not a null pointer, glob() > calls (*errfunc()) with two arguments. [...] > 2. The eerrno argument is the value of errno from the > failure, as set by opendir(), readdir(), or stat(). > (Other values may be used to report other errors not > explicitly documented for those functions.) (Note: does that mean glob() has to call those 3 functions (as opposed to open(O_DIRECTORY)/getdents() or any other API)? Why stat(), shouldn't that be lstat()?) First (and that's still not the case I'm making here), it's not obvious what /directories/ glob() will try to open. It can be somewhat inferred from the spec, as the pathname expansion specification refers to directories that must be readable (which implies they are going to be read) and some that only need to be searchable (implying they're not going to be read). But maybe the spec should be more explicit, as it's not obvious for instance that in */*.c the current directory and all the subdirs are going to be read, while in */foo.c, only the current directory is read (and all subdirs/foo.c lstat()ed), so if there's a non-readable subdir, only the former will fail (or cause errfunc to be invoked). Now, to get to the point, the spec refers to "directories" that can't be opened. What about a /etc/passwd/*.c glob. /etc/passwd is not a directory, opendir("/etc/passwd") if called would fail with ENOTDIR, does that mean glob() should not call opendir() here or that it should ignore opendir()'s error when errno is ENOTDIR? What about */*.c where there's at least one non-directory non-hidden file in the current directory? What if there's a broken symlink or a symlink to a file that is not accessible (and so for which we can't tell whether the symlink is a directory or not)? I've done tests with the FreeBSD 12.0, Solaris 10 and GNU libc 2.27 implementations of glob() and they all differ significantly, the Solaris one being the least compliant to what I can infer the spec to require, and FreeBSD's the most. On Solaris /etc/passwd/*.c glob(GLOB_ERR) fails (and calls errfunc with /etc/passwd, ENOTDIR), same for */*.c in a directory that contains a non-hidden regular file. Only FreeBSD's glob(GLOB_ERR) doesn't fail on non-existent/*.c or */*.c in a directory that contains a broken symlink. The other two call errfunc with ENOENT. For */*.c in a directory that contains a symlink to a non-accessible area, they all fail (call errfunc with EACCESS). Same with */*/*.c if the current directory contains a subdir that is readable but not searchable (note that whether glob() could tell whether entries of that directory are directories or not depends on whether readdir() returns that information or not; either way, we can't tell for symlinks). |
||||||
Desired Action |
At this point, I just want to start the discussion as to how best fix it. - The "ordinarily" should probably be changed to "if errfunc is NULL" - I don't think we want to force implementations to literally call opendir()/readdir()/lstat() (in any case, that "stat()" is wrong). Not sure how to phrase it though. - we should probably clarify which directories glob() is meant to try opening, or which files glob() is meant to invoke opendir() or equivalent on. - and then what to do for non-directories or files which we can't tell whether they're directories or not. Either require the FreeBSD or GNU behaviour or allow both. The Solaris behaviour is not useful IMO, but it's more flexible in that the caller can use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD behaviour. |
||||||
Tags | applied_after_i8d3, tc3-2008 | ||||||
Attached Files | |||||||
|
Relationships | |||||||||||
|
Notes | |
(0004493) shware_systems (reporter) 2019-07-28 01:42 |
Re: - I don't think we want to force implementations to literally call opendir()/readdir()/lstat() (in any case, that "stat()" is wrong). Not sure how to phrase it though. Those are examples of interfaces that may return error codes errfunc is expected to process, that I see, not a requirement glob() implementations have to use them and only them. So, use of lstat() is allowed, as is directly accessing a host through syscalls that affect errno, bypassing use of the listed interfaces entirely. All that is missing is "e.g." after "failure," and ", or other standard interfaces," after "those interfaces" in the parenthetical part, to emphasize they are examples. What may be helpful is a table of standard errno values that are to be passed to errfunc, whichever interface or implementation private code generates them, so applications don't need to guess what case labels errfunc's switch statement may have to process. |
(0004494) stephane (reporter) 2019-07-28 06:44 |
Re: Note: 0004493 Yes, it's actually not clear how stat() is meant to be used here. I had assumed, lstat() was meant instead as in the ./*/file cases where implementations don't open the subdirs of ., but instead try lstat(./subdir/file) on each of them. But GLOB_ERR/errfunc being meant to report errors upon opening/reading *directories*, it can't report errors of lstat(). Maybe the spec wants implementations to call stat() on directories to check if they are searchable? If we step back from the implementation detail to look at what the intention of the interface should be: AFAICT a glob(*/*.c) should return the matching files and GLOB_ERR/errfunc should identify the problems that prevent us from doing so. /etc/passwd/*.c or non-existing/*.c doesn't match any file. The ENOTDIR/ENOENT failure upon trying to opening those non-directories is not an error preventing us from expanding the glob, it's on the contrary confirmation that the glob can't match. Where it becomes more ambiguous is when ELOOP/ENAMETOOLONG is returned (where the files may exist using a shortened path). FreeBSD's glob() does return errors in those cases which IMO sounds like the best thing to do. The real problem with the interface is that it doesn't allow reporting the lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for instance could cause confusion and imply subdir/foo/bar/baz is a directory that cannot be read, while actually it's probably either subdir, subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want to force implementations to stat() those 3 directories just to report an error. Maybe we don't want to over-specify here and just say GLOB_ERR/errfunc should report the errors upon accessing directories (and directories or files assumed to be directories only) that prevent it from expanding the glob pattern without going into details of the implementation. And an application usage section clarifying that non-existing/*.c should not be reported as an error since the ENOENT failure of accessing the non-existing assumed-to-be-directory doesn't prevent us from expanding the glob, quite the contrary. |
(0004495) stephane (reporter) 2019-07-28 07:03 |
Re: Note: 0004494 > The real problem with the interface is that it doesn't allow reporting the > lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path > and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for > instance could cause confusion and imply subdir/foo/bar/baz is a directory > that cannot be read, while actually it's probably either subdir, > subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want > to force implementations to stat() those 3 directories just to report an > error. Anyway, stat() would not be the right tool, more access(X_OK) in that case. If subdir is not searchable then a */foo/bar/ba[z] would call errfunc(subdir/foo/bar, EACCESS), so it would be acceptable for an implementation to just do access(subdir/foo/bar, X_OK) if they wanted to (that would not cover the other lstat() error cases though). |
(0005885) geoffclare (manager) 2022-07-13 15:34 |
During the mailing list discussion of this bug in July 2019, I said in mail item 29229:Unless one of the implementations changes to do something better before we get too far into work on Issue 8, I think the only choices we have are the Solaris behaviour, the GNU/BSD behaviour, the GNU/BSD "done right" (ELOOP/ENAMETOOLONG/ENOENT/ENOTDIR all treated the same), or allow some or all of these behaviours. Does anyone know if any implementation has made changes to glob() in the last three years? |
(0006426) geoffclare (manager) 2023-08-10 15:30 |
Interpretation response ------------------------ The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: ------------- None. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- Page and line numbers are for Issue 8 draft 3. On page 1201 line 41070 section glob() (GLOB_ERR), change: Cause glob() to return when an attempt to open, read, or search a directory fails because of an error condition that is related to file system contents. If this flag is not set, glob() shall not treat such conditions as an error, and shall continue to look for matches. Other error conditions may also be treated the same way as error conditions that are related to file system contents.to: Cause glob() to return when an attempt to open or search a pathname as a directory, or an attempt to read an opened directory, fails because of an error condition that is related to file system contents and prevents glob() from expanding the pattern. If this flag is not set, glob() shall not treat such conditions as an error, and shall continue to look for matches. Other error conditions may also be treated the same way as error conditions that are related to file system contents. On Issue 8 draft 3 page 1202 line 41114 section glob(), change: If, during the search, an attempt to open, read, or search a directory fails and errfunc is not a null pointer, ...to: If errfunc is not a null pointer and, during the search, an attempt to open or search a pathname as a directory, or an attempt to read an opened directory, fails because of an error condition that prevents glob() from expanding the pattern, ... After page 1203 line 41124 section glob(), add a paragraph: The set of error conditions that are considered to prevent glob() from expanding the pattern shall include [EACCES], [ENAMETOOLONG], and [ELOOP]. It is implementation-defined what other error conditions are included in the set. After page 1204 line 41202 section glob() (RATIONALE), add: Implementations differ as to which error conditions they consider prevent glob() from expanding the pattern. The standard requires that [EACCES], [ENAMETOOLONG], and [ELOOP] are included because in these cases the expansion could succeed if performed with a different effective user or group ID, or with an alternative pathname (shorter than {PATH_MAX}, or traversing fewer symbolic links). On page 1205 line 41217 section glob(), change FUTURE DIRECTIONS from "None" to: A future version of this standard may require that (*errfunc()) is called for all error conditions that are related to file system contents which occur when attempting to open or search a pathname as a directory or attempting to read an opened directory. On page 2508 line 81856 section 2.14.3, change: If these permissions are denied, or if an attempt to open or search a directory fails because of an error condition that is related to file system contents, this shall not be considered an error and pathname expansion shall continue as if the directory had existed and had been successfully opened or searched, and no matching directory entries had been found in it.to: If these permissions are denied, or if an attempt to open or search a pathname as a directory, or an attempt to read an opened directory, fails because of an error condition that is related to file system contents, this shall not be considered an error and pathname expansion shall continue as if the pathname had named an existing directory which had been successfully opened and read, or searched, and no matching directory entries had been found in it. |
(0006431) agadmin (administrator) 2023-08-14 16:05 |
Interpretation proposed: 14 August 2023 |
(0006505) agadmin (administrator) 2023-10-02 14:31 |
Interpretation approved: 2 October 2023 |
Issue History | |||
Date Modified | Username | Field | Change |
2019-07-27 10:49 | stephane | New Issue | |
2019-07-27 10:49 | stephane | Name | => Stephane Chazelas |
2019-07-27 10:49 | stephane | Section | => glob() |
2019-07-27 10:49 | stephane | Page Number | => 1109, 1110 (in 2018 edition) |
2019-07-27 10:49 | stephane | Line Number | => 35742, 35768 |
2019-07-28 00:48 | Don Cragun | Interp Status | => --- |
2019-07-28 00:48 | Don Cragun | Category | Shell and Utilities => System Interfaces |
2019-07-28 01:42 | shware_systems | Note Added: 0004493 | |
2019-07-28 06:44 | stephane | Note Added: 0004494 | |
2019-07-28 07:03 | stephane | Note Added: 0004495 | |
2019-07-30 10:19 | geoffclare | Relationship added | related to 0001275 |
2021-12-21 22:12 | dennisw | Issue Monitored: dennisw | |
2022-07-13 15:34 | geoffclare | Note Added: 0005885 | |
2023-08-10 15:11 | eblake | Relationship added | related to 0001228 |
2023-08-10 15:30 | geoffclare | Note Added: 0006426 | |
2023-08-10 15:31 | geoffclare | Interp Status | --- => Pending |
2023-08-10 15:31 | geoffclare | Final Accepted Text | => Note: 0006426 |
2023-08-10 15:31 | geoffclare | Status | New => Interpretation Required |
2023-08-10 15:31 | geoffclare | Resolution | Open => Accepted As Marked |
2023-08-10 15:31 | geoffclare | Tag Attached: tc3-2008 | |
2023-08-14 16:05 | agadmin | Interp Status | Pending => Proposed |
2023-08-14 16:05 | agadmin | Note Added: 0006431 | |
2023-10-02 14:31 | agadmin | Interp Status | Proposed => Approved |
2023-10-02 14:31 | agadmin | Note Added: 0006505 | |
2023-10-10 09:10 | geoffclare | Status | Interpretation Required => Applied |
2023-10-10 09:11 | geoffclare | Tag Attached: applied_after_i8d3 | |
2024-06-11 09:08 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |