Anonymous | Login | 2023-03-21 18:43 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001273 | [1003.1(2016/18)/Issue7+TC2] System Interfaces | Objection | Error | 2019-07-27 10:49 | 2022-07-13 15:34 | |||||||
Reporter | stephane | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | |||||||||||
Name | Stephane Chazelas | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | glob() | |||||||||||
Page Number | 1109, 1110 (in 2018 edition) | |||||||||||
Line Number | 35742, 35768 | |||||||||||
Interp Status | --- | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001273: glob()'s GLOB_ERR/errfunc and non-directory files | |||||||||||
Description |
In the XSH glob() specification, For GLOB_ERR, the spec says: > Cause glob() to return when it encounters a directory that it > cannot open or read. Ordinarily, glob() continues to find > matches. (Note: it's not clear what "Ordinarily" means here. When errfunc is set and returns non-zero, glob() doesn't continue, is it ordinary?). For errfunc: > If, during the search, a directory is encountered that cannot > be opened or read and errfunc is not a null pointer, glob() > calls (*errfunc()) with two arguments. [...] > 2. The eerrno argument is the value of errno from the > failure, as set by opendir(), readdir(), or stat(). > (Other values may be used to report other errors not > explicitly documented for those functions.) (Note: does that mean glob() has to call those 3 functions (as opposed to open(O_DIRECTORY)/getdents() or any other API)? Why stat(), shouldn't that be lstat()?) First (and that's still not the case I'm making here), it's not obvious what /directories/ glob() will try to open. It can be somewhat inferred from the spec, as the pathname expansion specification refers to directories that must be readable (which implies they are going to be read) and some that only need to be searchable (implying they're not going to be read). But maybe the spec should be more explicit, as it's not obvious for instance that in */*.c the current directory and all the subdirs are going to be read, while in */foo.c, only the current directory is read (and all subdirs/foo.c lstat()ed), so if there's a non-readable subdir, only the former will fail (or cause errfunc to be invoked). Now, to get to the point, the spec refers to "directories" that can't be opened. What about a /etc/passwd/*.c glob. /etc/passwd is not a directory, opendir("/etc/passwd") if called would fail with ENOTDIR, does that mean glob() should not call opendir() here or that it should ignore opendir()'s error when errno is ENOTDIR? What about */*.c where there's at least one non-directory non-hidden file in the current directory? What if there's a broken symlink or a symlink to a file that is not accessible (and so for which we can't tell whether the symlink is a directory or not)? I've done tests with the FreeBSD 12.0, Solaris 10 and GNU libc 2.27 implementations of glob() and they all differ significantly, the Solaris one being the least compliant to what I can infer the spec to require, and FreeBSD's the most. On Solaris /etc/passwd/*.c glob(GLOB_ERR) fails (and calls errfunc with /etc/passwd, ENOTDIR), same for */*.c in a directory that contains a non-hidden regular file. Only FreeBSD's glob(GLOB_ERR) doesn't fail on non-existent/*.c or */*.c in a directory that contains a broken symlink. The other two call errfunc with ENOENT. For */*.c in a directory that contains a symlink to a non-accessible area, they all fail (call errfunc with EACCESS). Same with */*/*.c if the current directory contains a subdir that is readable but not searchable (note that whether glob() could tell whether entries of that directory are directories or not depends on whether readdir() returns that information or not; either way, we can't tell for symlinks). |
|||||||||||
Desired Action |
At this point, I just want to start the discussion as to how best fix it. - The "ordinarily" should probably be changed to "if errfunc is NULL" - I don't think we want to force implementations to literally call opendir()/readdir()/lstat() (in any case, that "stat()" is wrong). Not sure how to phrase it though. - we should probably clarify which directories glob() is meant to try opening, or which files glob() is meant to invoke opendir() or equivalent on. - and then what to do for non-directories or files which we can't tell whether they're directories or not. Either require the FreeBSD or GNU behaviour or allow both. The Solaris behaviour is not useful IMO, but it's more flexible in that the caller can use a errfunc that ignores ENOENT/ENOTDIR to emulate the GNU/FreeBSD behaviour. |
|||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | ||||||||||||
|
![]() |
||||||
|
![]() |
|
(0004493) shware_systems (reporter) 2019-07-28 01:42 |
Re: - I don't think we want to force implementations to literally call opendir()/readdir()/lstat() (in any case, that "stat()" is wrong). Not sure how to phrase it though. Those are examples of interfaces that may return error codes errfunc is expected to process, that I see, not a requirement glob() implementations have to use them and only them. So, use of lstat() is allowed, as is directly accessing a host through syscalls that affect errno, bypassing use of the listed interfaces entirely. All that is missing is "e.g." after "failure," and ", or other standard interfaces," after "those interfaces" in the parenthetical part, to emphasize they are examples. What may be helpful is a table of standard errno values that are to be passed to errfunc, whichever interface or implementation private code generates them, so applications don't need to guess what case labels errfunc's switch statement may have to process. |
(0004494) stephane (reporter) 2019-07-28 06:44 |
Re: Note: 0004493 Yes, it's actually not clear how stat() is meant to be used here. I had assumed, lstat() was meant instead as in the ./*/file cases where implementations don't open the subdirs of ., but instead try lstat(./subdir/file) on each of them. But GLOB_ERR/errfunc being meant to report errors upon opening/reading *directories*, it can't report errors of lstat(). Maybe the spec wants implementations to call stat() on directories to check if they are searchable? If we step back from the implementation detail to look at what the intention of the interface should be: AFAICT a glob(*/*.c) should return the matching files and GLOB_ERR/errfunc should identify the problems that prevent us from doing so. /etc/passwd/*.c or non-existing/*.c doesn't match any file. The ENOTDIR/ENOENT failure upon trying to opening those non-directories is not an error preventing us from expanding the glob, it's on the contrary confirmation that the glob can't match. Where it becomes more ambiguous is when ELOOP/ENAMETOOLONG is returned (where the files may exist using a shortened path). FreeBSD's glob() does return errors in those cases which IMO sounds like the best thing to do. The real problem with the interface is that it doesn't allow reporting the lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for instance could cause confusion and imply subdir/foo/bar/baz is a directory that cannot be read, while actually it's probably either subdir, subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want to force implementations to stat() those 3 directories just to report an error. Maybe we don't want to over-specify here and just say GLOB_ERR/errfunc should report the errors upon accessing directories (and directories or files assumed to be directories only) that prevent it from expanding the glob pattern without going into details of the implementation. And an application usage section clarifying that non-existing/*.c should not be reported as an error since the ENOENT failure of accessing the non-existing assumed-to-be-directory doesn't prevent us from expanding the glob, quite the contrary. |
(0004495) stephane (reporter) 2019-07-28 07:03 |
Re: Note: 0004494 > The real problem with the interface is that it doesn't allow reporting the > lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path > and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for > instance could cause confusion and imply subdir/foo/bar/baz is a directory > that cannot be read, while actually it's probably either subdir, > subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want > to force implementations to stat() those 3 directories just to report an > error. Anyway, stat() would not be the right tool, more access(X_OK) in that case. If subdir is not searchable then a */foo/bar/ba[z] would call errfunc(subdir/foo/bar, EACCESS), so it would be acceptable for an implementation to just do access(subdir/foo/bar, X_OK) if they wanted to (that would not cover the other lstat() error cases though). |
(0005885) geoffclare (manager) 2022-07-13 15:34 |
During the mailing list discussion of this bug in July 2019, I said in mail item 29229:Unless one of the implementations changes to do something better before we get too far into work on Issue 8, I think the only choices we have are the Solaris behaviour, the GNU/BSD behaviour, the GNU/BSD "done right" (ELOOP/ENAMETOOLONG/ENOENT/ENOTDIR all treated the same), or allow some or all of these behaviours. Does anyone know if any implementation has made changes to glob() in the last three years? |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |