Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001273 [1003.1(2016/18)/Issue7+TC2] System Interfaces Objection Error 2019-07-27 10:49 2023-10-10 09:10
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Stephane Chazelas
Organization
User Reference
Section glob()
Page Number 1109, 1110 (in 2018 edition)
Line Number 35742, 35768
Interp Status Approved
Final Accepted Text Note: 0006426
Summary 0001273: glob()'s GLOB_ERR/errfunc and non-directory files
Description In the XSH glob() specification,

For GLOB_ERR, the spec says:

> Cause glob() to return when it encounters a directory that it
> cannot open or read. Ordinarily, glob() continues to find
> matches.

(Note: it's not clear what "Ordinarily" means here. When errfunc
is set and returns non-zero, glob() doesn't continue, is it
ordinary?).

For errfunc:

> If, during the search, a directory is encountered that cannot
> be opened or read and errfunc is not a null pointer, glob()
> calls (*errfunc()) with two arguments.
[...]
>  2. The eerrno argument is the value of errno from the
> failure, as set by opendir(), readdir(), or stat().
> (Other values may be used to report other errors not
> explicitly documented for those functions.)

(Note: does that mean glob() has to call those 3 functions (as
opposed to open(O_DIRECTORY)/getdents() or any other API)? Why
stat(), shouldn't that be lstat()?)

First (and that's still not the case I'm making here), it's not
obvious what /directories/ glob() will try to open.

It can be somewhat inferred from the spec, as the pathname
expansion specification refers to directories that must be
readable (which implies they are going to be read) and some that
only need to be searchable (implying they're not going to be
read).

But maybe the spec should be more explicit, as it's not obvious
for instance that in */*.c the current directory and all the
subdirs are going to be read, while in */foo.c, only the
current directory is read (and all subdirs/foo.c lstat()ed), so
if there's a non-readable subdir, only the former will fail (or
cause errfunc to be invoked).

Now, to get to the point, the spec refers to "directories" that
can't be opened.

What about a /etc/passwd/*.c glob. /etc/passwd is not a
directory, opendir("/etc/passwd") if called would fail with
ENOTDIR, does that mean glob() should not call opendir() here or
that it should ignore opendir()'s error when errno is ENOTDIR?

What about */*.c where there's at least one non-directory
non-hidden file in the current directory? What if there's a
broken symlink or a symlink to a file that is not accessible
(and so for which we can't tell whether the symlink is a
directory or not)?

I've done tests with the FreeBSD 12.0, Solaris 10 and GNU libc
2.27 implementations of glob() and they all differ
significantly, the Solaris one being the least compliant to what
I can infer the spec to require, and FreeBSD's the most.

On Solaris /etc/passwd/*.c glob(GLOB_ERR) fails (and calls
errfunc with /etc/passwd, ENOTDIR), same for */*.c in a
directory that contains a non-hidden regular file.

Only FreeBSD's glob(GLOB_ERR) doesn't fail on non-existent/*.c
or */*.c in a directory that contains a broken symlink. The
other two call errfunc with ENOENT.

For */*.c in a directory that contains a symlink to a
non-accessible area, they all fail (call errfunc with EACCESS).
Same with */*/*.c if the current directory contains a subdir
that is readable but not searchable (note that whether glob()
could tell whether entries of that directory are directories or
not depends on whether readdir() returns that information or
not; either way, we can't tell for symlinks).
Desired Action At this point, I just want to start the discussion as to how
best fix it.

- The "ordinarily" should probably be changed to "if errfunc is
  NULL"

- I don't think we want to force implementations to literally
  call opendir()/readdir()/lstat() (in any case, that "stat()"
  is wrong). Not sure how to phrase it though.

- we should probably clarify which directories glob() is meant
  to try opening, or which files glob() is meant to invoke
  opendir() or equivalent on.

- and then what to do for non-directories or files which we
  can't tell whether they're directories or not. Either require
  the FreeBSD or GNU behaviour or allow both. The Solaris
  behaviour is not useful IMO, but it's more flexible in that
  the caller can use a errfunc that ignores ENOENT/ENOTDIR to
  emulate the GNU/FreeBSD behaviour.
Tags applied_after_i8d3, tc3-2008
Attached Files

- Relationships
related to 0001228Applied allow shells to exclude "." and ".." from pathname expansions 
related to 0001275Applied pathname expansion errors 

-  Notes
(0004493)
shware_systems (reporter)
2019-07-28 01:42

Re:
- I don't think we want to force implementations to literally
  call opendir()/readdir()/lstat() (in any case, that "stat()"
  is wrong). Not sure how to phrase it though.


Those are examples of interfaces that may return error codes errfunc is expected to process, that I see, not a requirement glob() implementations have to use them and only them. So, use of lstat() is allowed, as is directly accessing a host through syscalls that affect errno, bypassing use of the listed interfaces entirely. All that is missing is "e.g." after "failure," and ", or other standard interfaces," after "those interfaces" in the parenthetical part, to emphasize they are examples.

What may be helpful is a table of standard errno values that are to be passed to errfunc, whichever interface or implementation private code generates them, so applications don't need to guess what case labels errfunc's switch statement may have to process.
(0004494)
stephane (reporter)
2019-07-28 06:44

Re: Note: 0004493

Yes, it's actually not clear how stat() is meant to be used here. I had assumed, lstat() was meant instead as in the ./*/file cases where implementations don't open the subdirs of ., but instead try lstat(./subdir/file) on each of them.

But GLOB_ERR/errfunc being meant to report errors upon opening/reading *directories*, it can't report errors of lstat(). Maybe the spec wants implementations to call stat() on directories to check if they are searchable?

If we step back from the implementation detail to look at what the intention of the interface should be: AFAICT a glob(*/*.c) should return the matching files and GLOB_ERR/errfunc should identify the problems that prevent us from doing so.

/etc/passwd/*.c or non-existing/*.c doesn't match any file. The ENOTDIR/ENOENT failure upon trying to opening those non-directories is not an error preventing us from expanding the glob, it's on the contrary confirmation that the glob can't match.

Where it becomes more ambiguous is when ELOOP/ENAMETOOLONG is returned (where the files may exist using a shortened path). FreeBSD's glob() does return errors in those cases which IMO sounds like the best thing to do.

The real problem with the interface is that it doesn't allow reporting the lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for instance could cause confusion and imply subdir/foo/bar/baz is a directory that cannot be read, while actually it's probably either subdir, subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want to force implementations to stat() those 3 directories just to report an error.

Maybe we don't want to over-specify here and just say GLOB_ERR/errfunc should report the errors upon accessing directories (and directories or files assumed to be directories only) that prevent it from expanding the glob pattern without going into details of the implementation. And an application usage section clarifying that non-existing/*.c should not be reported as an error since the ENOENT failure of accessing the non-existing assumed-to-be-directory doesn't prevent us from expanding the glob, quite the contrary.
(0004495)
stephane (reporter)
2019-07-28 07:03

Re: Note: 0004494
> The real problem with the interface is that it doesn't allow reporting the
> lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path
> and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for
> instance could cause confusion and imply subdir/foo/bar/baz is a directory
> that cannot be read, while actually it's probably either subdir,
> subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want
> to force implementations to stat() those 3 directories just to report an
> error.

Anyway, stat() would not be the right tool, more access(X_OK) in that case. If subdir is not searchable then a */foo/bar/ba[z] would call errfunc(subdir/foo/bar, EACCESS), so it would be acceptable for an implementation to just do access(subdir/foo/bar, X_OK) if they wanted to (that would not cover the other lstat() error cases though).
(0005885)
geoffclare (manager)
2022-07-13 15:34

During the mailing list discussion of this bug in July 2019, I said in mail item 29229:
Unless one of the implementations changes to do something better before we get too far into work on Issue 8, I think the only choices we have are the Solaris behaviour, the GNU/BSD behaviour, the GNU/BSD "done right" (ELOOP/ENAMETOOLONG/ENOENT/ENOTDIR all treated the same), or allow some or all of these behaviours.

Does anyone know if any implementation has made changes to glob() in the last three years?
(0006426)
geoffclare (manager)
2023-08-10 15:30

Interpretation response
------------------------
The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
None.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
Page and line numbers are for Issue 8 draft 3.

On page 1201 line 41070 section glob() (GLOB_ERR), change:
Cause glob() to return when an attempt to open, read, or search a directory fails because of an error condition that is related to file system contents. If this flag is not set, glob() shall not treat such conditions as an error, and shall continue to look for matches. Other error conditions may also be treated the same way as error conditions that are related to file system contents.
to:
Cause glob() to return when an attempt to open or search a pathname as a directory, or an attempt to read an opened directory, fails because of an error condition that is related to file system contents and prevents glob() from expanding the pattern. If this flag is not set, glob() shall not treat such conditions as an error, and shall continue to look for matches. Other error conditions may also be treated the same way as error conditions that are related to file system contents.


On Issue 8 draft 3 page 1202 line 41114 section glob(), change:
If, during the search, an attempt to open, read, or search a directory fails and errfunc is not a null pointer, ...
to:
If errfunc is not a null pointer and, during the search, an attempt to open or search a pathname as a directory, or an attempt to read an opened directory, fails because of an error condition that prevents glob() from expanding the pattern, ...


After page 1203 line 41124 section glob(), add a paragraph:
The set of error conditions that are considered to prevent glob() from expanding the pattern shall include [EACCES], [ENAMETOOLONG], and [ELOOP]. It is implementation-defined what other error conditions are included in the set.


After page 1204 line 41202 section glob() (RATIONALE), add:
Implementations differ as to which error conditions they consider prevent glob() from expanding the pattern. The standard requires that [EACCES], [ENAMETOOLONG], and [ELOOP] are included because in these cases the expansion could succeed if performed with a different effective user or group ID, or with an alternative pathname (shorter than {PATH_MAX}, or traversing fewer symbolic links).
    
Implementations are encouraged to call (*errfunc()) for all error conditions that are related to file system contents which occur when attempting to open or search a pathname as a directory or attempting to read an opened directory. The appropriate way to handle such errors varies according to the provenance of the pattern and what the application will do with the resulting pathnames, and should therefore be for the application to decide. For example, given the pattern "non-existing/*", some applications may want glob() to succeed and return an empty list because there are no existing files that match the pattern, but for others that would not be appropriate, such as if an application asks the user to name a directory containing files to be processed and the user makes a typing mistake when responding; the application will want to alert the user to the mistake instead of behaving as if the user had named an empty directory. If (*errfunc()) is called for [ENOENT] errors, the first application can ignore them in that function, but if (*errfunc()) is not called, the second application cannot achieve what it wants using glob(). Similar reasoning applies for the pattern "regular_file/*" and [ENOTDIR] errors.


On page 1205 line 41217 section glob(), change FUTURE DIRECTIONS from "None" to:
A future version of this standard may require that (*errfunc()) is called for all error conditions that are related to file system contents which occur when attempting to open or search a pathname as a directory or attempting to read an opened directory.


On page 2508 line 81856 section 2.14.3, change:
If these permissions are denied, or if an attempt to open or search a directory fails because of an error condition that is related to file system contents, this shall not be considered an error and pathname expansion shall continue as if the directory had existed and had been successfully opened or searched, and no matching directory entries had been found in it.
to:
If these permissions are denied, or if an attempt to open or search a pathname as a directory, or an attempt to read an opened directory, fails because of an error condition that is related to file system contents, this shall not be considered an error and pathname expansion shall continue as if the pathname had named an existing directory which had been successfully opened and read, or searched, and no matching directory entries had been found in it.
(0006431)
agadmin (administrator)
2023-08-14 16:05

Interpretation proposed: 14 August 2023
(0006505)
agadmin (administrator)
2023-10-02 14:31

Interpretation approved: 2 October 2023

- Issue History
Date Modified Username Field Change
2019-07-27 10:49 stephane New Issue
2019-07-27 10:49 stephane Name => Stephane Chazelas
2019-07-27 10:49 stephane Section => glob()
2019-07-27 10:49 stephane Page Number => 1109, 1110 (in 2018 edition)
2019-07-27 10:49 stephane Line Number => 35742, 35768
2019-07-28 00:48 Don Cragun Interp Status => ---
2019-07-28 00:48 Don Cragun Category Shell and Utilities => System Interfaces
2019-07-28 01:42 shware_systems Note Added: 0004493
2019-07-28 06:44 stephane Note Added: 0004494
2019-07-28 07:03 stephane Note Added: 0004495
2019-07-30 10:19 geoffclare Relationship added related to 0001275
2021-12-21 22:12 dennisw Issue Monitored: dennisw
2022-07-13 15:34 geoffclare Note Added: 0005885
2023-08-10 15:11 eblake Relationship added related to 0001228
2023-08-10 15:30 geoffclare Note Added: 0006426
2023-08-10 15:31 geoffclare Interp Status --- => Pending
2023-08-10 15:31 geoffclare Final Accepted Text => Note: 0006426
2023-08-10 15:31 geoffclare Status New => Interpretation Required
2023-08-10 15:31 geoffclare Resolution Open => Accepted As Marked
2023-08-10 15:31 geoffclare Tag Attached: tc3-2008
2023-08-14 16:05 agadmin Interp Status Pending => Proposed
2023-08-14 16:05 agadmin Note Added: 0006431
2023-10-02 14:31 agadmin Interp Status Proposed => Approved
2023-10-02 14:31 agadmin Note Added: 0006505
2023-10-10 09:10 geoffclare Status Interpretation Required => Applied
2023-10-10 09:11 geoffclare Tag Attached: applied_after_i8d3


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker