Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001273 [1003.1(2016)/Issue7+TC2] System Interfaces Objection Error 2019-07-27 10:49 2019-07-28 07:03
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Stephane Chazelas
Organization
User Reference
Section glob()
Page Number 1109, 1110 (in 2018 edition)
Line Number 35742, 35768
Interp Status ---
Final Accepted Text
Summary 0001273: glob()'s GLOB_ERR/errfunc and non-directory files
Description In the XSH glob() specification,

For GLOB_ERR, the spec says:

> Cause glob() to return when it encounters a directory that it
> cannot open or read. Ordinarily, glob() continues to find
> matches.

(Note: it's not clear what "Ordinarily" means here. When errfunc
is set and returns non-zero, glob() doesn't continue, is it
ordinary?).

For errfunc:

> If, during the search, a directory is encountered that cannot
> be opened or read and errfunc is not a null pointer, glob()
> calls (*errfunc()) with two arguments.
[...]
>  2. The eerrno argument is the value of errno from the
> failure, as set by opendir(), readdir(), or stat().
> (Other values may be used to report other errors not
> explicitly documented for those functions.)

(Note: does that mean glob() has to call those 3 functions (as
opposed to open(O_DIRECTORY)/getdents() or any other API)? Why
stat(), shouldn't that be lstat()?)

First (and that's still not the case I'm making here), it's not
obvious what /directories/ glob() will try to open.

It can be somewhat inferred from the spec, as the pathname
expansion specification refers to directories that must be
readable (which implies they are going to be read) and some that
only need to be searchable (implying they're not going to be
read).

But maybe the spec should be more explicit, as it's not obvious
for instance that in */*.c the current directory and all the
subdirs are going to be read, while in */foo.c, only the
current directory is read (and all subdirs/foo.c lstat()ed), so
if there's a non-readable subdir, only the former will fail (or
cause errfunc to be invoked).

Now, to get to the point, the spec refers to "directories" that
can't be opened.

What about a /etc/passwd/*.c glob. /etc/passwd is not a
directory, opendir("/etc/passwd") if called would fail with
ENOTDIR, does that mean glob() should not call opendir() here or
that it should ignore opendir()'s error when errno is ENOTDIR?

What about */*.c where there's at least one non-directory
non-hidden file in the current directory? What if there's a
broken symlink or a symlink to a file that is not accessible
(and so for which we can't tell whether the symlink is a
directory or not)?

I've done tests with the FreeBSD 12.0, Solaris 10 and GNU libc
2.27 implementations of glob() and they all differ
significantly, the Solaris one being the least compliant to what
I can infer the spec to require, and FreeBSD's the most.

On Solaris /etc/passwd/*.c glob(GLOB_ERR) fails (and calls
errfunc with /etc/passwd, ENOTDIR), same for */*.c in a
directory that contains a non-hidden regular file.

Only FreeBSD's glob(GLOB_ERR) doesn't fail on non-existent/*.c
or */*.c in a directory that contains a broken symlink. The
other two call errfunc with ENOENT.

For */*.c in a directory that contains a symlink to a
non-accessible area, they all fail (call errfunc with EACCESS).
Same with */*/*.c if the current directory contains a subdir
that is readable but not searchable (note that whether glob()
could tell whether entries of that directory are directories or
not depends on whether readdir() returns that information or
not; either way, we can't tell for symlinks).
Desired Action At this point, I just want to start the discussion as to how
best fix it.

- The "ordinarily" should probably be changed to "if errfunc is
  NULL"

- I don't think we want to force implementations to literally
  call opendir()/readdir()/lstat() (in any case, that "stat()"
  is wrong). Not sure how to phrase it though.

- we should probably clarify which directories glob() is meant
  to try opening, or which files glob() is meant to invoke
  opendir() or equivalent on.

- and then what to do for non-directories or files which we
  can't tell whether they're directories or not. Either require
  the FreeBSD or GNU behaviour or allow both. The Solaris
  behaviour is not useful IMO, but it's more flexible in that
  the caller can use a errfunc that ignores ENOENT/ENOTDIR to
  emulate the GNU/FreeBSD behaviour.
Tags No tags attached.
Attached Files

- Relationships
related to 0001275New pathname expansion errors 

-  Notes
(0004493)
shware_systems (reporter)
2019-07-28 01:42

Re:
- I don't think we want to force implementations to literally
  call opendir()/readdir()/lstat() (in any case, that "stat()"
  is wrong). Not sure how to phrase it though.


Those are examples of interfaces that may return error codes errfunc is expected to process, that I see, not a requirement glob() implementations have to use them and only them. So, use of lstat() is allowed, as is directly accessing a host through syscalls that affect errno, bypassing use of the listed interfaces entirely. All that is missing is "e.g." after "failure," and ", or other standard interfaces," after "those interfaces" in the parenthetical part, to emphasize they are examples.

What may be helpful is a table of standard errno values that are to be passed to errfunc, whichever interface or implementation private code generates them, so applications don't need to guess what case labels errfunc's switch statement may have to process.
(0004494)
stephane (reporter)
2019-07-28 06:44

Re: Note: 0004493

Yes, it's actually not clear how stat() is meant to be used here. I had assumed, lstat() was meant instead as in the ./*/file cases where implementations don't open the subdirs of ., but instead try lstat(./subdir/file) on each of them.

But GLOB_ERR/errfunc being meant to report errors upon opening/reading *directories*, it can't report errors of lstat(). Maybe the spec wants implementations to call stat() on directories to check if they are searchable?

If we step back from the implementation detail to look at what the intention of the interface should be: AFAICT a glob(*/*.c) should return the matching files and GLOB_ERR/errfunc should identify the problems that prevent us from doing so.

/etc/passwd/*.c or non-existing/*.c doesn't match any file. The ENOTDIR/ENOENT failure upon trying to opening those non-directories is not an error preventing us from expanding the glob, it's on the contrary confirmation that the glob can't match.

Where it becomes more ambiguous is when ELOOP/ENAMETOOLONG is returned (where the files may exist using a shortened path). FreeBSD's glob() does return errors in those cases which IMO sounds like the best thing to do.

The real problem with the interface is that it doesn't allow reporting the lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for instance could cause confusion and imply subdir/foo/bar/baz is a directory that cannot be read, while actually it's probably either subdir, subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want to force implementations to stat() those 3 directories just to report an error.

Maybe we don't want to over-specify here and just say GLOB_ERR/errfunc should report the errors upon accessing directories (and directories or files assumed to be directories only) that prevent it from expanding the glob pattern without going into details of the implementation. And an application usage section clarifying that non-existing/*.c should not be reported as an error since the ENOENT failure of accessing the non-existing assumed-to-be-directory doesn't prevent us from expanding the glob, quite the contrary.
(0004495)
stephane (reporter)
2019-07-28 07:03

Re: Note: 0004494
> The real problem with the interface is that it doesn't allow reporting the
> lstat() errors in the */foo/bar/baz cases. Since errfunc only takes a path
> and errno arguments, calling it with a subdir/foo/bar/baz and EACCESS for
> instance could cause confusion and imply subdir/foo/bar/baz is a directory
> that cannot be read, while actually it's probably either subdir,
> subdir/foo or subdir/foo/bar that is not searchable. I'm not sure we want
> to force implementations to stat() those 3 directories just to report an
> error.

Anyway, stat() would not be the right tool, more access(X_OK) in that case. If subdir is not searchable then a */foo/bar/ba[z] would call errfunc(subdir/foo/bar, EACCESS), so it would be acceptable for an implementation to just do access(subdir/foo/bar, X_OK) if they wanted to (that would not cover the other lstat() error cases though).

- Issue History
Date Modified Username Field Change
2019-07-27 10:49 stephane New Issue
2019-07-27 10:49 stephane Name => Stephane Chazelas
2019-07-27 10:49 stephane Section => glob()
2019-07-27 10:49 stephane Page Number => 1109, 1110 (in 2018 edition)
2019-07-27 10:49 stephane Line Number => 35742, 35768
2019-07-28 00:48 Don Cragun Interp Status => ---
2019-07-28 00:48 Don Cragun Category Shell and Utilities => System Interfaces
2019-07-28 01:42 shware_systems Note Added: 0004493
2019-07-28 06:44 stephane Note Added: 0004494
2019-07-28 07:03 stephane Note Added: 0004495
2019-07-30 10:19 geoffclare Relationship added related to 0001275


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker