Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001228 [1003.1(2016)/Issue7+TC2] Shell and Utilities Editorial Enhancement Request 2019-02-08 22:38 2019-06-08 14:07
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Stephane Chazelas
Organization
User Reference www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html">https://www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html [www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html" target="_blank">^]
Section Pathname Expansion
Page Number 2384
Line Number 76271
Interp Status ---
Final Accepted Text
Summary 0001228: allow shells to exclude "." and ".." from pathname expansions
Description That's a follow up on for instance:
www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html">https://www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html [www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html" target="_blank">^]
www.mail-archive.com/austin-group-l@opengroup.org/msg02319.html">https://www.mail-archive.com/austin-group-l@opengroup.org/msg02319.html [www.mail-archive.com/austin-group-l@opengroup.org/msg02319.html" target="_blank">^]

"." and ".." are tools used in relative paths to navigate the
file system, every system call that takes a file path as
argument understands them to mean the current and parent
directory respectively.

While initially, they were implemented as directory entries
created upon mkdir (first the utility, then the system call),
many modern (and ancient) file systems store the child to parent
directory relationship in a different way.

The readdir() specification at the moment doesn't mandate dot
and dot-dot being returned, though, as discussed in the first
mailing list thread linked above, it has some obscure text:

> The readdir() function shall not return directory entries
> containing empty names. If entries for dot or dot-dot exist,
> one entry shall be returned for dot and one entry shall be
> returned for dot-dot; otherwise, they shall not be returned.

Which I interpret as meaning that if either "." or ".." exists
(and "exists" would have to be defined in that context, for
instance many systems synthesise those when they are not
present as directory entries in the structure of the file
system), then *both* should be returned. In other words, POSIX
forbids implementations to return "." without ".." or ".."
without "." (in readdir() at least, the "ls -a" specification
has no similar wording and is not defined in relation to
readdir()).

Another interpretation (by kre) was that *only one* entry for
each of "." and ".." was to be returned (but then I'd expect
that to be the case for any file name, not just "." and "..").

readdir() returning "." and ".." is not particularly useful: that
doesn't tell us anything; we know "." and ".." are always valid
path components in file paths, and in practice we can't tell
whether the structure of the file system have them from the
result of readdir() as some systems synthesise them, not to
mention the fact that on most systems, the d_ino of ".." in
mount points is incorrect. Most applications need to take
special care to exclude those (that's the case of at least all
the ones that do a recursive directory traversal like find,
chown -R). According to legend, files whose name starts with "."
are hidden by accident, because of a coding error in some ls or
glob code trying to exclude "." and ".." but ending up excluding
all file names starting with ".".

Now (and it's the point of this bug, sorry for the longish
context), while readdir() returning "." and ".." is not useful,
shell glob expansions including them is *harmful* and many
shells still do it and the POSIX specification could be
interpreted as requiring it when readdir() returns them.

It is the reason why "rm -rf .." stopped working in Unix V3
(1973). From the man page:

> It is forbidden to remove the file .. merely to avoid the
> antisocial consequences of inadvertently doing something like
> rm -r .*.


Same was done for "." later (and "/").

Many systems however, including POSIX certified ones still don't
do it for "dir/.." or "../" which means "rm -rf dir/.*" and "rm
-rf .*/" are still a problem.

Of course that's not limited to "rm". "rm" has that work around,
but the problem is almost as bad for "chmod -R a+r .*" and most
other utilities that take file paths as arguments.

The work around is particularly ugly:

   chmod -R a+r .[!.]* ..?*

And causes chmod (here) to return with an error if either of
those two globs doesn't match (that code also happens not to
work in most non-POSIX shells like csh, rc, zsh, fish...).

I can't think of any case where one would want a glob to include
"." and "..". Again, those are navigating tools.

Back in 1973, wildcards were expanded by the /etc/glob helper
invoked by the shell and that utility did include "." and ".."
in the expansion of ".*". In the Bourne and C shells, filename
generation was moved into the shell, but behaved the same.

That problem was fixed in the 80s by the Forsyth shell (a
reimplementation of the SysV shell) and by extension pdksh
(based on the Forsyth shell) and by extension, mksh the shell of
MirOS and the shell of OpenBSD, as well as zsh and fish whose
glob expansions never include "." nor "..".

Most shells that have an option or mechanism to stop hiding the
files whose name starts with "." (dotglob, globdot options or
ksh93's FIGNORE (though see the regression at
https://github.com/att/ast/issues/11), [^] bash's GLOBIGNORE) still
do exclude "." and ".." (though generally not for the ".*"
globs), yash being a notable exception which makes its dotglob
option unusable.

About that, the POSIX spec says:

>  3. Specified patterns shall be matched against existing
> filenames and pathnames, as appropriate. Each component that
> contains a pattern character shall require read permission in
> the directory containing that component. Any component,
> except the last, that does not contain a pattern character
> shall require search permission.

Again, "existing" is not clearly defined here. There is no
reference to readdir(). Do "." and ".." /exist/ as filenames?
Even if they are synthesized by readdir()? If they're not
returned by readdir(), one could argue they still exist because
lstat() on them succeeds.

Desired Action First allow, (and then require as a potential future direction)
shells to exclude "." and ".." from generated file names.

Change:

>  3. Specified patterns shall be matched against existing
> filenames and pathnames, as appropriate. Each component that
> contains a pattern character shall require read permission in
> the directory containing that component. Any component,
> except the last, that does not contain a pattern character
> shall require search permission.

to something like:

>  3. Specified patterns shall be matched against existing
> filenames and pathnames, as reported by readdir() as
> appropriate. Implementations may choose to exclude "." and
> ".." if present in the output of readdir() when matching
> against pattern components. Each component that contains a
> pattern character shall require read permission in the
> directory containing that component. Any component, except the
> last, that does not contain a pattern character shall require
> search permission.

Add a future direction section along the lines of:

> A future version of this standard may require implementations
> to exclude "." and ".." from the list of generated filenames.

It will probably take decades until we can safely use .* in sh
scripts, but at least we would not be preventing implementations
from doing "the right thing".

Note that most of the time, the spec refers to globbing as
"pathname expansion", but there are a few cases (including
the name of the section quoted above) where it's "filename
expansion" instead. It would make sense to use "pathname
expansion" everywhere.

You may want to address the clarification of readdir() in this
same bug. Maybe change:

> If entries for dot or dot-dot exist, one entry shall be
> returned for dot and one entry shall be returned for dot-dot;
> otherwise, they shall not be returned.

to something like:


> It is unspecified whether entries for "." or ".." are
> returned by readdir(). But implementations shall make sure
> that if one entry is returned for ".", one shall be returned
> for ".." as well (and vice versa).

Maybe also add a:

> Only one entry for a given name shall be given.

Though in both cases, I suppose a corrupted file system could
exhibit both of those pathological behaviours ("." without ".."
or several entries for a same name), and I don't expect existing
readdir() implementations guard against that, so it could be
dangerous to tell applications they can rely on it.

So maybe the best thing is to remove that confusing text
altogether.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0004245)
stephane (reporter)
2019-02-09 08:13

Sorry, about:

> It is the reason why "rm -rf .." stopped working in Unix V3
> (1973). From the man page:
>
> > It is forbidden to remove the file .. merely to avoid the
> > antisocial consequences of inadvertently doing something like
> > rm -r .*.

That should have been Unix V7 (1979), the one that also brought the Bourne shell not V3. V3 is the one where the -r option was added to rm but it was not removing directories just emptying their content (recursively). In V7, directories where removed as well, but ".." was rejected.

See
https://unix.stackexchange.com/questions/90073/does-rm-ever-delete-the-parent-directory/90075#90075 [^] for details.
(0004246)
stephane (reporter)
2019-02-09 08:19

The links to mail-archive.com are being mangled on the mantis web interface.

https://www.mail-archive.com/austin-group-l%40opengroup.org/msg01176.html [^]
https://www.mail-archive.com/austin-group-l%40opengroup.org/msg02319.html [^]

may work better.
(0004412)
stephane (reporter)
2019-06-08 14:07

As mentioned in the bug description but not in the proposed resolution, POSIX should clarify what file list pathname expansion should match patterns against.

At the moment, it mentions "existing" files, which is not desirable. For instance, some file systems have hidden ".zfs" or ".snapshot" directories that do exist but are hidden in that they are not returned in a readdir().

Those systems may have separate dedicated APIs to list those, but it's important that none of readdir(), ls -a, find, chmod/chown -R, glob(), shell pathname expansion, etc include them.

IMO, "ls -a" should be specified to output the same list of files as returned by readdir(). and shell pathname expansion should match on the same list as returned by either "ls -a" or "ls -A" (ls -a with "." and ".." excluded), and the word "exist" should be avoided ("." and ".." always "exist" in that stat()/chdir() will work on them, but they are not always returned but readdir()/ls -a)

- Issue History
Date Modified Username Field Change
2019-02-08 22:38 stephane New Issue
2019-02-08 22:38 stephane Name => Stephane Chazelas
2019-02-08 22:38 stephane User Reference => www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html">https://www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html [www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html" target="_blank">^]
2019-02-08 22:38 stephane Section => Pathname Expansion
2019-02-08 22:38 stephane Page Number => 2384
2019-02-08 22:38 stephane Line Number => 76271
2019-02-09 08:13 stephane Note Added: 0004245
2019-02-09 08:19 stephane Note Added: 0004246
2019-06-08 14:07 stephane Note Added: 0004412


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker