Anonymous | Login | 2025-02-11 18:17 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001228 | [1003.1(2016/18)/Issue7+TC2] Shell and Utilities | Editorial | Enhancement Request | 2019-02-08 22:38 | 2024-06-11 09:08 | ||
Reporter | stephane | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Stephane Chazelas | ||||||
Organization | |||||||
User Reference | https://www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html [^] | ||||||
Section | Pathname Expansion | ||||||
Page Number | 2384 | ||||||
Line Number | 76271 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | Note: 0005889 | ||||||
Summary | 0001228: allow shells to exclude "." and ".." from pathname expansions | ||||||
Description |
That's a follow up on for instance: https://www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html [^] https://www.mail-archive.com/austin-group-l@opengroup.org/msg02319.html [^] "." and ".." are tools used in relative paths to navigate the file system, every system call that takes a file path as argument understands them to mean the current and parent directory respectively. While initially, they were implemented as directory entries created upon mkdir (first the utility, then the system call), many modern (and ancient) file systems store the child to parent directory relationship in a different way. The readdir() specification at the moment doesn't mandate dot and dot-dot being returned, though, as discussed in the first mailing list thread linked above, it has some obscure text: > The readdir() function shall not return directory entries > containing empty names. If entries for dot or dot-dot exist, > one entry shall be returned for dot and one entry shall be > returned for dot-dot; otherwise, they shall not be returned. Which I interpret as meaning that if either "." or ".." exists (and "exists" would have to be defined in that context, for instance many systems synthesise those when they are not present as directory entries in the structure of the file system), then *both* should be returned. In other words, POSIX forbids implementations to return "." without ".." or ".." without "." (in readdir() at least, the "ls -a" specification has no similar wording and is not defined in relation to readdir()). Another interpretation (by kre) was that *only one* entry for each of "." and ".." was to be returned (but then I'd expect that to be the case for any file name, not just "." and ".."). readdir() returning "." and ".." is not particularly useful: that doesn't tell us anything; we know "." and ".." are always valid path components in file paths, and in practice we can't tell whether the structure of the file system have them from the result of readdir() as some systems synthesise them, not to mention the fact that on most systems, the d_ino of ".." in mount points is incorrect. Most applications need to take special care to exclude those (that's the case of at least all the ones that do a recursive directory traversal like find, chown -R). According to legend, files whose name starts with "." are hidden by accident, because of a coding error in some ls or glob code trying to exclude "." and ".." but ending up excluding all file names starting with ".". Now (and it's the point of this bug, sorry for the longish context), while readdir() returning "." and ".." is not useful, shell glob expansions including them is *harmful* and many shells still do it and the POSIX specification could be interpreted as requiring it when readdir() returns them. It is the reason why "rm -rf .." stopped working in Unix V3 (1973). From the man page: > It is forbidden to remove the file .. merely to avoid the > antisocial consequences of inadvertently doing something like > rm -r .*. Same was done for "." later (and "/"). Many systems however, including POSIX certified ones still don't do it for "dir/.." or "../" which means "rm -rf dir/.*" and "rm -rf .*/" are still a problem. Of course that's not limited to "rm". "rm" has that work around, but the problem is almost as bad for "chmod -R a+r .*" and most other utilities that take file paths as arguments. The work around is particularly ugly: chmod -R a+r .[!.]* ..?* And causes chmod (here) to return with an error if either of those two globs doesn't match (that code also happens not to work in most non-POSIX shells like csh, rc, zsh, fish...). I can't think of any case where one would want a glob to include "." and "..". Again, those are navigating tools. Back in 1973, wildcards were expanded by the /etc/glob helper invoked by the shell and that utility did include "." and ".." in the expansion of ".*". In the Bourne and C shells, filename generation was moved into the shell, but behaved the same. That problem was fixed in the 80s by the Forsyth shell (a reimplementation of the SysV shell) and by extension pdksh (based on the Forsyth shell) and by extension, mksh the shell of MirOS and the shell of OpenBSD, as well as zsh and fish whose glob expansions never include "." nor "..". Most shells that have an option or mechanism to stop hiding the files whose name starts with "." (dotglob, globdot options or ksh93's FIGNORE (though see the regression at https://github.com/att/ast/issues/11), [^] bash's GLOBIGNORE) still do exclude "." and ".." (though generally not for the ".*" globs), yash being a notable exception which makes its dotglob option unusable. About that, the POSIX spec says: > 3. Specified patterns shall be matched against existing > filenames and pathnames, as appropriate. Each component that > contains a pattern character shall require read permission in > the directory containing that component. Any component, > except the last, that does not contain a pattern character > shall require search permission. Again, "existing" is not clearly defined here. There is no reference to readdir(). Do "." and ".." /exist/ as filenames? Even if they are synthesized by readdir()? If they're not returned by readdir(), one could argue they still exist because lstat() on them succeeds. |
||||||
Desired Action |
First allow, (and then require as a potential future direction) shells to exclude "." and ".." from generated file names. Change: > 3. Specified patterns shall be matched against existing > filenames and pathnames, as appropriate. Each component that > contains a pattern character shall require read permission in > the directory containing that component. Any component, > except the last, that does not contain a pattern character > shall require search permission. to something like: > 3. Specified patterns shall be matched against existing > filenames and pathnames, as reported by readdir() as > appropriate. Implementations may choose to exclude "." and > ".." if present in the output of readdir() when matching > against pattern components. Each component that contains a > pattern character shall require read permission in the > directory containing that component. Any component, except the > last, that does not contain a pattern character shall require > search permission. Add a future direction section along the lines of: > A future version of this standard may require implementations > to exclude "." and ".." from the list of generated filenames. It will probably take decades until we can safely use .* in sh scripts, but at least we would not be preventing implementations from doing "the right thing". Note that most of the time, the spec refers to globbing as "pathname expansion", but there are a few cases (including the name of the section quoted above) where it's "filename expansion" instead. It would make sense to use "pathname expansion" everywhere. You may want to address the clarification of readdir() in this same bug. Maybe change: > If entries for dot or dot-dot exist, one entry shall be > returned for dot and one entry shall be returned for dot-dot; > otherwise, they shall not be returned. to something like: > It is unspecified whether entries for "." or ".." are > returned by readdir(). But implementations shall make sure > that if one entry is returned for ".", one shall be returned > for ".." as well (and vice versa). Maybe also add a: > Only one entry for a given name shall be given. Though in both cases, I suppose a corrupted file system could exhibit both of those pathological behaviours ("." without ".." or several entries for a same name), and I don't expect existing readdir() implementations guard against that, so it could be dangerous to tell applications they can rely on it. So maybe the best thing is to remove that confusing text altogether. |
||||||
Tags | issue8 | ||||||
Attached Files | |||||||
|
![]() |
|||||||||||
|
![]() |
|
(0004245) stephane (reporter) 2019-02-09 08:13 |
Sorry, about: > It is the reason why "rm -rf .." stopped working in Unix V3 > (1973). From the man page: > > > It is forbidden to remove the file .. merely to avoid the > > antisocial consequences of inadvertently doing something like > > rm -r .*. That should have been Unix V7 (1979), the one that also brought the Bourne shell not V3. V3 is the one where the -r option was added to rm but it was not removing directories just emptying their content (recursively). In V7, directories where removed as well, but ".." was rejected. See https://unix.stackexchange.com/questions/90073/does-rm-ever-delete-the-parent-directory/90075#90075 [^] for details. |
(0004246) stephane (reporter) 2019-02-09 08:19 |
The links to mail-archive.com are being mangled on the mantis web interface. https://www.mail-archive.com/austin-group-l%40opengroup.org/msg01176.html [^] https://www.mail-archive.com/austin-group-l%40opengroup.org/msg02319.html [^] may work better. |
(0004412) stephane (reporter) 2019-06-08 14:07 |
As mentioned in the bug description but not in the proposed resolution, POSIX should clarify what file list pathname expansion should match patterns against. At the moment, it mentions "existing" files, which is not desirable. For instance, some file systems have hidden ".zfs" or ".snapshot" directories that do exist but are hidden in that they are not returned in a readdir(). Those systems may have separate dedicated APIs to list those, but it's important that none of readdir(), ls -a, find, chmod/chown -R, glob(), shell pathname expansion, etc include them. IMO, "ls -a" should be specified to output the same list of files as returned by readdir(). and shell pathname expansion should match on the same list as returned by either "ls -a" or "ls -A" (ls -a with "." and ".." excluded), and the word "exist" should be avoided ("." and ".." always "exist" in that stat()/chdir() will work on them, but they are not always returned but readdir()/ls -a) |
(0004898) McDutchie (reporter) 2020-07-21 01:06 |
As the maintainer of the only ksh93 version that is currently being developed in the open[*], I would like to endorse this change, for what that is worth. In addition to all the things already pointed out by Stéphane, the current POSIX behaviour also produces 'interesting' recursion when you give the obvious command to recursively copy all files in the current directory to another directory (without copying the directory itself): cp -pr .* * /some/other/dir/ On shells other than mksh and zsh, you currently need: cp -pr .[!.]* ..?* * /some/other/dir which is completely ridiculous to ask of normal everyday shell users. This is basic functionality, and it does not currently work properly on shells other than mksh and zsh. I'm strongly inclined to make ksh93 follow mksh and zsh. _____ [*] https://github.com/ksh93/ksh [^] |
(0005889) geoffclare (manager) 2022-07-14 16:27 |
On draft 2.1 page 2353 line 76183 section 2.13.3, after:it shall be matched against existing filenames and pathnames, as appropriate.add: If directory entries for dot and dot-dot exist, they may be ignored. On draft 2.1 page 2353 line 76205 section 2.13.3, add (at the end of item 3): <small>Note: A future version of this standard may require that directory entries for dot and dot-dot are ignored (if they exist) when matching patterns against existing filenames. For example, when expanding .* the result would not include dot and dot-dot.</small> |
![]() |
|||
Date Modified | Username | Field | Change |
2019-02-08 22:38 | stephane | New Issue | |
2019-02-08 22:38 | stephane | Name | => Stephane Chazelas |
2019-02-08 22:38 | stephane | User Reference | => https://www.mail-archive.com/austin-group-l@opengroup.org/msg01176.html [^] |
2019-02-08 22:38 | stephane | Section | => Pathname Expansion |
2019-02-08 22:38 | stephane | Page Number | => 2384 |
2019-02-08 22:38 | stephane | Line Number | => 76271 |
2019-02-09 08:13 | stephane | Note Added: 0004245 | |
2019-02-09 08:19 | stephane | Note Added: 0004246 | |
2019-06-08 14:07 | stephane | Note Added: 0004412 | |
2020-07-21 01:06 | McDutchie | Note Added: 0004898 | |
2022-07-14 16:27 | geoffclare | Note Added: 0005889 | |
2022-07-14 16:27 | geoffclare | Interp Status | => --- |
2022-07-14 16:27 | geoffclare | Final Accepted Text | => Note: 0005889 |
2022-07-14 16:27 | geoffclare | Status | New => Resolved |
2022-07-14 16:27 | geoffclare | Resolution | Open => Accepted As Marked |
2022-07-14 16:28 | geoffclare | Tag Attached: issue8 | |
2022-08-05 09:34 | geoffclare | Status | Resolved => Applied |
2023-08-10 15:11 | eblake | Relationship added | related to 0001275 |
2023-08-10 15:11 | eblake | Relationship added | related to 0001273 |
2024-06-11 09:08 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |