Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000247 [1003.1(2008)/Issue 7] Shell and Utilities Objection Enhancement Request 2010-04-29 22:50 2010-05-27 15:42
Reporter dwheeler View Status public  
Assigned To ajosey
Priority normal Resolution Open  
Status Under Review  
Name David A. Wheeler
Organization
User Reference
Section set,glob
Page Number 256,1088,2333-2334,2357-2359
Line Number 8395,36273,73812-73813,74488-74605
Interp Status ---
Final Accepted Text
Summary 0000247: Add nullglob (null globbing) support to shell's "set" and glob()
Description Even though filename processing is a very common operation, it is surprisingly difficult to do correctly, as described here:
 http://www.dwheeler.com/essays/filenames-in-shell.html [^]
 http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [^]

One annoyance is that if a glob pattern is not matched, the failed pattern is returned. This explicitly required in section 2.13.3 ("Patterns Used for Filename Expansion"), starting at line 73812: "If the pattern does not match any existing filenames or pathnames, the pattern string shall be left unchanged".

This means that many common shell constructs are incorrect, because they fail when the pattern does not match anything. For example, this is usually wrong, because there is no guarantee that a directory will have a .txt file:
 for file in ./*.txt ; do
  COMMAND "$file"  # This may try to process the file named "*.txt" if no match

 done


One solution is include a check inside the loop, but this is complicated and inefficient, and is thus often is not done. There are also pathological cases where the pattern failed but a filename *with* the pattern exists, which causes the wrong thing to be done. In short, while it's possible to do this, people don't do it:
 for file in ./* ; do        # Use "./*", NOT bare "*", to avoid "-filename
s".
   if [ -e "$file" ] ; then  # Make sure it isn't an empty match
     COMMAND ... "$file" ...
   fi
 done


It would be far better if the shell could automatically do the "right" thing, that is, return an empty set if a metacharacter is included *and* there is no matching result. The "glob.h" header file (page 256) includes an option GLOB_NOCHECK that is close to what is desired, though not quite.
Shell null globbing returns an empty result if there is no match *and* there was at least one metacharacter; it returns the file unchanged if there is NO metacharacter. The glob() routine's GLOB_NOCHECK returns empty even if there were NO metacharacters, and is thus subtly different. The current glob() routine is sufficient to implement the shell proposal, but it would be useful to add an addition option to glob() so that they can also do null globbing, so that is added as well.

One possible objection is that this does not always handle failed matches when the command does something different without any files. For example, "cat" will read from a list of filenames, but will read from stdin when no filenames are listed, so "cat ./*.txt" will still do the wrong thing when no filenames are present. This is true, but "cat ./*.txt" already does the wrong thing (it will try to open a non-existant file "*.txt" if the match fails), so using this option doesn't make it fundamentally any *worse*, while making "for" loops far more useful. Commands that don't switch to "read from stdin when no files" are also better off. Finally, since it is an option, people can enable it or disable it as they choose.

Many shells have a way of doing this, but there is no *standard* way to do it. Doing this in a shell is often called "null globbing"
Null globbing fixes this by replacing an unmatched pattern with nothing at all. In bash you can enable nullglob with "shopt -s nullglob". In zsh, you can use "setopt NULL_GLOB" for the same result. Then, "for" loops on glob patterns will work correctly if nothing matches the glob pattern.

There are many possible short and long option names; the very problem right now is that there is no standardization of the name! This proposal suggests set -N for the short option to "set", and "nullglob" as the long name in "set -o". I searched and found that these did not interfere with existing options in bash 4.0.23, dash version 0.5.5.1, ksh version 93t+, or zsh 4.3.9. Obviously, other option names are possible; the key is standardize it.

It would be nice to use "set -G" as the option for nullglob, since zsh already supports null globbing with this very name, and bash does not have an interfering use for it. Unfortunately, ksh uses "set -G" to expand "**" into a recursive descent of files, so "set -G" should *not* be used as it would impede adoption elsewhere.

It might be nice to modify wordexp() (e.g., page 461), too. This proposal doesn't do that, but that would be an obvious next step.

This proposal proposes one approach to modifying glob() to support this as well - a new option GLOB_NULLGLOB that only has effect if GLOB_NOCHECK is enabled, and slightly modifies how GLOB_NOCHECK works. There are other ways to do this, of course.
Desired Action Document this new shell option ("nullglob") as follows:

In page 2333, line 73812-73813, replace:
"If the pattern does not match any existing filenames or pathnames, the pattern string shall be left unchanged".
with:
"If the pattern does not match any existing filenames or pathnames, and contains at least one metacharacter, the result depends on the nullglob option. If the nullglob option is enabled, a null string results. If the nullglob option is not enabled, the pattern string shall be left unchanged".

On Page 2357, in the synopsis lines 74489-74490, add a new "-N" short option name for set.

Under line 74562, add:
-N
When this option is on, a filename expansion pattern which matches no files, yet included at least one character with a special meaning (see 2.13.1), expands to a null string rather than itself.

Under line 74583, add:
nullglob
Equivalent to -N.


Document the new lower-level glob option (GLOB_NULLGLOB) as follows:


Page 256, under line 8395 (glob), add:
GLOB_NULLGLOB
If the pattern contains special characters and does not match any pathname, then the result is empty instead of the pattern. Only has effect if GLOB_NOCHECK is also enabled.
On line 8393, append "(if GLOB_NULLGLOB is enabled, the pattern will only be returned if there are no special characters in the pattern)".

Page 1088, under line 36273:
GLOB_NULLGLOB
If pattern contains wildcards and does not match any pathname, then the result is empty instead of the pattern. Only has effect if GLOB_NOCHECK is also enabled.

Line 36269, append "(if GLOB_NULLGLOB is enabled, the pattern will only be returned if there are no special characters in the pattern)".

Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0000411)
dwheeler (reporter)
2010-04-29 23:00

Quick fix - in my proposal, change:
"If the pattern contains special characters"
to:
"If the pattern contains at least one special character"

And change:
"If pattern contains wildcard"
to:
"If the pattern contains at least one special character"
(0000420)
nick (manager)
2010-05-27 15:31

From David Korn (email seq 13735):

Subject: Re: [1003.1(2008)/Issue 7 0000247]: Add nullglob (null globbing) support to shell's "set" and glob()
--------

> Many shells have a way of doing this, but there is no *standard* way to do
> it. Doing this in a shell is often called "null globbing"
> Null globbing fixes this by replacing an unmatched pattern with nothing at
> all. In bash you can enable nullglob with "shopt -s nullglob". In zsh, you
> can use "setopt NULL_GLOB" for the same result. Then, "for" loops on glob
> patterns will work correctly if nothing matches the glob pattern.
>

In ksh93, you can do this on a per pattern basis with ~(N) in front of the
pattern, for exampe
        for i in ~(N)*.c
        do xxx
        done
which will skip the loop of there are no files ending with .c.


David Korn
dgk@research.att.com
(0000421)
nick (manager)
2010-05-27 15:42

During May 27 2010 conf call, general consensus is that ksh93 filename generation appears to have many useful extensions, and we should move in that direction. See http://www2.research.att.com/sw/download/man/man1/ksh.html [^] for man page details. New wording invited.

- Issue History
Date Modified Username Field Change
2010-04-29 22:50 dwheeler New Issue
2010-04-29 22:50 dwheeler Status New => Under Review
2010-04-29 22:50 dwheeler Assigned To => ajosey
2010-04-29 22:50 dwheeler Name => David A. Wheeler
2010-04-29 22:50 dwheeler Section => set,glob
2010-04-29 22:50 dwheeler Page Number => 256,1088,2333-2334,2357-2359
2010-04-29 22:50 dwheeler Line Number => 8395,36273,73812-73813,74488-74605
2010-04-29 23:00 dwheeler Note Added: 0000411
2010-05-27 15:31 nick Note Added: 0000420
2010-05-27 15:42 nick Note Added: 0000421


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker