Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000245 [1003.1(2008)/Issue 7] Shell and Utilities Objection Enhancement Request 2010-04-29 20:25 2023-01-09 16:20
Reporter dwheeler View Status public  
Assigned To ajosey
Priority normal Resolution Duplicate  
Status Closed  
Name David A. Wheeler
Organization
User Reference
Section read
Page Number 3128
Line Number 103903-104006
Interp Status ---
Final Accepted Text
Summary 0000245: Add -0 option to shell's "read"
Description As documented in 0000243, the POSIX specification and common implementations permit nearly all bytes to be in pathnames, and yet it is surprisingly difficult to portably and correctly process such pathnames. This is one of the more common reasons for security vulnerabilities (see CERT’s "Secure Coding" item MSC09-C, CWE 78, CWE 73, and CWE 116, and the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors). For more details about this problem, see:
 http://www.dwheeler.com/essays/filenames-in-shell.html [^]
 http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [^]

The current situation is that it is too hard to *correctly* process filenames, leading to a number of security vulnerabilities. Expecting users and developers to use complicated constructs to handle filenames is unreasonable and dangerous; they should be given a safer and easy-to-use set of constructs for this common case.

Instead, add a new "-0" option to the shell's "read" utility. This way, lists of filenames can be easily processed because they can be terminated by a null byte (which cannot occur in pathnames or filenames).

The current "find ... -exec COMMAND " is grossly inadequate. If COMMAND is nontrivial, it quickly becomes too hard to read and understand, and it encourages shell programs to be divided up unnaturally whenever files are processed.

Some shells already include nonstandard extensions to 'read' to make this possible, for example, bash can use -d "" to enable null as a terminator. This is harder to use than desired, however, because you must also set IFS (a task many forget) *AND* you must still use read's "-r" option. Thus, to do this correctly with bash extensions and find -print0, you must currently use this overly-complicated construct:
find . -print0 | while IFS="" read -d "" -r ; do
 ...
done


Since filename processing is a *common* task, it should be extremely easy to do it correctly (handling all permitted cases).

With this extension and 0000243, the following far-simpler construct becomes possible:
find . -print0 | while read -0 file ; do
 ...
done

Desired Action On line 103903, change:
  read [−r] var...
to:
  read [-0] [-r] var...


On line 103907, change:
 "By default, unless the −r option is specified,"
to:
 "By default, unless the -0 or −r option is specified,"

In line 103915, change "-r option is specified" to
"-0 or -r options are specified".

In line 103916, prepend the text with:
"Unless the -0 option is specified, the following rules apply:"

Replace line 103935 with:
 The following options are supported:
 -0 Read each character until a null byte, then place that value (except the null byte) into the next variable. Continue until each of the variables has received a value or until end-of-file is reached. If end-of-file is reached prematurely, set the variable to an empty string. If there are no variables, read until a null byte or end of file is reached. Do not treat backslash or the characters in the IFS environment variable in any special way; consider them as part of the input line.


Tags No tags attached.
Attached Files

- Relationships
duplicate of 0000243Appliedajosey Add -print0 to "find" 
related to 0000244Closedajosey Add -0 to xargs 
related to 0000251Appliedajosey Forbid newline, or even bytes 1 through 31 (inclusive), in filenames 

-  Notes
(0000884)
Don Cragun (manager)
2011-07-06 23:56

The current plan is to add a set of byte values (based on single-byte characters in
the C Locale) that will not be allowed in newly created filenames using 0000251
as the bug to make the changes. If consensus is reached on a resolution for bug
251, the plan is to reject and close bugs 243, 244, and 245. These three bugs
will remain open until bug 251 is resolved
(0001389)
dwheeler (reporter)
2012-10-05 19:12

I'm glad that the plan is to forbid the creation of filenames containing certain bytes (e.g., newlines). That would help, and I encourage that.

But that would NOT help interoperation with the huge number of devices that TODAY allow such filenames to be created. How would you access their data to do an easy transition?!? The POSIX specification needs to make it easy to TRANSITION systems, by providing simple mechanisms to list and open filenames that include newline, tab, escape, and other horrors.

As I noted in the other bug reports, The "usual way" of doing this is to support lists of \0-terminated filenames; this is ALREADY widely supported, has essentially no overhead, and solves the problem. I understand that there's concern that "every utility would grow a \0 option handler", but that's overblown. In practice, adding such options to a very short list of utilities makes it easy to build the rest. By far the most important is "find", since this is the usual tool for walking a directory tree. Judiciously adding options just in a few other places makes it possible to process filename lists in "usual cases", and greatly simplifies building tools for other cases. In this case, adding an option for "read" means that a simple shell "while" loop can be used to process a list of filenames (e.g., output from find).
(0001390)
eblake (manager)
2012-10-05 19:43

Note that at least bash already provides 'read -d $delim', where if $delim is the empty string, then read uses NUL instead of newline as the delimiter; but does not provide 'read -0'. If we argue that existing practice is important, then perhaps standardizing 'read -d' including the special empty string delimiter is the better approach to take than inventing 'read -0'.
(0001392)
dwheeler (reporter)
2012-10-06 21:00

Yes, accepting "read -d DELIMITER" with empty-delimiter meaning \0, would also work just. I proposed "-0" because I thought that'd be easier to use for this case, as described above. But if the POSIX community would rather accept "read -d DELIMITER" with those semantics, that'd still be great.

Would "read -d DELIMITER" be more likely to be accepted?
(0001393)
dwheeler (reporter)
2012-10-08 16:33

Please change summary to:
Add -d option to shell's "read"


Here is an updated description:

It is often useful to "read" values in shell, but to stop on some character or byte other than newline. For example, setting IFS and then using "read" will consume other characters through to a newline, which in many cases is not the desired outcome.

An important example, as documented in 0000243, is that the POSIX specification and common implementations permit nearly all bytes to be in pathnames, and yet it is surprisingly difficult to portably and correctly process such pathnames because newline is a legal character in a filename. This is one of the more common reasons for security vulnerabilities (see CERT’s "Secure Coding" item MSC09-C, CWE 78, CWE 73, and CWE 116, and the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors). For more details about this problem, see: http://www.dwheeler.com/essays/filenames-in-shell.html [^] and http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [^] .

The current situation is that it is too hard to *correctly* process filenames, leading to a number of security vulnerabilities. Expecting users and developers to use complicated constructs to handle filenames is unreasonable and dangerous; they should be given a safer and easy-to-use set of constructs for this common case. A read that allows a \0 delimiter makes it much easier to write a simple "while" loop over filenames; you can then write:
... while IFS="" read -r -d "" filename ...

This proposed additional read parameter is already implemented in the "bash" shell.


Here is an updated proposed action:

Desired Action On line 103903, change:
  read [−r] var...
to:
  read [-r] [-d DELIMITER] var...

Replace line 103935 with:
 The following options are supported:
 -d DELIMITER The first character of DELIMITER is used to terminate the input line, instead of <newline>. IF DELIMITER is the empty string, the null byte is the delimiter, and sequences of <carriage-return> <newline> MUST NOT be converted to <newline>.

Append to Line 104006 (rationale):
The "-d" option enables reading up to an arbitrary delimiter, without consuming a whole line. Using an empty string as the delimiter enables easier reading of <NUL>-terminated lists of filenames, e.g., to prevent security vulnerabilities that might occur with filenames that contain <newline>.
(0001397)
dwheeler (reporter)
2012-10-10 16:50

David Korn noted an error in the updated rationale (thank you!). Namely, changing the delimiter can read MORE than a line. So olease change the "Append to Line 104006 (rationale)" to as follows:
===
The "-d" option enables reading up to an arbitrary delimiter. Using an empty string as the delimiter enables easier reading of <NUL>-terminated lists of filenames, e.g., to prevent security vulnerabilities that might occur with filenames that contain <newline>. Note that "-d" and the empty string must always be specified as two separate arguments.
===

The proposal will need to be clarified further; see the posting about "\" and "character vs. byte" in the Wed, 10 Oct 2012 16:58:41 +0200 posting by Wilhelm Müller. I believe these issues can resolved, but I want to give the mailing list a chance to discuss them before updating.
(0002680)
mirabilos (reporter)
2015-05-27 21:52

As mksh maintainer, I would also like to consider -d <delim> over -0, because it is more generally useful, and more widespread.

I however disagree slightly with the wording: the first *byte* (C “char”) of the <delim> string should be used as delimiter, not the first (multibyte) character. Data read is ultimately just that, data, and one should not make the mistake to assume it to be in any specific encoding; scanning for bytes is perfectly fine.

+1 on accepting 243, 244, 245 independent on whether “problematic” filenames are forbidden, though.
(0006099)
Don Cragun (manager)
2023-01-09 16:20

The changes for this proposal are included in the resolution of 0000243.

- Issue History
Date Modified Username Field Change
2010-04-29 20:25 dwheeler New Issue
2010-04-29 20:25 dwheeler Status New => Under Review
2010-04-29 20:25 dwheeler Assigned To => ajosey
2010-04-29 20:25 dwheeler Name => David A. Wheeler
2010-04-29 20:25 dwheeler Section => read
2010-04-29 20:25 dwheeler Page Number => 3128
2010-04-29 20:25 dwheeler Line Number => 103903-104006
2011-07-06 23:42 Don Cragun Relationship added related to 0000243
2011-07-06 23:43 Don Cragun Relationship added related to 0000244
2011-07-06 23:56 Don Cragun Note Added: 0000884
2011-11-27 22:09 dwheeler Issue Monitored: dwheeler
2012-10-05 19:12 dwheeler Note Added: 0001389
2012-10-05 19:43 eblake Note Added: 0001390
2012-10-06 21:00 dwheeler Note Added: 0001392
2012-10-08 16:33 dwheeler Note Added: 0001393
2012-10-10 16:50 dwheeler Note Added: 0001397
2015-05-27 21:52 mirabilos Note Added: 0002680
2023-01-09 16:17 Don Cragun Relationship replaced duplicate of 0000243
2023-01-09 16:18 Don Cragun Interp Status => ---
2023-01-09 16:18 Don Cragun Status Under Review => Closed
2023-01-09 16:18 Don Cragun Resolution Open => Duplicate
2023-01-09 16:20 Don Cragun Note Added: 0006099
2023-08-22 06:30 Don Cragun Relationship added related to 0000251


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker