0000245: Add -0 option to shell's "read" - Austin Group Issue Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0000245	1003.1(2008)/Issue 7	Shell and Utilities	public	2010-04-29 20:25	2023-01-09 16:20

Reporter	dwheeler	Assigned To	ajosey
Priority	normal	Severity	Objection	Type	Enhancement Request
Status	Closed	Resolution	Duplicate

Name	David A. Wheeler
Organization
User Reference
Section	read
Page Number	3128
Line Number	103903-104006
Interp Status	---
Final Accepted Text


Summary	0000245: Add -0 option to shell's "read"
Description	As documented in 0000243, the POSIX specification and common implementations permit nearly all bytes to be in pathnames, and yet it is surprisingly difficult to portably and correctly process such pathnames. This is one of the more common reasons for security vulnerabilities (see CERT’s "Secure Coding" item MSC09-C, CWE 78, CWE 73, and CWE 116, and the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors). For more details about this problem, see: http://www.dwheeler.com/essays/filenames-in-shell.html http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html The current situation is that it is too hard to correctly process filenames, leading to a number of security vulnerabilities. Expecting users and developers to use complicated constructs to handle filenames is unreasonable and dangerous; they should be given a safer and easy-to-use set of constructs for this common case. Instead, add a new "-0" option to the shell's "read" utility. This way, lists of filenames can be easily processed because they can be terminated by a null byte (which cannot occur in pathnames or filenames). The current "find ... -exec COMMAND " is grossly inadequate. If COMMAND is nontrivial, it quickly becomes too hard to read and understand, and it encourages shell programs to be divided up unnaturally whenever files are processed. Some shells already include nonstandard extensions to 'read' to make this possible, for example, bash can use -d "" to enable null as a terminator. This is harder to use than desired, however, because you must also set IFS (a task many forget) AND you must still use read's "-r" option. Thus, to do this correctly with bash extensions and find -print0, you must currently use this overly-complicated construct: find . -print0 \| while IFS="" read -d "" -r ; do ... done Since filename processing is a common task, it should be extremely easy to do it correctly (handling all permitted cases). With this extension and 0000243, the following far-simpler construct becomes possible: find . -print0 \| while read -0 file ; do ... done
Desired Action	On line 103903, change: read [−r] var... to: read [-0] [-r] var... On line 103907, change: "By default, unless the −r option is specified," to: "By default, unless the -0 or −r option is specified," In line 103915, change "-r option is specified" to "-0 or -r options are specified". In line 103916, prepend the text with: "Unless the -0 option is specified, the following rules apply:" Replace line 103935 with: The following options are supported: -0 Read each character until a null byte, then place that value (except the null byte) into the next variable. Continue until each of the variables has received a value or until end-of-file is reached. If end-of-file is reached prematurely, set the variable to an empty string. If there are no variables, read until a null byte or end of file is reached. Do not treat backslash or the characters in the IFS environment variable in any special way; consider them as part of the input line.
Tags	No tags attached.

~~Don Cragun~~ 2011-07-06 23:56 manager bugnote:0000884	The current plan is to add a set of byte values (based on single-byte characters in the C Locale) that will not be allowed in newly created filenames using 0000251 as the bug to make the changes. If consensus is reached on a resolution for bug 251, the plan is to reject and close bugs 243, 244, and 245. These three bugs will remain open until bug 251 is resolved

dwheeler 2012-10-05 19:12 reporter bugnote:0001389	I'm glad that the plan is to forbid the creation of filenames containing certain bytes (e.g., newlines). That would help, and I encourage that. But that would NOT help interoperation with the huge number of devices that TODAY allow such filenames to be created. How would you access their data to do an easy transition?!? The POSIX specification needs to make it easy to TRANSITION systems, by providing simple mechanisms to list and open filenames that include newline, tab, escape, and other horrors. As I noted in the other bug reports, The "usual way" of doing this is to support lists of \0-terminated filenames; this is ALREADY widely supported, has essentially no overhead, and solves the problem. I understand that there's concern that "every utility would grow a \0 option handler", but that's overblown. In practice, adding such options to a very short list of utilities makes it easy to build the rest. By far the most important is "find", since this is the usual tool for walking a directory tree. Judiciously adding options just in a few other places makes it possible to process filename lists in "usual cases", and greatly simplifies building tools for other cases. In this case, adding an option for "read" means that a simple shell "while" loop can be used to process a list of filenames (e.g., output from find).

eblake 2012-10-05 19:43 manager bugnote:0001390	Note that at least bash already provides 'read -d $delim', where if $delim is the empty string, then read uses NUL instead of newline as the delimiter; but does not provide 'read -0'. If we argue that existing practice is important, then perhaps standardizing 'read -d' including the special empty string delimiter is the better approach to take than inventing 'read -0'.

dwheeler 2012-10-06 21:00 reporter bugnote:0001392	Yes, accepting "read -d DELIMITER" with empty-delimiter meaning \0, would also work just. I proposed "-0" because I thought that'd be easier to use for this case, as described above. But if the POSIX community would rather accept "read -d DELIMITER" with those semantics, that'd still be great. Would "read -d DELIMITER" be more likely to be accepted?

dwheeler 2012-10-08 16:33 reporter bugnote:0001393	Please change summary to: Add -d option to shell's "read" Here is an updated description: It is often useful to "read" values in shell, but to stop on some character or byte other than newline. For example, setting IFS and then using "read" will consume other characters through to a newline, which in many cases is not the desired outcome. An important example, as documented in 0000243, is that the POSIX specification and common implementations permit nearly all bytes to be in pathnames, and yet it is surprisingly difficult to portably and correctly process such pathnames because newline is a legal character in a filename. This is one of the more common reasons for security vulnerabilities (see CERT’s "Secure Coding" item MSC09-C, CWE 78, CWE 73, and CWE 116, and the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors). For more details about this problem, see: http://www.dwheeler.com/essays/filenames-in-shell.html and http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html . The current situation is that it is too hard to correctly process filenames, leading to a number of security vulnerabilities. Expecting users and developers to use complicated constructs to handle filenames is unreasonable and dangerous; they should be given a safer and easy-to-use set of constructs for this common case. A read that allows a \0 delimiter makes it much easier to write a simple "while" loop over filenames; you can then write: ... while IFS="" read -r -d "" filename ... This proposed additional read parameter is already implemented in the "bash" shell. Here is an updated proposed action: Desired Action On line 103903, change: read [−r] var... to: read [-r] [-d DELIMITER] var... Replace line 103935 with: The following options are supported: -d DELIMITER The first character of DELIMITER is used to terminate the input line, instead of <newline>. IF DELIMITER is the empty string, the null byte is the delimiter, and sequences of <carriage-return> <newline> MUST NOT be converted to <newline>. Append to Line 104006 (rationale): The "-d" option enables reading up to an arbitrary delimiter, without consuming a whole line. Using an empty string as the delimiter enables easier reading of <NUL>-terminated lists of filenames, e.g., to prevent security vulnerabilities that might occur with filenames that contain <newline>.

dwheeler 2012-10-10 16:50 reporter bugnote:0001397	David Korn noted an error in the updated rationale (thank you!). Namely, changing the delimiter can read MORE than a line. So olease change the "Append to Line 104006 (rationale)" to as follows: === The "-d" option enables reading up to an arbitrary delimiter. Using an empty string as the delimiter enables easier reading of <NUL>-terminated lists of filenames, e.g., to prevent security vulnerabilities that might occur with filenames that contain <newline>. Note that "-d" and the empty string must always be specified as two separate arguments. === The proposal will need to be clarified further; see the posting about "\" and "character vs. byte" in the Wed, 10 Oct 2012 16:58:41 +0200 posting by Wilhelm Müller. I believe these issues can resolved, but I want to give the mailing list a chance to discuss them before updating.

mirabilos 2015-05-27 21:52 reporter bugnote:0002680	As mksh maintainer, I would also like to consider -d <delim> over -0, because it is more generally useful, and more widespread. I however disagree slightly with the wording: the first byte (C “char”) of the <delim> string should be used as delimiter, not the first (multibyte) character. Data read is ultimately just that, data, and one should not make the mistake to assume it to be in any specific encoding; scanning for bytes is perfectly fine. +1 on accepting 243, 244, 245 independent on whether “problematic” filenames are forbidden, though.

~~Don Cragun~~ 2023-01-09 16:20 manager bugnote:0006099	The changes for this proposal are included in the resolution of 0000243.

Date Modified	Username	Field	Change
2010-04-29 20:25	dwheeler	New Issue
2010-04-29 20:25	dwheeler	Status	New => Under Review
2010-04-29 20:25	dwheeler	Assigned To	=> ajosey
2010-04-29 20:25	dwheeler	Name	=> David A. Wheeler
2010-04-29 20:25	dwheeler	Section	=> read
2010-04-29 20:25	dwheeler	Page Number	=> 3128
2010-04-29 20:25	dwheeler	Line Number	=> 103903-104006
2011-07-06 23:42	~~Don Cragun~~	Relationship added	related to 0000243
2011-07-06 23:43	~~Don Cragun~~	Relationship added	related to 0000244
2011-07-06 23:56	~~Don Cragun~~	Note Added: 0000884
2012-10-05 19:12	dwheeler	Note Added: 0001389
2012-10-05 19:43	eblake	Note Added: 0001390
2012-10-06 21:00	dwheeler	Note Added: 0001392
2012-10-08 16:33	dwheeler	Note Added: 0001393
2012-10-10 16:50	dwheeler	Note Added: 0001397
2015-05-27 21:52	mirabilos	Note Added: 0002680
2023-01-09 16:17	~~Don Cragun~~	Relationship replaced	duplicate of 0000243
2023-01-09 16:18	~~Don Cragun~~	Interp Status	=> ---
2023-01-09 16:18	~~Don Cragun~~	Status	Under Review => Closed
2023-01-09 16:18	~~Don Cragun~~	Resolution	Open => Duplicate
2023-01-09 16:20	~~Don Cragun~~	Note Added: 0006099
2023-08-22 06:30	~~Don Cragun~~	Relationship added	related to 0000251

View Issue Details

Relationships

Activities

Issue History

duplicate of	0000243	Closed	ajosey	Add -print0 to "find"
related to	0000244	Closed	ajosey	Add -0 to xargs
related to	0000251	Closed	ajosey	Forbid newline, or even bytes 1 through 31 (inclusive), in filenames