0000245: Add -0 option to shell's "read" - Austin Group Defect Tracker

Notes
(0000884) Don Cragun (manager) 2011-07-06 23:56	The current plan is to add a set of byte values (based on single-byte characters in the C Locale) that will not be allowed in newly created filenames using 0000251 as the bug to make the changes. If consensus is reached on a resolution for bug 251, the plan is to reject and close bugs 243, 244, and 245. These three bugs will remain open until bug 251 is resolved

(0001389) dwheeler (reporter) 2012-10-05 19:12	I'm glad that the plan is to forbid the creation of filenames containing certain bytes (e.g., newlines). That would help, and I encourage that. But that would NOT help interoperation with the huge number of devices that TODAY allow such filenames to be created. How would you access their data to do an easy transition?!? The POSIX specification needs to make it easy to TRANSITION systems, by providing simple mechanisms to list and open filenames that include newline, tab, escape, and other horrors. As I noted in the other bug reports, The "usual way" of doing this is to support lists of \0-terminated filenames; this is ALREADY widely supported, has essentially no overhead, and solves the problem. I understand that there's concern that "every utility would grow a \0 option handler", but that's overblown. In practice, adding such options to a very short list of utilities makes it easy to build the rest. By far the most important is "find", since this is the usual tool for walking a directory tree. Judiciously adding options just in a few other places makes it possible to process filename lists in "usual cases", and greatly simplifies building tools for other cases. In this case, adding an option for "read" means that a simple shell "while" loop can be used to process a list of filenames (e.g., output from find).

(0001390) eblake (manager) 2012-10-05 19:43	Note that at least bash already provides 'read -d $delim', where if $delim is the empty string, then read uses NUL instead of newline as the delimiter; but does not provide 'read -0'. If we argue that existing practice is important, then perhaps standardizing 'read -d' including the special empty string delimiter is the better approach to take than inventing 'read -0'.

(0001392) dwheeler (reporter) 2012-10-06 21:00	Yes, accepting "read -d DELIMITER" with empty-delimiter meaning \0, would also work just. I proposed "-0" because I thought that'd be easier to use for this case, as described above. But if the POSIX community would rather accept "read -d DELIMITER" with those semantics, that'd still be great. Would "read -d DELIMITER" be more likely to be accepted?

(0001393) dwheeler (reporter) 2012-10-08 16:33	Please change summary to: Add -d option to shell's "read" Here is an updated description: It is often useful to "read" values in shell, but to stop on some character or byte other than newline. For example, setting IFS and then using "read" will consume other characters through to a newline, which in many cases is not the desired outcome. An important example, as documented in 0000243, is that the POSIX specification and common implementations permit nearly all bytes to be in pathnames, and yet it is surprisingly difficult to portably and correctly process such pathnames because newline is a legal character in a filename. This is one of the more common reasons for security vulnerabilities (see CERT’s "Secure Coding" item MSC09-C, CWE 78, CWE 73, and CWE 116, and the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors). For more details about this problem, see: http://www.dwheeler.com/essays/filenames-in-shell.html [^] and http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [^] . The current situation is that it is too hard to correctly process filenames, leading to a number of security vulnerabilities. Expecting users and developers to use complicated constructs to handle filenames is unreasonable and dangerous; they should be given a safer and easy-to-use set of constructs for this common case. A read that allows a \0 delimiter makes it much easier to write a simple "while" loop over filenames; you can then write: ... while IFS="" read -r -d "" filename ... This proposed additional read parameter is already implemented in the "bash" shell. Here is an updated proposed action: Desired Action On line 103903, change: read [−r] var... to: read [-r] [-d DELIMITER] var... Replace line 103935 with: The following options are supported: -d DELIMITER The first character of DELIMITER is used to terminate the input line, instead of <newline>. IF DELIMITER is the empty string, the null byte is the delimiter, and sequences of <carriage-return> <newline> MUST NOT be converted to <newline>. Append to Line 104006 (rationale): The "-d" option enables reading up to an arbitrary delimiter, without consuming a whole line. Using an empty string as the delimiter enables easier reading of <NUL>-terminated lists of filenames, e.g., to prevent security vulnerabilities that might occur with filenames that contain <newline>.

(0001397) dwheeler (reporter) 2012-10-10 16:50	David Korn noted an error in the updated rationale (thank you!). Namely, changing the delimiter can read MORE than a line. So olease change the "Append to Line 104006 (rationale)" to as follows: === The "-d" option enables reading up to an arbitrary delimiter. Using an empty string as the delimiter enables easier reading of <NUL>-terminated lists of filenames, e.g., to prevent security vulnerabilities that might occur with filenames that contain <newline>. Note that "-d" and the empty string must always be specified as two separate arguments. === The proposal will need to be clarified further; see the posting about "\" and "character vs. byte" in the Wed, 10 Oct 2012 16:58:41 +0200 posting by Wilhelm Müller. I believe these issues can resolved, but I want to give the mailing list a chance to discuss them before updating.

(0002680) mirabilos (reporter) 2015-05-27 21:52	As mksh maintainer, I would also like to consider -d <delim> over -0, because it is more generally useful, and more widespread. I however disagree slightly with the wording: the first byte (C “char”) of the <delim> string should be used as delimiter, not the first (multibyte) character. Data read is ultimately just that, data, and one should not make the mistake to assume it to be in any specific encoding; scanning for bytes is perfectly fine. +1 on accepting 243, 244, 245 independent on whether “problematic” filenames are forbidden, though.

(0006099) Don Cragun (manager) 2023-01-09 16:20	The changes for this proposal are included in the resolution of 0000243.

Issue History
Date Modified	Username	Field	Change
2010-04-29 20:25	dwheeler	New Issue
2010-04-29 20:25	dwheeler	Status	New => Under Review
2010-04-29 20:25	dwheeler	Assigned To	=> ajosey
2010-04-29 20:25	dwheeler	Name	=> David A. Wheeler
2010-04-29 20:25	dwheeler	Section	=> read
2010-04-29 20:25	dwheeler	Page Number	=> 3128
2010-04-29 20:25	dwheeler	Line Number	=> 103903-104006
2011-07-06 23:42	Don Cragun	Relationship added	related to 0000243
2011-07-06 23:43	Don Cragun	Relationship added	related to 0000244
2011-07-06 23:56	Don Cragun	Note Added: 0000884
2011-11-27 22:09	dwheeler	Issue Monitored: dwheeler
2012-10-05 19:12	dwheeler	Note Added: 0001389
2012-10-05 19:43	eblake	Note Added: 0001390
2012-10-06 21:00	dwheeler	Note Added: 0001392
2012-10-08 16:33	dwheeler	Note Added: 0001393
2012-10-10 16:50	dwheeler	Note Added: 0001397
2015-05-27 21:52	mirabilos	Note Added: 0002680
2023-01-09 16:17	Don Cragun	Relationship replaced	duplicate of 0000243
2023-01-09 16:18	Don Cragun	Interp Status	=> ---
2023-01-09 16:18	Don Cragun	Status	Under Review => Closed
2023-01-09 16:18	Don Cragun	Resolution	Open => Duplicate
2023-01-09 16:20	Don Cragun	Note Added: 0006099
2023-08-22 06:30	Don Cragun	Relationship added	related to 0000251

Aardvark Mark IV