0000406: add dd iflags=fullblock, for better pipe interaction

Notes
(0000742) jessegordon (reporter) 2011-04-15 17:43 edited on: 2011-04-15 21:45	This solution (iflags=fullblock) is a good one, but a basic question should be answered first: Should dd, by default, be usable for reading from pipes? By way of brief background, POSIX says that if dd is reading and read() returns a short read, it should count that as one of the count=blocks. When reading from normal files and block devices, it seems read() generally never returns a short read except for reasons like EOF or hardware error or something that actually prevents the data from being read, so this policy seems harmless enough for those file types. Some have said that when reading from a tape drive, that the tape driver code will return variable size blocks to match the variable size blocks on the tape, which is a handy way to signal dd to read in variable size blocks. This is good because it allows one to read x number of blocks, regardless of how many bytes are transferred. Pipes, however, raise a problem: When reading from a pipe, read() frequently returns a short read simply because it read to the end of whatever data was in the pipe, and then the kernel has to task switch back to whatever process is filling the pipe so it can fill it some more. As a result, the current default behavior of dd when reading pipes is that (unless ibs=1) the user has no assurance that dd will actually read in ibscount bytes from a pipe or fifo, making it virtually unusable for reading from pipes. To test, try the following command: yes\|dd bs=1000 count=1000\|wc -c And yet dd is supposed to be able to have any file type as input -- which I'm assuming include stdin (since it's the default) and fifos. ~~~ Thus, the contradiction is that dd is both supposed to work with pipes, and is also supposed to not retry when read() gets a short read, which makes pipes useless for reading from. So either dd should be listed as "Cannot read from pipes unless bs=1" or it should say that "When reading from pipes, read() should be called repeatedly until ibs bytes have been read, or EOF or program is terminated" ~~~ To me, the solution is to change it so that dd just always reads ibscount bytes from pipes. But some (perhaps validly) argue that there are 40 years of scripts which depend on dd giving short reads on pipes (No example cited). (But I did just think of this -- what if one was reading from a tape via a pipe ( cat /dev/mt0 \| dd ) would cat and the pipe maintain the tape's original variable size block boundaries and pass those onto dd? I have no idea. I've never used a tape backup. But I'm trying to think of all the possibilities.) I can't see how the current dd behavior could possibly be depended on since it basically makes reading from pipes useless for most cases, because the user (whether they know it or not) cannot depend on dd reading ibscount bytes for any acceptable range of ibs and count. It should be serious concern whenever there is a situation where a utility causes dataloss even though the user is using the utility according to the user documentation. (For example, the average user who reads the --help and man page of dd would have no way of knowing that there are some complications with read() that would cause data loss when reading from pipes. I would have never guessed that reading from pipes routinely caused short reads -- but then I, like most dd users, am not a kernel programmer.) Thus, the problem could be solved 2 ways: o Clarify that dd by default is not safe for reading from pipes, and then include Mr. Blake's suggested iflags=fullblock; or o (my favorite) Clarify that dd should, by default, read ibscount bytes from pipes and fifos, by re-read()'ing for each input block until that block has read ibs bytes. (While maintaining, of course, the current short-read behavior for all other file types.) Thank you very much, Jesse Gordon

(0000775) jessegordon (reporter) 2011-04-28 18:51 edited on: 2011-04-28 22:00	It seems to me that since by default, count= is for read()'s rather than blocks, the description of functionality should be reoriented along the lines of: count=n: Limit to n reads, where a read is an operating system and context defined variable size number of bytes no greater then ibs. (see iflags=fullblock to cause count to mean blocks instead of reads.) ibs: Maximum number of bytes for each read, where a read is an operating system and context defined variable size number of bytes. See iflags=fullblocks to make ibs a fixed size. That way, folks who aren't kernel programmers won't be caught by surprise even after reading "count=BLOCKS" in the man page. ~~~~~ As to "Behavior is unspecified if iflags=fullblock is requested alongside the sync, block, or unblock conversions." This doesn't seem to make sense to me. block and unblock should work fine with iflags=fullblock and maybe even sync. From man page: block pad newline-terminated records with spaces to cbs-size So what if I want to read from stdin a new-line terminated data file with ibs=1000 count=1000 iflags=fullblock,block? I will end up reading less then my desired number of bytes (Or some other worse unspecified result.) I don't particularly see why fullblocks should not work with block. However, with variable length line records, it's not too likely that someone would limit their read by byte count but rather line count. (But see unblock below.) ~~~~~~~ From man page: unblock replace trailing spaces in cbs-size records with newline Again, and even more so, here we are reading a file with fixed length records and it is even more likely that the user will wish to read exactly ibscount bytes, and it is even more important that iflags=fullblock work with unblock then it is with block. ~~~~~~~~~~~~~ From the man page: sync pad every input block with NULs to ibs-size;.... What if I want to read via stdin a variable number of bytes, but pad the last read to produce a full countibs output? Remember, even with iflag=fullblock there is still one reason that less then a full ibs will be read -- and that is because EOF -- and someone may well want to pad that last block out with nulls. Thank you very much, Jesse Gordon

Issue History
Date Modified	Username	Field	Change
2011-04-14 23:00	eblake	New Issue
2011-04-14 23:00	eblake	Status	New => Under Review
2011-04-14 23:00	eblake	Assigned To	=> ajosey
2011-04-14 23:00	eblake	Name	=> Eric Blake
2011-04-14 23:00	eblake	Organization	=> Red Hat
2011-04-14 23:00	eblake	User Reference	=> ebb.dd
2011-04-14 23:00	eblake	Section	=> dd
2011-04-14 23:00	eblake	Page Number	=> 2582
2011-04-14 23:00	eblake	Line Number	=> 83165
2011-04-14 23:00	eblake	Interp Status	=> ---
2011-04-15 17:43	jessegordon	Note Added: 0000742
2011-04-15 21:45	jessegordon	Note Edited: 0000742
2011-04-28 15:48	msbrown	Tag Attached: issue8
2011-04-28 15:48	msbrown	Status	Under Review => Resolved
2011-04-28 15:48	msbrown	Resolution	Open => Accepted
2011-04-28 16:52	eblake	Relationship added	related to 0000407
2011-04-28 18:51	jessegordon	Note Added: 0000775
2011-04-28 22:00	jessegordon	Note Edited: 0000775
2020-03-03 14:39	geoffclare	Status	Resolved => Applied
2024-06-11 08:53	agadmin	Status	Applied => Closed

Aardvark Mark IV