View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000406 | 1003.1(2008)/Issue 7 | Shell and Utilities | public | 2011-04-14 23:00 | 2024-06-11 08:53 |
Reporter | eblake | Assigned To | ajosey | ||
Priority | normal | Severity | Objection | Type | Enhancement Request |
Status | Closed | Resolution | Accepted | ||
Name | Eric Blake | ||||
Organization | Red Hat | ||||
User Reference | ebb.dd | ||||
Section | dd | ||||
Page Number | 2582 | ||||
Line Number | 83165 | ||||
Interp Status | --- | ||||
Final Accepted Text | |||||
Summary | 0000406: add dd iflags=fullblock, for better pipe interaction | ||||
Description | The rationale for head states (XCU line 91101): There is no −c option (as there is in tail) because it is not historical practice and because other utilities in this volume of POSIX.1-2008 provide similar functionality. but does not name what those other utilities might be. The only one I can think of is dd. However, dd does _not_ provide the ability to efficiently read a particular number of byte. It is limited to reading count= blocks of input of ibs= bytes each, but if any of the input reads are short, then fewer than count*ibs bytes will be output. The only way to get an exact byte count is thus to use dd ibs=1 count=$n (since ibs=1 guarantees no short reads), but that causes $n reads. The problem of short reads is most noticeable with pipes and FIFOs, where short reads are common, but is also possible with regular files where the implementation encounters signals that interrupt the middle of reading. The GNU version of dd offers an extension iflags=fullblock which is used to enable dd to do multiple reads until the complete ibs= size has been read in, before proceeding on with the rest of the algorithm. Standardizing this extension would make dd more useful with input from pipes, while allowing more efficient blocking sizes. If this extension is not standardized, then it would be wise to modify the non-normative text to warn users that dd does not work well with pipes for anything larger than ibs=1. This proposal attempts to codify existing practice of one implementation but requires introducing a new option iflags (GNU dd also uses iflags for other purposes, such as iflags=noctty to specify the use of O_NOCTTY when opening if=, but that is not included in this proposal). However, if [io]flags is ever considered for standardization for use in specifying various O_* flags to pass to open(), it may make more sense to instead invent a new conversion specification conv=fullblock, rather than the GNU spelling of iflags=fullblock. This proposal helps with efficient read sizes from pipes when reading to end of file or when reading an exact multiple of the block size, but still lacks the ability to read an arbitrary number of bytes without resorting to two dd processes, as in this method of truncating input at exactly 10000 bytes while still writing an exact number of blocks for an output file with a 4k block size: process | { dd bs=4096 count=2 iflags=fullblock; \ dd ibs=1808 count=1 iflags=fullblock conv=sync obs=4096; } > output Therefore, it may also be worth standardizing a means of limiting input to a fixed byte count which is not a multiple of the requested input block size, although I don't know of any existing practice for that in dd; for that usage, the non-standard 'head -c' makes more sense. | ||||
Desired Action | At line 83245 [XCU dd OPERANDS], add a new paragraph: iflags=fullblock Perform as many reads as required to reach the full input block size or end of file, rather than acting on partial reads. If this operand is in effect, then the count= operand refers to the number of full input blocks rather than reads. Behavior is unspecified if iflags=fullblock is requested alongside the sync, block, or unblock conversions. At line 83144 [dd DESCRIPTION], add a sentence to step 1: If the iflags=fullblock operand is specified, this may entail multiple reads; otherwise, the input block is used even if the read was shorter than the specified block size. At line 83321 [dd APPLICATION USAGE], add a new paragraph: Using the count= operand of dd with a pipe or FIFO as the input can lead to surprising results, since these file types are prone to encountering short reads for any input block size other than 1. Unless the iflags=fullblock operand is in effect, dd will stop after the specified number of reads, rather than input blocks, and therefore can often result in fewer bytes being output than the product of the count and input block size. | ||||
Tags | issue8 |
|
This solution (iflags=fullblock) is a good one, but a basic question should be answered first: Should dd, by default, be usable for reading from pipes? By way of brief background, POSIX says that if dd is reading and read() returns a short read, it should count that as one of the count=blocks. When reading from normal files and block devices, it seems read() generally never returns a short read except for reasons like EOF or hardware error or something that actually prevents the data from being read, so this policy seems harmless enough for those file types. Some have said that when reading from a tape drive, that the tape driver code will return variable size blocks to match the variable size blocks on the tape, which is a handy way to signal dd to read in variable size blocks. This is good because it allows one to read x number of blocks, regardless of how many bytes are transferred. Pipes, however, raise a problem: When reading from a pipe, read() frequently returns a short read simply because it read to the end of whatever data was in the pipe, and then the kernel has to task switch back to whatever process is filling the pipe so it can fill it some more. As a result, the current default behavior of dd when reading pipes is that (unless ibs=1) the user has no assurance that dd will actually read in ibs*count bytes from a pipe or fifo, making it virtually unusable for reading from pipes. To test, try the following command: yes|dd bs=1000 count=1000|wc -c And yet dd is supposed to be able to have any file type as input -- which I'm assuming include stdin (since it's the default) and fifos. ~~~ Thus, the contradiction is that dd is both supposed to work with pipes, and is also supposed to not retry when read() gets a short read, which makes pipes useless for reading from. So either dd should be listed as "Cannot read from pipes unless bs=1" or it should say that "When reading from pipes, read() should be called repeatedly until ibs bytes have been read, or EOF or program is terminated" ~~~ To me, the solution is to change it so that dd just always reads ibs*count bytes from pipes. But some (perhaps validly) argue that there are 40 years of scripts which depend on dd giving short reads on pipes (No example cited). (But I did just think of this -- what if one was reading from a tape via a pipe ( cat /dev/mt0 | dd ) would cat and the pipe maintain the tape's original variable size block boundaries and pass those onto dd? I have no idea. I've never used a tape backup. But I'm trying to think of all the possibilities.) I can't see how the current dd behavior could possibly be depended on since it basically makes reading from pipes useless for most cases, because the user (whether they know it or not) cannot depend on dd reading ibs*count bytes for any acceptable range of ibs and count. It should be serious concern whenever there is a situation where a utility causes dataloss even though the user is using the utility according to the user documentation. (For example, the average user who reads the --help and man page of dd would have no way of knowing that there are some complications with read() that would cause data loss when reading from pipes. I would have never guessed that reading from pipes routinely caused short reads -- but then I, like most dd users, am not a kernel programmer.) Thus, the problem could be solved 2 ways: o Clarify that dd by default is not safe for reading from pipes, and then include Mr. Blake's suggested iflags=fullblock; or o (my favorite) Clarify that dd should, by default, read ibs*count bytes from pipes and fifos, by re-read()'ing for each input block until that block has read ibs bytes. (While maintaining, of course, the current short-read behavior for all other file types.) Thank you very much, Jesse Gordon |
|
It seems to me that since by default, count= is for read()'s rather than blocks, the description of functionality should be reoriented along the lines of: count=n: Limit to n reads, where a read is an operating system and context defined variable size number of bytes no greater then ibs. (see iflags=fullblock to cause count to mean blocks instead of reads.) ibs: Maximum number of bytes for each read, where a read is an operating system and context defined variable size number of bytes. See iflags=fullblocks to make ibs a fixed size. That way, folks who aren't kernel programmers won't be caught by surprise even after reading "count=BLOCKS" in the man page. ~~~~~ As to "Behavior is unspecified if iflags=fullblock is requested alongside the sync, block, or unblock conversions." This doesn't seem to make sense to me. block and unblock should work fine with iflags=fullblock and maybe even sync. From man page: block pad newline-terminated records with spaces to cbs-size So what if I want to read from stdin a new-line terminated data file with ibs=1000 count=1000 iflags=fullblock,block? I will end up reading less then my desired number of bytes (Or some other worse unspecified result.) I don't particularly see why fullblocks should not work with block. However, with variable length line records, it's not too likely that someone would limit their read by byte count but rather line count. (But see unblock below.) ~~~~~~~ From man page: unblock replace trailing spaces in cbs-size records with newline Again, and even more so, here we are reading a file with fixed length records and it is even more likely that the user will wish to read exactly ibs*count bytes, and it is even more important that iflags=fullblock work with unblock then it is with block. ~~~~~~~~~~~~~ From the man page: sync pad every input block with NULs to ibs-size;.... What if I want to read via stdin a variable number of bytes, but pad the last read to produce a full count*ibs output? Remember, even with iflag=fullblock there is still one reason that less then a full ibs will be read -- and that is because EOF -- and someone may well want to pad that last block out with nulls. Thank you very much, Jesse Gordon |
Date Modified | Username | Field | Change |
---|---|---|---|
2011-04-14 23:00 | eblake | New Issue | |
2011-04-14 23:00 | eblake | Status | New => Under Review |
2011-04-14 23:00 | eblake | Assigned To | => ajosey |
2011-04-14 23:00 | eblake | Name | => Eric Blake |
2011-04-14 23:00 | eblake | Organization | => Red Hat |
2011-04-14 23:00 | eblake | User Reference | => ebb.dd |
2011-04-14 23:00 | eblake | Section | => dd |
2011-04-14 23:00 | eblake | Page Number | => 2582 |
2011-04-14 23:00 | eblake | Line Number | => 83165 |
2011-04-14 23:00 | eblake | Interp Status | => --- |
2011-04-15 17:43 | jessegordon | Note Added: 0000742 | |
2011-04-15 21:45 | jessegordon | Note Edited: 0000742 | |
2011-04-28 15:48 | msbrown | Tag Attached: issue8 | |
2011-04-28 15:48 | msbrown | Status | Under Review => Resolved |
2011-04-28 15:48 | msbrown | Resolution | Open => Accepted |
2011-04-28 16:52 | eblake | Relationship added | related to 0000407 |
2011-04-28 18:51 | jessegordon | Note Added: 0000775 | |
2011-04-28 22:00 | jessegordon | Note Edited: 0000775 | |
2020-03-03 14:39 | geoffclare | Status | Resolved => Applied |
2024-06-11 08:53 | agadmin | Status | Applied => Closed |
2024-09-05 15:25 | nick | Relationship added | related to 0001854 |