Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000406 [1003.1(2008)/Issue 7] Shell and Utilities Objection Enhancement Request 2011-04-14 23:00 2020-03-03 14:39
Reporter eblake View Status public  
Assigned To ajosey
Priority normal Resolution Accepted  
Status Applied  
Name Eric Blake
Organization Red Hat
User Reference ebb.dd
Section dd
Page Number 2582
Line Number 83165
Interp Status ---
Final Accepted Text
Summary 0000406: add dd iflags=fullblock, for better pipe interaction
Description The rationale for head states (XCU line 91101):

There is no −c option (as there is in tail) because it is not historical
practice and because other utilities in this volume of POSIX.1-2008
provide similar functionality.

but does not name what those other utilities might be. The only one I
can think of is dd. However, dd does _not_ provide the ability to
efficiently read a particular number of byte. It is limited to reading
count= blocks of input of ibs= bytes each, but if any of the input reads
are short, then fewer than count*ibs bytes will be output. The only way
to get an exact byte count is thus to use dd ibs=1 count=$n (since ibs=1
guarantees no short reads), but that causes $n reads. The problem of
short reads is most noticeable with pipes and FIFOs, where short reads
are common, but is also possible with regular files where the
implementation encounters signals that interrupt the middle of reading.

The GNU version of dd offers an extension iflags=fullblock which is used
to enable dd to do multiple reads until the complete ibs= size has been
read in, before proceeding on with the rest of the algorithm.
Standardizing this extension would make dd more useful with input from
pipes, while allowing more efficient blocking sizes.

If this extension is not standardized, then it would be wise to modify
the non-normative text to warn users that dd does not work well with
pipes for anything larger than ibs=1.

This proposal attempts to codify existing practice of one implementation
but requires introducing a new option iflags (GNU dd also uses iflags for
other purposes, such as iflags=noctty to specify the use of O_NOCTTY when
opening if=, but that is not included in this proposal). However, if
[io]flags is ever considered for standardization for use in specifying
various O_* flags to pass to open(), it may make more sense to instead
invent a new conversion specification conv=fullblock, rather than the
GNU spelling of iflags=fullblock.

This proposal helps with efficient read sizes from pipes when reading
to end of file or when reading an exact multiple of the block size, but
still lacks the ability to read an arbitrary number of bytes without
resorting to two dd processes, as in this method of truncating input at
exactly 10000 bytes while still writing an exact number of blocks for
an output file with a 4k block size:

process | { dd bs=4096 count=2 iflags=fullblock; \
      dd ibs=1808 count=1 iflags=fullblock conv=sync obs=4096; } > output

Therefore, it may also be worth standardizing a means of limiting input
to a fixed byte count which is not a multiple of the requested input
block size, although I don't know of any existing practice for that
in dd; for that usage, the non-standard 'head -c' makes more sense.
Desired Action At line 83245 [XCU dd OPERANDS], add a new paragraph:

iflags=fullblock Perform as many reads as required to reach the full
input block size or end of file, rather than acting on partial reads. If
this operand is in effect, then the count= operand refers to the number
of full input blocks rather than reads. Behavior is unspecified if
iflags=fullblock is requested alongside the sync, block, or unblock
conversions.

At line 83144 [dd DESCRIPTION], add a sentence to step 1:

If the iflags=fullblock operand is specified, this may entail multiple
reads; otherwise, the input block is used even if the read was shorter
than the specified block size.

At line 83321 [dd APPLICATION USAGE], add a new paragraph:

Using the count= operand of dd with a pipe or FIFO as the input can lead
to surprising results, since these file types are prone to encountering
short reads for any input block size other than 1. Unless the
iflags=fullblock operand is in effect, dd will stop after the specified
number of reads, rather than input blocks, and therefore can often
result in fewer bytes being output than the product of the count and
input block size.
Tags issue8
Attached Files

- Relationships
related to 0000407Appliedajosey add head -c 

-  Notes
(0000742)
jessegordon (reporter)
2011-04-15 17:43
edited on: 2011-04-15 21:45

This solution (iflags=fullblock) is a good one, but a basic question should be answered first:

Should dd, by default, be usable for reading from pipes?

By way of brief background, POSIX says that if dd is reading and read() returns a short read, it should count that as one of the count=blocks.

When reading from normal files and block devices, it seems read() generally never returns a short read except for reasons like EOF or hardware error or something that actually prevents the data from being read, so this policy seems harmless enough for those file types.

Some have said that when reading from a tape drive, that the tape driver code will return variable size blocks to match the variable size blocks on the tape, which is a handy way to signal dd to read in variable size blocks. This is good because it allows one to read x number of blocks, regardless of how many bytes are transferred.

Pipes, however, raise a problem: When reading from a pipe, read() frequently returns a short read simply because it read to the end of whatever data was in the pipe, and then the kernel has to task switch back to whatever process is filling the pipe so it can fill it some more.

As a result, the current default behavior of dd when reading pipes is that (unless ibs=1) the user has no assurance that dd will actually read in ibs*count bytes from a pipe or fifo, making it virtually unusable for reading from pipes.

To test, try the following command:

yes|dd bs=1000 count=1000|wc -c

And yet dd is supposed to be able to have any file type as input -- which I'm assuming include stdin (since it's the default) and fifos.

~~~

Thus, the contradiction is that dd is both supposed to work with pipes, and is also supposed to not retry when read() gets a short read, which makes pipes useless for reading from.

So either dd should be listed as "Cannot read from pipes unless bs=1" or it should say that "When reading from pipes, read() should be called repeatedly until ibs bytes have been read, or EOF or program is terminated"
~~~

To me, the solution is to change it so that dd just always reads ibs*count bytes from pipes. But some (perhaps validly) argue that there are 40 years of scripts which depend on dd giving short reads on pipes (No example cited).

(But I did just think of this -- what if one was reading from a tape via a pipe ( cat /dev/mt0 | dd ) would cat and the pipe maintain the tape's original variable size block boundaries and pass those onto dd? I have no idea. I've never used a tape backup. But I'm trying to think of all the possibilities.)


I can't see how the current dd behavior could possibly be depended on since it basically makes reading from pipes useless for most cases, because the user (whether they know it or not) cannot depend on dd reading ibs*count bytes for any acceptable range of ibs and count.

It should be serious concern whenever there is a situation where a utility causes dataloss even though the user is using the utility according to the user documentation.
(For example, the average user who reads the --help and man page of dd would have no way of knowing that there are some complications with read() that would cause data loss when reading from pipes. I would have never guessed that reading from pipes routinely caused short reads -- but then I, like most dd users, am not a kernel programmer.)

Thus, the problem could be solved 2 ways:

o Clarify that dd by default is not safe for reading from pipes, and then include Mr. Blake's suggested iflags=fullblock; or

o (my favorite) Clarify that dd should, by default, read ibs*count bytes from pipes and fifos, by re-read()'ing for each input block until that block has read ibs bytes. (While maintaining, of course, the current short-read behavior for all other file types.)

Thank you very much,

Jesse Gordon

(0000775)
jessegordon (reporter)
2011-04-28 18:51
edited on: 2011-04-28 22:00

It seems to me that since by default, count= is for read()'s rather than blocks, the description of functionality should be reoriented along the lines of:

count=n: Limit to n reads, where a read is an operating system and context defined variable size number of bytes no greater then ibs. (see iflags=fullblock to cause count to mean blocks instead of reads.)

ibs: Maximum number of bytes for each read, where a read is an operating system and context defined variable size number of bytes. See iflags=fullblocks to make ibs a fixed size.

That way, folks who aren't kernel programmers won't be caught by surprise even after reading "count=BLOCKS" in the man page.

~~~~~

As to "Behavior is unspecified if
iflags=fullblock is requested alongside the sync, block, or unblock
conversions."

This doesn't seem to make sense to me. block and unblock should work fine with iflags=fullblock and maybe even sync.

From man page:
block pad newline-terminated records with spaces to cbs-size

So what if I want to read from stdin a new-line terminated data file with ibs=1000 count=1000 iflags=fullblock,block?
I will end up reading less then my desired number of bytes (Or some other worse unspecified result.) I don't particularly see why fullblocks should not work with block.

However, with variable length line records, it's not too likely that someone would limit their read by byte count but rather line count. (But see unblock below.)

~~~~~~~
From man page:
unblock replace trailing spaces in cbs-size records with newline

Again, and even more so, here we are reading a file with fixed length records and it is even more likely that the user will wish to read exactly ibs*count bytes, and it is even more important that iflags=fullblock work with unblock then it is with block.

~~~~~~~~~~~~~

From the man page:
sync pad every input block with NULs to ibs-size;....

What if I want to read via stdin a variable number of bytes, but pad the last read to produce a full count*ibs output?

Remember, even with iflag=fullblock there is still one reason that less then a full ibs will be read -- and that is because EOF -- and someone may well want to pad that last block out with nulls.

Thank you very much,

Jesse Gordon


- Issue History
Date Modified Username Field Change
2011-04-14 23:00 eblake New Issue
2011-04-14 23:00 eblake Status New => Under Review
2011-04-14 23:00 eblake Assigned To => ajosey
2011-04-14 23:00 eblake Name => Eric Blake
2011-04-14 23:00 eblake Organization => Red Hat
2011-04-14 23:00 eblake User Reference => ebb.dd
2011-04-14 23:00 eblake Section => dd
2011-04-14 23:00 eblake Page Number => 2582
2011-04-14 23:00 eblake Line Number => 83165
2011-04-14 23:00 eblake Interp Status => ---
2011-04-15 17:43 jessegordon Note Added: 0000742
2011-04-15 21:45 jessegordon Note Edited: 0000742
2011-04-28 15:48 msbrown Tag Attached: issue8
2011-04-28 15:48 msbrown Status Under Review => Resolved
2011-04-28 15:48 msbrown Resolution Open => Accepted
2011-04-28 16:52 eblake Relationship added related to 0000407
2011-04-28 18:51 jessegordon Note Added: 0000775
2011-04-28 22:00 jessegordon Note Edited: 0000775
2020-03-03 14:39 geoffclare Status Resolved => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker