Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000876 [1003.1(2013)/Issue7+TC1] Shell and Utilities Objection Omission 2014-09-11 14:55 2019-06-10 08:54
Reporter eblake View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Eric Blake
Organization Red Hat
User Reference eblake.cat
Section cat
Page Number 2526
Line Number 81474
Interp Status Approved
Final Accepted Text See Note: 0002424.
Summary 0000876: allow implementations to fail on 'cat a >> a'
Description Several existing implementations of cat explicitly refuse to output to the same file descriptor as any of its inputs, to avoid filling up the disk if the file is non-empty:

Solaris:
$ cd /tmp/
$ touch a
$ /usr/bin/cat a >> a
cat: input/output files 'a' identical

GNU:
$ cd /tmp/
$ touch a
$ cat a >> a
cat: a: input file is output file

This behavior doesn't seem to be permitted by the standard, although it is useful. The proposal here only fixes cat, although the group may decide to make the allowance for same input/output rejection have wider scope.
Desired Action At line 81474 [XCU cat STDOUT], add a sentence:
If the standard output is a regular file, and is the same file as any of the input file operands, the implementation may treat this as an error without writing anything to the output file.
Tags tc2-2008
Attached Files

- Relationships

-  Notes
(0002377)
eblake (manager)
2014-09-11 16:23

Paul Eggert pointed out to me that the example at line 81508 may also need updating:

Because of the shell language mechanism used to perform output redirection, a command such as this:
 cat doc doc.end > doc
causes the original data in doc to be lost.

On my test system, data was indeed lost
$ echo 1 > a
$ echo 2 > b
$ cat a b > a
cat: a: input file is output file
$ cat a
2

but it also demonstrates that cat merely skipped the processing of 'a', rather than failing up front. So it may be better to document that an input file is skipped if it is the same as the output (causing an overall error), rather than my original proposed wording tied to the output file.
(0002378)
eblake (manager)
2014-09-11 16:27

Arguably, using the fact that cat'ting a non-empty regular file to itself will eventually fail due to the disk being full, one can argue that POSIX already allows implementations to fail (and doesn't forbid from failing early, rather than first exhausting the disk). After all, we justified the reason that 'rm -rf /' is allowed to fail early rather than late based on the fact that it will eventually fail when rm is itself removed, and failing earlier is nicer if we can prove eventual failure would have happened. But it then raises the issue of whether cat'ting an empty file to itself is allowed to fail (since if permitted, it would not fill the disk, therefore, we cannot prove that it would fail late, so it is harder to justify it failing early).
(0002379)
shware_systems (reporter)
2014-09-11 16:54

Would it be better to make explicit, for those file types where the size is known when the operand is encountered, that cat shall only copy those size bytes in number, regardless of target? This would have

cat a >> a

only doubling in size, not entering an infinite loop that exhausts the disk. I believe this matches better the intent of the utility; copy this chunk of data, as it is now, to standard output.
(0002380)
eblake (manager)
2014-09-11 17:02

In response to Note: 0002379:
If we were designing from scratch, maybe. But it would render existing implementations non-compliant; my argument here is to relax the standard to allow existing behavior, not to mandate new behavior of only doubling in size.
(0002381)
Don Cragun (manager)
2014-09-11 17:33
edited on: 2014-09-11 17:34

Concerning Note: 0002377: The command: cat a b > a
does not indicate that cat skipped the contents of a; the contents of a were destroyed by the shell when it performed the requested redirection before cat entered main().

(0002382)
eblake (manager)
2014-09-11 17:42

Ah, but:

$ echo 1 > a
$ echo 2 > b
$ echo 3 > c
$ cat a b c > b
cat: b: input file is output file
$ cat b
1
3

this time, the contents of 'b' are NOT empty at the time the error message about b is produced.
(0002384)
geoffclare (manager)
2014-09-12 09:40
edited on: 2014-10-17 09:22

I checked a bunch of other utilities on Solaris, HP-UX and Linux and the only other one that produced an error was (GNU) grep on Linux. So we should definitely make this change for grep as well, but I agree we should consider allowing it for some other utilities.

The additional utilities I checked were: awk grep cut dd fold head m4 more nl paste od pr sed tail uniq. The file I used was not empty, in case that would make a difference.

Update: In the Oct 16 teleconference it was decided to allow this only for cat. For cat, redirection to an input file is invariably only done by mistake, but for other utilities there are legitimate cases where this might be done and we do not want to prevent power users from making use of them. For example, a one-to-one substitution can be performed in place (thus not needing space for a second copy of the file) by doing a read-write redirection of stdout, e.g. sed s/a/A/ file 1<> file

In the case of grep, a legitimate use case is:

grep -l 'regexp' * > filelist

which is a perfectly reasonable thing to do if it is known that none of the filenames match the regexp. In shells that support extended globbing the * could be replaced with !(filelist) but a portable method of avoiding filelist being an input file (assuming it has to be in the current directory) would be complicated and hard to get right.

(0002424)
Don Cragun (manager)
2014-10-16 17:11
edited on: 2014-10-23 15:41

Interpretation response:
------------------------
The standard states that the cat utility is required to copy the files named as operands to standard output even if standard output is redirected to one of those input files, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
A cat command which redirects its standard output to a file that is also named as a file operand is likely to run until the output file reaches the maximum output file size allowed for that process or the underlying filesystem runs out of space. This is a common application error that accidentally consumes a lot of space needed by other users on the system. Therefore, many implementations of the cat utility check for this condition and, when it is found, print a diagnostic message and exit with a non-zero exit status. This behavior is not currently allowed by the standard, but should be.

Notes to the Editor (not part of this interpretation):
At page 2526 line 81474 (XCU cat STDOUT), add a sentence:
If the standard output is a regular file, and is the same file as any of the input file operands, the implementation may treat this as an error.

At page 2526 lines 81506-81509 (XCU cat EXAMPLES), change:
Because of the shell language mechanism used to perform output redirection, a command such as this:
cat doc doc.end > doc

causes the original data in doc to be lost.

to:
Because of the shell language mechanism used to perform output redirection, a command such as this:
cat doc doc.end > doc

causes the original data in doc to be lost before cat even begins execution. This is true whether the cat command fails with an error or silently succeeds (the specification allows both behaviors). In order to append the contents of doc.end without losing the original contents of doc, this command should be used instead:
cat doc.end >> doc


(0002452)
ajosey (manager)
2014-11-27 10:33

Interpretation Proposed: 27 November 2014
(0002515)
ajosey (manager)
2015-01-05 14:13

Interpretation approved: 5 Jan 2015

- Issue History
Date Modified Username Field Change
2014-09-11 14:55 eblake New Issue
2014-09-11 14:55 eblake Name => Eric Blake
2014-09-11 14:55 eblake Organization => Red Hat
2014-09-11 14:55 eblake User Reference => eblake.cat
2014-09-11 14:55 eblake Section => cat
2014-09-11 14:55 eblake Page Number => 2526
2014-09-11 14:55 eblake Line Number => 81474
2014-09-11 14:55 eblake Interp Status => ---
2014-09-11 16:23 eblake Note Added: 0002377
2014-09-11 16:27 eblake Note Added: 0002378
2014-09-11 16:54 shware_systems Note Added: 0002379
2014-09-11 17:02 eblake Note Added: 0002380
2014-09-11 17:33 Don Cragun Note Added: 0002381
2014-09-11 17:34 Don Cragun Note Edited: 0002381
2014-09-11 17:42 eblake Note Added: 0002382
2014-09-12 09:40 geoffclare Note Added: 0002384
2014-10-16 17:11 Don Cragun Note Added: 0002424
2014-10-16 17:12 Don Cragun Note Edited: 0002424
2014-10-16 17:13 Don Cragun Interp Status --- => Pending
2014-10-16 17:13 Don Cragun Final Accepted Text => See Note: 0002424.
2014-10-16 17:13 Don Cragun Status New => Interpretation Required
2014-10-16 17:13 Don Cragun Resolution Open => Accepted
2014-10-17 08:59 geoffclare Note Edited: 0002384
2014-10-17 09:03 geoffclare Note Edited: 0002384
2014-10-17 09:22 geoffclare Note Edited: 0002384
2014-10-23 15:19 Don Cragun Note Edited: 0002424
2014-10-23 15:19 Don Cragun Note Edited: 0002424
2014-10-23 15:36 Don Cragun Note Edited: 0002424
2014-10-23 15:38 Don Cragun Note Edited: 0002424
2014-10-23 15:39 Don Cragun Note Edited: 0002424
2014-10-23 15:39 Don Cragun Resolution Accepted => Accepted As Marked
2014-10-23 15:40 geoffclare Tag Attached: tc2-2008
2014-10-23 15:41 Don Cragun Note Edited: 0002424
2014-11-27 10:33 ajosey Interp Status Pending => Proposed
2014-11-27 10:33 ajosey Note Added: 0002452
2015-01-05 14:13 ajosey Interp Status Proposed => Approved
2015-01-05 14:13 ajosey Note Added: 0002515
2019-06-10 08:54 agadmin Status Interpretation Required => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker