Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000530 [1003.1(2008)/Issue 7] Shell and Utilities Objection Enhancement Request 2011-12-21 14:53 2012-09-10 16:06
Reporter dwheeler View Status public  
Assigned To ajosey
Priority normal Resolution Rejected  
Status Closed  
Name David A. Wheeler
Organization
User Reference
Section sed
Page Number 3153-3161
Line Number 104759-105140
Interp Status ---
Final Accepted Text
Summary 0000530: Support in-place editing in sed (-iEXTENSION)
Description I propose standardizing option(s) so sed users can perform in-place editing of files in an easier, more efficient, and less error-prone way. This capability is already supported in several widely-used sed implementations using (at least) the “-i” option; I know it’s in at least GNU sed, FreeBSD sed, and MacOS sed. Here are the details on why I think this is worth standardizing.

It is extremely common for sed to be used on already-existing files to make changes. Currently, the obvious way to do this portably is to combine sed with mv or cp. That works, but this is overly complicated for such a common simple task, especially when there are multiple files (e.g., find … -exec … {}.bak isn’t portable, and using while loops portably and correctly requires read -r, IFS setting, quoting everywhere, and assuming filenames won’t include newline). It is also inefficient; this starts at least two processes for each file, and sed must be re-initialized separately for each file (again, this matters when there are many files).

Search-and-replace becomes really easy to do efficiently with in-place editing. E.g.:
find . -name "*.c" -exec sed -i.bak -e 's/foo/bar/g' {} +
or if it’s just the current directory:
sed -i.bak -e 's/foo/bar/g' *.c

A variety of sed implementations support in-place editing:
* GNU sed: http://www.gnu.org/software/sed/manual/html_node/Invoking-sed.html#Invoking-sed [^] (note that extensions containing “*” are treated specially)
* FreeBSD sed: http://www.freebsd.org/cgi/man.cgi?query=sed&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html [^] (it has a -I as well as -i)
* MacOS sed: http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man1/sed.1.html [^] (their documentation recommends using an extension to keep a backup, but it does document a no-backup-extension capability).

The Wikipedia article on sed (as of this moment in time) highlights this extension to sed as important, saying: “GNU sed added several new features. The best-known is in-place editing of files (i.e., replace the original file with the result of applying the sed program), which was later included in BSD sed too. This feature is nowadays often used instead of ed scripts....”.

Let me try to quickly counter some objections:
1. Won’t this imply adding a similar option to every filter? No. The “sed” utility is usually used for relatively simple text manipulation cases, so making it especially easy to use for its common case makes sense.
2. Why should it manipulate files when it deals with streams? The sed utility already reads data from files, after all, that’s where the data often is. This option just makes it easier to put the data back where it belongs once stream processing is done.

The good news is that there is widespread support for in-place editing, and the syntax even has wide agreement *if* the original file contents are backed up with an extension. The syntax, supported by GNU sed, FreeBSD sed, and MacOS sed (at least), is “-iEXTENSION”. With this, each original “file” is kept as “fileEXTENSION” and the new contents are stored in “file”. (Perl even copied that syntax; I’m not proposing to standardize perl, just noting that this capability is so common that other tools are copying it.) These implementations differ subtly in semantics if “*” is part of the extension, because in GNU sed, every instance of “*” is replaced with the filename. I don’t think this GNU sed capability is used often and thus doesn’t need to be standardized; it’d be easy to just declare that the semantics are undefined if “*” is part of EXTENSION, so that GNU sed can keep doing that but no one else has to copy it. In all cases, line numbering restarts at 1 for each new file, and “$” matches the end of each file.

Sadly, GNU sed and FreeBSD/MacOS differ on the syntax if no extension is cuddled right after the “-i”. In GNU sed, the extension is optional; if it’s not immediately cuddled after the “-i” then there is no extension, and no backup is kept (perl uses this syntax too). In FreeBSD and MacOS, the extension is required; if no extension is cuddled after the “-i” then the NEXT ARGUMENT is used as the extension, and if that next argument is the 0-length "", no backup is kept. So everyone supports backups if you just cuddle the extension right after “-i”. In addition, everyone supports omitting the backup, but they have slightly different incompatible syntaxes for doing so, which leads to an interesting question in that case.

I see these options for standardizing in-place editing without backups:
1. Don’t support it in the standard; require that backups be kept when using -i, and require users to do the mv by hand if they don’t want backups. I think that’s a bad idea. Not needing backups is common (particularly when it’s part of a script or makefile command), so people will probably keep using nonstandard constructs for common cases. If someone uses -i on “all files” and forgets to erase the backups after each time, they can end up with file.bak, file.bak.bak, file.bak.bak.bak, and so on. It’s also a problem when you have many big files; the option flag system shouldn’t force people to keep backups if they are processing 10,000 files that are a gigabyte each! The standard can do better.
2. Require everyone to support -i "" (a separate null-length argument) as a special case meaning “don’t keep a backup”. FreeBSD and MacOS do this already. This would require a patch to GNU sed, but it’s hard to imagine that this weird extension would interfere with any scripts; a null-length string is not a valid option, and it’s not a valid filename. GNU sed could, when it confronts “-i” with no extension, look at the next parameter to see if it’s zero-length and consume it if it is. It is odd, however.
3. Create a new option flag to mean “-i, but do not keep backup files”. I don’t care so much which flag is used, but in the spirit of putting forth a concrete proposal, I would propose “-N” (for iN-place). This would be a standards committee invention, but the invention is not the ability (many sed implementations have this). The invention is the name of a common option flag, created because there are incompatible interfaces in current practice and the point of a standard is to create some common way to invoke functionality.

I’m currently proposing #3, but option #2 would work too.

If we wish to stay compatible with current practice this a minor violation of XBD 12.1 point 2a. Basically, users MUST cuddle the extension right after “-i” (as this is the agreed-on intersection between GNU and FreeBSD/Apple). But lots of commands already violate XBD 12.1 in various small ways. This particular kind of minor violation occurs in several other places in the standard. In addition, it doesn’t create a new violation, it just documents a minor violation already in existing practice.
Desired Action Lines 104762-104764: After each “[-n]” add: [-iEXTENSION | -N ]

Line 104772: Append, “and that after -i (if any) there shall be an extension without intervening whitespace.”

After line 104777:
-iEXTENSION Edit files in-place, that is, record a backup copy of each file in fileEXTENSION, and record the results of editing in the original filename.
-N Record the results of the editing in the original filename; once sed has completed, no backup copy is kept.

After line 104841, add:
If the “-i” or “-N” (in-place) options are used, the files are edited “in-place”: Once sed has completed, each file contains the result of processing that file through sed (instead of being sent to standard out). If -i is used, then the original file contents for each given file shall be backed up in a file with the name of the original file with EXTENSION appended. The behavior is unspecified for “-i” if EXTENSION contains “*” or if there is no EXTENSION. When editing “in-place” using -i or -N, opening each file shall restart line numbering at 1, the “$” address shall match the last line of the current file, and address ranges are limited to one file. The utility shall halt with an error if “-i” or “-N” are used but no file name is provided.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0001170)
eblake (manager)
2012-03-08 17:22

This was discussed on the 8 Mar 2012 conference. The idea has merit for the next version of POSIX, but the proposal needs to be enhanced before it can be acceptable. For ease of tracking, we are marking this bug closed; but we would be happy to consider a fully fleshed out proposal that addresses the following points:

The use of -i (lowercase), although common, is not a candidate for standardization because implementations differ on its behavior, and because the standard discourages optional option-arguments. However, it would be appropriate to standardize -I for the same purpose, if that option letter is not already in use, with a common semantic for all implementations to follow (similar to how xargs only standardized the use of -I and leaves -i for implementation extensions).

Also the proposal should consider whether multiple in-place edits are done in concatenated mode (the current standard when multiple input files are present) or in a per-file mode (where range expressions do not straddle input files, and where the line count resets at 1 for each file visited); an example of switching between modes is the GNU sed -s option.
(0001362)
eblake (manager)
2012-09-10 16:06

If we do standardize 'sed -I' as a means of doing in-place editing, it would probably be best to require the operation to fail up front if the file cannot be written by the current user. That is, the existing behavior of GNU 'sed -i' of doing the edits into a temporary file, and then rename()ing the temporary file onto the original file, has the unfortunate side effect of replacing a read-only file with new contents backed by a new inode. The GNU sed authors are unwilling to change this behavior of -i to avoid backward incompatibility, but do not mind refusing to overwrite a read-only file via a new option -I.

- Issue History
Date Modified Username Field Change
2011-12-21 14:53 dwheeler New Issue
2011-12-21 14:53 dwheeler Status New => Under Review
2011-12-21 14:53 dwheeler Assigned To => ajosey
2011-12-21 14:53 dwheeler Name => David A. Wheeler
2011-12-21 14:53 dwheeler Section => sed
2011-12-21 14:53 dwheeler Page Number => 3153-3161
2011-12-21 14:53 dwheeler Line Number => 104759-105140
2012-03-08 17:22 eblake Interp Status => ---
2012-03-08 17:22 eblake Note Added: 0001170
2012-03-08 17:22 eblake Status Under Review => Closed
2012-03-08 17:22 eblake Resolution Open => Rejected
2012-09-10 16:06 eblake Note Added: 0001362


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker