View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001927 | 1003.1(2008)/Issue 7 | Shell and Utilities | public | 2025-06-01 01:18 | 2025-07-04 02:11 |
Reporter | dwheeler | Assigned To | ajosey | ||
Priority | normal | Severity | Editorial | Type | Clarification Requested |
Status | Under Review | Resolution | Open | ||
Name | David A. Wheeler | ||||
Organization | The Linux Foundation | ||||
User Reference | Utilities | ||||
Section | Utilities | ||||
Page Number | NA | ||||
Line Number | NA | ||||
Interp Status | |||||
Final Accepted Text | |||||
Summary | 0001927: Add sponge utility | ||||
Description | When using only POSIX utilities, it's unnecessarily difficult to read a file, process it, and replace that file with those processed results. Novices who try to do the following will experience sadness: grep -E '^X' < foo > foo You can *sort-of* do this by redirecting the results to a temporary file, and then moving the temporary file onto the old one. However, this sequence fails to maintain the permissions of the original file. Doing that requires MORE steps. This is a common need that should have an easy solution. Real-world implementations of "sed" include a flag -i (-in-place) specifically to do this. This only works when using sed (obviously). Unfortunately, the flag has incompatible interfaces on different systems, as noted in https://austingroupbugs.net/view.php?id=530 . It'd be worth implementing a common interface for sed everywhere so this widely-used in-place functionality had a standard way to invoke it, but that would take a while to get implemented widely and again, it would *only* work with sed. The "sponge" utility has been used for years to do this in practice. It's simple to describe, simple to implement, widely available, and provides general functionality that works in any pipeline (not just for sed). I believe that standards should always strongly consider codifying existing practice. That's what I'm proposing here. | ||||
Desired Action | NAME sponge — soak up standard input and write it to a file SYNOPSIS sponge [-a] target DESCRIPTION sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file. sponge preserves the permissions of the output file if it already exists. When possible, sponge creates or updates the output file atomically by creating a separate temp file in TMPDIR and then renaming that temp file into place. This cannot be done if TMPDIR is not in the same filesystem. If the output file is a special file or symlink, the data will be written to it, non-atomically. If no file is specified, sponge outputs to stdout. OPTIONS -a Replace the file with a new file that contains the file's original content, with the standard input appended to it. This is done atomically when possible. OPERANDS The following operands shall be supported: target file A pathname of the file where the results will be placed. STDIN The standard input shall be used to read inputs. ENVIRONMENT VARIABLES The following environment variables shall affect the execution of cp: TMPDIR Temporary directory for use in storing results while sponge is reading data from standard input. ASYNCHRONOUS EVENTS Default. STDOUT Not used. STDERR Diagnostic messages such as an inability to duplicate permissions or an inability to modify the target file. OUTPUT FILES The output files may be of any type. EXTENDED DESCRIPTION None. EXIT STATUS The following exit values shall be returned: 0 Target file was successfully updated. >0 An error occurred. CONSEQUENCES OF ERRORS If sponge terminates before it reads the end of standard input, there may be a leftover temporary file in TMPDIR. The following sections are informative. APPLICATION USAGE EXAMPLES # Modify file foo so it only includes lines beginning with "X": grep -E '^X' < foo | sponge foo RATIONALE This utility makes it much easier to create pipelines that use some file as an input and write their results to that file, all while preserving the permissions of that file. It's possible to write to a temporary file, then mv it to the original name, but this common approach fails to preserve the original file's permissions. The goal of this utility is to make it easy to do a common action. The following example doesn't work as a novice might expect, because opening the file "foo" for writing eliminates the data in "foo" for reading: grep -E '^X' < foo > foo FUTURE DIRECTIONS A future option might be added to truncate and copy contents into the target file so that the target's inode does not change. A future option might be added to create a new file with the same permissions as an existing file (since this is the first step of implementing this utility). SEE ALSO cp, mv, sed XBD 4.7 File Access Permissions, 8. Environment Variables, 12.2 Utility Syntax Guidelines XSH open(), unlink() End of informative text. | ||||
Tags | No tags attached. |
|
Note: I'm sure that the proposed text above for the new utility could be improved. However, the first step is to decide whether or not to include the sponge utility at all, and the second step is to refine its definition to be appropriate for POSIX. I tried to do both, but I've focused on step 1. There's no point in refining the proposal if it will be rejected no matter what :-). However, I think "use a pipeline to update in place" is such a common use case that it's good to make it *easy*. There have been proposals for "in-place" commands that run a string passed in, but don't know of any that are fully implemented or widely used. Part of the problem is that they don't they require complex escape mechanisms for widely-used characters. As a result they require complex escaping to use that is hard to reason about, the mixing of data and command easily leads to vulnerabilities, and it's hard to create tools to do things like provide proper syntactic highlighting. In contrast, "sponge" has been used for a long time by many, it's easy to reason about, and it doesn't require complex reasoning . Instead, just write your pipeline as usual, and pipe to sponge as the final command. No separate mv command needed, and you know that the permissions are the same (if it's possible to do that). Many places highlight sponge as a useful utility. Here are some: https://www.reddit.com/r/linux/comments/18gifb8/usefull_cli_tools/ https://www.putorius.net/linux-sponge-soak-up-standard-input-and-write-to-a-file.html https://www.youtube.com/watch?v=9RXkZpmBDj0 (at time 15:10) https://rentes.github.io/unix/utilities/2015/07/27/moreutils-package/ |
|
Sponge is part of a collection of utilities that started out as mostly a bunch of useful perl scripts. Initially, sponge file was in essence: perl -0777 -pe 'open STDOUT, ">", file' Soaking up the input in memory, and writing it to "file" upon eof on stdin, so behaving more like POSIX ed, ex, sort -o. At some point, it was rewritten in C and changed its modus operandi radically. It behaved more like perl -i / gsed -i, which replaces the file with an edited copy, except that: - it uses a tempfile in ${TMPDIR-/tmp} instead of the same directory of the file which makes it much less likely for rename() to succeed as /tmp is often on a separate FS - Like perl, unlike gsed, it only tries to preserve basic Unix permissions, not extended ACLs, ownership or other attributes such as security context. - If the rename fails, it's back to ex/ed behaviour where it copies the temp file to the original file à la ed/ex while perl/gsed -i would fail (unlikely in practice there when the tempfile is in the same directory as the original) Both approaches (overwrite existing file vs replace it with a new one) each have their advantage / drawbacks: Overwriting means: - inode number, permissions, ownership and most metadata is preserved. - but is not atomic - can't be done for files that are in use executables for instance and can cause havoc for scripts that are currently being run. Replacing means: - it's atomic and generally avoids the problems above. - does not preserve inode number, cannot always preserve metadata (such as ownership or some security attributes that require privileges to set) - breaks hard links - breaks symlinks (replaces them with a regular file, losing metadata (the target of the symlink) in the process. Newer sponge's hybrid approach means it's still able to change a file in a directory the user doesn't have write access to as long as the user has write access to the file, but then that means non-atomic replace and different behaviour for links. IOW, sponge can end up doing very different things, not based on what the user decides but based on external factors, which IMO is not a clean design. If sponge was to be specified, I'd rather it was the original behaviour (which reflects the "sponge" name better) or the gsed -i one. I don't know of a system that has that utility installed by default and personally haven't felt I couldn't do without it. I'm not fully convinced it's worth specifying. It's nowhere near the bulletproof way to edit files in place that one may hope it to be. One problem is that in: cmd < file | sponge file file is overwritten and lost even if cmd fails or file can't be opened for reading. { cmd | sponge file; } < file Addresses one of the problems. I'd say process substitution would be a feature more worth specifying than a sponge utility. In particular zsh's =(...) form that uses a temp file can replace most usages of sponge (though comes with its own limitations) as in: cp =(cmd file) file To replace: cmd file | sponge file { rm -f file && cmd > file; } < file Can also already replace many usages of sponge (and to some extent is safer than < file cmd | sponge file). See also ksh93's: grep foo < file 1<>; file To overwrite the file in place (valid for things like "grep" that don't produce more output than they consume) and truncates in the end. |
|
The GNU Coreutils maintainers pointed out several ways of updating files, some of which are better than others (many of the poor solutions lack ACID semantics, for those familiar with the database terminology of Atomic, Consistent, Isolated, and Durable): https://www.pixelbeat.org/docs/unix_file_replacement.html https://lists.gnu.org/r/coreutils/2025-06/msg00018.html Pádraig Brady: "If we were to standardize a new tool, we should aim for ACID file replacement functionality." ... "The atomicity issue in method 6 was due to the use of truncate and write (cp and rm in my example). Perhaps sponge has been updated in the meantime, but reading the man page now indicates it uses rename() for atomicity if supported, and so is now more like method 7 discussed at the above. So as long as sponge adequately handles file attributes, it would be a useful addition yes." My take on the thread: The original proposal here does NOT guarantee ACID semantics. Although it is possible to write a sponge implementation that has atomic replacement semantics, and then ensure that sponsorship for adding the utility to POSIX requires atomic replacement, we then risk users running non-compliant versions that break compared to what the standard says. But if we standardize the functionality under a new name, we have the same situation as tar vs. pax where the standard utility may not ever gain popularity because people aren't aware of its addition. I interpret this feedback from coreutils as not necessarily being opposed to sponsoring the addition, but to at least be very careful about decisions on what requirements to place on the utility and whether to reuse the name sponge. |
|
The busybox maintainers chimed in on a thread starting here: https://lists.busybox.net/pipermail/busybox/2025-June/091540.html Among the quotes that stand out to me: Steffen Nurpmeso: "And what the *standard* would need, much more than that, and in my opinion, is flock(1) (after we gained timeout(1))." Harald van Dijk: "I am seeing a difference between the moreutils implementation and that removed FreeBSD implementation. The proposed wording, taken from moreutils states: > sponge preserves the permissions of the output file if it already > exists. It also states: > When possible, sponge creates or updates the output file atomically by > creating a separate temp file in TMPDIR and then renaming that temp > file into place. But, in order to achieve that second point, the moreutils implementation fails to achieve the first. It does not preserve the permissions of the output file, it preserves the permission bits. On systems that support ACLs, the two are not equivalent. The use of a temporary file to be renamed to replace the original also breaks hardlinks. Personally, I have never used this utility but if it were to be added to POSIX and/or busybox, I would prefer that FreeBSD version over the moreutils version." David Leonard: "My first thought is that sponge may benefit from a flock option. This would allow file users to avoid accessing the file in an intermediate state. For example sponge -l would obtain a read lock first, then collects input, then upgrade to a write lock, then replace the file's content. flock -s myfile -c cat myfile | sed -e s/foo/bar/g | sponge -l myfile " |
|
fwiw, http://lists.landley.net/pipermail/toybox-landley.net/2025-June/030739.html is the head of the corresponding toybox discussion thread. |
|
Obviously there are various ways to implement a "sponge-like" functionality. As noted above, the original sponge developers originally overwrote the file. However, years ago they changed to a hybrid "do the best you can" approach that prefers an ACID overwrite where it can manage it, and an overwrite when it can't. The fact that they *changed* the code to do this, even though it takes more code, suggests that this is the most widely expected default ("do the best you can"). "Doing what most people want by default" is a *good* design. The fact that there *are* trade-offs suggests that perhaps options could be added control what approach is used, in cases where users care. (E.g., only overwrite, only replace, and prefer-replace-then-overwrite). It seems like this is exactly what options are good for :-). Regarding "it uses a tempfile in ${TMPDIR-/tmp} instead of the same directory of the file which makes it much less likely for rename() to succeed as /tmp is often on a separate FS", you could set its TMPDIR to be the same directory as the destination file. If that's a common case, an option could be added to make that even easier to do. "Like perl, unlike gsed, it only tries to preserve basic Unix permissions, not extended ACLs, ownership or other attributes such as security context." It's sometimes not even *possible* to retain security context (even in an overwrite!). However, I can see the specification encouraging implementations to maximally preserve permissions, at *least* basic Unix permission bits. Then it's a quality-of-implementation issue. "[In] cmd < file | sponge file file is overwritten and lost even if cmd fails or file can't be opened for reading.... { cmd | sponge file; } < file" I don't see that as a serious problem. In-place replacement is typically *NEVER* used on precious irreplacement initial inputs. It's typically used as part of a larger sequence of pipelines with potentially large files. I'll agree that if that *is* a concern of a user, it's challenging to handle. I guess it could grow an option to only replace a file if it got at least 1 byte of input, and return an error code if it didn't. I don't have any other ideas if that's critical, but again, normally you only do in-place replacements when it's replaceable. Process substitution is different from sponge. Constructs like "cp =(cmd file) file" don't preserve ANY permissions, for example. I wouldn't oppose process substitution in a different proposal, but that seems different. David Leonard: "My first thought is that sponge may benefit from a flock option...." I'm quite amenable to that! I don't know if the developers of the most widely-used implementation of sponge would be willing to add a few new options, but I think it's worth asking. All of these additions (even working harder to copy the permission information) aren't really *that* much work on top of what's already there. Even with all those options, it's a narrowly-scoped utility that isn't hard to use or implement. |
Date Modified | Username | Field | Change |
---|---|---|---|
2025-06-01 01:18 | dwheeler | New Issue | |
2025-06-01 01:18 | dwheeler | Status | New => Under Review |
2025-06-01 01:18 | dwheeler | Assigned To | => ajosey |
2025-06-11 16:49 | dwheeler | Note Added: 0007198 | |
2025-06-22 08:43 | stephane | Note Added: 0007207 | |
2025-06-26 15:28 | eblake | Note Added: 0007211 | |
2025-06-26 15:41 | eblake | Note Edited: 0007211 | |
2025-06-26 15:45 | eblake | Note Added: 0007212 | |
2025-06-26 17:39 | enh | Note Added: 0007214 | |
2025-07-04 02:11 | dwheeler | Note Added: 0007219 |