0001927: Add sponge utility - Austin Group Issue Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0001927	1003.1(2008)/Issue 7	Shell and Utilities	public	2025-06-01 01:18	2025-09-15 01:33

Reporter	dwheeler	Assigned To	ajosey
Priority	normal	Severity	Editorial	Type	Clarification Requested
Status	Under Review	Resolution	Open

Name	David A. Wheeler
Organization	The Linux Foundation
User Reference	Utilities
Section	Utilities
Page Number	NA
Line Number	NA
Interp Status
Final Accepted Text


Summary	0001927: Add sponge utility
Description	When using only POSIX utilities, it's unnecessarily difficult to read a file, process it, and replace that file with those processed results. Novices who try to do the following will experience sadness: grep -E '^X' < foo > foo You can sort-of do this by redirecting the results to a temporary file, and then moving the temporary file onto the old one. However, this sequence fails to maintain the permissions of the original file. Doing that requires MORE steps. This is a common need that should have an easy solution. Real-world implementations of "sed" include a flag -i (-in-place) specifically to do this. This only works when using sed (obviously). Unfortunately, the flag has incompatible interfaces on different systems, as noted in https://austingroupbugs.net/view.php?id=530 . It'd be worth implementing a common interface for sed everywhere so this widely-used in-place functionality had a standard way to invoke it, but that would take a while to get implemented widely and again, it would only work with sed. The "sponge" utility has been used for years to do this in practice. It's simple to describe, simple to implement, widely available, and provides general functionality that works in any pipeline (not just for sed). I believe that standards should always strongly consider codifying existing practice. That's what I'm proposing here.
Desired Action	NAME sponge — soak up standard input and write it to a file SYNOPSIS sponge [-a] target DESCRIPTION sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file. sponge preserves the permissions of the output file if it already exists. When possible, sponge creates or updates the output file atomically by creating a separate temp file in TMPDIR and then renaming that temp file into place. This cannot be done if TMPDIR is not in the same filesystem. If the output file is a special file or symlink, the data will be written to it, non-atomically. If no file is specified, sponge outputs to stdout. OPTIONS -a Replace the file with a new file that contains the file's original content, with the standard input appended to it. This is done atomically when possible. OPERANDS The following operands shall be supported: target file A pathname of the file where the results will be placed. STDIN The standard input shall be used to read inputs. ENVIRONMENT VARIABLES The following environment variables shall affect the execution of cp: TMPDIR Temporary directory for use in storing results while sponge is reading data from standard input. ASYNCHRONOUS EVENTS Default. STDOUT Not used. STDERR Diagnostic messages such as an inability to duplicate permissions or an inability to modify the target file. OUTPUT FILES The output files may be of any type. EXTENDED DESCRIPTION None. EXIT STATUS The following exit values shall be returned: 0 Target file was successfully updated. >0 An error occurred. CONSEQUENCES OF ERRORS If sponge terminates before it reads the end of standard input, there may be a leftover temporary file in TMPDIR. The following sections are informative. APPLICATION USAGE EXAMPLES # Modify file foo so it only includes lines beginning with "X": grep -E '^X' < foo \| sponge foo RATIONALE This utility makes it much easier to create pipelines that use some file as an input and write their results to that file, all while preserving the permissions of that file. It's possible to write to a temporary file, then mv it to the original name, but this common approach fails to preserve the original file's permissions. The goal of this utility is to make it easy to do a common action. The following example doesn't work as a novice might expect, because opening the file "foo" for writing eliminates the data in "foo" for reading: grep -E '^X' < foo > foo FUTURE DIRECTIONS A future option might be added to truncate and copy contents into the target file so that the target's inode does not change. A future option might be added to create a new file with the same permissions as an existing file (since this is the first step of implementing this utility). SEE ALSO cp, mv, sed XBD 4.7 File Access Permissions, 8. Environment Variables, 12.2 Utility Syntax Guidelines XSH open(), unlink() End of informative text.
Tags	No tags attached.

dwheeler 2025-06-11 16:49 reporter bugnote:0007198	Note: I'm sure that the proposed text above for the new utility could be improved. However, the first step is to decide whether or not to include the sponge utility at all, and the second step is to refine its definition to be appropriate for POSIX. I tried to do both, but I've focused on step 1. There's no point in refining the proposal if it will be rejected no matter what :-). However, I think "use a pipeline to update in place" is such a common use case that it's good to make it easy. There have been proposals for "in-place" commands that run a string passed in, but don't know of any that are fully implemented or widely used. Part of the problem is that they don't they require complex escape mechanisms for widely-used characters. As a result they require complex escaping to use that is hard to reason about, the mixing of data and command easily leads to vulnerabilities, and it's hard to create tools to do things like provide proper syntactic highlighting. In contrast, "sponge" has been used for a long time by many, it's easy to reason about, and it doesn't require complex reasoning . Instead, just write your pipeline as usual, and pipe to sponge as the final command. No separate mv command needed, and you know that the permissions are the same (if it's possible to do that). Many places highlight sponge as a useful utility. Here are some: https://www.reddit.com/r/linux/comments/18gifb8/usefull_cli_tools/ https://www.putorius.net/linux-sponge-soak-up-standard-input-and-write-to-a-file.html https://www.youtube.com/watch?v=9RXkZpmBDj0 (at time 15:10) https://rentes.github.io/unix/utilities/2015/07/27/moreutils-package/

stephane 2025-06-22 08:43 reporter bugnote:0007207	Sponge is part of a collection of utilities that started out as mostly a bunch of useful perl scripts. Initially, sponge file was in essence: perl -0777 -pe 'open STDOUT, ">", file' Soaking up the input in memory, and writing it to "file" upon eof on stdin, so behaving more like POSIX ed, ex, sort -o. At some point, it was rewritten in C and changed its modus operandi radically. It behaved more like perl -i / gsed -i, which replaces the file with an edited copy, except that: - it uses a tempfile in ${TMPDIR-/tmp} instead of the same directory of the file which makes it much less likely for rename() to succeed as /tmp is often on a separate FS - Like perl, unlike gsed, it only tries to preserve basic Unix permissions, not extended ACLs, ownership or other attributes such as security context. - If the rename fails, it's back to ex/ed behaviour where it copies the temp file to the original file à la ed/ex while perl/gsed -i would fail (unlikely in practice there when the tempfile is in the same directory as the original) Both approaches (overwrite existing file vs replace it with a new one) each have their advantage / drawbacks: Overwriting means: - inode number, permissions, ownership and most metadata is preserved. - but is not atomic - can't be done for files that are in use executables for instance and can cause havoc for scripts that are currently being run. Replacing means: - it's atomic and generally avoids the problems above. - does not preserve inode number, cannot always preserve metadata (such as ownership or some security attributes that require privileges to set) - breaks hard links - breaks symlinks (replaces them with a regular file, losing metadata (the target of the symlink) in the process. Newer sponge's hybrid approach means it's still able to change a file in a directory the user doesn't have write access to as long as the user has write access to the file, but then that means non-atomic replace and different behaviour for links. IOW, sponge can end up doing very different things, not based on what the user decides but based on external factors, which IMO is not a clean design. If sponge was to be specified, I'd rather it was the original behaviour (which reflects the "sponge" name better) or the gsed -i one. I don't know of a system that has that utility installed by default and personally haven't felt I couldn't do without it. I'm not fully convinced it's worth specifying. It's nowhere near the bulletproof way to edit files in place that one may hope it to be. One problem is that in: cmd < file \| sponge file file is overwritten and lost even if cmd fails or file can't be opened for reading. { cmd \| sponge file; } < file Addresses one of the problems. I'd say process substitution would be a feature more worth specifying than a sponge utility. In particular zsh's =(...) form that uses a temp file can replace most usages of sponge (though comes with its own limitations) as in: cp =(cmd file) file To replace: cmd file \| sponge file { rm -f file && cmd > file; } < file Can also already replace many usages of sponge (and to some extent is safer than < file cmd \| sponge file). See also ksh93's: grep foo < file 1<>; file To overwrite the file in place (valid for things like "grep" that don't produce more output than they consume) and truncates in the end.

eblake 2025-06-26 15:28 manager bugnote:0007211 Last edited: 2025-06-26 15:41	The GNU Coreutils maintainers pointed out several ways of updating files, some of which are better than others (many of the poor solutions lack ACID semantics, for those familiar with the database terminology of Atomic, Consistent, Isolated, and Durable): https://www.pixelbeat.org/docs/unix_file_replacement.html https://lists.gnu.org/r/coreutils/2025-06/msg00018.html Pádraig Brady: "If we were to standardize a new tool, we should aim for ACID file replacement functionality." ... "The atomicity issue in method 6 was due to the use of truncate and write (cp and rm in my example). Perhaps sponge has been updated in the meantime, but reading the man page now indicates it uses rename() for atomicity if supported, and so is now more like method 7 discussed at the above. So as long as sponge adequately handles file attributes, it would be a useful addition yes." My take on the thread: The original proposal here does NOT guarantee ACID semantics. Although it is possible to write a sponge implementation that has atomic replacement semantics, and then ensure that sponsorship for adding the utility to POSIX requires atomic replacement, we then risk users running non-compliant versions that break compared to what the standard says. But if we standardize the functionality under a new name, we have the same situation as tar vs. pax where the standard utility may not ever gain popularity because people aren't aware of its addition. I interpret this feedback from coreutils as not necessarily being opposed to sponsoring the addition, but to at least be very careful about decisions on what requirements to place on the utility and whether to reuse the name sponge.

eblake 2025-06-26 15:45 manager bugnote:0007212	The busybox maintainers chimed in on a thread starting here: https://lists.busybox.net/pipermail/busybox/2025-June/091540.html Among the quotes that stand out to me: Steffen Nurpmeso: "And what the standard would need, much more than that, and in my opinion, is flock(1) (after we gained timeout(1))." Harald van Dijk: "I am seeing a difference between the moreutils implementation and that removed FreeBSD implementation. The proposed wording, taken from moreutils states: > sponge preserves the permissions of the output file if it already > exists. It also states: > When possible, sponge creates or updates the output file atomically by > creating a separate temp file in TMPDIR and then renaming that temp > file into place. But, in order to achieve that second point, the moreutils implementation fails to achieve the first. It does not preserve the permissions of the output file, it preserves the permission bits. On systems that support ACLs, the two are not equivalent. The use of a temporary file to be renamed to replace the original also breaks hardlinks. Personally, I have never used this utility but if it were to be added to POSIX and/or busybox, I would prefer that FreeBSD version over the moreutils version." David Leonard: "My first thought is that sponge may benefit from a flock option. This would allow file users to avoid accessing the file in an intermediate state. For example sponge -l would obtain a read lock first, then collects input, then upgrade to a write lock, then replace the file's content. flock -s myfile -c cat myfile \| sed -e s/foo/bar/g \| sponge -l myfile "

enh 2025-06-26 17:39 reporter bugnote:0007214	fwiw, http://lists.landley.net/pipermail/toybox-landley.net/2025-June/030739.html is the head of the corresponding toybox discussion thread.

dwheeler 2025-07-04 02:11 reporter bugnote:0007219	Obviously there are various ways to implement a "sponge-like" functionality. As noted above, the original sponge developers originally overwrote the file. However, years ago they changed to a hybrid "do the best you can" approach that prefers an ACID overwrite where it can manage it, and an overwrite when it can't. The fact that they changed the code to do this, even though it takes more code, suggests that this is the most widely expected default ("do the best you can"). "Doing what most people want by default" is a good design. The fact that there are trade-offs suggests that perhaps options could be added control what approach is used, in cases where users care. (E.g., only overwrite, only replace, and prefer-replace-then-overwrite). It seems like this is exactly what options are good for :-). Regarding "it uses a tempfile in ${TMPDIR-/tmp} instead of the same directory of the file which makes it much less likely for rename() to succeed as /tmp is often on a separate FS", you could set its TMPDIR to be the same directory as the destination file. If that's a common case, an option could be added to make that even easier to do. "Like perl, unlike gsed, it only tries to preserve basic Unix permissions, not extended ACLs, ownership or other attributes such as security context." It's sometimes not even possible to retain security context (even in an overwrite!). However, I can see the specification encouraging implementations to maximally preserve permissions, at least basic Unix permission bits. Then it's a quality-of-implementation issue. "[In] cmd < file \| sponge file file is overwritten and lost even if cmd fails or file can't be opened for reading.... { cmd \| sponge file; } < file" I don't see that as a serious problem. In-place replacement is typically NEVER used on precious irreplacement initial inputs. It's typically used as part of a larger sequence of pipelines with potentially large files. I'll agree that if that is a concern of a user, it's challenging to handle. I guess it could grow an option to only replace a file if it got at least 1 byte of input, and return an error code if it didn't. I don't have any other ideas if that's critical, but again, normally you only do in-place replacements when it's replaceable. Process substitution is different from sponge. Constructs like "cp =(cmd file) file" don't preserve ANY permissions, for example. I wouldn't oppose process substitution in a different proposal, but that seems different. David Leonard: "My first thought is that sponge may benefit from a flock option...." I'm quite amenable to that! I don't know if the developers of the most widely-used implementation of sponge would be willing to add a few new options, but I think it's worth asking. All of these additions (even working harder to copy the permission information) aren't really that much work on top of what's already there. Even with all those options, it's a narrowly-scoped utility that isn't hard to use or implement.

dwheeler 2025-09-11 14:45 reporter bugnote:0007252	If necessary, this functionality could be defined with a different name like "soak" or "absorb". Sometimes that's worked (I think printf(1) is an example), sometimes that's been less successful (pax), but obviously that is an option.

hvd 2025-09-11 17:16 reporter bugnote:0007255	Re 0001927:0007219: > It's typically used as part of a larger sequence of pipelines with potentially large files. But for that case, it doesn't really matter whether sponge replaces the file's contents, or the file, and both implementations would work fine. An option would be to add the utility with the existing name, but leaving that detail unspecified. > The fact that they changed the code to do this, even though it takes more code, suggests that this is the most widely expected default ("do the best you can"). That isn't why they changed the code to do this. The reason is mentioned in the commit message: "make sponge use a temp file if the input is large". The previous implementation kept the full input in memory, which could fail. I don't think the utility is common enough to make comments about the most widely expected default either way, personally.

Date Modified	Username	Field	Change
2025-06-01 01:18	dwheeler	New Issue
2025-06-01 01:18	dwheeler	Status	New => Under Review
2025-06-01 01:18	dwheeler	Assigned To	=> ajosey
2025-06-11 16:49	dwheeler	Note Added: 0007198
2025-06-22 08:43	stephane	Note Added: 0007207
2025-06-26 15:28	eblake	Note Added: 0007211
2025-06-26 15:41	eblake	Note Edited: 0007211
2025-06-26 15:45	eblake	Note Added: 0007212
2025-06-26 17:39	enh	Note Added: 0007214
2025-07-04 02:11	dwheeler	Note Added: 0007219
2025-09-11 14:45	dwheeler	Note Added: 0007252
2025-09-11 17:16	hvd	Note Added: 0007255

View Issue Details

Activities

Issue History