View Issue Details

IDProjectCategoryView StatusLast Update
00019271003.1(2008)/Issue 7Shell and Utilitiespublic2025-07-04 02:11
Reporterdwheeler Assigned Toajosey  
PrioritynormalSeverityEditorialTypeClarification Requested
Status Under ReviewResolutionOpen 
NameDavid A. Wheeler
OrganizationThe Linux Foundation
User ReferenceUtilities
SectionUtilities
Page NumberNA
Line NumberNA
Interp Status
Final Accepted Text
Summary0001927: Add sponge utility
DescriptionWhen using only POSIX utilities, it's unnecessarily difficult to read a file, process it, and replace that file with those processed results.

Novices who try to do the following will experience sadness:

grep -E '^X' < foo > foo

You can *sort-of* do this by redirecting the results to a temporary file, and then moving the temporary file onto the old one. However, this sequence fails to maintain the permissions of the original file. Doing that requires MORE steps. This is a common need that should have an easy solution.

Real-world implementations of "sed" include a flag -i (-in-place) specifically to do this. This only works when using sed (obviously). Unfortunately, the flag has incompatible interfaces on different systems, as noted in https://austingroupbugs.net/view.php?id=530 . It'd be worth implementing a common interface for sed everywhere so this widely-used in-place functionality had a standard way to invoke it, but that would take a while to get implemented widely and again, it would *only* work with sed.

The "sponge" utility has been used for years to do this in practice. It's simple to describe, simple to implement, widely available, and provides general functionality that works in any pipeline (not just for sed). I believe that standards should always strongly consider codifying existing practice. That's what I'm proposing here.
Desired ActionNAME

sponge — soak up standard input and write it to a file

SYNOPSIS

sponge [-a] target

DESCRIPTION

sponge reads standard input and writes it out to the specified file.
Unlike a shell redirect, sponge soaks up all its input before opening
the output file. This allows constricting pipelines that read from
and write to the same file.

sponge preserves the permissions of the output file if it already
exists.

When possible, sponge creates or updates the output file atomically by
creating a separate temp file in TMPDIR and then
renaming that temp file into place. This cannot be done if TMPDIR is not
in the same filesystem.

If the output file is a special file or symlink, the data will be
written to it, non-atomically.

If no file is specified, sponge outputs to stdout.

OPTIONS

-a

Replace the file with a new file that contains the file's original
content, with the standard input appended to it. This is done
atomically when possible.

OPERANDS

The following operands shall be supported:

target file

A pathname of the file where the results will be placed.

STDIN

The standard input shall be used to read inputs.

ENVIRONMENT VARIABLES

The following environment variables shall affect the execution of cp:

TMPDIR

Temporary directory for use in storing results while sponge is
reading data from standard input.

ASYNCHRONOUS EVENTS

Default.

STDOUT

Not used.

STDERR

Diagnostic messages such as an inability to duplicate permissions or an inability to modify the target file.

OUTPUT FILES

The output files may be of any type.

EXTENDED DESCRIPTION

None.

EXIT STATUS

The following exit values shall be returned:

0
Target file was successfully updated.

>0
An error occurred.

CONSEQUENCES OF ERRORS

If sponge terminates before it reads the end of standard input,
there may be a leftover temporary file in TMPDIR.

The following sections are informative.

APPLICATION USAGE

EXAMPLES

# Modify file foo so it only includes lines beginning with "X":
grep -E '^X' < foo | sponge foo

RATIONALE

This utility makes it much easier to create pipelines that use
some file as an input and write their results to that file, all
while preserving the permissions of that file.

It's possible to write to a temporary file, then mv it to the original
name, but this common approach fails to preserve the original
file's permissions. The goal of this
utility is to make it easy to do a common action.

The following example doesn't work as a novice might expect, because
opening the file "foo" for writing eliminates the data in "foo" for reading:

grep -E '^X' < foo > foo

FUTURE DIRECTIONS

A future option might be added to truncate and copy contents into the
target file so that the target's inode does not change.

A future option might be added to create a new file with the same
permissions as an existing file (since this is the first step of implementing
this utility).

SEE ALSO

cp, mv, sed

XBD 4.7 File Access Permissions, 8. Environment Variables, 12.2 Utility Syntax Guidelines

XSH open(), unlink()

End of informative text.
TagsNo tags attached.

Activities

dwheeler

2025-06-11 16:49

reporter   bugnote:0007198

Note: I'm sure that the proposed text above for the new utility could be improved. However, the first step is to decide whether or not to include the sponge utility at all, and the second step is to refine its definition to be appropriate for POSIX. I tried to do both, but I've focused on step 1. There's no point in refining the proposal if it will be rejected no matter what :-). However, I think "use a pipeline to update in place" is such a common use case that it's good to make it *easy*.

There have been proposals for "in-place" commands that run a string passed in, but don't know of any that are fully implemented or widely used. Part of the problem is that they don't they require complex escape mechanisms for widely-used characters. As a result they require complex escaping to use that is hard to reason about, the mixing of data and command easily leads to vulnerabilities, and it's hard to create tools to do things like provide proper syntactic highlighting. In contrast, "sponge" has been used for a long time by many, it's easy to reason about, and it doesn't require complex reasoning . Instead, just write your pipeline as usual, and pipe to sponge as the final command. No separate mv command needed, and you know that the permissions are the same (if it's possible to do that).

Many places highlight sponge as a useful utility. Here are some:
https://www.reddit.com/r/linux/comments/18gifb8/usefull_cli_tools/
https://www.putorius.net/linux-sponge-soak-up-standard-input-and-write-to-a-file.html
https://www.youtube.com/watch?v=9RXkZpmBDj0 (at time 15:10)
https://rentes.github.io/unix/utilities/2015/07/27/moreutils-package/

stephane

2025-06-22 08:43

reporter   bugnote:0007207

Sponge is part of a collection of utilities that started out as
mostly a bunch of useful perl scripts.

Initially, sponge file was in essence:

perl -0777 -pe 'open STDOUT, ">", file'

Soaking up the input in memory, and writing it to "file" upon
eof on stdin, so behaving more like POSIX ed, ex, sort -o.

At some point, it was rewritten in C and changed its modus
operandi radically. It behaved more like perl -i / gsed -i,
which replaces the file with an edited copy, except that:

- it uses a tempfile in ${TMPDIR-/tmp} instead of the same
  directory of the file which makes it much less likely for
  rename() to succeed as /tmp is often on a separate FS
- Like perl, unlike gsed, it only tries to preserve basic Unix
  permissions, not extended ACLs, ownership or other attributes
  such as security context.
- If the rename fails, it's back to ex/ed behaviour where it
  copies the temp file to the original file à la ed/ex while
  perl/gsed -i would fail (unlikely in practice there when the
  tempfile is in the same directory as the original)

Both approaches (overwrite existing file vs replace it with a
new one) each have their advantage / drawbacks:

Overwriting means:
- inode number, permissions, ownership and most metadata is
preserved.
- but is not atomic
- can't be done for files that are in use executables for
  instance and can cause havoc for scripts that are currently
  being run.

Replacing means:
- it's atomic and generally avoids the problems above.
- does not preserve inode number, cannot always preserve
  metadata (such as ownership or some security attributes that
  require privileges to set)
- breaks hard links
- breaks symlinks (replaces them with a regular file, losing
  metadata (the target of the symlink) in the process.

Newer sponge's hybrid approach means it's still able to change a
file in a directory the user doesn't have write access to as
long as the user has write access to the file, but then that
means non-atomic replace and different behaviour for links.

IOW, sponge can end up doing very different things, not based on
what the user decides but based on external factors, which IMO
is not a clean design.

If sponge was to be specified, I'd rather it was the original
behaviour (which reflects the "sponge" name better) or the gsed
-i one.

I don't know of a system that has that utility installed by
default and personally haven't felt I couldn't do without it.
I'm not fully convinced it's worth specifying. It's nowhere
near the bulletproof way to edit files in place that one
may hope it to be.

One problem is that in:

cmd < file | sponge file

file is overwritten and lost even if cmd fails or file can't be
opened for reading.

{ cmd | sponge file; } < file

Addresses one of the problems.

I'd say process substitution would be a feature more worth
specifying than a sponge utility. In particular zsh's =(...)
form that uses a temp file can replace most usages of sponge
(though comes with its own limitations) as in:

cp =(cmd file) file

To replace:

cmd file | sponge file

{ rm -f file && cmd > file; } < file

Can also already replace many usages of sponge (and to some
extent is safer than < file cmd | sponge file).

See also ksh93's:

grep foo < file 1<>; file

To overwrite the file in place (valid for things like "grep"
that don't produce more output than they consume) and truncates
in the end.

eblake

2025-06-26 15:28

manager   bugnote:0007211

Last edited: 2025-06-26 15:41

The GNU Coreutils maintainers pointed out several ways of updating files, some of which are better than others (many of the poor solutions lack ACID semantics, for those familiar with the database terminology of Atomic, Consistent, Isolated, and Durable): https://www.pixelbeat.org/docs/unix_file_replacement.html
 https://lists.gnu.org/r/coreutils/2025-06/msg00018.html

Pádraig Brady: "If we were to standardize a new tool, we should aim for ACID file replacement functionality." ...
"The atomicity issue in method 6
was due to the use of truncate and write (cp and rm in my example).
Perhaps sponge has been updated in the meantime,
but reading the man page now indicates it uses
rename() for atomicity if supported,
and so is now more like method 7 discussed at the above.

So as long as sponge adequately handles file attributes,
it would be a useful addition yes."

My take on the thread: The original proposal here does NOT guarantee ACID semantics. Although it is possible to write a sponge implementation that has atomic replacement semantics, and then ensure that sponsorship for adding the utility to POSIX requires atomic replacement, we then risk users running non-compliant versions that break compared to what the standard says. But if we standardize the functionality under a new name, we have the same situation as tar vs. pax where the standard utility may not ever gain popularity because people aren't aware of its addition.

I interpret this feedback from coreutils as not necessarily being opposed to sponsoring the addition, but to at least be very careful about decisions on what requirements to place on the utility and whether to reuse the name sponge.

eblake

2025-06-26 15:45

manager   bugnote:0007212

The busybox maintainers chimed in on a thread starting here: https://lists.busybox.net/pipermail/busybox/2025-June/091540.html

Among the quotes that stand out to me:

Steffen Nurpmeso: "And what the *standard* would need, much more than that, and in my opinion, is flock(1) (after we gained timeout(1))."

Harald van Dijk:
"I am seeing a difference between the moreutils implementation and that
removed FreeBSD implementation. The proposed wording, taken from
moreutils states:

> sponge preserves the permissions of the output file if it already
> exists.

It also states:

> When possible, sponge creates or updates the output file atomically by
> creating a separate temp file in TMPDIR and then renaming that temp
> file into place.

But, in order to achieve that second point, the moreutils implementation
fails to achieve the first. It does not preserve the permissions of the
output file, it preserves the permission bits. On systems that support
ACLs, the two are not equivalent.

The use of a temporary file to be renamed to replace the original also
breaks hardlinks.

Personally, I have never used this utility but if it were to be added to
POSIX and/or busybox, I would prefer that FreeBSD version over the
moreutils version."

David Leonard:
"My first thought is that sponge may benefit from a flock option. This
would allow file users to avoid accessing the file in an intermediate
state. For example sponge -l would obtain a read lock first, then collects
input, then upgrade to a write lock, then replace the file's content.

     flock -s myfile -c cat myfile | sed -e s/foo/bar/g | sponge -l myfile
"

enh

2025-06-26 17:39

reporter   bugnote:0007214

fwiw, http://lists.landley.net/pipermail/toybox-landley.net/2025-June/030739.html is the head of the corresponding toybox discussion thread.

dwheeler

2025-07-04 02:11

reporter   bugnote:0007219

Obviously there are various ways to implement a "sponge-like" functionality.
As noted above, the original sponge developers originally overwrote the file.
However, years ago they changed to a hybrid "do the best you can" approach that
prefers an ACID overwrite where it can manage it, and an overwrite when it can't.
The fact that they *changed* the code to do this, even though it takes more code,
suggests that this is the most widely expected default ("do the best you can").
"Doing what most people want by default" is a *good* design.

The fact that there *are* trade-offs suggests that perhaps options could be
added control what approach is used, in cases where users care.
(E.g., only overwrite, only replace, and prefer-replace-then-overwrite).
It seems like this is exactly what options are good for :-).

Regarding "it uses a tempfile in ${TMPDIR-/tmp} instead of the same
  directory of the file which makes it much less likely for
  rename() to succeed as /tmp is often on a separate FS",
you could set its TMPDIR to be the same directory as the
destination file. If that's a common case, an option could be added
to make that even easier to do.

"Like perl, unlike gsed, it only tries to preserve basic Unix
 permissions, not extended ACLs, ownership or other attributes
 such as security context."

It's sometimes not even *possible* to retain security context
(even in an overwrite!). However, I can see the specification
encouraging implementations to maximally preserve permissions,
at *least* basic Unix permission bits. Then it's a quality-of-implementation issue.


"[In] cmd < file | sponge file
file is overwritten and lost even if cmd fails or file can't be
opened for reading....
{ cmd | sponge file; } < file"

I don't see that as a serious problem.
In-place replacement is typically *NEVER* used on
precious irreplacement initial inputs. It's typically used as part of
a larger sequence of pipelines with potentially large files.

I'll agree that if that *is* a concern of a user, it's challenging to handle.
I guess it could grow an option to only replace a file if it got at least 1 byte
of input, and return an error code if it didn't. I don't have any other ideas
if that's critical, but again, normally you only do in-place replacements
when it's replaceable.

Process substitution is different from sponge. Constructs like
"cp =(cmd file) file" don't preserve ANY permissions, for example.
I wouldn't oppose process substitution in a different proposal, but that
seems different.

David Leonard:
"My first thought is that sponge may benefit from a flock option...."

I'm quite amenable to that!

I don't know if the developers of the most widely-used implementation
of sponge would be willing to add a few new options, but I think it's
worth asking. All of these additions (even working harder to copy the
permission information) aren't really *that* much work on top of what's
already there. Even with all those options, it's a narrowly-scoped
utility that isn't hard to use or implement.

Issue History

Date Modified Username Field Change
2025-06-01 01:18 dwheeler New Issue
2025-06-01 01:18 dwheeler Status New => Under Review
2025-06-01 01:18 dwheeler Assigned To => ajosey
2025-06-11 16:49 dwheeler Note Added: 0007198
2025-06-22 08:43 stephane Note Added: 0007207
2025-06-26 15:28 eblake Note Added: 0007211
2025-06-26 15:41 eblake Note Edited: 0007211
2025-06-26 15:45 eblake Note Added: 0007212
2025-06-26 17:39 enh Note Added: 0007214
2025-07-04 02:11 dwheeler Note Added: 0007219