Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001618 [Issue 8 drafts] Shell and Utilities Editorial Enhancement Request 2022-11-15 02:32 2024-06-11 09:12
Reporter illiliti View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Closed   Product Version Draft 2.1
Name Mark Lundblad
Organization
User Reference
Section pax
Page Number 3047
Line Number 102685-102698
Final Accepted Text Note: 0006063
Summary 0001618: pax: extend substitution flags
Description It is very useful to have ability to skip substitutions for specific file type. For example, one possible use case might be replicating GNU tar '--strip-components=1' option. We can use pax -rs '|[^/]*|.|' for that and it works fine except the case where it breaks. If archive contains symlink which points to e.g. ../foo.h, pax will transform it into foo.h which is undesirable. To fix this incorrect behavior, pax should have additional substitution flags to control to which file types substitution should be applied.

Since both 's'/'S'('S' is not documented in man page, but 's' is) characters are reserved by NetBSD pax to disable symlink substitutions, we use 'l'/'L' instead in order to not introduce potentially breaking change. See: https://nxr.netbsd.org/xref/src/bin/pax/pat_rep.c?r=1.31#199 [^]

See also: https://www.mail-archive.com/austin-group-l@opengroup.org/msg08130.html [^]
Desired Action On page 3047 line 102689, change:

-s /old/new/[gp]

to:

-s /old/new/[gphrlHRL]


On page 3047 line 102693-102698, change:

Any non-null character can be used as a delimiter ( '/' shown here). Multiple -s
expressions can be specified; the expressions shall be applied in the order
specified, terminating with the first successful substitution. The optional
trailing 'g' is as defined in the ed utility. The optional trailing 'p' shall
cause successful substitutions to be written to standard error. File or archive
member names that substitute to the empty string shall be ignored when reading
and writing archives.

to:

Any non-null character can be used as a delimiter ( '/' shown here). Multiple -s
expressions can be specified; the expressions shall be applied in the order
specified, terminating with the first successful substitution. The optional
trailing 'g' is as defined in the ed utility. The optional trailing 'p' shall
cause successful substitutions to be written to standard error. Optional trailing
'h', 'r', 'l' characters shall enable substitutions for hardlink targets, regular
filenames, symlink targets. Whereas 'H', 'R', 'L' characters shall disable them,
respectively. By default, if no hrlHRL flags were specified, substitutions shall
be applied to all types. File or archive member names that substitute to the
empty string shall be ignored when reading and writing archives.
Tags issue8
Attached Files

- Relationships

-  Notes
(0006055)
geoffclare (manager)
2022-11-17 12:36

Since NetBSD only documents the s flag, they would probably be okay with us specifying S as having the opposite behaviour. I think that would preferable than using l and L instead.

The r and R flags are invention and I don't see the point of them as there is no "target" to apply the substitution to (or not) for regular files.

If neither h nor H is used it should be unspecified which is the default. Likewise for s and S. (Otherwise only one of each pair would be needed.)
(0006056)
kre (reporter)
2022-11-17 15:22

I'm not sure that Note: 0006055 quite captures the intent of the -s
option, which (note: I am not particularly knowledgable about pax)
I believe is intended to alter the files that are in the archive - used,
I think, most commonly to remove/alter leading prefixes.

The 's' flag in NetBSD causes the substitution to apply only to the
archive member name, and not to its contents (the symlink target) in
the case that the archive member is a symlink.

The desired action seems confused to me, perhaps using "target" when
member name is meant (which would explain the r/R intent) - but then it
isn't clear just which substitutions are being allowed/suppressed, the
original archive member name, or the target when the member is a link or
symlink.

I am also not sure that all the variations make a lot of sense. There's
no point adding options simply with the intent of covering every possible
case, if there is no practical use for some of them. And that is what
it looks like here.

Generally the idea of -s seems to be to alter all the paths in the archive,
which generally will include hard link targets, as those are almost always
to other files in the archive. If the path of the target is altered when it
is extracted, then the link target needs to be altered as well. Being able
to suppress that seems useless. On the other hand, symlink targets are
sometimes intended to be to files that are not in the archive, or even
to "files" that are not intended to exist at all (the symlink is used for
readlink() only). Supressing altering the names of the files in the
archive, based upon file type, rather than matching the file name, seems
like a particularly useless endeavour - how would that be used?


I can ask the NetBSD community their opinion of all of this, but doing that doesn't make a lot of sense without a clear idea of just what is being
proposed.
(0006062)
illiliti (reporter)
2022-11-18 00:02

Thanks everyone. Initially, I tried to be semantically compatible with GNU tar and bsdtar since they provide similar functionality. But you're right, that's redundant. Here's the updated proposal which only reflect symlink flags:

On page 3047 line 102689, change:

-s /old/new/[gp]

to:

-s /old/new/[gpsS]


On page 3047 line 102693-102698, change:

Any non-null character can be used as a delimiter ( '/' shown here). Multiple -s
expressions can be specified; the expressions shall be applied in the order
specified, terminating with the first successful substitution. The optional
trailing 'g' is as defined in the ed utility. The optional trailing 'p' shall
cause successful substitutions to be written to standard error. File or archive
member names that substitute to the empty string shall be ignored when reading
and writing archives.

to:

Any non-null character can be used as a delimiter ( '/' shown here). Multiple -s
expressions can be specified; the expressions shall be applied in the order
specified, terminating with the first successful substitution. The optional
trailing 'g' is as defined in the ed utility. The optional trailing 'p' shall
cause successful substitutions to be written to standard error. The optional
trailing 's' shall disable substitutions for symlink destinations, whereas 'S'
shall enable them. The application shall ensure that optional trailing 's' and 'S'
are mutually exclusive for each expression. If neither 's' nor 'S' is defined, the
behavior whether substitutions for symlink destinations shall be applied or not is
unspecified. File or archive member names that substitute to the empty string shall
be ignored when reading and writing archives.
(0006063)
geoffclare (manager)
2022-11-18 15:34
edited on: 2022-11-21 16:35

The following is a modified version of Note: 0006062 with wording that I think is more in keeping with other text in the standard, and the addition of a clarification about empty symbolic links. (I also removed some unchanged text.)

On page 3047 line 102689, change:
-s /old/new/[gp]
to:
-s /old/new/[gpsS]

On page 3047 line 102693-102698, change:
The optional trailing 'p' shall cause successful substitutions to be written to standard error. File or archive member names that substitute to the empty string shall be ignored when reading and writing archives.
to:
The optional trailing 'p' shall cause successful substitutions to be written to standard error. The optional trailing 's' and 'S' control whether the substitutions are applied to symbolic link contents: 's' shall cause them not to be applied; 'S' shall cause them to be applied. If neither is present, it is unspecified which is the default. If both are present, the behavior is unspecified. File or archive member names that substitute to the empty string shall be ignored when reading and writing archives. Symbolic link contents that substitute to the empty string shall not be treated specially.


(0006064)
kre (reporter)
2022-11-18 17:15

Is there a need for the 'S' option and the "unspecified which is the default" ?

That is, is there any version of pax which (without the 's' flag or some
equivalent) fails to substitute in symbolic link contents? (Or on the
target of a hard link?)

The problem description suggests perhaps not - that is, the problem is
that when the archive member names are being altered, symbolic link
values (the name they reference) are being altered as well, and sometimes
should not be.

If that is true of all versions of pax (or almost all), absent some
additional flag to prevent that happening, then we can simply make
the substitutions apply to all link (hard and symbolic) contents, and
just add the 's' flag to suppress that for symbolic links.

That would be simpler, and better for applications I would have thought,
as there won't be a need to add 's' or 'S' flags on every -s arg, when
what exists now works fine.

Further, it would be better for NetBSD (which has the 's') - but where
all those flags ('p' 'g' and 's') are treated in a case independent way.
('p'=='P' 'g'=='G' 's'=='S' even though that's not documented. Making
that apply for p and g, but do the exact opposite for s would be annoying.

Note that in many (but perhaps not all) cases the problem described can
also be worked around by supplying a more specific pattern than just
-s '|[^/]*|.|' -- which might be needed in any case as a symlink whose
contents are essentially the same as what would be used as the target
for a hard link (the path name relative to the starting origin) should
probably get substituted, whereas one more like the ../foo.h example
should not.

A simple flag appended to the option cannot distinguish those cases.
(0006066)
illiliti (reporter)
2022-11-19 03:39

Yes, "unspecified which is the default" is needed. I know nothing about pax implementations in commercial unix systems, but I know at least one open-source implementation that does not do substitutions on symlinks. It is pax from schilytools.

And no, there's no such substitution pattern that can work in generic way and not break symlinks. Therefore 's' flag to ignore symlinks is needed.

Also since we discarded r/h/R/H flags, should we make it more clear that 'S' flag should exclude regular and all other file types except for symlinks? I find it useful since user could pass two(or more) -s expressions to pax; one with 'S' flag would work only with symlinks and one with 's' flag with other file types. Thoughts?
(0006067)
kre (reporter)
2022-11-19 18:03

OK, then I guess we need to stay with it as Note: 0006063 has it
specified, that or invent some other character instead of 'S' (which
is an invention anyway right, so there is no particular reason
except for symmetry with 's' for it to be that character).

However I am not sure that schilytools is, of itself, quite enough of
a blocker, if something different was desired.

But from Note: 0006066 I don't know what you mean by:
   should we make it more clear that 'S' flag should exclude regular
   and all other file types except for symlinks?

The -s expressions always apply to all archive member names. What file
type it is makes no difference. The issue is when the contents is also a
path name - which means just for links (hard and soft). Those are also
normally (without 's', and temporarily ignoring schilytools) also subject
to substitution. For hard links that is exactly what is wanted, since the
link contents is the path name of another file already added to the archive,
since that one's name would have been altered by a matching -s, the same
matching -s needs to be applied to the link's contents (the file to which to
which to link), or the link with either not be able to be made, or perhaps
might become a link to some other file entirely. So the 's'/'S' flags cannot
mean anything in that case either. That leaves only symlink contents, which
one might not want to apply the substitutions, so the 's'/'S' flags apply only
to that case - the content of a symbolic link archive member.

If you mean that 'S' should be defined to substitute only on symlink contents,
and not on other pathnames in the archive, then that looks to be pure
invention, and would then require another option (or lack of either the 's'
or 'S' flags) to mean substitute on both, choosing "no flag" for that would
be back to removing "if neither is given result is unspecified".

Lastly re:
    there's no such substitution pattern that can work in generic way
    and not break symlinks

I know, but I doubt there's a generic substitution pattern that can apply
to an arbitrary pax archive and expect to work in all cases (even when there
are no symlinks, or links of any kind) in the archive. The substitutions
need to be tailored to the needs of the archive. Even then it might be
hard to get matching patterns which deal with all cases (though the "match
once and stop" rule means that -s 's/thing/thing/' is an easy way to leave
some entries unaffacted, eg: for the symlink example, using -s '|^\.\./|../|'
before the -s '|[^/]*|.|' would avoid the issue shown there, but not
other possible problems. (Use '[.]' instead of '\.' if \ escaping isn't
expected to work.)
(0006069)
illiliti (reporter)
2022-11-20 01:25
edited on: 2022-11-21 11:57

I'm fine with Note: 0006063 too.

> If you mean that 'S' should be defined to substitute only on symlink contents,
and not on other pathnames in the archive

Yes, that's what I meant. And I agree with you why it isn't worth it.

> I know, but I doubt there's a generic substitution pattern that can apply
to an arbitrary pax archive and expect to work in all cases (even when there
are no symlinks, or links of any kind) in the archive.

Practice shows that -s '|[^/]*|.|' with 's' flag is enough to cover common cases(e.g. software release tarballs).

(0006070)
philip-guenther (reporter)
2022-11-20 02:37

I agree a new flag ('s') should be specified to disable substitution on symbolic link contents.

IMHO, if we have consensus on that, then I'm with kre in feeling that schilytools shouldn't be seen a blocker for specifying the default behavior: existing implementations that don't already have the new flag will be non-compliant until fixed, regardless of what their default behavior currently is. So, are there any implementations that already have an 's' flag *and* that default to that behavior? Alternatively, is there push back from an implementation that is going to add the new flag but doesn't want to change its default?

If not, then we can avoid specifying a new implementation-defined behavior, which would be a demerit to portability.
(0006071)
illiliti (reporter)
2022-11-20 06:06

Just checked. pax from heirloom project and original pax by Mark H. Colburn also don't apply substitutions to symlink targets. So we have at least three pax implementations that differ from BSD pax: schilytools pax, heirloom pax and original pax.

Could someone check what z/OS, AIX, HP-UX, and other commerical unix systems do? I'm pretty sure they don't mangle symlinks as well since they (seemingly)use original pax as a base but checking that wouldn't hurt anyway. Just in case.
(0006072)
geoffclare (manager)
2022-11-21 09:28

Solaris and HP-UX do not apply substitutions to symlink targets. It's probably a safe bet that the same is true for AIX and z/OS.
(0006074)
illiliti (reporter)
2022-11-21 12:31

Thanks for checking! Now we have a clear picture that we can't simply redefine implementation-defined behavior. Users probably rely on a behavior of their pax and redefining it might cause issues for them.

Given that, I think Note: 0006063 should be accepted as the best compromise. I know I know, users will have to define s/S flag for each expression, but personally I don't see any other option that would be compatible with current implementations.
(0006086)
stephane (reporter)
2022-11-28 20:15

I'd welcome a way to be able to tell pax what the -s applies to: path, symlink target, hardlink target as the --xform option of GNU tar does. See https://unix.stackexchange.com/questions/726489/untar-file-if-exists-in-archive/726494#726494 [^] as an example of why not being able to do so can be undesirable.

But the --xform of GNU tar differs from POSIX pax' in that POSIX pax has that requirement that once a -s matches, all other -s's are disregarded, so if we implement GNU tar's rhsRHS, we likely need to add something like a q/Q flag to tell whether or not to stop processing more "s"ubstitutions if this one succeeds so one can apply different substitutions to paths and link targets.

Currently, it's not completely clear what should happen if an archive member has its symlink or hardlink target substituted with the empty string, but its path not (either not substituted at all or the substitution results in a non-empty string).

Maybe with the "p" flag, the output should also make it clear what is being substituted. ATM, with the pax found on Debian (from MirBSD), I see no output there for substitutions made on symlink targets (even when that results in the symlink target becoming empty and the member being discarded as a result).
(0006087)
stephane (reporter)
2022-11-28 20:37

What it implies for pax -rw and pax -rwl should likely be reviewed as well. From vague memory for looking into it years ago, behaviours vary widely there.

Not sure if it's covered in the spec, but there's a whole range of vulnerabilities involving processing of malicious archives containing symlink/hardlink members followed by members with the same path that have affected most tar implementations in the past (I remember raising that one to Joerg about his "star" implementation of tar) that will get in the picture here. i.e. we'll likely need something like "if upon extraction, a member's path after "-s"ubstitutions have been applied is identical to the link target (after "-s"ubstitution) of an earlier member, behaviour unspecified" with a recommendation that the member should be discarded and an error message printed.
(0006088)
illiliti (reporter)
2022-11-28 23:10

stephane, please consider submitting a proposal of what exactly should be added, fixed, reviewed or improved. Thanks.
(0006089)
geoffclare (manager)
2022-11-29 09:13
edited on: 2022-11-29 09:22

When this was discussed in the Nov 21 teleconference, we agreed that the s and S flags were the only additions that are suitable for inclusion at this time because s is the only one that has been implemented in an existing pax utility (and S is just the opposite of s, and is needed because the default differs between implementations). All of the other suggestions would be invention.

If you want to propose further additions you will need to persuade an implementation to add them first.


- Issue History
Date Modified Username Field Change
2022-11-15 02:32 illiliti New Issue
2022-11-15 02:32 illiliti Name => Mark Lundblad
2022-11-15 02:32 illiliti Section => pax
2022-11-15 02:32 illiliti Page Number => 3047
2022-11-15 02:32 illiliti Line Number => 102685-102698
2022-11-17 12:36 geoffclare Note Added: 0006055
2022-11-17 15:22 kre Note Added: 0006056
2022-11-18 00:02 illiliti Note Added: 0006062
2022-11-18 15:34 geoffclare Note Added: 0006063
2022-11-18 17:15 kre Note Added: 0006064
2022-11-19 03:39 illiliti Note Added: 0006066
2022-11-19 18:03 kre Note Added: 0006067
2022-11-20 01:25 illiliti Note Added: 0006069
2022-11-20 02:37 philip-guenther Note Added: 0006070
2022-11-20 06:06 illiliti Note Added: 0006071
2022-11-21 09:28 geoffclare Note Added: 0006072
2022-11-21 11:57 illiliti Note Edited: 0006069
2022-11-21 12:31 illiliti Note Added: 0006074
2022-11-21 16:35 geoffclare Note Edited: 0006063
2022-11-21 16:37 geoffclare Final Accepted Text => Note: 0006063
2022-11-21 16:37 geoffclare Status New => Resolved
2022-11-21 16:37 geoffclare Resolution Open => Accepted As Marked
2022-11-21 16:37 geoffclare Tag Attached: issue8
2022-11-28 20:15 stephane Note Added: 0006086
2022-11-28 20:37 stephane Note Added: 0006087
2022-11-28 23:10 illiliti Note Added: 0006088
2022-11-29 09:13 geoffclare Note Added: 0006089
2022-11-29 09:15 geoffclare Note Edited: 0006089
2022-11-29 09:16 geoffclare Note Edited: 0006089
2022-11-29 09:18 geoffclare Note Edited: 0006089
2022-11-29 09:22 geoffclare Note Edited: 0006089
2023-01-17 11:33 geoffclare Status Resolved => Applied
2024-06-11 09:12 agadmin Status Applied => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker