Anonymous | Login | 2024-12-04 05:55 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||||||||
0001872 | [1003.1(2024)/Issue8] Shell and Utilities | Editorial | Clarification Requested | 2024-11-07 21:34 | 2024-11-18 09:34 | ||||||||
Reporter | steffen | View Status | public | ||||||||||
Assigned To | |||||||||||||
Priority | normal | Resolution | Accepted As Marked | ||||||||||
Status | Resolved | ||||||||||||
Name | steffen | ||||||||||||
Organization | |||||||||||||
User Reference | |||||||||||||
Section | find | ||||||||||||
Page Number | 2946 | ||||||||||||
Line Number | 98444 ff. | ||||||||||||
Interp Status | --- | ||||||||||||
Final Accepted Text | Note: 0006956 | ||||||||||||
Summary | 0001872: find: clarify "less safe" statement | ||||||||||||
Description |
On the oss-security list there is a thread "shell wildcard expansion (un)safety". A snippet of a thread i am involved in: <code> |> To add that the POSIX core developers mention (APPLICATION USAGE): |> |> It should be noted that using find with -print0 to pipe input to |> xargs -r0 is less safe than using find with -exec because if |> find -print0 is terminated after it has written a partial |> pathname, the partial pathname may be processed as if it was |> a complete pathname. | |Shouldn't that behavior be treated as an xargs implementation bug or at |least shortcoming, and fixed as such? I hope POSIX doesn't require it? Now, i am not a POSIX core developer. POSIX.1-2024 was developed for over a decade (even almost one and a half) with many hundreds of issues fixed through discussions in regular meetings. A first thought is that the now really included (four decades too late!) sh(1)ell's "pipefail" option was agreed upon long after the text above appeared for the -print0/-r0 addition. If that is true the above text is anyway a correct statement less the partial pathname because the undesired "termination" will not be reflected in the exit status of the pipe. |In other words, if the input stream to "xargs -0" doesn't end in a NUL, |xargs must not process the last maybe-partial string. I've just checked Other than that i would agree. |GNU findutils xargs (not the latest version, though) and it does have |this problem - something we'd want to fix? From a glance "git show master:findutils/xargs.c::process0_stdin()" of busybox also does int c = getchar(); if (c == EOF) { if (p == s) goto ret; c = '\0'; } *p++ = c; if (c == '\0') { /* NUL or EOF detected */ ... So then the above paragraph even reflects code reality. </code> |
||||||||||||
Desired Action | "Do something about it" :-) | ||||||||||||
Tags | tc1-2024 | ||||||||||||
Attached Files | |||||||||||||
|
Notes | |
(0006951) steffen (reporter) 2024-11-07 21:38 |
P.S.: for xargs POSIX says <code> 123174 If the −0 option is specified, the application shall ensure that arguments in the standard input are 123175 delimited by null bytes. If multiple adjacent null bytes occur in the input, each null byte shall be 123176 treated as a delimiter. If the standard input is not empty and does not end with a null byte, xargs 123177 should ignore the trailing non-null bytes (as this can signal incomplete data) but may use them 123178 as the last argument passed to utility. </code> |
(0006952) steffen (reporter) 2024-11-08 01:40 edited on: 2024-11-13 18:19 |
P.S.: the thread continued and i am about to post (subset): <code> |So it looks like we can fix/enhance xargs in this way in both GNU |findutils and Busybox findutils and perhaps elsewhere. It would also be |interesting to know if any implementations exist that already "ignore |the trailing non-null bytes". It seems to me the xargs(1) of the BSDs have a common root with identical comments, variables (zflag == -0) etc, but slightly diverged code bases "thereafter"; .. not going to dig that stuff now, .. but running f-1400, n-1000 and o-0705 (i do not have OpenBSD 7.6) yet) all interpret the trailer it seems. #|f-1400:~$ printf 'a\0b\0c' | xargs -0 printf '[%s]\n' [a] [b] [c] On OpenIndiana "2024" i see #?0|oi-2024:steffen$ printf 'a\0b\0c' | xargs -0 printf '[%s]\n' [a] [b] [c] #?0|oi-2024:steffen$ command -v printf xargs printf /bin/xargs (xargs also via /usr/gnu/bin/xargs as you say) |Another reason for this safer behavior is that it's also more consistent |with respect to empty strings. If "trailing non-null bytes" are passed |"as the last argument", then this only occurs if the last argument is |non-empty. Yet xargs otherwise does support empty arguments, except for |the last non-null-terminated one. We'd be removing this inconsistency. Seems to require changing any xargs(1) i have around. Which makes the standard *very much* requiring changes for the future ... So i take back the "unfortunate". </code> Maybe i open an issue for xargs tomorrow, so that the future directions are pointed to in normal text flow? Is that do- or desireable? |
(0006953) stephane (reporter) 2024-11-08 08:31 |
For the record, I was the one bringing that issue up to POSIX in Note: 0006093 which resulted in that text, I also mention it at https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice/321757#321757 [^] (Interrupted output section) with a real life example. See also https://unix.stackexchange.com/questions/730873/find-print0-xargs-0-cmd-vs-find-exec-cmd/730874#730874 [^] for some historical background on -exec {} + vs xargs -0. I do vaguely remember mentioning it to the GNU findutils maintainers, but I may have imagined it and in any case don't remember the outcome. Now I think it's too late for POSIX to mandate implementations discard non-delimited records as that behaviour is being relied upon. A new option would have to be introduced. Could be a -D pendant to -d where -D '\0' requires the delimiter when -d '\0' doesn't (-d currently badly missing from POSIX xargs for it's ability to deal with lines with -d '\n'). Or a -F extra flag to require Full records. Generally, GNU utilities do process non-delimited records. For instance all their text utilities allow non-delimited lines on input (and some add the delimiter back on output, some don't). Same applies for those that take a -z or -0 to deal with NUL-delimited records instead of lines. In POSIX, behaviour is unspecified for text utilities if the input doesn't end in newline with the exception of awk, so in any case, regardless of the behaviour of xargs in this instance, if one does something like: find ... -print0 | awk -v RS='\0' -v ORS='\0' '{print; print $0".back"}' | xargs -r0n2 cp -p -- (not that POSIX allows NUL field separator for awk yet), the fact that find was interrupted if it was is lost when that reaches xargs. The NUL-delimited mode of GNU text utilities is often used to process text files as a whole as a poor man's "slurp mode". As in: sed -z 's/.../.../' file.txt To have substitution possibly spanning several lines. xargs -r0a file.txt printf %b To expand echo-style escape sequences in the contents of file.txt. Mandating the delimiter would break those. In any case, that's not something for POSIX to address. For now, it allows implementations not to discard non-delimited records and warns about the safety implication of doing so. It's up to implementations to decide what they want to do now: ignore the problem which in practice rarely happens and falls in the category of the rare pathological cases, like memory/fd exhaustion or random bit flip on solar flares where all bets are off anyway or address it either by breaking backward compatibility or add extra API. If different implementors agree on that new API, then that can be specified in POSIX. |
(0006954) geoffclare (manager) 2024-11-12 09:48 |
> Maybe i open an issue for xargs tomorrow, so that the future directions are pointed to in normal text flow? Is that do- or desireable? It is too early to make a decision on that future direction. The time to do it is during work on Issue 9 - the earliest being when the editors start working on draft 1, so that the change goes in draft 1, but it may be better to wait and submit a request against draft 1 (or 2). At that point we would be able to assess whether enough implementations have acted on the recommendation. |
(0006956) geoffclare (manager) 2024-11-14 17:03 edited on: 2024-11-14 17:08 |
This was discussed when those words were added to Issue 8. There is a danger in allowing partial input records to be treated as complete. As an example, if find is used to generate a list of directories to be recursively removed and a partial pathname is accepted by xargs, it could result in the accidental removal of a much larger subtree in a filesystem than was intended. The current standard allows this behavior due to existing practice, but we hope to be able to disallow processing of partial input in a future revision of the standard. At page 3600 line 123176 section xargs, change: If the standard input is not empty and does not end with a null byte, xargs should ignore the trailing non-null bytes (as this can signal incomplete data) but may use them as the last argument passed to utility.to: If the standard input is not empty and does not end with a null byte, xargs should treat the trailing non-null bytes (which can signal incomplete data) as an error but may use them as the last argument passed to utility. Add to RATIONALE after page 3605 line 123412: When the -0 option is not specified, if the standard input is not empty and does not end with a <newline>, then the input is not a text file, and therefore the behavior is undefined. However, it is recommended that xargs diagnoses the trailing non-<newline> characters (for consistency with the recommendation for -0 and trailing non-null bytes). On page 3606 line 123415 section xargs, change: xargs ignores the trailing non-null bytes.to: xargs treats the trailing non-null bytes as an error. |
(0006957) stephane (reporter) 2024-11-16 08:54 |
re: Note: 0006956 With the "may ignore" removed replaced with an error and "may use them as the last argument" added, it's not totally clear what should happen if it's not used as last argument or what the "error" entails. Should xargs exit with a 1-125 exit status as soon as it comes to the realisation that the last input record is undelimited? Or abort the last run only? Or do the last run with or without that undelimited record, but report the error before or after and exit with a 1-125 exit status after? What if the utility does a exit(255) not on the last run but after xargs came to the realisation that the last record was undelimited? Should it still output an error message? You'd think that one should not happen as if EOF is reached before a run that is not the last (for which the list of arguments is full), you'd hope xargs would not treat it as EOF and try reading again after in case some data has been added in the interval (possibly even by the previous run), but in practice that's not what happens with GNU xargs at least: $ printf '%s\0' {1..9999} > a $ xargs -n1000 -r0 sh -c '[ "$1" -eq 8001 ] && printf "end\0" >> a; eval "echo \"\$1\" .. \"\${$#}\""' sh < a 1 1000 1001 .. 2000 2001 .. 3000 3001 .. 4000 4001 .. 5000 5001 .. 6000 6001 .. 7000 7001 .. 8000 8001 .. 9000 9001 .. 9999 That (delimited) "end" added by the second-last run was not included in the last run. It is included with busybox xargs Fun fact: echo 1 | xargs -n1 sh -c 'echo "$(( $1 + 1 ))" >> /dev/stdin; echo "$1"' sh On Linux where /dev/stdin when fd 0 is a pipe acts like a named pipe, does give you an infinite number sequence with busybox or ast-open xargs but not GNU nor toybox. |
(0006958) steffen (reporter) 2024-11-16 19:42 |
(I had posted the solution of #0006956 to the oss-security thread.) |
(0006959) philip-guenther (reporter) 2024-11-17 03:03 |
Two comments: first, if a trailing NUL is required, it's a terminator, not a delimiter, so I would suggest using 't' instead of 'd' when considering options. Second, and more seriously, does this group think this sequence of events suggests any adjustments to the processes *inside* the group that would improve the outcomes? "Hey, let's do this thing [NUL delimited filenames] that solves a problem" "Sure" "btw, there's a bug in this idea" <standardization continues> "Hey, this thing [which has been standardized] has a problem!" <works starts on new thing, despite still requiring everyone to do the thing with the bug> This seems...not optimal? It feels like there should have been a frank discussion about the issue when it was reported and a decision made *then* whether the standard would say "nope, ya'll are wrong to think this is a problem and here's why" and it would be handled now by pointing at what was discussed/agreed then OR it would have been agreed "yeah, this is a problem that will need to be fixed" and the addition of -print0 / xargs -0 / etc would have been undone/withdrawn until implementations had settled on how to solve this problem and a revised proposal was made. Personally, I think NUL as terminator instead of delimiter makes sense. It feels like a missed opportunity for broader review to catch a subtle glitch in an interface. |
(0006960) stephane (reporter) 2024-11-17 08:17 edited on: 2024-11-17 08:21 |
Re: Note: 0006959 AFAIK, in POSIX terminology, "delimiter" is closer to "terminator" than "separator". IFS is the Internal Field Delimiter which with IFS=: splits "a::b:" into "a", "" and "b" (and not an extra empty element), newline is the line delimiter, a -d was added to "read" (not an invention, that comes from ksh93 and supported by several other shells; and read returns failure if the record is not delimited) in the same resolution than introduced xargs -r -0 (again options already supported for decades by several implementations). Note that I'm as much a POSIX developer as you are and not part of the "group" in charge of the specification. I'm just giving my opinion as a long time shell user and contributor to news://comp.unix.shell, [^] now https://unix.stackexchange.com. [^] I was mentioning xargs -d as it is already implemented in GNU xargs, and -D as a potential avenue implementations (not POSIX at this point) could fix that issue without breaking backward compatibility if they deemed it was an issue worth fixing. IMO, it's a bad idea to change the behaviour of xargs -0 now as it's been working the way it does now (accept and process a trailing non-delimited record) for over 30 years and been relied upon as already mentioned. It's also futile to think that would fix the issue. To fully fix that "issue", we'd need all tools that process NL or NUL delimited records to discard/reject non-delimited records. For instance, it's common to use tr '\0\n' '\n\0' to convert NUL-delimited records into newline delimited ones so that tools that work on lines (and can cope with NULs in those lines) can be used to process it. In: find /src -name '*.d' -type d -print0 | tr '\0\n' '\n\0' | rev | cut -d/ -f1 | rev | sort -u | tr '\0\n' '\n\0' | (cd /dst && xargs -r0 mkdir -p --) No point in xargs ignoring/discarding a non-delimited record if none of rev, cut, sort discard/reject non-delimited lines. And the whole point of xargs as opposed to find -exec ... {} + is that you can insert those processing tools in the middle. |
(0006961) geoffclare (manager) 2024-11-18 09:34 |
> it's not totally clear what should happen if it's not used as last argument or what the "error" entails. The "should ... but may ..." wording allows two behaviours with one recommended over the other. So if it's not used as last argument, the error is required. What happens when errors are detected is described in CONSEQUENCES OF ERRORS. Since there is no explicit mention of this particular error in that section on the xargs page, the default description in 1.4 Utility Description Defaults applies. This allows all of the behaviours you describe, making this a quality-of-implementation issue. If/when we make the error mandatory, that would be the time to consider tightening the allowed behaviour for this error, based on implementation behaviour(s) at the time. |
Issue History | |||
Date Modified | Username | Field | Change |
2024-11-07 21:34 | steffen | New Issue | |
2024-11-07 21:34 | steffen | Name | => steffen |
2024-11-07 21:34 | steffen | Section | => find |
2024-11-07 21:34 | steffen | Page Number | => 2946 |
2024-11-07 21:34 | steffen | Line Number | => 98444 ff. |
2024-11-07 21:38 | steffen | Note Added: 0006951 | |
2024-11-08 01:40 | steffen | Note Added: 0006952 | |
2024-11-08 01:42 | steffen | Note Edited: 0006952 | |
2024-11-08 08:31 | stephane | Note Added: 0006953 | |
2024-11-12 09:48 | geoffclare | Note Added: 0006954 | |
2024-11-13 18:19 | steffen | Note Edited: 0006952 | |
2024-11-14 17:03 | geoffclare | Note Added: 0006956 | |
2024-11-14 17:04 | geoffclare | Interp Status | => --- |
2024-11-14 17:04 | geoffclare | Final Accepted Text | => Note: 0006956 |
2024-11-14 17:04 | geoffclare | Status | New => Resolved |
2024-11-14 17:04 | geoffclare | Resolution | Open => Accepted As Marked |
2024-11-14 17:05 | geoffclare | Tag Attached: tc1-2024 | |
2024-11-14 17:08 | geoffclare | Note Edited: 0006956 | |
2024-11-16 08:54 | stephane | Note Added: 0006957 | |
2024-11-16 19:42 | steffen | Note Added: 0006958 | |
2024-11-17 03:03 | philip-guenther | Note Added: 0006959 | |
2024-11-17 08:17 | stephane | Note Added: 0006960 | |
2024-11-17 08:21 | stephane | Note Edited: 0006960 | |
2024-11-18 09:34 | geoffclare | Note Added: 0006961 | |
2024-11-26 02:52 | calestyo | Issue Monitored: calestyo |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |