Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001801 [Issue 8 drafts] Shell and Utilities Editorial Enhancement Request 2024-01-25 21:39 2024-02-21 16:49
Reporter mohd_akram View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Resolved   Product Version Draft 4
Name Mohamed Akram
Organization
User Reference
Section xargs
Page Number 3600-3601
Line Number 123162, 123252
Final Accepted Text See Note: 0006657.
Summary 0001801: xargs: add -P option
Description The `-P maxprocs` option is widely supported in xargs implementations to allow running commands in parallel. It is available in GNU, FreeBSD, NetBSD, OpenBSD, macOS, and possibly other xargs implementations.
Desired Action Change line 123162 from:

[-s size] [utility [argument...]]

to:

[-s size] [-P maxprocs] [utility [argument...]]

Add at line 123252:

-P maxprocs Parallel mode: run at most maxprocs invocations of utility at once. If the value of maxprocs is non-positive, the behavior is unspecified.
Tags issue9
Attached Files

- Relationships
related to 0001811Resolved xargs: add -P option to FUTURE DIRECTIONS section 

-  Notes
(0006657)
Don Cragun (manager)
2024-02-15 16:52
edited on: 2024-02-15 16:59

Change line 123162 from:
[-s size] [utility [argument...]]
to:
[-P maxprocs] [-s size] [utility [argument...]]


Add at line 123252:
-P maxprocs Parallel mode: execute at most maxprocs invocations of utility concurrently. If the value of maxprocs is non-positive, the behavior is unspecified.


Remove the FUTURE DIRECTIONS entry added by 0001811.

(0006660)
kre (reporter)
2024-02-16 11:31

Please, I thought the bad old days of just copying man page descriptions
were beyond us - things really need to be better than this.

There's not nearly enough description of what is supposed to be happening
with this option to expect an implementation to do what is expected (whatever
that is).

For example, how are the args read from stdin supposed to be apportioned
amongst the parallel invocations of the utility? One might expect that
perhaps the idea is to read enough args until the command line limit is
reached, and then start reading more for the next parallel invocation.
But that means that if there happen to be not all that many args, then
perhaps no parallelism would be achieved, when if less args had been
allocated to each invocation, more parallel instances could have been
invoked.

Or, we could allocate the args to intended utility invocations round-robin,
first arg read to the first invocation, 2nd to the second, ... until there
have been maxprocs args read (assuming there are that many on stdin)
after which we add a second arg to the first invocation, etc. This avoids
the issue mentioned for the previous style, but does mean that xargs doesn't
start invoking anything for longer - if utility is fast enough, it might be
able to process its args faster than xargs is able to read from stdin and
prepare the args for the subsequent process - resulting in an overall slowdown
from running in parallel, rather than a speedup.

Beyond that, "execute ... concurrently" might be read as meaning that all the
(up to maxprocs) invvocations should be made to run at the same time. To
achieve that, the implementation would need to start them all at the same time,
otherwise the first started might finish before the last is ready to commence,
so they wouldn't be running concurrently. I doubt that's what is intended, but
perhaps.

And even more, how are these parallel invocations expected to interact with
the CONSEQUENCES OF ERRORS section of the spec (page 3603 in D4 of I8).
That is, if an invocation of utility does exit 255, or is killed by a
signal, how is xargs supposed to terminate without processing any further
input, when it has already processed more, and started more invocations of
the utility? And what is to be done with those other invocations still
running when one exits in one of those ways - are they to be killed,
orphaned, or is xargs to wait for them to finish before terminating?
In that latter case, should more diagnostic messages be written if more
of the invocations also exit 255, or via a signal?

There may be more issues I haven't yet realised.

This needs to go back to the drawing board and start all over again, with
a much more comprehensive and standards worthy wording added.

But there's no hurry, as this is to be an Issue 9 change, there is perhaps
a decade or two before an actual resolution is needed.
(0006670)
gabravier (reporter)
2024-02-21 00:20

> For example, how are the args read from stdin supposed to be apportioned amongst the parallel invocations of the utility? One might expect that perhaps the idea is to read enough args until the command line limit is reached, and then start reading more for the next parallel invocation. But that means that if there happen to be not all that many args, then perhaps no parallelism would be achieved, when if less args had been allocated to each invocation, more parallel instances could have been invoked.

All of GNU findutils, FreeBSD, OpenBSD, Illumos, BusyBox and Toybox read enough arguments until the command line limit is reached, i.e. `seq 1 100 | xargs -P10 -n70` prints in 2 processes (the first using 70 arguments and the second using 30 arguments) and thus results in two lines of output, rather than being printed in 10 lines as would be expected if split between 10 processes.

> Beyond that, "execute ... concurrently" might be read as meaning that all the (up to maxprocs) invvocations should be made to run at the same time. To achieve that, the implementation would need to start them all at the same time, otherwise the first started might finish before the last is ready to commence, so they wouldn't be running concurrently. I doubt that's what is intended, but perhaps.

I don't exactly get what you mean by "run at the same time". Are you implying this would require an xargs implementation to use some kind of implementation-specific system call that would execve an array of processes all at the same time ? (I am not aware of any system call that would accomplish this). I don't really understand what this objection is about, exactly, or how any such thing would be observable to any program. None of the implementations am I looking at do anything particularly unique in this regard, from what I can see.

> And even more, how are these parallel invocations expected to interact with the CONSEQUENCES OF ERRORS section of the spec (page 3603 in D4 of I8). That is, if an invocation of utility does exit 255, or is killed by a signal, how is xargs supposed to terminate without processing any further input, when it has already processed more, and started more invocations of the utility?

For xargs to "undo processing" of further input if it has already processed it would require xargs to implement time-travel. I have not yet been able to find an implementation with this capability, so I would recommend against trying to impose such a requirement, although of course xargs must stop processing any input it has not processed yet.

> And what is to be done with those other invocations still running when one exits in one of those ways - are they to be killed, orphaned, or is xargs to wait for them to finish before terminating? In that latter case, should more diagnostic messages be written if more of the invocations also exit 255, or via a signal?

Pretty much every implementation differs in this regard:
- GNU findutils writes a diagnostic and waits for other invocations to finish before terminating. If another invocation also exits with 255, it writes another diagnostic and then proceeds to immediately invoke undefined behavior by calling exit within an atexit handler. On my machine that results in it exiting after printing that second diagnostic (leaving the remaining invocations orphaned)
- FreeBSD writes a diagnostic and waits for other invocations to finish before terminating, and prints more diagnostics if other invocations also exit with 255
- OpenBSD and illumos write a diagnostic and immediately exit, leaving the other invocations orphaned
- BusyBox has the same behavior as GNU findutils has on my machine (prints diagnostic, waits, prints another diagnostic if another invocation also exits with 255 and exits then) but manages to do so without invoking undefined behavior, at least
- Toybox writes a diagnostic and then proceeds to continue execution as though everything is fine (it does this without -P too so that seems clearly just non-conforming...)

I do agree the wording is not enough as-is, though. Personally I would recommend rewriting parts of the description to describe xargs as waiting until less than N invocations are currently running (where N is the value of -P, with a default of 1) before it starts another process, although I have no idea what exactly should be specified for cases where xargs would normally exit without processing any further input, given practically every single implementation behaves differently, except that I would exclude the behaviors from the GNU, Busybox and Toybox implementations, which seem clearly broken to me.
(0006672)
gabravier (reporter)
2024-02-21 16:49

I'll add that with regards to the behavior when a process exits with exit status 255 or is killed by a signal, toybox has fixed their non-conforming behavior and now behaves like FreeBSD.

- Issue History
Date Modified Username Field Change
2024-01-25 21:39 mohd_akram New Issue
2024-01-25 21:39 mohd_akram Name => Mohamed Akram
2024-01-25 21:39 mohd_akram Section => xargs
2024-01-25 21:39 mohd_akram Page Number => 3600-3601
2024-01-25 21:39 mohd_akram Line Number => 123162, 123252
2024-02-15 16:47 geoffclare Relationship added related to 0001811
2024-02-15 16:52 Don Cragun Note Added: 0006657
2024-02-15 16:53 Don Cragun Status New => Resolved
2024-02-15 16:53 Don Cragun Resolution Open => Accepted As Marked
2024-02-15 16:55 Don Cragun Note Edited: 0006657
2024-02-15 16:55 Don Cragun Tag Attached: issue9
2024-02-15 16:59 Don Cragun Note Edited: 0006657
2024-02-15 17:01 Don Cragun Final Accepted Text => See Note: 0006657.
2024-02-16 11:31 kre Note Added: 0006660
2024-02-21 00:20 gabravier Note Added: 0006670
2024-02-21 16:49 gabravier Note Added: 0006672


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker