Austin Group Defect Tracker

Aardvark Mark IV

Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001784 [Issue 8 drafts] Shell and Utilities Objection Error 2023-10-22 06:14 2023-11-13 20:16
Reporter kre View Status public  
Assigned To
Priority normal Resolution Open  
Status New   Product Version Draft 3
Name Robert Elz
User Reference
Section XCU 3 / getopts
Page Number 2955 - 2959
Line Number 98803 - 98966
Final Accepted Text
Summary 0001784: getopts specification needs fixing (multiple issues)
Description First:

Line 98807
        and the index of the next argument to be processed in the
        shell variable OPTIND.

Much the same is in the ENVIRONMENT VARIABLES section, lines 98888-9
        OPTIND This variable shall be used by the getopts utility as
                the index of the next argument to be processed.

Which is the "next argument to be processed" - the argument after the
one that supplied the option written into the name arg, or the
argument that will be processed by the next call to getopts ?

It makes a difference when the argument in question has two (or more)
options in it, and anything but the last of them is being processed now.

Eg: (given an optstring with "xy" in it (no colons))
        script -xy -d
if getopts is used in script to process those options, then
where name is set to 'x', this same arg will be processed again
next time to return 'y', but the "next argument" is the one
containing -d in many people's interpretation (and different shells
interpret it each way, in some OPTIND is 1 for 'x' and '2' for 'y',
in others it is 2 for both 'x' and 'y'). yash is different, it's
(intermediate) OPTIND settings contain the index of the arg being
processed, a colon, and the index of the option char within that arg
(so would be 1:2 and 1:3 in this case).

The standard is unclear what is intended here, it would be better to
simply say that the value of OPTIND at this point is unspecified, as
in practice there isn't anything much a script can do with it anyway,
even if we did pick one of the plausible interpretations. Pretending
that a simple integer is useful to the implementation (which the
definition at line 98888 does) is not helpful to anyway - to keep
track of whet it is up to, the implementation either needs to use
some other mechanism (ie: not use OPTIND for anything except when
the application does OPTIND=1) or it needs (as yash does) to encode
more than just an integer into OPTIND.

Beyond that, is the term "index of" defined anywhere? (It isn't in XBD 3)
If it is, there should be an xref, otherwise there should be a definition
given here. What is its format? For the usage when getopts returns
an exit status of 1, it is clearly intended to contain an integer, as
the EXAMPLES section, shows at like 98951

        shift $(($OPTIND - 1))

which wouldn't work if OPTIND were not an integer. But is that
also actually required of the OPTIND returned upon other invocations?

If the intent here was to rely upon the standard English use of
the term, then that fails, as there really isn't one of those, to
be useful an index has to be relative to some base, is the
first option index 0 or index 1 (or something else) ?

On line 98836 it is stated:
        The shell variables OPTIND and OPTARG shall be local to the
        caller of getopts

WTF? What is that supposed to mean, that is, what does it mean
to be local to something, and what exactly is the "caller of getopts" ??

This is particularly absurd, as in the immediately following paragraph
(lines 98840-1) it says:

        The shell variable specified by the name operand, OPTIND, and OPTARG
        shall affect the current shell execution environment;

which makes sense, and is what implementations actually do. If that
shell environment is "the caller" then what does it mean to be "local",
that it isn't allowed to be exported? That it doesn't survive the
termination of that shell environment? If this last one, then why does
it need stating, what variables do survive the termination of the shell
environment? Or was something else fanciful intended there ?

Next, at lines 98862-3

   the value in OPTARG shall be stripped of the option character and the '-'.

So, if we have an optstring of "abc:d" and the invocation of
getopts is

        getopts abc:d var -abcfoo -d

then when 'var' is set to 'c' OPTARG is supposed to be "abfoo" ? (that
is we remove the 'c' and the '-' as instructed).

No, that can't be right, the option-argument is (at least implied by)
XBD 12.1 (which isn't referenced anywhere in XCU 3/getopts - directly
or indirectly, only XBD 12.2) the string which follows the option when
it is included in the same argument as the option, so the 'ab' should not
be included, just "foo" - but the '-' does not follow the option there
either, so why is the standard saying that the '-' must be removed?

Why isn't just saying that OPTARG is the option-argument (properly
defined by an xref) and leaving it at that?

Incidentally, XBD 3.244 is not very helpful here, all it says is an
Option-Argument is:
    A parameter that follows certain options. In some cases an option-argument
    is included within the same argument string as the option--in most cases
    it is the next argument.

The "follows" is suggestive, but "included within the same argument string"
leaves more possibilities open. And why does that say "certain options" ?
If it means options that require one, those aren't "certain". Just
"some options" would be better there.

In the RATIONALE, at lines: 98964-6 :

    Although a leading <plus-sign> in optstring is required to have no
    effect on the behavior of getopt(), this standard intentionally allows
    implementations of the getopts utility to use a leading
    <plus-sign> as an extension that alters behavior.

First, I am not sure just where it intentionally does that, the RATIONALE
isn't a normative part of the standard, so that paragraph can't be it,
did I miss something? But ignoring that...

Implementations are to be allowed to support a leading '+' in optstring.
But how does that effect (at line 98821, and I think other places, like
line 98895, there might be more):

        If the first character of optstring is a <colon> ...

In XSH/getopt it is clear that the optional '+' precedes the optional ':'
in optstring, but if that is followed here, how can that ':' be the
first character of optstring? Must the application use only one or
the other, or is getopts doing the reverse of getopt() and requiring the
order be ":+..." (and if so, where does it say so) or should the wording here
be fixed so it works like the getopt() function ?

And while we're here. the first mention of options (line 98803)
should contain an xref to XBD 3.243, the first mention of option-arguments
(also on line 98803) should have an xref to XBD 3.243 and the first mention of
operand (I think on line 98831) should have an xref to XBD 3.241.
These xrefs then each refer to XBD 12.1 which shows better than the
definitions how those things are formed (particularly in bullet point 1) - but
referencing the definitions is better I think (XBD 12.1 does not refer back
to XBD 3).

Desired Action Fix it all...

Maybe some wording, for some of it, may follow sometime later, in a note.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
kre (reporter)
2023-10-28 05:19

I have just realised there is yet another problem with the spec of getopts
beyond those above...

On page 2955 (lines 98843...) - right at the bottom of that page (which is
the first page of the getopts spec) it says:

   Any other attempt to invoke getopts multiple times in a single shell
   execution environment with parameters (positional parameters or arg
   operands) that are not the same in all invocations, or with an OPTIND
   value modified to be a value other than 1, produces unspecified results.

The problem is that final "or with an OPTIND value modified..." as the
spec actually requires that getopts modify OPTIND each time it is invoked,
and some of those modifications will be to values other than 1 (and the
application cannot know, in advance, when that will happen). In effect
that sentence (the "produces unspecified results") means that every
invocation of getopts, other than the first after OPTIND has been
initialised to 1, is potentially unspecified.

I suspect what this sentence meant to say was "or with an OPTIND value
modified by the application to be a value other than 1," - but that isn't
what it currently says.
kre (reporter)
2023-10-28 05:36
edited on: 2023-10-28 06:34

This note deleted ... it just wondered about some relationships with
other issues that were inadvertently applied (as noted in Note: 0006558)

Since that has been fixed, there is no need for a note asking about it.
Nor was there any need for any apologies - mistakes happen, it just
surprised me at first - when I added Note: 0006557 (to the correct 0001785)
I had worked out what probably happened.

Thanks for fixing it so quickly.

Don Cragun (manager)
2023-10-28 06:25

Re Note: 0006556:
I apologize; I should know better than to try to update bug reports this late at night. I intended to note the relationships between 0001785 (instead of this bug) and 0001535, 0001393, and 0000351. I will correct the relationships now.
shware_systems (reporter)
2023-11-13 18:09

I think originally the getopts utility interface assumed a user would specify voluntarily all options be preceded by a <dash>, or <plus>, as separate arguments, e.g. "-a -b" and not "-ab", and having multiple options was more a syntax line documentation convenience only. There may have been thoughts too on making it the shells responsibility to split apart multiple options to this format before processing lines of a script so getopt wouldn't need to be bothered, but it doesn't look like any shells ever implemented this.

Then OPTIND as documented would specify which argument that had a leading option <dash> was next to be referenced unambiguously. Without such munging it is probably better to make OPTIND an opaque variable of unspecified format, not numeric, that only getopt may reliably reference.
kre (reporter)
2023-11-13 20:16

Re: Note: 0006568

The first paragraph cannot possibly be correct, unix programs have been
using multiple flag options after a single '-' since about when (perhaps
exactly when, 'twas before my time) they were invented. "ls -al" is a
simple example that has been with us forever. There is no way that anyone,
anywhere, ever, would have even considered requiring that to be "ls -a -l".
Further, it is getopts' role to parse the option args (and was getopt's
before that, as much as it was able) expecting the shell to parse them
(which it would need to do to distinguish between a: and al as the optstring,
which varies how ls -al would need to be treated) and then invoke getopts
to parse them again would be absurd.

The second paragraph (2nd sentence in particular, we can't do the first, as
there is no existing standard to document) I almost agree with - except that
we write "unspecified value" not "opaque" (the meaning is almost the same),
and that we must require OPTIND to contain a string representing an integer
after getopts has returned "no more" (ie: exit status 1), as we must be able
to do "shift $(( OPTIND - 1 ))"

In general, the only time a script should reference OPTIND is after getopts
has indicated the options are done (and with that in mind, it might be worth
adding a note in the application usage section advising against a "break" out
of a while getopts ... ; do ; done loop, the loop should be allowed to
terminate naturally) and it can be set to 1 (OPTIND=1) before the getopts
loop starts to reinit things.

- Issue History
Date Modified Username Field Change
2023-10-22 06:14 kre New Issue
2023-10-22 06:14 kre Name => Robert Elz
2023-10-22 06:14 kre Section => XCU 3 / getopts
2023-10-22 06:14 kre Page Number => 2955 - 2959
2023-10-22 06:14 kre Line Number => 98803 - 98966
2023-10-22 06:40 kre Tag Attached: issue8
2023-10-28 05:08 Don Cragun Relationship added related to 0001535
2023-10-28 05:10 Don Cragun Relationship added related to 0001393
2023-10-28 05:10 Don Cragun Relationship added parent of 0000351
2023-10-28 05:19 kre Note Added: 0006555
2023-10-28 05:36 kre Note Added: 0006556
2023-10-28 06:25 Don Cragun Note Added: 0006558
2023-10-28 06:27 Don Cragun Relationship deleted related to 0001535
2023-10-28 06:28 Don Cragun Relationship deleted related to 0001393
2023-10-28 06:34 Don Cragun Relationship deleted parent of 0000351
2023-10-28 06:34 kre Note Edited: 0006556
2023-11-13 18:09 shware_systems Note Added: 0006568
2023-11-13 20:16 kre Note Added: 0006569
2023-11-14 09:44 geoffclare Tag Detached: issue8
2023-11-15 22:23 salewski Issue Monitored: salewski

Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker