Anonymous | Login | 2023-12-02 09:36 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001784 | [Issue 8 drafts] Shell and Utilities | Objection | Error | 2023-10-22 06:14 | 2023-11-13 20:16 | |||||||
Reporter | kre | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | Product Version | Draft 3 | |||||||||
Name | Robert Elz | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | XCU 3 / getopts | |||||||||||
Page Number | 2955 - 2959 | |||||||||||
Line Number | 98803 - 98966 | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001784: getopts specification needs fixing (multiple issues) | |||||||||||
Description |
First: Line 98807 and the index of the next argument to be processed in the shell variable OPTIND. Much the same is in the ENVIRONMENT VARIABLES section, lines 98888-9 say: OPTIND This variable shall be used by the getopts utility as the index of the next argument to be processed. Which is the "next argument to be processed" - the argument after the one that supplied the option written into the name arg, or the argument that will be processed by the next call to getopts ? It makes a difference when the argument in question has two (or more) options in it, and anything but the last of them is being processed now. Eg: (given an optstring with "xy" in it (no colons)) script -xy -d if getopts is used in script to process those options, then where name is set to 'x', this same arg will be processed again next time to return 'y', but the "next argument" is the one containing -d in many people's interpretation (and different shells interpret it each way, in some OPTIND is 1 for 'x' and '2' for 'y', in others it is 2 for both 'x' and 'y'). yash is different, it's (intermediate) OPTIND settings contain the index of the arg being processed, a colon, and the index of the option char within that arg (so would be 1:2 and 1:3 in this case). The standard is unclear what is intended here, it would be better to simply say that the value of OPTIND at this point is unspecified, as in practice there isn't anything much a script can do with it anyway, even if we did pick one of the plausible interpretations. Pretending that a simple integer is useful to the implementation (which the definition at line 98888 does) is not helpful to anyway - to keep track of whet it is up to, the implementation either needs to use some other mechanism (ie: not use OPTIND for anything except when the application does OPTIND=1) or it needs (as yash does) to encode more than just an integer into OPTIND. Beyond that, is the term "index of" defined anywhere? (It isn't in XBD 3) If it is, there should be an xref, otherwise there should be a definition given here. What is its format? For the usage when getopts returns an exit status of 1, it is clearly intended to contain an integer, as the EXAMPLES section, shows at like 98951 shift $(($OPTIND - 1)) which wouldn't work if OPTIND were not an integer. But is that also actually required of the OPTIND returned upon other invocations? If the intent here was to rely upon the standard English use of the term, then that fails, as there really isn't one of those, to be useful an index has to be relative to some base, is the first option index 0 or index 1 (or something else) ? On line 98836 it is stated: The shell variables OPTIND and OPTARG shall be local to the caller of getopts WTF? What is that supposed to mean, that is, what does it mean to be local to something, and what exactly is the "caller of getopts" ?? Really! This is particularly absurd, as in the immediately following paragraph (lines 98840-1) it says: The shell variable specified by the name operand, OPTIND, and OPTARG shall affect the current shell execution environment; which makes sense, and is what implementations actually do. If that shell environment is "the caller" then what does it mean to be "local", that it isn't allowed to be exported? That it doesn't survive the termination of that shell environment? If this last one, then why does it need stating, what variables do survive the termination of the shell environment? Or was something else fanciful intended there ? Next, at lines 98862-3 the value in OPTARG shall be stripped of the option character and the '-'. So, if we have an optstring of "abc:d" and the invocation of getopts is getopts abc:d var -abcfoo -d then when 'var' is set to 'c' OPTARG is supposed to be "abfoo" ? (that is we remove the 'c' and the '-' as instructed). No, that can't be right, the option-argument is (at least implied by) XBD 12.1 (which isn't referenced anywhere in XCU 3/getopts - directly or indirectly, only XBD 12.2) the string which follows the option when it is included in the same argument as the option, so the 'ab' should not be included, just "foo" - but the '-' does not follow the option there either, so why is the standard saying that the '-' must be removed? Why isn't just saying that OPTARG is the option-argument (properly defined by an xref) and leaving it at that? Incidentally, XBD 3.244 is not very helpful here, all it says is an Option-Argument is: A parameter that follows certain options. In some cases an option-argument is included within the same argument string as the option--in most cases it is the next argument. The "follows" is suggestive, but "included within the same argument string" leaves more possibilities open. And why does that say "certain options" ? If it means options that require one, those aren't "certain". Just "some options" would be better there. In the RATIONALE, at lines: 98964-6 : Although a leading <plus-sign> in optstring is required to have no effect on the behavior of getopt(), this standard intentionally allows implementations of the getopts utility to use a leading <plus-sign> as an extension that alters behavior. First, I am not sure just where it intentionally does that, the RATIONALE isn't a normative part of the standard, so that paragraph can't be it, did I miss something? But ignoring that... Implementations are to be allowed to support a leading '+' in optstring. But how does that effect (at line 98821, and I think other places, like line 98895, there might be more): If the first character of optstring is a <colon> ... In XSH/getopt it is clear that the optional '+' precedes the optional ':' in optstring, but if that is followed here, how can that ':' be the first character of optstring? Must the application use only one or the other, or is getopts doing the reverse of getopt() and requiring the order be ":+..." (and if so, where does it say so) or should the wording here be fixed so it works like the getopt() function ? And while we're here. the first mention of options (line 98803) should contain an xref to XBD 3.243, the first mention of option-arguments (also on line 98803) should have an xref to XBD 3.243 and the first mention of operand (I think on line 98831) should have an xref to XBD 3.241. These xrefs then each refer to XBD 12.1 which shows better than the definitions how those things are formed (particularly in bullet point 1) - but referencing the definitions is better I think (XBD 12.1 does not refer back to XBD 3). |
|||||||||||
Desired Action |
Fix it all... Maybe some wording, for some of it, may follow sometime later, in a note. |
|||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | ||||||||||||
|
![]() |
|
(0006555) kre (reporter) 2023-10-28 05:19 |
I have just realised there is yet another problem with the spec of getopts beyond those above... On page 2955 (lines 98843...) - right at the bottom of that page (which is the first page of the getopts spec) it says: Any other attempt to invoke getopts multiple times in a single shell execution environment with parameters (positional parameters or arg operands) that are not the same in all invocations, or with an OPTIND value modified to be a value other than 1, produces unspecified results. The problem is that final "or with an OPTIND value modified..." as the spec actually requires that getopts modify OPTIND each time it is invoked, and some of those modifications will be to values other than 1 (and the application cannot know, in advance, when that will happen). In effect that sentence (the "produces unspecified results") means that every invocation of getopts, other than the first after OPTIND has been initialised to 1, is potentially unspecified. I suspect what this sentence meant to say was "or with an OPTIND value modified by the application to be a value other than 1," - but that isn't what it currently says. |
(0006556) kre (reporter) 2023-10-28 05:36 edited on: 2023-10-28 06:34 |
This note deleted ... it just wondered about some relationships with other issues that were inadvertently applied (as noted in Note: 0006558) Since that has been fixed, there is no need for a note asking about it. Nor was there any need for any apologies - mistakes happen, it just surprised me at first - when I added Note: 0006557 (to the correct 0001785) I had worked out what probably happened. Thanks for fixing it so quickly. |
(0006558) Don Cragun (manager) 2023-10-28 06:25 |
Re Note: 0006556: I apologize; I should know better than to try to update bug reports this late at night. I intended to note the relationships between 0001785 (instead of this bug) and 0001535, 0001393, and 0000351. I will correct the relationships now. |
(0006568) shware_systems (reporter) 2023-11-13 18:09 |
I think originally the getopts utility interface assumed a user would specify voluntarily all options be preceded by a <dash>, or <plus>, as separate arguments, e.g. "-a -b" and not "-ab", and having multiple options was more a syntax line documentation convenience only. There may have been thoughts too on making it the shells responsibility to split apart multiple options to this format before processing lines of a script so getopt wouldn't need to be bothered, but it doesn't look like any shells ever implemented this. Then OPTIND as documented would specify which argument that had a leading option <dash> was next to be referenced unambiguously. Without such munging it is probably better to make OPTIND an opaque variable of unspecified format, not numeric, that only getopt may reliably reference. |
(0006569) kre (reporter) 2023-11-13 20:16 |
Re: Note: 0006568 The first paragraph cannot possibly be correct, unix programs have been using multiple flag options after a single '-' since about when (perhaps exactly when, 'twas before my time) they were invented. "ls -al" is a simple example that has been with us forever. There is no way that anyone, anywhere, ever, would have even considered requiring that to be "ls -a -l". Further, it is getopts' role to parse the option args (and was getopt's before that, as much as it was able) expecting the shell to parse them (which it would need to do to distinguish between a: and al as the optstring, which varies how ls -al would need to be treated) and then invoke getopts to parse them again would be absurd. The second paragraph (2nd sentence in particular, we can't do the first, as there is no existing standard to document) I almost agree with - except that we write "unspecified value" not "opaque" (the meaning is almost the same), and that we must require OPTIND to contain a string representing an integer after getopts has returned "no more" (ie: exit status 1), as we must be able to do "shift $(( OPTIND - 1 ))" In general, the only time a script should reference OPTIND is after getopts has indicated the options are done (and with that in mind, it might be worth adding a note in the application usage section advising against a "break" out of a while getopts ... ; do ; done loop, the loop should be allowed to terminate naturally) and it can be set to 1 (OPTIND=1) before the getopts loop starts to reinit things. |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |