Anonymous | Login | 2024-12-12 00:54 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001128 | [1003.1(2013)/Issue7+TC1] Shell and Utilities | Objection | Omission | 2017-03-17 01:08 | 2024-06-11 08:54 | ||
Reporter | kre | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Robert Elz | ||||||
Organization | |||||||
User Reference | |||||||
Section | 1.1.2.1, 2.6.4 | ||||||
Page Number | 2331-3, 2358-9 | ||||||
Line Number | 74118-74162, 75225-75250 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | Note: 0004088 | ||||||
Summary | 0001128: Where is the ',' (comma) operator ? | ||||||
Description |
First, apologies if this has been raised before, I looked, but did not see anything (and as it is not fixed, I would tend to guess it has not been.) Table 1-2, and section 2.6.4 make no mention at all of C's ',' operator. Some shells appear to implement that (perhaps most shells) but some do not. Is it intended to be supported, or not? |
||||||
Desired Action |
Add a row to Table 1-2 listing ',' Add text to 2.6.4 indicating whether or not implementation of ',' in arithmetic expressions is required/forbidden/optional. |
||||||
Tags | tc3-2008 | ||||||
Attached Files | |||||||
|
Notes | |
(0003623) stephane (reporter) 2017-03-17 10:32 edited on: 2017-03-17 10:32 |
AFAICT, http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04 [^] and http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tagtcjh_15 [^] make it very clear what operators are to be supported. We don't want to list all the operators that may or may not be supported as extensions unless that affects the standard syntax. For instance, POSIX would have to make it explicit if it allowed implementations to implement a NOT (literally) operator, because that would mean people can't use $((NOT -1)) (which currently is 1 taken away from the NOT variable). That's for instance why the spec says it's unspecified (or implementation defined, I don't remember) whether ++ and -- are supported or not, to tell applications not to assume $((--a)) is the same as $((- - a)) (as they could if they had left that bit out). With the current text, it is very clear that $((1,2)) is not specified, so are $((2**3)), or $((sqrt(4))), or $((2 ~ 3)) even though some of them are supported by some implementations. So an application must not use them (and incidently shell implementers are free to do whatever they want with them, like 1,2 meaning either the same as 1.2 or the "," C operator). We don't need the spec to tell us explicitly that they're not. Note that in the case of $((1,2)), in ksh93, that conflicts with floating point expressions in locales where "," is the decimal separator, so is better left out IMO (also for consistency with awk). See yash for a better approach at handling localisation with regards to the decimal separator. |
(0003626) joerg (reporter) 2017-03-17 10:59 |
Given that most locales use the comma as separator for decimal fractions, this is generally a problem and this is why ksh93 requires a space before the comma to make it the comma operator if you are in a typial (non-english) based locale. |
(0003627) stephane (reporter) 2017-03-17 11:30 edited on: 2017-03-17 11:40 |
Re: note:3626 [off-topic here]. It's a bad decision of ksh93 of honouring the locale's decimal point *in the syntax of the shell*. That means most ksh93 scripts written by English-speaking people that use floating points stop working when run in a locale where "," is the radix character. yash also supports floating point arithmetic and also handles the locale's radix character, but does it in a much better way. "." is always the radix in the shell arithmetic language (like in C, like in awk, like in bc), but honours the locale's radix for I/O (upon expansion and upon taking variables in). In a locale where comma is the radix character pi=$((3.14159265)) echo "$pi" outputs 3,14159265 (and you need to write $((pi / 2)), not $(($pi / 2)), and not pi=3.14159265, and be careful when you change the locale midway through the script) See also https://unix.stackexchange.com/questions/89712/bash-float-to-integer/89748#89748 [^] |
(0003628) kre (reporter) 2017-03-17 11:31 |
Sorry, but ... With the current text, it is very clear that $((1,2)) is not specified, I disagree. Table 1-2 lists just about everything that is valid C (even the statements, that are not C expressions by any definition - it even includes goto) and, aside from ',', lists every plausible C expression operator that exists - sure unary '*' is missing, and if it is considered an operator binary '.' (personally I wouldn't consider it as such), and '[' (with its ']' companion (which is just a derivation of '*') but none of those are really applicable in any obvious way to sh (at least for shells that don't implement arrays, which are definitely an extension.) The only conclusion that I can draw from that is that ',' was just forgotten. Since it is not mentioned at all, it is hard to believe that it was deliberately omitted, were that the case, there would be an explanation in the rationale. Note that I did not ask that support for ',' be mandated, nor that it be forbidden (whether for locale issues or anything else) - specifying that, like ++ and --, support for it is unspecified would be just fine. But that statement should be explicit, as it is for ++ -- (and the, would be absurd, sizeof). Note that ** and a binary ~ operator, and various other possible extensions are quite different - none of those are standard C operators. Certainly it is not possible to list every possible newly invented character sequence which someone might implement as an operator, and state that support for that is undefined, but the standard should really list all the standard C expression operators (it could just omit the C statements though, they aren't expressions in (normal) C, and there's no reason to assume anyone would assume that sh $(( )) which defines itself as using C expressions, would for some bizarre reason permit "while..." or goto (where, labels are not in the list...) inside a $(( )). I don't think this is really the place for any extensive discussion of locale issues, so I will just avoid that here, though I will note as an aside that by "typical non-english based locale" Joerg really means "non-English European locale". kre |
(0003631) stephane (reporter) 2017-03-17 12:13 |
I agree there's scope for improvement/clarification (there was a discussion on that on the ML wrt var vs $var and the fact that surely a POSIX shell can't be expected to implement a full C-interpreter, that echoed several of your points). But, to reword my point above slightly differently though with the same intention, the spec should make it clear what operators are supported (what's the syntax of the language), not give a list of could-be operators that are not (and though there are issues in how the spec is worded, "," is not one of them IMO). |
(0003632) shware_systems (reporter) 2017-03-17 12:21 edited on: 2017-03-17 12:42 |
I don't see the comma operator as intended to be included. It is a statement grouping operator to disambiguate sequence points without having to use extra goto statements in C blocks so a ';' indicates an explicit sequence point. It is an operator, but not an arithmetic one that returns a result. Any result is a consequence of the expression following the ',', not of the operator itself. As goto's are excluded, it follows ',' is also intended to be excluded; unless a future shell becomes sequence point sensitive. That it is described with the arithmetic operators rather than by itself is similar to how typedef is listed as a storage class specifier, for (histo)hysterical reasons. While parentheses and the question mark and colon combo can be viewed as a similar type of operator, their effect in grouping the other arithmetic operators is limited to a single sequence point. |
(0003633) kre (reporter) 2017-03-17 12:39 edited on: 2017-03-17 12:51 |
Re: 3632... ',' certainly is an arithmetic operator that returns a result, just as much as ?: does (you can describe them just the same way) - and for that matter '=' (assignment - particularly in the sh context where there are no data type conversions to be concerned with). Further, while ',' does imply sequencing of its operands (as does && and ||) it has nothing whatever to do with control flow, or goto (even less than ?: which is certainly an included operator in sh). And [added later in an edit] - () can apply around ',' operators, in fact, a common usage of ',' (in C, and could be, if it could be relied upon, in sh) is its usage in expressions like .. some_boolean_expression ? ( x=1, y=2, z=3) : ( x=3, y=1, z=7 ); Re: 3631, "the spec should make it clear what operators are supported" - if it did it that way, and only listed the supported operations, and simply said that all other C operators are unspecified, that would be fine. But excluding explicitly all the ones that we could think of to exclude, but not the ones that were forgotten, makes it unclear what the status would have been had it been remembered, would ',' have been listed as supported or not (after all, ksh - all variants I could find to test it - supports it, and as we keep being told, ksh88 forms the basis of the posix sh spec.) |
(0003635) shware_systems (reporter) 2017-03-17 12:53 |
The way to emulate comma is by a sequence similar to: left-hand-side; goto auto-comma-label; auto-comma-label: result = right-hand-side; From c99 6.5.17: Then the right operand is evaluated; the result has its type and value. This is why I don't see the ',' as having a result. It is a syntactic macro that generates hidden labels more than actual operator. |
(0003637) kre (reporter) 2017-03-17 14:07 |
Re 3635, you can interpret '&&' the same way if you like, it doesn't do anything with the values of its operands, it just evaluates the one on the left, then possibly evaluates the one on the right, and returns as its value whichever one it evaluated last. ',' evaluates the operand on the left, then unconditionally evaluates the operand on the right, and returns as its result the operand that it evaluated last. What real difference is there? You can also model the && operator with if statements and gotos if that floats your boat, but that doesn't make it anything less an operator. ',' is an operator too, and it jas a result, just as the test you quoted from c99 says it does. |
(0003639) shware_systems (reporter) 2017-03-20 01:43 |
C99 6.5.13 to 17 can all be implemented as macros with arguments, but then their names would have to be longer to be mnemonic and couldn't use infix notation. The standard characterizes all of them as operators instead, and since the behavior description is adequate enough to be usable it doesn't really matter for comma that other characterizations may be more precise. The difference I see is &&, and ||, involve a conditional based on the operand values, or a similar one with the left hand side of the ?: triplet; and evaluation of parts of the expression isn't guaranteed. A processor's ALU is involved evaluating the condition, if only to set flags or select a branch target as side effects. They qualify as arithmetic operators because they involve the ALU somehow. Functionally, the comma operator is essentially a NOP that only ensures there aren't pending side effects of an ALU doing speculative or asynchronous executions when the right hand side begins being evaluated, especially if components of the expression are volatile qualified. All parts of the expression get evaluated, however. Its side effects relate to a CPUs' instruction fetch subsystem, not the ALU. For a simple case, it frequently maps to a WAIT instruction on x86s, which has neither operands or result that affect the ALU or FPU registers, as Intel documents it. Parentheses around those operators and operands can affect evaluation precedence also. When used with comma they simply disambiguate it syntactically from being considered a premature end of a function's argument or initialization element rvalue. They function more like curly braces around entire statement blocks than for arithmetic grouping. This is a significant enough difference I consider comma of a different operator class than arithmetic, as token pasting operators in the preprocessor are a different class, but also listed as generic 'operators' in the syntax descriptions. |
(0003640) kre (reporter) 2017-03-20 03:05 |
Mark, you can believe what you want, but the ',' (as a boolean operator, distinct from the ',' which is a syntax element used to separate items in lists of initialisers, declarations, parameters, ...) is an operator in C and has been for a very very long time (I don't (quite) go back far enough to know if it was there in C on day 1, but it certainly is part of ancient K&R C, from way before it was made portable or anything like that.) PDP-11's (and 7's) had no need of any fancy ALU /CPU side effect maintenance, you did something, and it just happened... So ',' was certainly not added for that ... on the other hand, expressions like x++, x++ are perfectly well defined (unlike x++ + x++ for example), the ',' provides that kins of sequencing. Comma operators are useful all kinds of places where the syntax permits just one expression, but more than one is needed (or desired), as for example in for (count = 0, ptr = list_head; ptr != NULL; ptr = ptr->next, count++) /* whatever code */ The same would be useful in $(( )) in the shell - it allows just a single expression, but sometimes we need more than that, which I assume (apart from just completeness) is why most shells implement it. I still cannot help but believe it was just forgotten - and until someone who really knows says different, I will continue to believe that. |
(0003641) geoffclare (manager) 2017-03-20 10:20 |
I can't claim to "really know", but I am inclined to believe that comma was deliberately not included in the operators table because at the time (1992) it was not supported by any of the then-current implementations of the utilities to which that table applied, namely awk, bc and (the ksh88 implementation of) sh. Note that although Note: 0003633 says "after all, ksh - all variants I could find to test it - supports it, and as we keep being told, ksh88 forms the basis of the posix sh spec", I have been unable to find any ksh88 variant that supports it. I tried Solaris 10 (/bin/ksh and /usr/xpg4/bin/sh), Solaris 11 (/usr/xpg4/bin/sh) and HP-UX 11.23 (/usr/bin/ksh), and they all reported a syntax error for $((1,2)). |
(0003642) kre (reporter) 2017-03-20 10:54 |
Re note 3641 .. I don't have access to anything with ksh88, so that was one that was not part of "all variants I could find to test" - though I did assume it might have been there, as pdksh (the ancient bug ridden thing) is supposed to be (or was intended to have been) a ksh88 clone - or so I have read, and it supports ','. But if ',' was deliberately omitted (from the std), why was no mention made of it, sizeof() is deliberately omitted (for excellent, and obvious, reasons) and it is mentioned (one can see that ++ and -- need to be explicitly unspecified as otherwise they may be interpreted as two unary or unary & binary operators in some cases). Control statements (which aren't expressions at all) are also explicitly omitted. But ',' is just ignored??? It is the one plausible operator that is not there anywhere. kre |
(0003643) joerg (reporter) 2017-03-20 11:42 |
Re Note: 0003641 If you are in a non-english based locale, you should use $((1 ,2)) but I can confirm that ksh88 (even on Solaris) does not support it. |
(0003644) geoffclare (manager) 2017-03-20 11:45 |
It looks to me like the inclusions and omissions of arithmetic operators in the table were done by entire C standard sub-sections. Thus sizeof is in the table because other unary operators in 6.5.3 are needed. However, cast operators are not in the table because nothing in that sub-section (6.5.4) is needed. The omission of the comma operator fits that pattern because it is in its own sub-section (6.5.17). Control statements are included in the table because they are needed for utilities other than the shell. |
(0003645) kre (reporter) 2017-03-20 11:56 |
Re note 3644 ... Thanks Geoff - that makes some sense at last (I don't have a copy of the C standard to refer to, so couldn't notice that.) Nevertheless, given that ',' is the one missing operator that might make sense, when this issue eventually works its way to the head of the queue, I would still like some mention of the comma operator added to the text. It doesn't matter to me whether it becomes "shall support" (based upon most current shells doing that, and its usefulness), "support is unspecified", or "shall not support" (because of locale reasons, or anything else rational.) Just don't leave it looking abandoned and forgotten... (And with that, that's all from me on this issue.) |
(0003647) Vincent Lefevre (reporter) 2017-03-23 13:01 edited on: 2017-03-23 13:02 |
Re Note: 0003643 ,2 means 0,2 when the comma is the decimal separator. So, $((1 ,2)) won't work in ksh93. You need $((1 , 2)) instead (i.e. with a space before and after the comma). |
(0003687) kre (reporter) 2017-05-06 03:02 |
For no particular good reason, I have just been pondering the locale and ',' issue, and the problems that are caused by having an operator that is the same as a character that can appear as part of a number in some locales. Then I wondered why only ',' is special for that, I have it on good authority the ants of neptune (they live "underground", and are quite small, so we have not discovered them yet...) use a numeric form in which the '+' character is used to separate units from decimal fractions. (so 1+3 is their equivalent of what in the US/Aust/UK would be 1.3 or 1,3 in much of the non-UK Europe). Perhaps er either need to require that operators in $(( )) expressions all be space separated (as has been stated is required in ksh93 for ',') at least from numeric literals (we control the syntax of var names so there is less of a problem, unless some shell perhaps allows "scientific" notation and we have to deal with that "e" in numbers), or perhaps it might be better to simply require that all sh arithmetic be evaluated in the C locale. Neither solution is suitable for standardisation until implemented of course, so implementers out there might want to consider how they would handle any random locale specificatio, where any character might be the "decimal point" (whatever its proper name is.) |
(0004088) geoffclare (manager) 2018-08-23 16:15 |
On 2016 edition page 2332 line 74119 section 1.1.2.1 add a small-font note:Note: the comma operator (section 6.5.17 of the ISO C standard) is intentionally not included in the table. It need not be supported by implementations. |
Issue History | |||
Date Modified | Username | Field | Change |
2017-03-17 01:08 | kre | New Issue | |
2017-03-17 01:08 | kre | Name | => Robert Elz |
2017-03-17 01:08 | kre | Section | => 1.1.2.1, 2.6.4 |
2017-03-17 01:08 | kre | Page Number | => 2331-3, 2358-9 |
2017-03-17 01:08 | kre | Line Number | => 74118-74162, 75225-75250 |
2017-03-17 10:32 | stephane | Note Added: 0003623 | |
2017-03-17 10:32 | stephane | Note Edited: 0003623 | |
2017-03-17 10:59 | joerg | Note Added: 0003626 | |
2017-03-17 11:30 | stephane | Note Added: 0003627 | |
2017-03-17 11:31 | kre | Note Added: 0003628 | |
2017-03-17 11:40 | stephane | Note Edited: 0003627 | |
2017-03-17 12:13 | stephane | Note Added: 0003631 | |
2017-03-17 12:21 | shware_systems | Note Added: 0003632 | |
2017-03-17 12:36 | shware_systems | Note Edited: 0003632 | |
2017-03-17 12:39 | kre | Note Added: 0003633 | |
2017-03-17 12:41 | kre | Note Edited: 0003633 | |
2017-03-17 12:42 | shware_systems | Note Edited: 0003632 | |
2017-03-17 12:49 | kre | Note Edited: 0003633 | |
2017-03-17 12:50 | kre | Note Edited: 0003633 | |
2017-03-17 12:51 | kre | Note Edited: 0003633 | |
2017-03-17 12:53 | shware_systems | Note Added: 0003635 | |
2017-03-17 14:07 | kre | Note Added: 0003637 | |
2017-03-20 01:43 | shware_systems | Note Added: 0003639 | |
2017-03-20 03:05 | kre | Note Added: 0003640 | |
2017-03-20 10:20 | geoffclare | Note Added: 0003641 | |
2017-03-20 10:54 | kre | Note Added: 0003642 | |
2017-03-20 11:42 | joerg | Note Added: 0003643 | |
2017-03-20 11:45 | geoffclare | Note Added: 0003644 | |
2017-03-20 11:56 | kre | Note Added: 0003645 | |
2017-03-23 13:01 | Vincent Lefevre | Note Added: 0003647 | |
2017-03-23 13:02 | Vincent Lefevre | Note Edited: 0003647 | |
2017-03-23 13:02 | Vincent Lefevre | Note Edited: 0003647 | |
2017-05-06 03:02 | kre | Note Added: 0003687 | |
2018-08-23 16:15 | geoffclare | Note Added: 0004088 | |
2018-08-23 16:15 | geoffclare | Interp Status | => --- |
2018-08-23 16:15 | geoffclare | Final Accepted Text | => Note: 0004088 |
2018-08-23 16:15 | geoffclare | Status | New => Resolved |
2018-08-23 16:15 | geoffclare | Resolution | Open => Accepted As Marked |
2018-08-23 16:16 | geoffclare | Tag Attached: tc3-2008 | |
2019-10-31 11:33 | geoffclare | Status | Resolved => Applied |
2024-06-11 08:54 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |