Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001128 [1003.1(2013)/Issue7+TC1] Shell and Utilities Objection Omission 2017-03-17 01:08 2017-05-06 03:02
Reporter kre View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Robert Elz
Organization
User Reference
Section 1.1.2.1, 2.6.4
Page Number 2331-3, 2358-9
Line Number 74118-74162, 75225-75250
Interp Status ---
Final Accepted Text
Summary 0001128: Where is the ',' (comma) operator ?
Description First, apologies if this has been raised before, I looked,
but did not see anything (and as it is not fixed, I would
tend to guess it has not been.)

Table 1-2, and section 2.6.4 make no mention at all of C's
',' operator.

Some shells appear to implement that (perhaps most shells) but
some do not.

Is it intended to be supported, or not?
Desired Action Add a row to Table 1-2 listing ','
Add text to 2.6.4 indicating whether or not implementation of ',' in
arithmetic expressions is required/forbidden/optional.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0003623)
stephane (reporter)
2017-03-17 10:32
edited on: 2017-03-17 10:32

AFAICT, http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_04 [^] and http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tagtcjh_15 [^]

make it very clear what operators are to be supported.

We don't want to list all the operators that may or may not be supported as extensions unless that affects the standard syntax.

For instance, POSIX would have to make it explicit if it allowed implementations to implement a NOT (literally) operator, because that would mean people can't use $((NOT -1)) (which currently is 1 taken away from the NOT variable).

That's for instance why the spec says it's unspecified (or implementation defined, I don't remember) whether ++ and -- are supported or not, to tell applications not to assume $((--a)) is the same as $((- - a)) (as they could if they had left that bit out).

With the current text, it is very clear that $((1,2)) is not specified, so are $((2**3)), or $((sqrt(4))), or $((2 ~ 3)) even though some of them are supported by some implementations. So an application must not use them (and incidently shell implementers are free to do whatever they want with them, like 1,2 meaning either the same as 1.2 or the "," C operator).

We don't need the spec to tell us explicitly that they're not.

Note that in the case of $((1,2)), in ksh93, that conflicts with floating point expressions in locales where "," is the decimal separator, so is better left out IMO (also for consistency with awk). See yash for a better approach at handling localisation with regards to the decimal separator.

(0003626)
joerg (reporter)
2017-03-17 10:59

Given that most locales use the comma as separator for decimal fractions,
this is generally a problem and this is why ksh93 requires a space before
the comma to make it the comma operator if you are in a typial (non-english)
based locale.
(0003627)
stephane (reporter)
2017-03-17 11:30
edited on: 2017-03-17 11:40

Re: note:3626

[off-topic here]. It's a bad decision of ksh93 of honouring the locale's decimal point *in the syntax of the shell*. That means most ksh93 scripts written by English-speaking people that use floating points stop working when run in a locale where "," is the radix character.

yash also supports floating point arithmetic and also handles the locale's radix character, but does it in a much better way. "." is always the radix in the shell arithmetic language (like in C, like in awk, like in bc), but honours the locale's radix for I/O (upon expansion and upon taking variables in).

In a locale where comma is the radix character

pi=$((3.14159265))
echo "$pi"

outputs 3,14159265

(and you need to write $((pi / 2)), not $(($pi / 2)), and not pi=3.14159265, and be careful when you change the locale midway through the script)

See also https://unix.stackexchange.com/questions/89712/bash-float-to-integer/89748#89748 [^]

(0003628)
kre (reporter)
2017-03-17 11:31

Sorry, but ...

    With the current text, it is very clear that $((1,2)) is not specified,

I disagree. Table 1-2 lists just about everything that is valid C (even
the statements, that are not C expressions by any definition - it even
includes goto) and, aside from ',', lists every plausible C expression
operator that exists - sure unary '*' is missing, and if it is considered
an operator binary '.' (personally I wouldn't consider it as such), and
'[' (with its ']' companion (which is just a derivation of '*') but none
of those are really applicable in any obvious way to sh (at least for
shells that don't implement arrays, which are definitely an extension.)

The only conclusion that I can draw from that is that ',' was just forgotten.
Since it is not mentioned at all, it is hard to believe that it was deliberately
omitted, were that the case, there would be an explanation in the rationale.

Note that I did not ask that support for ',' be mandated, nor that it be
forbidden (whether for locale issues or anything else) - specifying that,
like ++ and --, support for it is unspecified would be just fine. But that
statement should be explicit, as it is for ++ -- (and the, would be absurd,
sizeof).

Note that ** and a binary ~ operator, and various other possible extensions
are quite different - none of those are standard C operators. Certainly it
is not possible to list every possible newly invented character sequence
which someone might implement as an operator, and state that support for that
is undefined, but the standard should really list all the standard C
expression operators (it could just omit the C statements though, they aren't
expressions in (normal) C, and there's no reason to assume anyone would
assume that sh $(( )) which defines itself as using C expressions, would
for some bizarre reason permit "while..." or goto (where, labels are not
in the list...) inside a $(( )).

I don't think this is really the place for any extensive discussion of locale
issues, so I will just avoid that here, though I will note as an aside that
by "typical non-english based locale" Joerg really means "non-English
European locale".

kre
(0003631)
stephane (reporter)
2017-03-17 12:13

I agree there's scope for improvement/clarification (there was a discussion on that on the ML wrt var vs $var and the fact that surely a POSIX shell can't be expected to implement a full C-interpreter, that echoed several of your points).

But, to reword my point above slightly differently though with the same intention, the spec should make it clear what operators are supported (what's the syntax of the language), not give a list of could-be operators that are not (and though there are issues in how the spec is worded, "," is not one of them IMO).
(0003632)
shware_systems (reporter)
2017-03-17 12:21
edited on: 2017-03-17 12:42

I don't see the comma operator as intended to be included. It is a statement grouping operator to disambiguate sequence points without having to use extra goto statements in C blocks so a ';' indicates an explicit sequence point. It is an operator, but not an arithmetic one that returns a result. Any result is a consequence of the expression following the ',', not of the operator itself. As goto's are excluded, it follows ',' is also intended to be excluded; unless a future shell becomes sequence point sensitive. That it is described with the arithmetic operators rather than by itself is similar to how typedef is listed as a storage class specifier, for (histo)hysterical reasons. While parentheses and the question mark and colon combo can be viewed as a similar type of operator, their effect in grouping the other arithmetic operators is limited to a single sequence point.

(0003633)
kre (reporter)
2017-03-17 12:39
edited on: 2017-03-17 12:51

Re: 3632... ',' certainly is an arithmetic operator that returns a result, just
as much as ?: does (you can describe them just the same way) - and for that
matter '=' (assignment - particularly in the sh context where there are no
data type conversions to be concerned with).

Further, while ',' does imply sequencing of its operands (as does && and ||)
it has nothing whatever to do with control flow, or goto (even less than ?:
which is certainly an included operator in sh).

And [added later in an edit] - () can apply around ',' operators, in fact,
a common usage of ',' (in C, and could be, if it could be relied upon, in
sh) is its usage in expressions like ..

    some_boolean_expression ? ( x=1, y=2, z=3) : ( x=3, y=1, z=7 );


Re: 3631, "the spec should make it clear what operators are supported" - if
it did it that way, and only listed the supported operations, and simply said
that all other C operators are unspecified, that would be fine. But
excluding explicitly all the ones that we could think of to exclude, but
not the ones that were forgotten, makes it unclear what the status would
have been had it been remembered, would ',' have been listed as supported
or not (after all, ksh - all variants I could find to test it - supports it,
and as we keep being told, ksh88 forms the basis of the posix sh spec.)

(0003635)
shware_systems (reporter)
2017-03-17 12:53

The way to emulate comma is by a sequence similar to:
left-hand-side;
goto auto-comma-label;
auto-comma-label:
result = right-hand-side;

From c99 6.5.17:
Then the right operand is evaluated; the result has its
type and value.

This is why I don't see the ',' as having a result. It is a syntactic macro that generates hidden labels more than actual operator.
(0003637)
kre (reporter)
2017-03-17 14:07

Re 3635, you can interpret '&&' the same way if you like, it doesn't
do anything with the values of its operands, it just evaluates the one
on the left, then possibly evaluates the one on the right, and returns
as its value whichever one it evaluated last.

',' evaluates the operand on the left, then unconditionally evaluates
the operand on the right, and returns as its result the operand that
it evaluated last.

What real difference is there?

You can also model the && operator with if statements and gotos if that
floats your boat, but that doesn't make it anything less an operator.
',' is an operator too, and it jas a result, just as the test you quoted
from c99 says it does.
(0003639)
shware_systems (reporter)
2017-03-20 01:43

C99 6.5.13 to 17 can all be implemented as macros with arguments, but then their names would have to be longer to be mnemonic and couldn't use infix notation. The standard characterizes all of them as operators instead, and since the behavior description is adequate enough to be usable it doesn't really matter for comma that other characterizations may be more precise.

The difference I see is &&, and ||, involve a conditional based on the operand values, or a similar one with the left hand side of the ?: triplet; and evaluation of parts of the expression isn't guaranteed. A processor's ALU is involved evaluating the condition, if only to set flags or select a branch target as side effects. They qualify as arithmetic operators because they involve the ALU somehow.

Functionally, the comma operator is essentially a NOP that only ensures there aren't pending side effects of an ALU doing speculative or asynchronous executions when the right hand side begins being evaluated, especially if components of the expression are volatile qualified. All parts of the expression get evaluated, however. Its side effects relate to a CPUs' instruction fetch subsystem, not the ALU. For a simple case, it frequently maps to a WAIT instruction on x86s, which has neither operands or result that affect the ALU or FPU registers, as Intel documents it.

Parentheses around those operators and operands can affect evaluation precedence also. When used with comma they simply disambiguate it syntactically from being considered a premature end of a function's argument or initialization element rvalue. They function more like curly braces around entire statement blocks than for arithmetic grouping.

This is a significant enough difference I consider comma of a different operator class than arithmetic, as token pasting operators in the preprocessor are a different class, but also listed as generic 'operators' in the syntax descriptions.
(0003640)
kre (reporter)
2017-03-20 03:05

Mark, you can believe what you want, but the ',' (as a boolean operator,
distinct from the ',' which is a syntax element used to separate items in
lists of initialisers, declarations, parameters, ...) is an operator in
C and has been for a very very long time (I don't (quite) go back far enough
to know if it was there in C on day 1, but it certainly is part of ancient
K&R C, from way before it was made portable or anything like that.)

PDP-11's (and 7's) had no need of any fancy ALU /CPU side effect maintenance,
you did something, and it just happened... So ',' was certainly not
added for that ... on the other hand, expressions like
        x++, x++
are perfectly well defined (unlike x++ + x++ for example), the ',' provides
that kins of sequencing.

Comma operators are useful all kinds of places where the syntax permits
just one expression, but more than one is needed (or desired), as for
example in
        for (count = 0, ptr = list_head; ptr != NULL; ptr = ptr->next, count++)
                /* whatever code */

The same would be useful in $(( )) in the shell - it allows just a single
expression, but sometimes we need more than that, which I assume (apart
from just completeness) is why most shells implement it.

I still cannot help but believe it was just forgotten - and until someone
who really knows says different, I will continue to believe that.
(0003641)
geoffclare (manager)
2017-03-20 10:20

I can't claim to "really know", but I am inclined to believe that comma was deliberately not included in the operators table because at the time (1992) it was not supported by any of the then-current implementations of the utilities to which that table applied, namely awk, bc and (the ksh88 implementation of) sh.

Note that although Note: 0003633 says "after all, ksh - all variants I could find to test it - supports it, and as we keep being told, ksh88 forms the basis of the posix sh spec", I have been unable to find any ksh88 variant that supports it. I tried Solaris 10 (/bin/ksh and /usr/xpg4/bin/sh), Solaris 11 (/usr/xpg4/bin/sh) and HP-UX 11.23 (/usr/bin/ksh), and they all reported a syntax error for $((1,2)).
(0003642)
kre (reporter)
2017-03-20 10:54

Re note 3641 ..

I don't have access to anything with ksh88, so that was one that was not
part of "all variants I could find to test" - though I did assume it might
have been there, as pdksh (the ancient bug ridden thing) is supposed to be
(or was intended to have been) a ksh88 clone - or so I have read, and it
supports ','.

But if ',' was deliberately omitted (from the std), why was no mention made
of it, sizeof() is deliberately omitted (for excellent, and obvious, reasons)
and it is mentioned (one can see that ++ and -- need to be explicitly
unspecified as otherwise they may be interpreted as two unary or
unary & binary operators in some cases). Control statements (which
aren't expressions at all) are also explicitly omitted.

But ',' is just ignored??? It is the one plausible operator that is not
there anywhere.

kre
(0003643)
joerg (reporter)
2017-03-20 11:42

Re Note: 0003641

If you are in a non-english based locale, you should use $((1 ,2))
but I can confirm that ksh88 (even on Solaris) does not support it.
(0003644)
geoffclare (manager)
2017-03-20 11:45

It looks to me like the inclusions and omissions of arithmetic operators in the table were done by entire C standard sub-sections. Thus sizeof is in the table because other unary operators in 6.5.3 are needed. However, cast operators are not in the table because nothing in that sub-section (6.5.4) is needed. The omission of the comma operator fits that pattern because it is in its own sub-section (6.5.17).

Control statements are included in the table because they are needed for utilities other than the shell.
(0003645)
kre (reporter)
2017-03-20 11:56

Re note 3644 ... Thanks Geoff - that makes some sense at last (I don't have
a copy of the C standard to refer to, so couldn't notice that.)

Nevertheless, given that ',' is the one missing operator that might make sense,
when this issue eventually works its way to the head of the queue, I would
still like some mention of the comma operator added to the text.

It doesn't matter to me whether it becomes "shall support" (based upon most
current shells doing that, and its usefulness), "support is unspecified",
or "shall not support" (because of locale reasons, or anything else rational.)

Just don't leave it looking abandoned and forgotten...

(And with that, that's all from me on this issue.)
(0003647)
Vincent Lefevre (reporter)
2017-03-23 13:01
edited on: 2017-03-23 13:02

Re Note: 0003643

,2 means 0,2 when the comma is the decimal separator. So, $((1 ,2)) won't work in ksh93. You need $((1 , 2)) instead (i.e. with a space before and after the comma).

(0003687)
kre (reporter)
2017-05-06 03:02

For no particular good reason, I have just been pondering the locale and ','
issue, and the problems that are caused by having an operator that is the
same as a character that can appear as part of a number in some locales.

Then I wondered why only ',' is special for that, I have it on good authority
the ants of neptune (they live "underground", and are quite small, so we have
not discovered them yet...) use a numeric form in which the '+' character is
used to separate units from decimal fractions. (so 1+3 is their equivalent
of what in the US/Aust/UK would be 1.3 or 1,3 in much of the non-UK Europe).

Perhaps er either need to require that operators in $(( )) expressions all
be space separated (as has been stated is required in ksh93 for ',') at
least from numeric literals (we control the syntax of var names so there is
less of a problem, unless some shell perhaps allows "scientific" notation
and we have to deal with that "e" in numbers), or perhaps it might be
better to simply require that all sh arithmetic be evaluated in the C locale.

Neither solution is suitable for standardisation until implemented of
course, so implementers out there might want to consider how they would
handle any random locale specificatio, where any character might be the
"decimal point" (whatever its proper name is.)

- Issue History
Date Modified Username Field Change
2017-03-17 01:08 kre New Issue
2017-03-17 01:08 kre Name => Robert Elz
2017-03-17 01:08 kre Section => 1.1.2.1, 2.6.4
2017-03-17 01:08 kre Page Number => 2331-3, 2358-9
2017-03-17 01:08 kre Line Number => 74118-74162, 75225-75250
2017-03-17 10:32 stephane Note Added: 0003623
2017-03-17 10:32 stephane Note Edited: 0003623
2017-03-17 10:59 joerg Note Added: 0003626
2017-03-17 11:30 stephane Note Added: 0003627
2017-03-17 11:31 kre Note Added: 0003628
2017-03-17 11:40 stephane Note Edited: 0003627
2017-03-17 12:13 stephane Note Added: 0003631
2017-03-17 12:21 shware_systems Note Added: 0003632
2017-03-17 12:36 shware_systems Note Edited: 0003632
2017-03-17 12:39 kre Note Added: 0003633
2017-03-17 12:41 kre Note Edited: 0003633
2017-03-17 12:42 shware_systems Note Edited: 0003632
2017-03-17 12:49 kre Note Edited: 0003633
2017-03-17 12:50 kre Note Edited: 0003633
2017-03-17 12:51 kre Note Edited: 0003633
2017-03-17 12:53 shware_systems Note Added: 0003635
2017-03-17 14:07 kre Note Added: 0003637
2017-03-20 01:43 shware_systems Note Added: 0003639
2017-03-20 03:05 kre Note Added: 0003640
2017-03-20 10:20 geoffclare Note Added: 0003641
2017-03-20 10:54 kre Note Added: 0003642
2017-03-20 11:42 joerg Note Added: 0003643
2017-03-20 11:45 geoffclare Note Added: 0003644
2017-03-20 11:56 kre Note Added: 0003645
2017-03-23 13:01 Vincent Lefevre Note Added: 0003647
2017-03-23 13:02 Vincent Lefevre Note Edited: 0003647
2017-03-23 13:02 Vincent Lefevre Note Edited: 0003647
2017-05-06 03:02 kre Note Added: 0003687


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker