View Issue Details

IDProjectCategoryView StatusLast Update
00019731003.1(2024)/Issue8Shell and Utilitiespublic2026-06-06 14:42
Reporterstephane Assigned To 
PrioritynormalSeverityObjectionTypeClarification Requested
Status Interpretation RequiredResolutionAccepted As Marked 
NameStephane Chazelas
Organization
User Reference
Sectionawk utility
Page Number2610
Line Number85386-85394
Interp StatusPending
Final Accepted Textsee 0001973:0007438
Summary0001973: awk "numeric string " origins
DescriptionThe awk specification (https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/utilities/awk.html#tag_20_06_13_02) has:

<<<
 A string value shall be considered a numeric string if it comes from one of the following:

     1. Field variables
     2. Input from the getline() function
     3. FILENAME
     4. ARGV array elements
     5. ENVIRON array elements
     6. Array elements created by the split() function
     7. A command line variable assignment
     8. Variable assignment from another numeric string variable
>>>

It can be interpreted as meaning that

awk 'BEGIN{$1 = "10"; print ($1 > 2)}'

should return 1 for instance. But no implementation that I know does so. By assigning a string to $1, it loses that special property whereby when containing a string that looks like a number it shall be considered as a number.

Same applies for ARGV, FILENAME...

Typo in rationale section btw:

> also shall have the numeric value of the numeric string" was removed
>from several sections of the ISO POSIX-2:1993 standard because *is*
> specifies an unnecessary implementation detail

is -> it
Desired ActionMake it clear that it's

1. the values resulting from the splitting of $0 into $1, $2... (upon first dereferencing after reading a record (including via getline) or after assigning to $0) that are candidate for numeric strings, not the field variables per se, or change to "Field variables unless subsequently assigned a string value".
3. the current input file as initially assigned to FILENAME, or "FILENAME unless subsequently assigned a string value"

And so on for ARGV and ENVIRON

Or add some verbiage below that list along the lines of:

> And the corresponding variables have not been subsequently assigned a string value.

That still makes it ambiguous for things like:

$1 = "10"; $0 = "11 12"; print ($1 > 2)

Where $1 becomes a numeric string again after assignment to $0
Tagstc1-2024

Activities

stephane

2026-03-06 08:01

reporter   bugnote:0007389

Last edited: 2026-03-06 10:25

May also be worth clarifying (in a separate ticket?) that in sub(ere, repl[, in ]) or gsub(ere, repl[, in ]), if "in" (or $0 if omitted) was a numeric string and there's been at least one substitution, then it becomes a non-numeric string even if it contains the valid representation of a number.

That is for instance:

printf '%s\n' 12 13 | awk '{gsub("2", "2")}; $0 > 2'

Should output 13 only as 12 is successfully substituted with 12, making it a string which is not greater than "2" while 13 remains a numeric string as the substitution failed.

stephane

2026-03-06 09:59

reporter   bugnote:0007391

For context, that came up at https://unix.stackexchange.com/questions/804798/awk-comparing-to-constant-numbers

stephane

2026-03-06 10:20

reporter   bugnote:0007392

Last edited: 2026-03-06 10:21

> 1. the values resulting from the splitting of $0 into $1, $2...

Sorry, that wording is insufficient as that doesn't cover $0 itself, where it's its assigning from input (the current record or via getline) that is considered for numeric strings.

For the case where $0 is recomputed when individual fields are modified, I find the behaviour varies between implementations.

echo 10 | LC_ALL=C awk '{$1 = $1}; $0 > 2'

outputs 10 in mawk, but not in busybox, GNU nor bwk's `awk`.

While

echo 10 | LC_ALL=C awk -v OFS=. '{$2 = 3}; $0 > 2'

outputs 10 in none of them.

agadmin

2026-03-07 11:48

administrator   bugnote:0007393

Adjust summary as requested (seq 38910)

nick

2026-06-04 16:27

manager   bugnote:0007438

Last edited: 2026-06-04 16:30

Interpretation response
------------------------
The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
Strings are enclosed in double-quotes, and the standard says this. However it is unclear what a numeric string means.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

At page 2610, line 85386, change:
        

        A string value shall be considered a numeric string if it comes from one of the following:
        

    to:
        

        A string value shall be considered a numeric string if it is not surrounded by <double-quote> marks (' " ')and comes from one of the following:
        

stephane

2026-06-05 07:08

reporter   bugnote:0007439

Re: 0001973:0007438

Thanks, but I fear the issue has been misinterpreted here as that resolution seems to be beside the point.

My point here is that currently, the POSIX wording would suggest that:

    awk 'BEGIN{ $1 = 1 2 3; $2 = 4 5; print ($1 < $2) }'

would be required to output 0 as those $1, $2 are *field variables* and contain something that looks like a number.

No awk implementation returns 0 here. They all return 1, because it's not about *field variables* but about where the value they (or any other variable) have been assigned comes from.

Here, those values come from the concatenation operator so are *strings*, not *numeric strings*.

Just to clarify for those not familiar with this quirk of the awk language: numeric string *values* are value that are strings in that the exact text representation is preserved. For example:

$ echo 01.000e0 | awk '{print $1}'
01.000e0

But that are treated numerically by the comparison operator:

$ echo 01.000e0 1 | awk '{print $1 == $2}'
1

Because they contain a representation of a number and come from some particular origin, here the implicit splitting of the input record.

One might argue that it's a misdesign for awk's == != < > <= >= operators to have been overloaded to do both string and number comparison (see how perl introduced separate eq ne lt gt le ge operators), but that's not the point I'm trying to make here and I'm not suggesting that should be changed in awk (too late for that).

stephane

2026-06-06 14:42

reporter   bugnote:0007440

I'm suggesting a different wording:

The following string values:

 - input records as assigned to $0 at the start of a cycle or to any variable by getline statements
 - the result of splitting strings, as assigned to field variables when $0 is assigned a new value, or as assigned to array element values by the split() function.
 - the current input file name as assigned to FILENAME
 - the command line arguments as assigned to values of elements of the ARGV array
 - the environment variable values as assigned to values of the elements of the ENVIRON array
 - the values of command line variable assignments

shall be considered numeric strings if they meet an implementation-dependent condition corresponding to either case (a) or (b) below:

It would also be worth noting that the type is attached to the *value*, not the variable it is assigned to, though the value's type is preserved upon assignment to a variable or array element value (not key which is always a string).

Issue History

Date Modified Username Field Change
2026-03-06 07:22 stephane New Issue
2026-03-06 08:01 stephane Note Added: 0007389
2026-03-06 09:59 stephane Note Added: 0007391
2026-03-06 10:20 stephane Note Added: 0007392
2026-03-06 10:21 stephane Note Edited: 0007392
2026-03-06 10:25 stephane Note Edited: 0007389
2026-03-07 11:48 agadmin Summary awk "string variables" origin => awk "numeric string " origins
2026-03-07 11:48 agadmin Interp Status => ---
2026-03-07 11:48 agadmin Note Added: 0007393
2026-06-04 15:34 nick Page Number (page or range of pages) => 2610
2026-06-04 15:34 nick Line Number (Line or range of lines) => 85386-85394
2026-06-04 16:27 nick Note Added: 0007438
2026-06-04 16:28 nick Status New => Resolved
2026-06-04 16:28 nick Resolution Open => Accepted As Marked
2026-06-04 16:28 nick Final Accepted Text => see 0001973:0007438
2026-06-04 16:28 nick Tag Attached: tc1-2024
2026-06-04 16:30 nick Note Edited: 0007438
2026-06-04 16:30 nick Status Resolved => Interpretation Required
2026-06-04 16:30 nick Interp Status --- => Pending
2026-06-05 07:08 stephane Note Added: 0007439
2026-06-06 14:42 stephane Note Added: 0007440