View Issue Details

IDProjectCategoryView StatusLast Update
00002241003.1(2008)/Issue 7Shell and Utilitiespublic2013-04-16 13:06
Reportergeoffclare Assigned Toajosey  
PrioritynormalSeverityObjectionTypeError
Status ClosedResolutionAccepted As Marked 
NameGeoff Clare
OrganizationThe Open Group
User Reference
Sectionawk
Page Number2430
Line Number77098
Interp StatusApproved
Final Accepted Text0000224:0000396
Summary0000224: awk -F and FS
DescriptionThere are a couple of problems with the description of the awk -F option:

    -F ERE
        Define the input field separator to be the extended regular
        expression ERE, before any input is read

1. By a strict reading, the ERE option-argument is used directly as
an extended regular expression, but in practice the option-argument
is handled as if it were a string literal to be assigned to the FS
variable. This makes a difference to the treatment of backslash
characters:

    $ awk -F '\\\\' 'BEGIN { print FS }'
    \\

Here the ERE used as input field separator is \\ not \\\\. It is
the same as for:

    $ awk 'BEGIN { FS="\\\\"; print FS }'
    \\

2. The input field separator (and consequently the FS variable) should
be set before execution of any BEGIN actions, not just before any
input is read, so that applications can use FS within BEGIN actions.
E.g.:

    BEGIN { OFS=FS }
Desired ActionChange

    -F ERE
        Define the input field separator to be the extended regular
        expression ERE, before any input is read; see [xref to
        Regular Expressions].

to

    -F sepstring
        Define the input field separator as if by the assignment

            FS = "sepstring"

        before execution of the actions associated with any BEGIN
        patterns, or before any input is read if there are no BEGIN
        patterns; see the description of the FS built-in variable,
        and how it is used, in the EXTENDED DESCRIPTION section.


In the SYNOPSIS change

    -F ERE

on lines 77077 and 77078 to

    -F sepstring

On line 77091 change

    -F ERE

to

    the -F sepstring option

At line 77514 change

    An extended regular expression can be used to separate fields by
    using the -F ERE option or by assigning a string containing the
    expression to the built-in variable FS.

to

    An extended regular expression can be used to separate fields by
    assigning a string containing the expression to the built-in
    variable FS, either directly or as a consequence of using the
    -F sepstring option.

Tagstc1-2008

Activities

stephane

2010-02-25 19:22

reporter   bugnote:0000391

I suppose we could describe -F <whatever> as being the same as -v FS=<whatever>

geoffclare

2010-02-26 09:43

manager   bugnote:0000392

Stephane's suggestion would be a neater way to specify the behaviour.
However, the two ways are not quite the same and on further
investigation, it turns out that some implementations behave one way
and some the other.

awk -F A -v FS=B 'BEGIN { print FS }'

prints B, as per Stephane's suggestion, in some versions of awk (e.g.
Solaris, GNU) but prints A, as per my original proposal, in others
(e.g. HP-UX, Unixware).

Here is an updated proposal which reflects this finding...

Change

    -F ERE
        Define the input field separator to be the extended regular
        expression ERE, before any input is read; see [xref to
        Regular Expressions].

to

    -F sepstring
        Define the input field separator. This option shall be
        equivalent to

            -v FS=sepstring

        except that if -F sepstring and -v FS=sepstring
        are both used, it is unspecified whether the FS assignment
        resulting from -F sepstring is processed in command line
        order or is processed after the last -v FS=sepstring.
        See the description of the FS built-in variable, and how it is
        used, in the EXTENDED DESCRIPTION section.


In the SYNOPSIS change

    -F ERE

on lines 77077 and 77078 to

    -F sepstring

On line 77091 change

    -F ERE

to

    the -F sepstring option

At line 77514 change

    An extended regular expression can be used to separate fields by
    using the -F ERE option or by assigning a string containing the
    expression to the built-in variable FS.

to

    An extended regular expression can be used to separate fields by
    assigning a string containing the expression to the built-in
    variable FS, either directly or as a consequence of using the
    -F sepstring option.

geoffclare

2010-03-04 17:12

manager   bugnote:0000396

Interpretation response
------------------------
The standard states the requirements for awk -F, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.


Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

Change

    -F ERE
        Define the input field separator to be the extended regular
        expression ERE, before any input is read; see [xref to
        Regular Expressions].

to

    -F sepstring
        Define the input field separator. This option shall be
        equivalent to

            -v FS=sepstring

        except that if -F sepstring and -v FS=sepstring
        are both used, it is unspecified whether the FS assignment
        resulting from -F sepstring is processed in command line
        order or is processed after the last -v FS=sepstring.
        See the description of the FS built-in variable, and how it is
        used, in the EXTENDED DESCRIPTION section.


In the SYNOPSIS change

    -F ERE

on lines 77077 and 77078 to

    -F sepstring

On line 77091 change

    -F ERE

to

    the -F sepstring option

At line 77514 change

    An extended regular expression can be used to separate fields by
    using the -F ERE option or by assigning a string containing the
    expression to the built-in variable FS.

to

    An extended regular expression can be used to separate fields by
    assigning a string containing the expression to the built-in
    variable FS, either directly or as a consequence of using the
    -F sepstring option.

Issue History

Date Modified Username Field Change
2010-02-25 17:29 geoffclare New Issue
2010-02-25 17:29 geoffclare Status New => Under Review
2010-02-25 17:29 geoffclare Assigned To => ajosey
2010-02-25 17:29 geoffclare Name => Geoff Clare
2010-02-25 17:29 geoffclare Organization => The Open Group
2010-02-25 17:29 geoffclare Section => awk
2010-02-25 17:29 geoffclare Page Number => 2430
2010-02-25 17:29 geoffclare Line Number => 77098
2010-02-25 17:29 geoffclare Interp Status => ---
2010-02-25 19:22 stephane Note Added: 0000391
2010-02-26 09:43 geoffclare Note Added: 0000392
2010-03-04 17:12 geoffclare Interp Status --- => Pending
2010-03-04 17:12 geoffclare Note Added: 0000396
2010-03-04 17:12 geoffclare Status Under Review => Interpretation Required
2010-03-04 17:12 geoffclare Resolution Open => Accepted As Marked
2010-03-04 17:12 geoffclare Description Updated
2010-03-04 17:13 geoffclare Final Accepted Text => 0000224:0000396
2010-04-16 10:14 ajosey Interp Status Pending => Proposed
2010-05-28 14:04 ajosey Interp Status Proposed => Approved
2010-09-24 11:16 geoffclare Tag Attached: tc1-2008
2013-04-16 13:06 ajosey Status Interpretation Required => Closed