Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000226 [1003.1(2008)/Issue 7] Shell and Utilities Comment Clarification Requested 2010-03-25 15:59 2019-06-10 08:55
Reporter nick View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Stephane Chazelas <stephane_chazelas@yahoo.fr>
Organization
User Reference
Section awk
Page Number 2447-2453
Line Number 77855
Interp Status Approved
Final Accepted Text See Note: 0002228.
Summary 0000226: Questions on awk grammar
Description At:

http://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html: [^]

> program : item_list
> | actionless_item_list
> ;
>
> item_list : newline_opt
> | actionless_item_list item terminator
> | item_list item terminator
> | item_list action terminator
> ;
>
> actionless_item_list : item_list pattern terminator
> | actionless_item_list pattern terminator
> ;
>
> item : pattern action
> | Function NAME '(' param_list_opt ')'
> newline_opt action
> | Function FUNC_NAME '(' param_list_opt ')'
> newline_opt action
> ;
[...]
> terminator : terminator ';'
> | terminator NEWLINE
> | ';'
> | NEWLINE
> ;


In which way it that different from:

> program : newline_opt
> | program item terminator
> ;
>
> item : pattern action
> | pattern
> | action
> | Function NAME '(' param_list_opt ')'
> newline_opt action
> | Function FUNC_NAME '(' param_list_opt ')'
> newline_opt action
> ;

? If I understand correctly, it's to forbid: awk '/foo/;
{print}'. (an action less item cannot be followed by a pattern
less item). Are there any awk implementation that doesn't
support that?

Also, it seems it doesn't allow: awk '/x/{print} {print}' but I
can't find any implemntation that doesn't support it (and I can
find many scripts that do use that).

On the other hand, it allows awk BEGIN and awk END which all
implementations I tried failed on. It allows: awk
'{print};;{print}' which at least mawk and gawk fail on.

Am I missing something?

--
Stephane
Desired Action Clarification requested.
Tags tc2-2008
Attached Files ? file icon awk.y [^] (11,833 bytes) 2014-04-17 16:42
txt file icon awk.txt [^] (11,833 bytes) 2014-04-17 16:44

- Relationships

-  Notes
(0002216)
rhansen (manager)
2014-04-03 16:14

The consensus is that the grammar should be changed to permit:
    awk '{print} {print}'
and:
    awk '/foo/; {print}'
To resolve this issue we need explicit text changing the grammar. Any input would be appreciated.

If desired, the grammar can be modified to reject actionless BEGIN and END; although we are also fine with the current situation of a semantic check rejecting something the grammar allows.

Regarding ';;' as a terminator, POSIX awk was based on nawk, which accepts this. The mawk and gawk implementations do not accept ';;', which we consider to be an implementation bug.
(0002217)
shware_systems (reporter)
2014-04-03 16:50
edited on: 2014-04-03 22:55

Suggested change for reject of actionless BEGIN or END (Looking at it this seems the easy one):

At P2471, L79230-5 Replace:
item : pattern action
| Function NAME ’(’ param_list_opt ’)’
newline_opt action
| Function FUNC_NAME ’(’ param_list_opt ’)’
newline_opt action
;

with:

item : pattern
| Function NAME ’(’ param_list_opt ’)’
newline_opt action
| Function FUNC_NAME param_list_opt ’)’
newline_opt action
;

At P2471, L79242-6 Replace:
pattern : Begin
| End
| expr
| expr ’,’ newline_opt expr
;

with

pattern : begin_end_clause
| expr_clause terminator
| expr_clause action terminator
;

expr_clause : expr
| expr ’,’ newline_opt expr
;

begin_end_clause : Begin action terminator
| End action terminator
;

Alternate clause, to allow empty action as "To Do" placeholder but exclude actionless items still:
begin_end_clause : Begin action terminator
| End action terminator
| Begin terminator
| End terminator
;
===================
Sorry about the multiple edits, but I misconstrued some of the examples and I'm not an awk aficionado.
This supports programs constructed as Begin clauses, End clauses, and expression prefixed actions as body to be executed, with allowance that some expressions can have side effects that would not require an associated action item but may affect END clauses. The productions do not make reference to the implied { print } action in that latter case. A possible change to the language suggests itself that an expr_list followed by a BEGIN or END clause, or an end-of-file/program operand, without a terminator will not invoke the implied print.

After re-reading the description this appears to be the intent of the language. Whether additional work needs to be done to account for getline modifying the behavior of BEGIN and END with respect to examination of an input stream I leave open, and productions referencing 'item' may need further edits. The change to item incorporates a bug fix in that FUNC_NAME includes a trailing '(' so having a separate '(' in that production is syntactically NAME'((', not a single '('.

(0002226)
nick (manager)
2014-04-17 15:25

> From: Aharon Robbins <arnold@skeeve.com>
> Subject: Re: [bug-gawk] use of ;; as terminator, request for grammar help
> Date: 17 April 2014 09:45:07 BST
> To: eblake@redhat.com, bug-gawk@gnu.org
> Cc: bwk@cs.princeton.edu, austin-group-l@opengroup.org
> X-Diagnostic: Not on the accept list
>
> Hi Eric and Austin Group folks,
>
> I apologize for the delay in replying. Real Life(tm) gets in the way
> of these things.
>
> I am cc'ing Brian Kernighan for his opinion on these issues as well.
>
>> Date: Thu, 03 Apr 2014 10:18:54 -0600
>> From: Eric Blake <eblake@redhat.com>
>> To: bug-gawk@gnu.org
>> Cc: Austin Group <austin-group-l@opengroup.org>
>> Subject: [bug-gawk] use of ;; as terminator, request for grammar help
>>
>> Hello GNU awk readers,
>>
>> On today's Austin Group call (the people in charge of POSIX), we visited
>> http://austingroupbugs.net/view.php?id=226. [^]
>>
>> This is in regards to the POSIX awk specification at:
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html [^]
>>
>> Among other things, there were two action items pointed out that this
>> list might be able to help with:
>>
>> 1. GNU awk has a bug regarding ;; as a terminator. The POSIX grammar
>> allows for:
>> awk '{print};;{print}'
>> but gawk rejects this case. This was deemed to be a bug in gawk, since
>> POSIX was based on the nawk behavior at the time POSIX was standardized,
>> and nawk has always supported this.
>
> I'm not convinced this is a real bug. In particular, accidents of the
> Unix awk implementation should not necessarily be formally codified
> in the standard. mawk, which was written based on the 1988 awk book,
> also does not support this.
>
> If there are awk programs that use this, they should best be changed to
> have only one ';', in my humble opinion; there's no real added value
> to codifying this into the language.
>
>> 2. Based on existing implementations, there is consensus that the POSIX
>> grammar is overly restrictive, and that we should change it to permit:
>> awk '{print} {print}'
>> and:
>> awk '/foo/; {print}'
>>
>> since existing implementations all support it. But to do that, we need
>> someone with help in writing grammars to propose the changes to the one
>> appearing on the POSIX page. Any input would be appreciated.
>
> I disagree with the first desired change. The ground I'm standing on here is
> firmer. The 1988 awk book disallowed rules without any separators, on the
> grounds that rules and statements within them should be syntactically
> consistent (a semicolon is required when multiple Xs [rules or statments] appear
> on one line). And the very early released versions of nawk in fact enforced
> this rule. (I remember testing against it.)
>
> Later on, after the awk book, Brian changed his awk. If you look at his FIXES
> file, you will see:
>
> Nov 27, 1988:
> With fear and trembling, modified the grammar to permit
> multiple pattern-action statements on one line without
> an explicit separator. By definition, this capitulation
> to the ghost of ancient implementations remains undefined
> and thus subject to change without notice or apology.
> DO NOT COUNT ON IT.
>
> The sentiment here is quite clear - while it might work, it should
> not be formalized.
>
> The gawk documentation follows this example, documenting clearly that
> a semicolon is required between multiple rules on one line, and NOT
> documenting that it can be left off. I do not plan to change this, either.
>
> The second change (awk '/foo/; { print }') should be supported by the POSIX
> grammar, since that is clearly two different rules.
>
> As an aside, there are one or two other areas where gawk implements
> undocumented (= unspecified) behavior for compatibility with Unix awk,
> but those remain purposely undocumented in the gawk manual; the case
> I'm thinking about even has this comment in the code:
>
> /*
> * A simple_stmt exists to satisfy a constraint in the POSIX
> * grammar allowing them to occur as the 1st and 3rd parts
> * in a `for (...;...;...)' loop. This is a historical oddity
> * inherited from Unix awk, not at all documented in the AK&W
> * awk book. We support it, as this was reported as a bug.
> * We don't bother to document it though. So there.
> */
>
> In my humble opinion, the ';;' issue is so trivial that it's not even worth
> the effort I put in for simple statements in for loops.
>
> I hope all this helps. Further discussion is welcome.
>
> Arnold
>
(0002227)
rhansen (manager)
2014-04-17 16:23
edited on: 2014-04-17 16:33

On pages 2447-2453 lines 77825-78107 (2013 edition/TC1: pages 2470-2476 lines 79189-79471) change the awk grammar as follows:

diff --git a/awk.y b/awk.y
index b12ecd9..c63ce89 100644
--- a/awk.y
+++ b/awk.y
@@ -49,23 +49,18 @@
 
 
 program          : item_list
-                 | actionless_item_list
+                 | item_list item
                  ;
 
 
-item_list        : newline_opt
-                 | actionless_item_list item terminator
-                 | item_list            item terminator
-                 | item_list          action terminator
+item_list        : /* empty */
+                 | item_list item terminator
                  ;
 
 
-actionless_item_list : item_list            pattern terminator
-                 | actionless_item_list pattern terminator
-                 ;
-
-
-item             : pattern action
+item             : action
+                 | pattern action
+                 | normal_pattern
                  | Function NAME      '(' param_list_opt ')'
                        newline_opt action
                  | Function FUNC_NAME '(' param_list_opt ')'
@@ -83,21 +78,27 @@ param_list       : NAME
                  ;
 
 
-pattern          : Begin
-                 | End
-                 | expr
+pattern          : normal_pattern
+                 | special_pattern
+                 ;
+
+normal_pattern   : expr
                  | expr ',' newline_opt expr
                  ;
 
 
+special_pattern  : Begin
+                 | End
+                 ;
+
+
 action           : '{' newline_opt                             '}'
                  | '{' newline_opt terminated_statement_list   '}'
                  | '{' newline_opt unterminated_statement_list '}'
                  ;
 
 
-terminator       : terminator ';'
-                 | terminator NEWLINE
+terminator       : terminator NEWLINE
                  |            ';'
                  |            NEWLINE
                  ;


Add to RATIONALE as a new paragraph after P2482, L79340 (2013 edition/TC1: page 2481 line 79677):

  Earlier versions of this standard required implementations to
  support multiple adjacent <semicolon>s, lines with one or more
  <semicolon>s before a rule ("pattern {action}" pairs), and lines
  with only <semicolon>(s). These are not required by this standard
  and are considered poor programming practice, but can be
  accepted by an implementation of awk as an extension.

(0002228)
Don Cragun (manager)
2014-04-17 16:29
edited on: 2014-04-18 16:36

Interpretation response
------------------------
The standard specifies the grammar for awk, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
The standard does not require support for awk '/foo/; {print}', but this was unintentional.

The standard does not require support for awk '/x/{print} {print}' and this is intentional.

awk BEGIN (with no action) and awk END (with no action) are allowed by the grammar but forbidden by the EXTENDED DESCRIPTION section on Special Patterns (see Issue 7, P2440, L77545 and in Issue 7, 2013 Edition, P2463, L78909).

The standard requires support for awk '{print};;{print}', but this is not historic practice and is considered to be a poor programming practice.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
Make the changes specified in Note: 0002227.

(0002229)
rhansen (manager)
2014-04-17 16:48
edited on: 2014-04-17 16:51

Note: 0002227 contains a diff of the grammar. For the full version, see the attached awk.y or awk.txt files (the two files are the same, but the Mantis bug tracker software uses different MIME types based on the file extension).

(0002245)
ajosey (manager)
2014-05-02 09:28

Interpretation proposed 2 May 2014
(0002280)
ajosey (manager)
2014-06-25 10:14

Interpretation approved 25 June 2014

- Issue History
Date Modified Username Field Change
2010-03-25 15:59 nick New Issue
2010-03-25 15:59 nick Status New => Under Review
2010-03-25 15:59 nick Assigned To => ajosey
2010-03-25 15:59 nick Name => Stephane Chazelas <stephane_chazelas@yahoo.fr>
2010-03-25 15:59 nick Section => awk
2010-03-25 15:59 nick Page Number => 2447-2453
2010-03-25 15:59 nick Line Number => 77855
2010-03-25 15:59 nick Interp Status => ---
2014-04-03 16:14 rhansen Note Added: 0002216
2014-04-03 16:50 shware_systems Note Added: 0002217
2014-04-03 16:51 shware_systems Note Edited: 0002217
2014-04-03 17:19 shware_systems Note Edited: 0002217
2014-04-03 18:41 shware_systems Note Edited: 0002217
2014-04-03 19:31 shware_systems Note Edited: 0002217
2014-04-03 22:55 shware_systems Note Edited: 0002217
2014-04-17 15:25 nick Note Added: 0002226
2014-04-17 16:23 rhansen Note Added: 0002227
2014-04-17 16:24 rhansen Note Edited: 0002227
2014-04-17 16:27 rhansen Note Edited: 0002227
2014-04-17 16:28 rhansen Note Edited: 0002227
2014-04-17 16:29 Don Cragun Interp Status --- => Pending
2014-04-17 16:29 Don Cragun Final Accepted Text => See bugID:22288.
2014-04-17 16:29 Don Cragun Note Added: 0002228
2014-04-17 16:29 Don Cragun Status Under Review => Interpretation Required
2014-04-17 16:29 Don Cragun Resolution Open => Accepted As Marked
2014-04-17 16:31 rhansen Note Edited: 0002227
2014-04-17 16:31 Don Cragun Final Accepted Text See bugID:22288. => See bugnote:22288.
2014-04-17 16:33 rhansen Note Edited: 0002227
2014-04-17 16:42 rhansen File Added: awk.y
2014-04-17 16:44 rhansen File Added: awk.txt
2014-04-17 16:48 rhansen Note Added: 0002229
2014-04-17 16:49 Don Cragun Note Edited: 0002228
2014-04-17 16:50 Don Cragun Tag Attached: tc2-2008
2014-04-17 16:50 Don Cragun Final Accepted Text See bugnote:22288. => See Note: 0002228.
2014-04-17 16:51 rhansen Note Edited: 0002229
2014-04-17 16:55 Don Cragun Note Edited: 0002228
2014-04-17 17:07 Don Cragun Note Edited: 0002228
2014-04-17 17:27 Don Cragun Note Edited: 0002228
2014-04-17 17:47 Don Cragun Note Edited: 0002228
2014-04-18 16:35 Don Cragun Note Edited: 0002228
2014-04-18 16:36 Don Cragun Note Edited: 0002228
2014-05-02 09:28 ajosey Interp Status Pending => Proposed
2014-05-02 09:28 ajosey Note Added: 0002245
2014-06-25 10:14 ajosey Interp Status Proposed => Approved
2014-06-25 10:14 ajosey Note Added: 0002280
2019-06-10 08:55 agadmin Status Interpretation Required => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker