Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001468 [1003.1(2008)/Issue 7] Shell and Utilities Editorial Enhancement Request 2021-04-24 15:20 2021-04-24 21:10
Reporter mortoneccc View Status public  
Assigned To ajosey
Priority normal Resolution Open  
Status Under Review  
Name Ed Morton
Organization
User Reference
Section awk
Page Number 2493
Line Number 80182-80184
Interp Status ---
Final Accepted Text
Summary 0001468: awk FS definition not quite correct
Description (sorry, I don't see any page or line numbers in the online spec, hence the 1 and 1 used above).

In the definition of FS in the awk spec (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) [^] it says:

-----
The following describes FS behavior:

   If FS is a null string, the behavior is unspecified.

   If FS is a single character:

       If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.

       Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c.

       Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.
-----

but that final case isn't exactly correct because an ERE can match a null string while a FS can't. Try for example splitting a record on all non-commas:

$ echo 'x,y,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <>
2 <,>
3 <,>
4 <>

which makes sense since there's a null string before the first non-comma (x), 2 commas around the 2nd non-comma (y) and a null string after the last non-comma (z). Now remove the "y" from the middle to get:

$ echo 'x,,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <>
2 <,,>
3 <>

and note that the null string between the 2 commas which would match the regexp `[^,]*` isn't actually matched by the FS `[^,]*`.
Desired Action Change the final paragraph of the FS definition mentioned above to say something like "Otherwise, the string value of FS shall be considered to be an extended regular expression such that each occurrence of a sequence **of one or more characters** matching the extended regular expression shall delimit fields."
Tags No tags attached.
Attached Files

- Relationships

There are no notes attached to this issue.

- Issue History
Date Modified Username Field Change
2021-04-24 15:20 mortoneccc New Issue
2021-04-24 15:20 mortoneccc Status New => Under Review
2021-04-24 15:20 mortoneccc Assigned To => ajosey
2021-04-24 15:20 mortoneccc Name => Ed Morton
2021-04-24 15:20 mortoneccc Section => awk
2021-04-24 15:20 mortoneccc Page Number => 1
2021-04-24 15:20 mortoneccc Line Number => 1
2021-04-24 21:10 Don Cragun Page Number 1 => 2493
2021-04-24 21:10 Don Cragun Line Number 1 => 80182-80184
2021-04-24 21:10 Don Cragun Interp Status => ---


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker