View Issue Details

IDProjectCategoryView StatusLast Update
00014681003.1(2008)/Issue 7Shell and Utilitiespublic2024-06-11 08:52
Reportermortoneccc Assigned Toajosey  
PrioritynormalSeverityEditorialTypeEnhancement Request
Status ClosedResolutionAccepted 
NameEd Morton
Organization
User Reference
Sectionawk
Page Number2493
Line Number80182-80184
Interp Status---
Final Accepted Text
Summary0001468: awk FS definition not quite correct
Description(sorry, I don't see any page or line numbers in the online spec, hence the 1 and 1 used above).

In the definition of FS in the awk spec (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) it says:

-----
The following describes FS behavior:

   If FS is a null string, the behavior is unspecified.

   If FS is a single character:

       If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.

       Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c.

       Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.
-----

but that final case isn't exactly correct because an ERE can match a null string while a FS can't. Try for example splitting a record on all non-commas:

$ echo 'x,y,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <>
2 <,>
3 <,>
4 <>

which makes sense since there's a null string before the first non-comma (x), 2 commas around the 2nd non-comma (y) and a null string after the last non-comma (z). Now remove the "y" from the middle to get:

$ echo 'x,,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
1 <>
2 <,,>
3 <>

and note that the null string between the 2 commas which would match the regexp `[^,]*` isn't actually matched by the FS `[^,]*`.
Desired ActionChange the final paragraph of the FS definition mentioned above to say something like "Otherwise, the string value of FS shall be considered to be an extended regular expression such that each occurrence of a sequence **of one or more characters** matching the extended regular expression shall delimit fields."
Tagstc3-2008

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2021-04-24 15:20 mortoneccc New Issue
2021-04-24 15:20 mortoneccc Status New => Under Review
2021-04-24 15:20 mortoneccc Assigned To => ajosey
2021-04-24 15:20 mortoneccc Name => Ed Morton
2021-04-24 15:20 mortoneccc Section => awk
2021-04-24 15:20 mortoneccc Page Number => 1
2021-04-24 15:20 mortoneccc Line Number => 1
2021-04-24 21:10 Don Cragun Page Number 1 => 2493
2021-04-24 21:10 Don Cragun Line Number 1 => 80182-80184
2021-04-24 21:10 Don Cragun Interp Status => ---
2021-11-18 16:55 Don Cragun Status Under Review => Resolved
2021-11-18 16:55 Don Cragun Resolution Open => Accepted
2021-11-18 16:56 Don Cragun Tag Attached: tc3-2008
2021-12-13 15:11 geoffclare Status Resolved => Applied
2024-06-11 08:52 agadmin Status Applied => Closed