View Issue Details

IDProjectCategoryView StatusLast Update
00013201003.1(2016/18)/Issue7+TC2Shell and Utilitiespublic2024-06-11 09:08
Reporterstephane Assigned To 
PrioritynormalSeverityEditorialTypeError
Status ClosedResolutionAccepted As Marked 
NameStephane Chazelas
Organization
User Reference
Sectionawk utility
Page Number2493
Line Number80185-80194
Interp Status---
Final Accepted Text0001320:0005041
Summary0001320: /\n/ can match newline
DescriptionThere is a very bizarre/confused text in the awk specification:

> Except for the '~' and "!~" operators, and in the gsub,
> match, split, and sub built-in functions, ERE matching
> shall be based on input records; that is, record separator
> characters (the first character of the value of the
> variable RS, <newline> by default) cannot be embedded in
> the expression, and no expression shall match the record
> separator character. If the record separator is not
> <newline>, <newline> characters embedded in the expression
> can be matched. For the '~' and "!~" operators, and in
> those four built-in functions, ERE matching shall be based
> on text strings; that is, any character (including
> <newline> and the record separator) can be embedded in the
> pattern, and an appropriate pattern shall match any
> character.

It kind of implies that:

echo x | awk -F'\n' '{$0 = "a\nb"; print /\n/; print $1}'

should print

0
a
b

or possibly

0
x

because /ERE/ or FS cannot match on the record separator and should match on the input record.

That's not what awk implementations do.


RE matching in those cases is not done on input records but on $0. The fact that $0 (in statements other than BEGIN) is initialised from the value of the current input record (which *at that point* didn't contain the then current value of RS) is irrelevant to describe how RE matching is done. RE matching behaviour is totally independent of the value of RS. RS is only used at the time a record is read.
Desired ActionReplace that whole section with something along the lines of:

If the subject is not specified (like in ~, !~, match()...), regexps are matched against the current value of $0.


Also, whether awk can deal with non-text data (NUL, byte values that don't form valid characters, strings longer than LINE_MAX) should probably be moved to some more generic section not specific to RE matching.
Tagstc3-2008

Activities

geoffclare

2020-10-09 10:23

manager   bugnote:0005041

Rather than the kind of radical change suggested in the desired action, I would prefer to make a minimal fix. The purpose of the paragraph is to point out a difference between matching input records and matching text strings; the only real problem is that it incorrectly states the condition under which matching is against input records. I suggest that an appropriate fix would be the following.

Change:
Except for the '~' and "!~" operators, and in the gsub, match, split, and sub built-in functions, ERE matching shall be based on input records; that is, record separator characters ...
to:
When ERE matching is performed against input records; that is, the match is against $0 and the current value of $0 resulted from processing an input record, record separator characters ...

Change:
For the '~' and "!~" operators, and in those four built-in functions, ERE matching shall be based on text strings; that is, any character ...
to:
When ERE matching is not performed against input records, it shall be based on text strings; any character ...

Issue History

Date Modified Username Field Change
2020-01-26 07:50 stephane New Issue
2020-01-26 07:50 stephane Name => Stephane Chazelas
2020-01-26 07:50 stephane Section => awk utility
2020-10-08 16:40 Don Cragun Page Number => 2493
2020-10-08 16:40 Don Cragun Line Number => 80185-80194
2020-10-08 16:40 Don Cragun Interp Status => ---
2020-10-09 09:43 geoffclare Project 1003.1(2013)/Issue7+TC1 => 1003.1(2016/18)/Issue7+TC2
2020-10-09 10:23 geoffclare Note Added: 0005041
2020-10-12 15:29 geoffclare Final Accepted Text => 0001320:0005041
2020-10-12 15:29 geoffclare Status New => Resolved
2020-10-12 15:29 geoffclare Resolution Open => Accepted As Marked
2020-10-12 15:29 geoffclare Tag Attached: tc3-2008
2020-12-04 16:36 geoffclare Status Resolved => Applied
2024-06-11 09:08 agadmin Status Applied => Closed