0001320: /\n/ can match newline - Austin Group Issue Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0001320	1003.1(2016/18)/Issue7+TC2	Shell and Utilities	public	2020-01-26 07:50	2024-06-11 09:08

Reporter	stephane	Assigned To
Priority	normal	Severity	Editorial	Type	Error
Status	Closed	Resolution	Accepted As Marked

Name	Stephane Chazelas
Organization
User Reference
Section	awk utility
Page Number	2493
Line Number	80185-80194
Interp Status	---
Final Accepted Text	0001320:0005041


Summary	0001320: /\n/ can match newline
Description	There is a very bizarre/confused text in the awk specification: > Except for the '~' and "!~" operators, and in the gsub, > match, split, and sub built-in functions, ERE matching > shall be based on input records; that is, record separator > characters (the first character of the value of the > variable RS, <newline> by default) cannot be embedded in > the expression, and no expression shall match the record > separator character. If the record separator is not > <newline>, <newline> characters embedded in the expression > can be matched. For the '~' and "!~" operators, and in > those four built-in functions, ERE matching shall be based > on text strings; that is, any character (including > <newline> and the record separator) can be embedded in the > pattern, and an appropriate pattern shall match any > character. It kind of implies that: echo x \| awk -F'\n' '{$0 = "a\nb"; print /\n/; print $1}' should print 0 a b or possibly 0 x because /ERE/ or FS cannot match on the record separator and should match on the input record. That's not what awk implementations do. RE matching in those cases is not done on input records but on $0. The fact that $0 (in statements other than BEGIN) is initialised from the value of the current input record (which at that point didn't contain the then current value of RS) is irrelevant to describe how RE matching is done. RE matching behaviour is totally independent of the value of RS. RS is only used at the time a record is read.
Desired Action	Replace that whole section with something along the lines of: If the subject is not specified (like in ~, !~, match()...), regexps are matched against the current value of $0. Also, whether awk can deal with non-text data (NUL, byte values that don't form valid characters, strings longer than LINE_MAX) should probably be moved to some more generic section not specific to RE matching.
Tags	tc3-2008

geoffclare 2020-10-09 10:23 reporter bugnote:0005041	Rather than the kind of radical change suggested in the desired action, I would prefer to make a minimal fix. The purpose of the paragraph is to point out a difference between matching input records and matching text strings; the only real problem is that it incorrectly states the condition under which matching is against input records. I suggest that an appropriate fix would be the following. Change: Except for the '~' and "!~" operators, and in the gsub, match, split, and sub built-in functions, ERE matching shall be based on input records; that is, record separator characters ... to: When ERE matching is performed against input records; that is, the match is against $0 and the current value of $0 resulted from processing an input record, record separator characters ... Change: For the '~' and "!~" operators, and in those four built-in functions, ERE matching shall be based on text strings; that is, any character ... to: When ERE matching is not performed against input records, it shall be based on text strings; any character ...

Date Modified	Username	Field	Change
2020-01-26 07:50	stephane	New Issue
2020-01-26 07:50	stephane	Name	=> Stephane Chazelas
2020-01-26 07:50	stephane	Section	=> awk utility
2020-10-08 16:40	~~Don Cragun~~	Page Number	=> 2493
2020-10-08 16:40	~~Don Cragun~~	Line Number	=> 80185-80194
2020-10-08 16:40	~~Don Cragun~~	Interp Status	=> ---
2020-10-09 09:43	geoffclare	Project	1003.1(2013)/Issue7+TC1 => 1003.1(2016/18)/Issue7+TC2
2020-10-09 10:23	geoffclare	Note Added: 0005041
2020-10-12 15:29	geoffclare	Final Accepted Text	=> 0001320:0005041
2020-10-12 15:29	geoffclare	Status	New => Resolved
2020-10-12 15:29	geoffclare	Resolution	Open => Accepted As Marked
2020-10-12 15:29	geoffclare	Tag Attached: tc3-2008
2020-12-04 16:36	geoffclare	Status	Resolved => Applied
2024-06-11 09:08	agadmin	Status	Applied => Closed

View Issue Details

Activities

Issue History