0001791: tr: clarify encdings of non-characters bytes and proper encodings of the NUL byte and

ID	Project	Category	View Status	Date Submitted	Last Update

0001791	Issue 8 drafts	Shell and Utilities	public	2023-12-07 04:34	2024-01-22 17:06

Reporter	calestyo	Assigned To
Priority	normal	Severity	Editorial	Type	Clarification Requested
Status	Closed	Resolution	Rejected
Product Version	Draft 3

Name	Christoph Anton Mitterer
Organization
User Reference
Section	Shells and Utilities
Page Number	3427, ff.
Line Number	116995, ff.
Final Accepted Text


Summary	0001791: tr: clarify encdings of non-characters bytes and proper encodings of the NUL byte and
Description	Hey. The description of tr, lines 116995, ff., says: > \octal > Octal sequences can be used to represent characters > with specific coded values. An octal sequence shall consist > of a <backslash> followed by the longest sequence of one, > two, or three-octal-digit characters (01234567). So here, \octal speaks only about encoding characters (and not bytes), which should in principle mean, that only octals that actually represent a character in the current locale are defined. Further down at lines 117108-117110: > Unlike some historical implementations, this definition > of the tr utility correctly processes NUL characters in > its input stream. NUL characters can be stripped by using: > tr -d '\000' It's clarified that at least \000 for the NUL "character" is allowed. (And also at line 117126.)
Desired Action	It should be clarified: 1) Whether any bytes (even such, that are not characters in the current locale) may be encoded via \octal. 2) Especially if NUL is a lone special case, it should further be clarified, whether \000 is the only valid representation or whether \00 and \0 are as well. If so, I'd suggest to use the "simplest" representation (\0). Cheers, Chris.
Tags	No tags attached.

Don Cragun 2024-01-22 17:06 manager bugnote:0006634	NUL is not a special case; NUL is a character in every locale. The standard says that the strings specified by string1 and string2 contain characters. A single character in those strings may be represented by one or more adjacent octal sequences that represent individual bytes of a multi-byte character. The notes in the APPLICATION USAGE and in the RATIONALE specify that <tt>tr -d '\000'</tt> must remove NUL characters from the input stream. This would also work with <tt>tr -d '\0</tt>' and <tt>tr -d '\00'</tt>. But if one wanted to remove NUL characters and the character '1', one would have to use <tt>tr -d '\0001'</tt> (or put the '1' before the octal escape sequence for the NUL character, e.g. <tt>tr -d '1\0'</tt>); not <tt>tr -d '\01'</tt> or <tt>tr -d '\001'</tt>. Therefore, this bug is rejected.

Date Modified	Username	Field	Change
2023-12-07 04:34	calestyo	New Issue
2023-12-07 04:34	calestyo	Name	=> Christoph Anton Mitterer
2023-12-07 04:34	calestyo	Section	=> Shells and Utilities
2023-12-07 04:34	calestyo	Page Number	=> 3427, ff.
2023-12-07 04:34	calestyo	Line Number	=> 116995, ff.
2024-01-22 17:06	Don Cragun	Note Added: 0006634
2024-01-22 17:06	Don Cragun	Status	New => Closed
2024-01-22 17:06	Don Cragun	Resolution	Open => Rejected

View Issue Details

Activities

Issue History