View Issue Details

IDProjectCategoryView StatusLast Update
0001791Issue 8 draftsShell and Utilitiespublic2024-01-22 17:06
Reportercalestyo Assigned To 
PrioritynormalSeverityEditorialTypeClarification Requested
Status ClosedResolutionRejected 
Product VersionDraft 3 
NameChristoph Anton Mitterer
Organization
User Reference
SectionShells and Utilities
Page Number3427, ff.
Line Number116995, ff.
Final Accepted Text
Summary0001791: tr: clarify encdings of non-characters bytes and proper encodings of the NUL byte and
DescriptionHey.

The description of tr, lines 116995, ff., says:
> \octal
> Octal sequences can be used to represent characters
> with specific coded values. An octal sequence shall consist
> of a <backslash> followed by the longest sequence of one,
> two, or three-octal-digit characters (01234567).

So here, \octal speaks only about encoding characters (and not bytes), which should in principle mean, that only octals that actually represent a character in the current locale are defined.


Further down at lines 117108-117110:
> Unlike some historical implementations, this definition
> of the tr utility correctly processes NUL characters in
> its input stream. NUL characters can be stripped by using:
> tr -d '\000'

It's clarified that at least \000 for the NUL "character" is allowed.
(And also at line 117126.)
Desired ActionIt should be clarified:

1) Whether any bytes (even such, that are not characters in the current locale) may be encoded via \octal.

2) Especially if NUL is a lone special case, it should further be clarified, whether \000 is the only valid representation or whether \00 and \0 are as well.
If so, I'd suggest to use the "simplest" representation (\0).

Cheers,
Chris.
TagsNo tags attached.

Activities

Don Cragun

2024-01-22 17:06

manager   bugnote:0006634

NUL is not a special case; NUL is a character in every locale. The standard says that the strings specified by string1 and string2 contain characters. A single character in those strings may be represented by one or more adjacent octal sequences that represent individual bytes of a multi-byte character. The notes in the APPLICATION USAGE and in the RATIONALE specify that <tt>tr -d '\000'</tt> must remove NUL characters from the input stream. This would also work with <tt>tr -d '\0</tt>' and <tt>tr -d '\00'</tt>. But if one wanted to remove NUL characters and the character '1', one would have to use <tt>tr -d '\0001'</tt> (or put the '1' before the octal escape sequence for the NUL character, e.g. <tt>tr -d '1\0'</tt>); not <tt>tr -d '\01'</tt> or <tt>tr -d '\001'</tt>.
Therefore, this bug is rejected.

Issue History

Date Modified Username Field Change
2023-12-07 04:34 calestyo New Issue
2023-12-07 04:34 calestyo Name => Christoph Anton Mitterer
2023-12-07 04:34 calestyo Section => Shells and Utilities
2023-12-07 04:34 calestyo Page Number => 3427, ff.
2023-12-07 04:34 calestyo Line Number => 116995, ff.
2024-01-22 17:06 Don Cragun Note Added: 0006634
2024-01-22 17:06 Don Cragun Status New => Closed
2024-01-22 17:06 Don Cragun Resolution Open => Rejected