View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001791 | Issue 8 drafts | Shell and Utilities | public | 2023-12-07 04:34 | 2024-01-22 17:06 |
Reporter | calestyo | Assigned To | |||
Priority | normal | Severity | Editorial | Type | Clarification Requested |
Status | Closed | Resolution | Rejected | ||
Product Version | Draft 3 | ||||
Name | Christoph Anton Mitterer | ||||
Organization | |||||
User Reference | |||||
Section | Shells and Utilities | ||||
Page Number | 3427, ff. | ||||
Line Number | 116995, ff. | ||||
Final Accepted Text | |||||
Summary | 0001791: tr: clarify encdings of non-characters bytes and proper encodings of the NUL byte and | ||||
Description | Hey. The description of tr, lines 116995, ff., says: > \octal > Octal sequences can be used to represent characters > with specific coded values. An octal sequence shall consist > of a <backslash> followed by the longest sequence of one, > two, or three-octal-digit characters (01234567). So here, \octal speaks only about encoding characters (and not bytes), which should in principle mean, that only octals that actually represent a character in the current locale are defined. Further down at lines 117108-117110: > Unlike some historical implementations, this definition > of the tr utility correctly processes NUL characters in > its input stream. NUL characters can be stripped by using: > tr -d '\000' It's clarified that at least \000 for the NUL "character" is allowed. (And also at line 117126.) | ||||
Desired Action | It should be clarified: 1) Whether any bytes (even such, that are not characters in the current locale) may be encoded via \octal. 2) Especially if NUL is a lone special case, it should further be clarified, whether \000 is the only valid representation or whether \00 and \0 are as well. If so, I'd suggest to use the "simplest" representation (\0). Cheers, Chris. | ||||
Tags | No tags attached. |
|
NUL is not a special case; NUL is a character in every locale. The standard says that the strings specified by string1 and string2 contain characters. A single character in those strings may be represented by one or more adjacent octal sequences that represent individual bytes of a multi-byte character. The notes in the APPLICATION USAGE and in the RATIONALE specify that <tt>tr -d '\000'</tt> must remove NUL characters from the input stream. This would also work with <tt>tr -d '\0</tt>' and <tt>tr -d '\00'</tt>. But if one wanted to remove NUL characters and the character '1', one would have to use <tt>tr -d '\0001'</tt> (or put the '1' before the octal escape sequence for the NUL character, e.g. <tt>tr -d '1\0'</tt>); not <tt>tr -d '\01'</tt> or <tt>tr -d '\001'</tt>. |
Date Modified | Username | Field | Change |
---|---|---|---|
2023-12-07 04:34 | calestyo | New Issue | |
2023-12-07 04:34 | calestyo | Name | => Christoph Anton Mitterer |
2023-12-07 04:34 | calestyo | Section | => Shells and Utilities |
2023-12-07 04:34 | calestyo | Page Number | => 3427, ff. |
2023-12-07 04:34 | calestyo | Line Number | => 116995, ff. |
2024-01-22 17:06 | Don Cragun | Note Added: 0006634 | |
2024-01-22 17:06 | Don Cragun | Status | New => Closed |
2024-01-22 17:06 | Don Cragun | Resolution | Open => Rejected |