|Anonymous | Login||2018-12-12 13:19 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001140||[1003.1(2008)/Issue 7] Shell and Utilities||Editorial||Clarification Requested||2017-05-17 08:08||2018-11-12 19:51|
|Priority||normal||Resolution||Accepted As Marked|
|Final Accepted Text||See Note: 0004151.|
|Summary||0001140: must the encoded output end with a zero-length line?|
Both busybox' and GNU sharutils' uuencode end their output with a line consisting of a single "`" before the "end" line. However, if that line is lacking, GNU sharutils' uudecode fails to detect the "end" line and instead misinterprets the "end" line as a line of input:
$ printf 'foo' | uuencode '/dev/stdout' | grep -v '^`' | uudecode
uudecode fatal error:
standard input: Short filefoo8J
Busybox's uudecode correctly recovers the input.
begin 664 /dev/stdout
would be a conforming output from "printf '' | uuencode /dev/stdout" and/or whether this must be accepted as input by uudecode. More generally, must the "end" line be preceded by a line consisting of a single "`" character.
Looks like a bug in the GNU utilities.
With the UNIX uuencode, you get:
printf 'foo' | uuencode '/dev/stdout'
begin 644 /dev/stdout
and your example is decoded correctly.
Re: Note: 0003692
Note that the line before "end" in your output contains a single SPC character. So it's the same as the GNU output, but using the SPC character (0x20) instead of ` (0x60). SPC can cause problem for copy-pastes (at least) so is better avoided.
Both indicate an empty line (the first character indicates the length of the line). (0x60 - 0x20) & 0x3f is the same as (0x20 - 0x20) & 0x3f, that is zero.
I see nothing in the spec that mandates that trailing empty line, but all implementations seem to include it. On the decoding side, it sounds like a more reliable terminator than the "end" one, as a "end" could be found at the beginning of a uuencoded line (so a "end" on its own could be either the real end or indicate a truncated uuencoded file). IOW, relying on ` or SPC at the beginning of the line to mark the end of the file makes the detection of truncated files more reliable.
BTW, on an input like
begin 644 /dev/stdout #9F]O #9F]O ` #9F]O end
That is with lines of length 3, 3, 0, 3, GNU uudecode complains about the expected "end" not being found after that empty line.
In the BSDs, that extra empty line was apparently added in 1990, but without much of a comment or explanation:
edited on: 2017-05-17 10:39
Re: Note: 0003694
Sorry, that trailing empty line was always there. It's just that before that diff, that line would have been the encoding of the last fread() that returns 0 for EOF.
On the decoding side, it looks like BSD has always relied on that empty line instead of the "end" keyword (behaving like GNU). As far back as the original 1980 implementation in 4BSD
OK, if several uudecode implementations rely on a final
before end, and most (all?) uuencode implementations in practice produce that, I guess it would make sense to make that an explicit requirement.
Then there's the issue of such lines appearing before the end, which at least GNU uudecode chokes on. I don't think any reasonable uuencode could produce that, but it shouldn't hurt to explicitly ban it - that will also make the end detection even more robust.
Assuming we want to change the standard to match existing and historical practice, here are some suggested changes:
On 2016 edition page 3362 line 113245 section uuencode, change:
These octets shall be converted to characters by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then it shall be assumed to represent a printable character in the ISO/IEC 646:1991 standard encoded character set. It then shall be translated into the corresponding character codes for the codeset in use in the current locale. (For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC.)to:
These octets shall be converted to characters in the ISO/IEC 646:1991 standard encoded character set by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then optionally replacing any 0x20 octets with 0x60. If necessary, these characters shall then be translated into the corresponding character codes for the codeset in use in the current locale. For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC; the octet 0x20, representing <space>, would optionally be replaced with 0x60, representing '`', and then translated to either <space> (0x40 if EBCDIC) or '`' (0x79 if EBCDIC), respectively.
On 2016 edition page 3362 line 113258 section uuencode, change:
These octets then shall be translated into the local character set.to:
before any replacement of 0x20 with 0x60 and translation into the local character set.
On 2016 edition page 3362 line 113259 section uuencode, change:
Each encoded line contains a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45.to:
Each encoded line shall contain a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by between 1 and 45, inclusive, encoded characters. The last encoded line, or the <tt>begin</tt> line if the input is empty, shall be followed by a line containing only a <space> or '`' character before the terminating <newline>.
On 2016 edition page 3364 line 113309 section uuencode, add a new paragraph to RATIONALE:
Historically the encoding used only octets in the range [0x20,0x5f], and thus the encoded lines could contain trailing spaces, which were at risk of being stripped by whatever transport method was used to send the file. To avoid this problem some implementations use 0x60 instead of 0x20, resulting in '`' characters instead of spaces in the output, and implementations are encouraged to do this.
Don Cragun (manager)
The standard states the historical uuencode output format, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.
Almost all uudecode implementations require a line just before the "end" line specifying a line of zero bytes to be converted. Some historic uuencode implementations output a <back-quote> while others output of a <space> to encode this zero length final line. The changes below allow either historic behavior.
Notes to the Editor (not part of this interpretation):
Make the changes in Note: 0004150.
|Interpretation proposed: 12 November 2018|
|2017-05-17 08:08||Villemoes||New Issue|
|2017-05-17 08:08||Villemoes||Status||New => Under Review|
|2017-05-17 08:08||Villemoes||Assigned To||=> ajosey|
|2017-05-17 08:08||Villemoes||Name||=> Rasmus Villemoes|
|2017-05-17 08:08||Villemoes||Section||=> uuencode/uudecode|
|2017-05-17 09:08||joerg||Note Added: 0003692|
|2017-05-17 10:10||stephane||Note Added: 0003693|
|2017-05-17 10:20||stephane||Note Added: 0003694|
|2017-05-17 10:27||stephane||Note Added: 0003695|
|2017-05-17 10:39||stephane||Note Edited: 0003695|
|2017-05-18 07:31||Villemoes||Note Added: 0003696|
|2018-10-15 09:12||geoffclare||Note Added: 0004150|
|2018-10-18 16:10||Don Cragun||Note Added: 0004151|
|2018-10-18 16:10||Don Cragun||Tag Attached: tc3-2008|
|2018-10-18 16:11||Don Cragun||Interp Status||=> ---|
|2018-10-18 16:11||Don Cragun||Final Accepted Text||=> See bugnote-4151.|
|2018-10-18 16:11||Don Cragun||Status||Under Review => Interpretation Required|
|2018-10-18 16:11||Don Cragun||Resolution||Open => Accepted As Marked|
|2018-10-18 16:12||Don Cragun||Final Accepted Text||See bugnote-4151. => See Note: 0004151.|
|2018-10-19 16:28||geoffclare||Interp Status||--- => Pending|
|2018-11-12 19:51||ajosey||Interp Status||Pending => Proposed|
|2018-11-12 19:51||ajosey||Note Added: 0004168|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|