Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001140 [1003.1(2008)/Issue 7] Shell and Utilities Editorial Clarification Requested 2017-05-17 08:08 2019-11-05 11:59
Reporter Villemoes View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Rasmus Villemoes
Organization
User Reference
Section uuencode/uudecode
Page Number
Line Number
Interp Status Approved
Final Accepted Text See Note: 0004151.
Summary 0001140: must the encoded output end with a zero-length line?
Description Both busybox' and GNU sharutils' uuencode end their output with a line consisting of a single "`" before the "end" line. However, if that line is lacking, GNU sharutils' uudecode fails to detect the "end" line and instead misinterprets the "end" line as a line of input:

$ printf 'foo' | uuencode '/dev/stdout' | grep -v '^`' | uudecode
uudecode fatal error:
standard input: Short filefoo8J

Busybox's uudecode correctly recovers the input.

Desired Action Clarify whether

begin 664 /dev/stdout
end

would be a conforming output from "printf '' | uuencode /dev/stdout" and/or whether this must be accepted as input by uudecode. More generally, must the "end" line be preceded by a line consisting of a single "`" character.
Tags tc3-2008
Attached Files

- Relationships

-  Notes
(0003692)
joerg (reporter)
2017-05-17 09:08

Looks like a bug in the GNU utilities.

With the UNIX uuencode, you get:

printf 'foo' | uuencode '/dev/stdout'
begin 644 /dev/stdout
#9F]O
 
end

and your example is decoded correctly.
(0003693)
stephane (reporter)
2017-05-17 10:10

Re: Note: 0003692

Note that the line before "end" in your output contains a single SPC character. So it's the same as the GNU output, but using the SPC character (0x20) instead of ` (0x60). SPC can cause problem for copy-pastes (at least) so is better avoided.

Both indicate an empty line (the first character indicates the length of the line). (0x60 - 0x20) & 0x3f is the same as (0x20 - 0x20) & 0x3f, that is zero.

I see nothing in the spec that mandates that trailing empty line, but all implementations seem to include it. On the decoding side, it sounds like a more reliable terminator than the "end" one, as a "end" could be found at the beginning of a uuencoded line (so a "end" on its own could be either the real end or indicate a truncated uuencoded file). IOW, relying on ` or SPC at the beginning of the line to mark the end of the file makes the detection of truncated files more reliable.

BTW, on an input like

begin 644 /dev/stdout
#9F]O
#9F]O
`
#9F]O
end


That is with lines of length 3, 3, 0, 3, GNU uudecode complains about the expected "end" not being found after that empty line.
(0003694)
stephane (reporter)
2017-05-17 10:20

In the BSDs, that extra empty line was apparently added in 1990, but without much of a comment or explanation:

https://github.com/weiss/original-bsd/commit/4bdd94feb49c37702585166528dd7f1061d27eb9#diff-5817be40429c877146c0008daac81672R118 [^]
(0003695)
stephane (reporter)
2017-05-17 10:27
edited on: 2017-05-17 10:39

Re: Note: 0003694

Sorry, that trailing empty line was always there. It's just that before that diff, that line would have been the encoding of the last fread() that returns 0 for EOF.

On the decoding side, it looks like BSD has always relied on that empty line instead of the "end" keyword (behaving like GNU). As far back as the original 1980 implementation in 4BSD

(0003696)
Villemoes (reporter)
2017-05-18 07:31

OK, if several uudecode implementations rely on a final

`


before end, and most (all?) uuencode implementations in practice produce that, I guess it would make sense to make that an explicit requirement.

Then there's the issue of such lines appearing before the end, which at least GNU uudecode chokes on. I don't think any reasonable uuencode could produce that, but it shouldn't hurt to explicitly ban it - that will also make the end detection even more robust.
(0004150)
geoffclare (manager)
2018-10-15 09:12

Assuming we want to change the standard to match existing and historical practice, here are some suggested changes:

On 2016 edition page 3362 line 113245 section uuencode, change:
These octets shall be converted to characters by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then it shall be assumed to represent a printable character in the ISO/IEC 646:1991 standard encoded character set. It then shall be translated into the corresponding character codes for the codeset in use in the current locale. (For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC.)
to:
These octets shall be converted to characters in the ISO/IEC 646:1991 standard encoded character set by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then optionally replacing any 0x20 octets with 0x60. If necessary, these characters shall then be translated into the corresponding character codes for the codeset in use in the current locale. For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC; the octet 0x20, representing <space>, would optionally be replaced with 0x60, representing '`', and then translated to either <space> (0x40 if EBCDIC) or '`' (0x79 if EBCDIC), respectively.

On 2016 edition page 3362 line 113258 section uuencode, change:
These octets then shall be translated into the local character set.
to:
before any replacement of 0x20 with 0x60 and translation into the local character set.

On 2016 edition page 3362 line 113259 section uuencode, change:
Each encoded line contains a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45.
to:
Each encoded line shall contain a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by between 1 and 45, inclusive, encoded characters. The last encoded line, or the <tt>begin</tt> line if the input is empty, shall be followed by a line containing only a <space> or '`' character before the terminating <newline>.

On 2016 edition page 3364 line 113309 section uuencode, add a new paragraph to RATIONALE:
Historically the encoding used only octets in the range [0x20,0x5f], and thus the encoded lines could contain trailing spaces, which were at risk of being stripped by whatever transport method was used to send the file. To avoid this problem some implementations use 0x60 instead of 0x20, resulting in '`' characters instead of spaces in the output, and implementations are encouraged to do this.
(0004151)
Don Cragun (manager)
2018-10-18 16:10

Interpretation response
------------------------
The standard states the historical uuencode output format, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
Almost all uudecode implementations require a line just before the "end" line specifying a line of zero bytes to be converted. Some historic uuencode implementations output a <back-quote> while others output of a <space> to encode this zero length final line. The changes below allow either historic behavior.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
Make the changes in Note: 0004150.
(0004168)
ajosey (manager)
2018-11-12 19:51

Interpretation proposed: 12 November 2018
(0004189)
agadmin (administrator)
2018-12-14 15:03

Interpretation approved: 14 December 2018

- Issue History
Date Modified Username Field Change
2017-05-17 08:08 Villemoes New Issue
2017-05-17 08:08 Villemoes Status New => Under Review
2017-05-17 08:08 Villemoes Assigned To => ajosey
2017-05-17 08:08 Villemoes Name => Rasmus Villemoes
2017-05-17 08:08 Villemoes Section => uuencode/uudecode
2017-05-17 09:08 joerg Note Added: 0003692
2017-05-17 10:10 stephane Note Added: 0003693
2017-05-17 10:20 stephane Note Added: 0003694
2017-05-17 10:27 stephane Note Added: 0003695
2017-05-17 10:39 stephane Note Edited: 0003695
2017-05-18 07:31 Villemoes Note Added: 0003696
2018-10-15 09:12 geoffclare Note Added: 0004150
2018-10-18 16:10 Don Cragun Note Added: 0004151
2018-10-18 16:10 Don Cragun Tag Attached: tc3-2008
2018-10-18 16:11 Don Cragun Interp Status => ---
2018-10-18 16:11 Don Cragun Final Accepted Text => See bugnote-4151.
2018-10-18 16:11 Don Cragun Status Under Review => Interpretation Required
2018-10-18 16:11 Don Cragun Resolution Open => Accepted As Marked
2018-10-18 16:12 Don Cragun Final Accepted Text See bugnote-4151. => See Note: 0004151.
2018-10-19 16:28 geoffclare Interp Status --- => Pending
2018-11-12 19:51 ajosey Interp Status Pending => Proposed
2018-11-12 19:51 ajosey Note Added: 0004168
2018-12-14 15:03 agadmin Interp Status Proposed => Approved
2018-12-14 15:03 agadmin Note Added: 0004189
2019-11-05 11:59 geoffclare Status Interpretation Required => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker