View Issue Details

IDProjectCategoryView StatusLast Update
00011401003.1(2008)/Issue 7Shell and Utilitiespublic2024-06-11 08:52
ReporterVillemoes Assigned Toajosey  
PrioritynormalSeverityEditorialTypeClarification Requested
Status ClosedResolutionAccepted As Marked 
NameRasmus Villemoes
Organization
User Reference
Sectionuuencode/uudecode
Page Number
Line Number
Interp StatusApproved
Final Accepted TextSee 0001140:0004151.
Summary0001140: must the encoded output end with a zero-length line?
DescriptionBoth busybox' and GNU sharutils' uuencode end their output with a line consisting of a single "`" before the "end" line. However, if that line is lacking, GNU sharutils' uudecode fails to detect the "end" line and instead misinterprets the "end" line as a line of input:

$ printf 'foo' | uuencode '/dev/stdout' | grep -v '^`' | uudecode
uudecode fatal error:
standard input: Short filefoo8J

Busybox's uudecode correctly recovers the input.

Desired ActionClarify whether

begin 664 /dev/stdout
end

would be a conforming output from "printf '' | uuencode /dev/stdout" and/or whether this must be accepted as input by uudecode. More generally, must the "end" line be preceded by a line consisting of a single "`" character.
Tagstc3-2008

Activities

joerg

2017-05-17 09:08

reporter   bugnote:0003692

Looks like a bug in the GNU utilities.

With the UNIX uuencode, you get:

printf 'foo' | uuencode '/dev/stdout'
begin 644 /dev/stdout
#9F]O
 
end

and your example is decoded correctly.

stephane

2017-05-17 10:10

reporter   bugnote:0003693

Re: 0001140:0003692

Note that the line before "end" in your output contains a single SPC character. So it's the same as the GNU output, but using the SPC character (0x20) instead of ` (0x60). SPC can cause problem for copy-pastes (at least) so is better avoided.

Both indicate an empty line (the first character indicates the length of the line). (0x60 - 0x20) & 0x3f is the same as (0x20 - 0x20) & 0x3f, that is zero.

I see nothing in the spec that mandates that trailing empty line, but all implementations seem to include it. On the decoding side, it sounds like a more reliable terminator than the "end" one, as a "end" could be found at the beginning of a uuencoded line (so a "end" on its own could be either the real end or indicate a truncated uuencoded file). IOW, relying on ` or SPC at the beginning of the line to mark the end of the file makes the detection of truncated files more reliable.

BTW, on an input like

begin 644 /dev/stdout
#9F]O
#9F]O
`
#9F]O
end


That is with lines of length 3, 3, 0, 3, GNU uudecode complains about the expected "end" not being found after that empty line.

stephane

2017-05-17 10:20

reporter   bugnote:0003694

In the BSDs, that extra empty line was apparently added in 1990, but without much of a comment or explanation:

https://github.com/weiss/original-bsd/commit/4bdd94feb49c37702585166528dd7f1061d27eb9#diff-5817be40429c877146c0008daac81672R118

stephane

2017-05-17 10:27

reporter   bugnote:0003695

Last edited: 2017-05-17 10:39

Re: 0001140:0003694

Sorry, that trailing empty line was always there. It's just that before that diff, that line would have been the encoding of the last fread() that returns 0 for EOF.

On the decoding side, it looks like BSD has always relied on that empty line instead of the "end" keyword (behaving like GNU). As far back as the original 1980 implementation in 4BSD

Villemoes

2017-05-18 07:31

reporter   bugnote:0003696

OK, if several uudecode implementations rely on a final

`


before end, and most (all?) uuencode implementations in practice produce that, I guess it would make sense to make that an explicit requirement.

Then there's the issue of such lines appearing before the end, which at least GNU uudecode chokes on. I don't think any reasonable uuencode could produce that, but it shouldn't hurt to explicitly ban it - that will also make the end detection even more robust.

geoffclare

2018-10-15 09:12

manager   bugnote:0004150

Assuming we want to change the standard to match existing and historical practice, here are some suggested changes:

On 2016 edition page 3362 line 113245 section uuencode, change:
These octets shall be converted to characters by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then it shall be assumed to represent a printable character in the ISO/IEC 646:1991 standard encoded character set. It then shall be translated into the corresponding character codes for the codeset in use in the current locale. (For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC.)
to:
These octets shall be converted to characters in the ISO/IEC 646:1991 standard encoded character set by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then optionally replacing any 0x20 octets with 0x60. If necessary, these characters shall then be translated into the corresponding character codes for the codeset in use in the current locale. For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC; the octet 0x20, representing <space>, would optionally be replaced with 0x60, representing '`', and then translated to either <space> (0x40 if EBCDIC) or '`' (0x79 if EBCDIC), respectively.

On 2016 edition page 3362 line 113258 section uuencode, change:
These octets then shall be translated into the local character set.
to:
before any replacement of 0x20 with 0x60 and translation into the local character set.

On 2016 edition page 3362 line 113259 section uuencode, change:
Each encoded line contains a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45.
to:
Each encoded line shall contain a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by between 1 and 45, inclusive, encoded characters. The last encoded line, or the <tt>begin</tt> line if the input is empty, shall be followed by a line containing only a <space> or '`' character before the terminating <newline>.

On 2016 edition page 3364 line 113309 section uuencode, add a new paragraph to RATIONALE:
Historically the encoding used only octets in the range [0x20,0x5f], and thus the encoded lines could contain trailing spaces, which were at risk of being stripped by whatever transport method was used to send the file. To avoid this problem some implementations use 0x60 instead of 0x20, resulting in '`' characters instead of spaces in the output, and implementations are encouraged to do this.

Don Cragun

2018-10-18 16:10

manager   bugnote:0004151

Interpretation response
------------------------
The standard states the historical uuencode output format, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
Almost all uudecode implementations require a line just before the "end" line specifying a line of zero bytes to be converted. Some historic uuencode implementations output a <back-quote> while others output of a <space> to encode this zero length final line. The changes below allow either historic behavior.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
Make the changes in 0001140:0004150.

ajosey

2018-11-12 19:51

manager   bugnote:0004168

Interpretation proposed: 12 November 2018

agadmin

2018-12-14 15:03

administrator   bugnote:0004189

Interpretation approved: 14 December 2018

Issue History

Date Modified Username Field Change
2017-05-17 08:08 Villemoes New Issue
2017-05-17 08:08 Villemoes Status New => Under Review
2017-05-17 08:08 Villemoes Assigned To => ajosey
2017-05-17 08:08 Villemoes Name => Rasmus Villemoes
2017-05-17 08:08 Villemoes Section => uuencode/uudecode
2017-05-17 09:08 joerg Note Added: 0003692
2017-05-17 10:10 stephane Note Added: 0003693
2017-05-17 10:20 stephane Note Added: 0003694
2017-05-17 10:27 stephane Note Added: 0003695
2017-05-17 10:39 stephane Note Edited: 0003695
2017-05-18 07:31 Villemoes Note Added: 0003696
2018-10-15 09:12 geoffclare Note Added: 0004150
2018-10-18 16:10 Don Cragun Note Added: 0004151
2018-10-18 16:10 Don Cragun Tag Attached: tc3-2008
2018-10-18 16:11 Don Cragun Interp Status => ---
2018-10-18 16:11 Don Cragun Final Accepted Text => See bugnote-4151.
2018-10-18 16:11 Don Cragun Status Under Review => Interpretation Required
2018-10-18 16:11 Don Cragun Resolution Open => Accepted As Marked
2018-10-18 16:12 Don Cragun Final Accepted Text See bugnote-4151. => See 0001140:0004151.
2018-10-19 16:28 geoffclare Interp Status --- => Pending
2018-11-12 19:51 ajosey Interp Status Pending => Proposed
2018-11-12 19:51 ajosey Note Added: 0004168
2018-12-14 15:03 agadmin Interp Status Proposed => Approved
2018-12-14 15:03 agadmin Note Added: 0004189
2019-11-05 11:59 geoffclare Status Interpretation Required => Applied
2024-06-11 08:52 agadmin Status Applied => Closed