0001140: must the encoded output end with a zero-length line?

ID	Project	Category	View Status	Date Submitted	Last Update

0001140	1003.1(2008)/Issue 7	Shell and Utilities	public	2017-05-17 08:08	2024-06-11 08:52

Reporter	Villemoes	Assigned To	ajosey
Priority	normal	Severity	Editorial	Type	Clarification Requested
Status	Closed	Resolution	Accepted As Marked

Name	Rasmus Villemoes
Organization
User Reference
Section	uuencode/uudecode
Page Number
Line Number
Interp Status	Approved
Final Accepted Text	See 0001140:0004151.


Summary	0001140: must the encoded output end with a zero-length line?
Description	Both busybox' and GNU sharutils' uuencode end their output with a line consisting of a single "`" before the "end" line. However, if that line is lacking, GNU sharutils' uudecode fails to detect the "end" line and instead misinterprets the "end" line as a line of input: $ printf 'foo' \| uuencode '/dev/stdout' \| grep -v '^`' \| uudecode uudecode fatal error: standard input: Short filefoo8J Busybox's uudecode correctly recovers the input.
Desired Action	Clarify whether begin 664 /dev/stdout end would be a conforming output from "printf '' \| uuencode /dev/stdout" and/or whether this must be accepted as input by uudecode. More generally, must the "end" line be preceded by a line consisting of a single "`" character.
Tags	tc3-2008

joerg 2017-05-17 09:08 reporter bugnote:0003692	Looks like a bug in the GNU utilities. With the UNIX uuencode, you get: printf 'foo' \| uuencode '/dev/stdout' begin 644 /dev/stdout #9F]O end and your example is decoded correctly.

stephane 2017-05-17 10:10 reporter bugnote:0003693	Re: 0001140:0003692 Note that the line before "end" in your output contains a single SPC character. So it's the same as the GNU output, but using the SPC character (0x20) instead of ` (0x60). SPC can cause problem for copy-pastes (at least) so is better avoided. Both indicate an empty line (the first character indicates the length of the line). (0x60 - 0x20) & 0x3f is the same as (0x20 - 0x20) & 0x3f, that is zero. I see nothing in the spec that mandates that trailing empty line, but all implementations seem to include it. On the decoding side, it sounds like a more reliable terminator than the "end" one, as a "end" could be found at the beginning of a uuencoded line (so a "end" on its own could be either the real end or indicate a truncated uuencoded file). IOW, relying on ` or SPC at the beginning of the line to mark the end of the file makes the detection of truncated files more reliable. BTW, on an input like begin 644 /dev/stdout #9F]O #9F]O ` #9F]O end That is with lines of length 3, 3, 0, 3, GNU uudecode complains about the expected "end" not being found after that empty line.

stephane 2017-05-17 10:20 reporter bugnote:0003694	In the BSDs, that extra empty line was apparently added in 1990, but without much of a comment or explanation: https://github.com/weiss/original-bsd/commit/4bdd94feb49c37702585166528dd7f1061d27eb9#diff-5817be40429c877146c0008daac81672R118

stephane 2017-05-17 10:27 reporter bugnote:0003695 Last edited: 2017-05-17 10:39	Re: 0001140:0003694 Sorry, that trailing empty line was always there. It's just that before that diff, that line would have been the encoding of the last fread() that returns 0 for EOF. On the decoding side, it looks like BSD has always relied on that empty line instead of the "end" keyword (behaving like GNU). As far back as the original 1980 implementation in 4BSD

Villemoes 2017-05-18 07:31 reporter bugnote:0003696	OK, if several uudecode implementations rely on a final ` before end, and most (all?) uuencode implementations in practice produce that, I guess it would make sense to make that an explicit requirement. Then there's the issue of such lines appearing before the end, which at least GNU uudecode chokes on. I don't think any reasonable uuencode could produce that, but it shouldn't hurt to explicitly ban it - that will also make the end detection even more robust.

geoffclare 2018-10-15 09:12 reporter bugnote:0004150	Assuming we want to change the standard to match existing and historical practice, here are some suggested changes: On 2016 edition page 3362 line 113245 section uuencode, change: These octets shall be converted to characters by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then it shall be assumed to represent a printable character in the ISO/IEC 646:1991 standard encoded character set. It then shall be translated into the corresponding character codes for the codeset in use in the current locale. (For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC.) to: These octets shall be converted to characters in the ISO/IEC 646:1991 standard encoded character set by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then optionally replacing any 0x20 octets with 0x60. If necessary, these characters shall then be translated into the corresponding character codes for the codeset in use in the current locale. For example, the octet 0x41, representing 'A', would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC; the octet 0x20, representing <space>, would optionally be replaced with 0x60, representing '`', and then translated to either <space> (0x40 if EBCDIC) or '`' (0x79 if EBCDIC), respectively. On 2016 edition page 3362 line 113258 section uuencode, change: These octets then shall be translated into the local character set. to: before any replacement of 0x20 with 0x60 and translation into the local character set. On 2016 edition page 3362 line 113259 section uuencode, change: Each encoded line contains a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45. to: Each encoded line shall contain a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by between 1 and 45, inclusive, encoded characters. The last encoded line, or the <tt>begin</tt> line if the input is empty, shall be followed by a line containing only a <space> or '`' character before the terminating <newline>. On 2016 edition page 3364 line 113309 section uuencode, add a new paragraph to RATIONALE: Historically the encoding used only octets in the range [0x20,0x5f], and thus the encoded lines could contain trailing spaces, which were at risk of being stripped by whatever transport method was used to send the file. To avoid this problem some implementations use 0x60 instead of 0x20, resulting in '`' characters instead of spaces in the output, and implementations are encouraged to do this.

~~Don Cragun~~ 2018-10-18 16:10 viewer bugnote:0004151	Interpretation response ------------------------ The standard states the historical uuencode output format, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- Almost all uudecode implementations require a line just before the "end" line specifying a line of zero bytes to be converted. Some historic uuencode implementations output a <back-quote> while others output of a <space> to encode this zero length final line. The changes below allow either historic behavior. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- Make the changes in 0001140:0004150.

ajosey 2018-11-12 19:51 manager bugnote:0004168	Interpretation proposed: 12 November 2018

agadmin 2018-12-14 15:03 administrator bugnote:0004189	Interpretation approved: 14 December 2018

Date Modified	Username	Field	Change
2017-05-17 08:08	Villemoes	New Issue
2017-05-17 08:08	Villemoes	Status	New => Under Review
2017-05-17 08:08	Villemoes	Assigned To	=> ajosey
2017-05-17 08:08	Villemoes	Name	=> Rasmus Villemoes
2017-05-17 08:08	Villemoes	Section	=> uuencode/uudecode
2017-05-17 09:08	joerg	Note Added: 0003692
2017-05-17 10:10	stephane	Note Added: 0003693
2017-05-17 10:20	stephane	Note Added: 0003694
2017-05-17 10:27	stephane	Note Added: 0003695
2017-05-17 10:39	stephane	Note Edited: 0003695
2017-05-18 07:31	Villemoes	Note Added: 0003696
2018-10-15 09:12	geoffclare	Note Added: 0004150
2018-10-18 16:10	~~Don Cragun~~	Note Added: 0004151
2018-10-18 16:10	~~Don Cragun~~	Tag Attached: tc3-2008
2018-10-18 16:11	~~Don Cragun~~	Interp Status	=> ---
2018-10-18 16:11	~~Don Cragun~~	Final Accepted Text	=> See bugnote-4151.
2018-10-18 16:11	~~Don Cragun~~	Status	Under Review => Interpretation Required
2018-10-18 16:11	~~Don Cragun~~	Resolution	Open => Accepted As Marked
2018-10-18 16:12	~~Don Cragun~~	Final Accepted Text	See bugnote-4151. => See 0001140:0004151.
2018-10-19 16:28	geoffclare	Interp Status	--- => Pending
2018-11-12 19:51	ajosey	Interp Status	Pending => Proposed
2018-11-12 19:51	ajosey	Note Added: 0004168
2018-12-14 15:03	agadmin	Interp Status	Proposed => Approved
2018-12-14 15:03	agadmin	Note Added: 0004189
2019-11-05 11:59	geoffclare	Status	Interpretation Required => Applied
2024-06-11 08:52	agadmin	Status	Applied => Closed

View Issue Details

Activities

Issue History