0001041 [1003.1(2013)/Issue7+TC1] Shell and Utilities 2016-04-03 09:56 2024-06-11 08:56
DannyNiu/NJF
XCU.4. cksum, compress, uncompress, zcat
Page Number Too many pages
Line Number Too many lines
Make the changes specified in Note: 0003648 and Note: 0003813.
Encourage implementations to include better integrity checksum, compression and decompression utilities if possible.
Description As we know, there are SHA-family of hash algorithms, and LZMA family of compression algorithms.

The SHA-family offers state-of-the-art security, while LZMA offers a significant improvement on compression ratio over DEFLATE, which in turn I believe offer better compression ratio over LZW algorithm of which compress(1), uncompress(1), and zcat(1) implements.

I think if Single Unix Specification is to mandate checksum and compression utilities, it might as well ask implementers provide a good one. Therefore I think we can discuss the possibility of bringing new apps in.
Desired Action Add to rationale section of cksum(1) << EOF

Implementations are encouraged to provide utilities that implement hash and integrity checksum algorithms of higher security.


Add to rationale section of compress(1), uncompress(1) and zcat(1) << EOF

Implementations are encouraged to provide utilities that implement data compression algorithms of better compression ratios.

geoffclare (manager)
2017-03-23 16:02
The following are proposed changes for cksum. The changes for compress etc. will be added as a separate note.

On page 2584 line 83831 section cksum, change:
The cksum utility is typically used to quickly compare a suspect file against a trusted version of the same, such as to ensure that files transmitted over noisy media arrive intact. However, this comparison cannot be considered cryptographically secure. The chances of a damaged file producing the same CRC as the original are small; deliberate deception is difficult, but probably not impossible.
The cksum utility is typically used to quickly compare a suspect file against a trusted version of the same, such as to ensure that files transmitted over noisy media arrive intact. However, this comparison cannot be considered cryptographically secure. This utility should be avoided whenever non-trivial requirements (including safety and security) have to be fulfilled.

On page 2585 line 83847 section cksum, insert a new first paragraph of RATIONALE:
The cksum utility is included in this standard for reasons of portability but is not suitable for uses where non-trivial requirements (including safety and security) have to be fulfilled. Implementations are encouraged to provide utilities that implement hash and integrity checksum algorithms of higher security and to keep up to date with developments in this area.

rhansen (manager)
2017-08-03 16:18
On Page xlvi (Informative References) add the following in proper alphabetical order:
DEFLATE Compressed Data Format Specification version 1.3
P. Deutsch, May 1996 ( available at [^] )

GZIP file format specification version 4.3
P. Deutsch, May 1996 ( available at [^] )

On Page 2602 replace the entire page (lines 88476-84515) with the following:
compress, uncompress, zcat - compress and decompress data

<XSI><tt>compress [-fv] [-b value] [-g | -m algo] [file...]
compress -c [-fv] [-b value] [-g | -m algo] [file]
compress -d [-cfv] [file...]
uncompress [-cfv] [file...]
zcat [file...]</tt></XSI>

The compress utility, when the -d option is not specified, shall apply the compression algorithm identified by the -g option or the -m algo option to the named files to attempt to reduce their size without loss of information. The compress utility with the -d option shall apply the appropriate decompression algorithm to the named files to restore the data to their original state.

The uncompress utility shall be equivalent to <tt>compress -d</tt>. The zcat utility shall be equivalent to <tt>compress -c -d</tt>. If multiple file operands are specified, the decompressed data from each input file shall be concatenated to standard output.

When compressing data, unless the -c option is specified, after an input file other than standard input has been compressed, the compressed data from the input file shall be stored in a file with the same pathname as the input file but with an added suffix. The added suffix shall be the suffix associated with the algorithm (see the algorithms in Table 4-X (on page xxx)). If appending the suffix would make the size of the last component of the output file's pathname exceed {NAME_MAX} bytes, the command shall fail. If appending the suffix would make the size of the pathname exceed {PATH_MAX} bytes, the command may fail.

When decompressing data, unless the -c option is specified, after an input file other than standard input has been decompressed, the decompressed data from the input file shall be stored in a file with the same pathname as the input file but with the suffix associated with the algorithm removed. [OB]If file has no suffix associated with a known compression algorithm or file does not exist and does not have a <tt>.Z</tt> suffix, file shall be used as the name of the output file, and the default suffix <tt>.Z</tt> shall be appended to file to form the input pathname.[/OB] The behavior is unspecified if the input pathname ends with a suffix other than the suffix associated with the algorithm used to compress the data. When the -c option is specified, file can have any suffix, or no suffix, and the utility shall use file as the input file and examine the file's contents to determine which algorithm to use to decompress the data (it is not an error if file does not have a suffix that matches the suffix associated with the compression algorithm).

When compressing or decompressing a file other than standard input and the -c option is not specified, if the invoking process has sufficient privilege, the ownership, modes, access time, and modification time of the output file shall match the ownership, modes, access time, and modification time of the input file. After the output file has been succesfully created, the input file shall be removed if the invoking process has sufficient privileges. If the invoking process does not have sufficient privileges to remove the input file (for example, if the directory has the S_ISVTX bit set) the behavior depends on whether the -f option is specified: if -f is not specified, the output file shall be removed, a diagnostic message shall be written and the utility shall continue processing other files but the final exit status shall be non-zero; if -f is specified, the output file shall not be removed and it is unspecified whether the inability to remove the input file is treated as an error. If it is not treated as an error, a warning message may be written to standard error.

If no file operands are specified, standard input shall be compressed or decompressed to standard output.

[OB]If an input file that is to be removed after processing has multiple hard links, the compress and uncompress utilities may write a diagnostic message to standard error and do nothing with the file; this behavior may depend on whether the -f option is specified. If a diagnostic message is written, the final exit status shall be non-zero.[/OB]

The compress, uncompress, and zcat utilities shall conform to XBD Section 12.2 (on page 216), except that Guideline 1 does not apply to uncompress since the utility name has ten letters.

The following options shall be supported:

-b value
If the compression algorithm is LZW, value specifies the maximum number of bits to use in a code. For a conforming application, the value argument shall be:
<tt>9 <= value <= 16</tt>
The implementation may allow values of greater than 16. The default shall be 14, 15, or 16.

If the compression algorithm is DEFLATE, value specifies the compression level. For a conforming application, the value argument shall be:
<tt>1 <= value <= 9</tt>
The default shall be 6.

For other algorithms, value specifies implementation-defined tuning.

Write to standard output; the input files shall not be changed, and no output files shall be created.

Decompress files. When invoked with the -d option, the compress utility shall restore previously compressed files to their original state.

Force compression or decompression of file, even if it does not (for compression) actually reduce the size of the file, or if the corresponding output file already exists. If the -f option is not given and the standard input is a terminal, the user shall be prompted as to whether an existing output file should be overwritten. If the response is affirmative, the existing file shall be overwritten. If the standard input is not a terminal and -f is not given, compress or uncompress shall write a diagnostic message to standard error, the existing file shall not be overwritten, and the utility shall exit with a status greater than zero. If the -f option is specified and an input file other than standard input has multiple hard links, it is implementation-defined whether the input file is unlinked after the corresponding output file is successfully written, or if processing of that file is skipped and a diagnostic message is written to standard error.

Equivalent to -m gzip

-m algo
Use the algorithm defined by algo to compress the files. The following algorithms shall be supported:

Table 4-X Compression algorithms, -m option-argument values, and suffixes
Algorithm            algo     Filename Suffix
adaptive LZW         lzw      .Z
RFC1951 DEFLATE      deflate  .gz
Synonym for DEFLATE  gzip     .gz

Other implementation-defined algorithms may be supported.

If neither of the -m algo and -g options is specified, <tt>lzw</tt> shall be used as a default algo value. Specifying more than one of the mutually exclusive -g and -m algo options, or multiple -m algo options, shall not be considered an error. The last option specified shall determine the behavior of the utility.

On systems not supporting the selected algorithm, the input files shall not be changed and an exit status greater than two shall be returned.
The Lempel-Ziv compression algorithm is described in the now-expired US Patent 4464650, which was issued to William Eastman, Abraham Lempel, Jacob Ziv, and Martin Cohn on August 7th, 1984 and assigned to Sperry Corporation.

The Lempel-Ziv-Welch compression algorithm is described in the now-expired US Patent 4558302, which was issued to Terry A. Welch on December 10th, 1985 and assigned to Sperry Corporation.

For compress, write the percentage reduction of each file to standard error. For uncompress, write messages to standard error concerning the expansion of each file.

The following operand shall be supported:

A pathname of a file to be compressed or decompressed. If a file is '-', the utility shall read from standard input at that point in the sequence and write to standard output. If more than one file operand is '-', the behavior is unspecified.

Standard input shall be used only if no file operands are specified or if a file operand is '-'.

On page 2603, L84516-84517 replace the INPUT FILES section with:
If file operands are specified, the corresponding input files contain the data to be compressed or decompressed.

On page 2603 lines 84540-84549, replace the STDOUT, STDERR, and OUTPUT FILES sections with:
For the compress and uncompress utilities, standard output shall be used if no file operands are specified, if a file operand is '−', or if the -c option is specified. Otherwise, standard output shall not be used.

The zcat utility shall write the decompressed data to standard output.

Standard error shall be used only for diagnostic and prompt messages, the optional warning message described in DESCRIPTION, and the output from −v.

When decompressing input files other than standard input, the corresponding output files shall contain the decompressed input data. When compressing input files other than standard input, the corresponding output files shall contain the compressed input data. If the selected algo is deflate or gzip, the compressed output shall be in the GZIP format described in RFC 1952. For other algorithms, the compressed output file format is implementation-defined and interchange of such files between implementations (including access via unspecified file sharing mechanisms) is not required by POSIX.1-20xx.

On page 2603 line 84553 (compress EXIT STATUS section), change:
The following exit values shall be returned

The following exit values shall be returned for compress

On page 2603 line 84558 (compress EXIT STATUS section), add:
The following exit values shall be returned for uncompress and zcat:

0 Successful completion.

>0 An error occurred.

On page 2604 L84559-L84560 replace the CONSEQUENCES OF ERRORS section with:
If an error occurs while compressing or decompressing an input file other than standard input, the input file shall remain unmodified.

On page 2604 L84572-84574, replace the last paragraph of APPLICATION USAGE with:
In addition to trying file and file<tt>.Z</tt> when looking for a file to decompress, some implementations of uncompress and zcat also try suffixes for other known compression algorithms if neither file nor file<tt>.Z</tt> is found. This version of the standard allows, but does not require this behavior. Portable applications should always specify the full pathname (including the suffix) of files to be decompressed.

On page 2604 lines 84577-84583 replace the RATIONALE, FUTURE DIRECTIONS, and SEE ALSO sections with:
Earlier versions of this standard limited the value of bits used by conforming applications for the lzw algorithm to 14 due to address space limitations on 16-bit architectures. Using 15 or 16 is a much more common default when using current hardware.

Earlier versions of this standard only supported LZW compression. The standard developers noted that existing implementations added other compression utilities, such as gzip, and found it desirable to support this widespread usage. Some implementations had extended the compress utility to support such other schemes. The standard developers generalized this practice by the addition of the -m option, even though this was not previous practice.

The uncompress -d option is added to match undocumented existing practice of tested implementations.

When decompressing a file, the requirement to add <tt>.Z</tt> to a file operand if the given pathname does not include a suffix associated with a known compression algorithm or if file does not exist and does not already have a <tt>.Z</tt> extension is an obsolescent feature and may be removed in a future version.

XBD Chapter 8 (on page 173), Section 12.2 (on page 216)

Replace the entire uncompress page (P3337-3339) with a pointer page to compress.

Replace the entire zcat page (P3471-3472) with a pointer page to compress.

