Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001202 [1003.1(2016)/Issue7+TC2] Shell and Utilities Objection Clarification Requested 2018-08-29 19:26 2018-08-30 18:42
Reporter kre View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Robert Elz
Organization
User Reference
Section XCU 4 -- printf
Page Number 3113
Line Number 104118 - 104120, 104123 - 104126
Interp Status ---
Final Accepted Text
Summary 0001202: printf %.Nb with \c in arg (more than N chars into arg) behaviour unclear
Description When a %b format conversion specification is used with printf(1)
and the associated string arg contains a \c escape sequence, the
operation is that the character preceding the \c is the last one
printed (printf exits after printing that character).

The specification says that in other words, but that's the effect,
and that's clear.

What is not clear is whether the \c from the arg string needs to
actually be "written" for it to have this effect.

Example:

    printf '%.2bX' 'a string\c'

Does this print "a X" (the first 2 characters of the arg, and then
as the \c was not used, continue with the remainder of the format
(the literal X here).

Or does the presence of the \c in the arg string cause processing to
stop (even though the \c was not "printed") in which case "a " should
be printed.

I only really have the builtin printf's in shells to test (and not all
shells use a builtin for it) so there are perhaps more interpretations
than I have seen, but:

Most shells builtin printf's (and NetBSD's /usr/bin/printf) seem to
adopt the 2nd approach. bosh (the only one I have actually seen) picks
the first. I think both are reasonable interpretations, as the standard
is not currently clear.

The version of the FreeBSD shell I have to test ignores the precision with
%b format conversions, but that looks to have been fixed earlier this month,
and from reading their sources, the behaviour now appears to be the "most
shells" style.

ksh93 is just plain weird (or the version I tested) - it allows precisions
on %b formats ('printf "%.2b" "abcd"' prints "ab" as it should, but if
there's a \c in the arg string, the precision seems to be ignored,
('printf "%.2b" "abcd\c"' printf "abcd"). That can't be anything but a bug.

The text can almost be read to imply the "most shells" format, from the
words (line 104123)
    Bytes from the converted string shall be written
which could be interpreted to mean that the arg string should have its
escape sequences converted to actual characters first, and then written
as if the %b was %s ... if that's done (which is what I think most shells
actually do) then the \c will be seen during the conversion step, and
will cause further processing to end.

But there is no particular reason to read the text that way, the conversion
of the arg string, and the output of the bytes, could easily proceed in
parallel, with bytes from the arg string that are not to be written not being
converted (doing so is really just a waste of time - certainly for all the
escapes except \c).
Desired Action Add a new sentence after the sentence which concludes on line 104120 ...

    If the \c is not reached during the processing of the string
    operand (because a precision argument limits the number printed)
    it is unspecified whether the effects of the \c, just indicated
    apply, or whether the \c is simply ignored.

though I actually would expect that would be re-written to be something
that means much the same, but is better written!

Alternatively, if bosh builtin printf happens to be the only printf which
behaves the way it does, the added sentence could say

    Note that the presence of a \c in the operand string shall have this
    effect even if output from the converted operand string terminates,
    due to a precision argument, before the \c is reached.

Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0004092)
joerg (reporter)
2018-08-30 12:46

Some shells convert the %b argument to expand the escape sequences and later
let the resulting string be processed by printf(3). This does not work, since
printf() cannot forward nul characters but as a side effect, the \c ending
is seen while doing the conversion.

If you like to permit coded nul bytes to be inside the string parameter, you
need to process things completely without help from printf(). If you at the
same time like to honor %.#b, the natural behavior is the behavior from bosh
since the \c is not seen as string processing is aborted because of the
%.# modifier.
(0004093)
shware_systems (reporter)
2018-08-30 14:43

I think the controlling language here is, at Line 104121:
"The interpretation of a <backslash> followed by any other sequence of characters is unspecified.
Bytes from the converted string shall be written until the end of the string or the number of bytes indicated by the precision specification is reached. If the precision is omitted, it shall be taken to be infinite, so all bytes up to the end of the converted string shall be written."

which indicates the entire argument shall be processed before applying the precision specifier, and \c taking effect as described for the second option, to leave open the possibility of other sequences being used as extensions that may convert to one or more final characters subject to the precision, or treated as a syntax error abort without outputting anything. Because of the possibility of embedded <NUL> characters, this means the conversion has to be treated as a counted string, not a nul-delimited one, to satisfy the last sentence quoted; since it does not have an explicit 'or a NUL character is encountered'.

As example, a platform might define a \u to expand to the current GMT with microsecond precision, which would be at least 8 characters (HH:MM:SS) replacing the \u. On another platform it might output Syntax Error on the next line, without 'a X' or 'a ' at all.
(0004095)
kre (reporter)
2018-08-30 16:28

Re note 4093.

There is a reply on the mailing list to most of that, which is not
relevant to the current issue.

However this part:
    which indicates the entire argument shall be processed before
    applying the precision specifier

is incorrect, it indicates nothing of the kind, though as the
Description of the issue says, it is easy to read it that way.
But all it means is that it is converted bytes from the operand
string that are written, not that they must be converted before
any of the string is processed.
(0004096)
shware_systems (reporter)
2018-08-30 18:13

Yes, it does say that... Your interpretation would be phrased "bytes from the argument string, as converted, shall be written as if by putc()", or similarly, to require that piecemeal effect, but it says take them from the ~wholly~ converted string, not while converting. It is an indication because the 'wholly' is implied, but I don't see the first interpretation allowed as the intent. The string has to be fully formed before you can take bytes from it, in other words, otherwise it is just a character sequence that may have invalid code points, as bad escapes or according to the charmap of the current locale.

Granted, doing it piecemeal is less memory usage intensive, so an implementer might want to read it that first way, but in other parts of the standard it is explicit where this is permitted or required, using phrases like "as if by putc()".
(0004097)
kre (reporter)
2018-08-30 18:42
edited on: 2018-09-02 01:50

Re note 4096 ...

We could continue with the "yes it does", "no it doesn't" debate
for ages. That will accomplish nothing.

First, I don't have an interpretation - that is the point. The text
does not even consider the issue, which is not a surprise, as to emulate Sys V
echo, which is the only reason %b exists, using a precision (or field width)
makes no sense at all.

I agree that if the "\c only works when consumed" was to be the intent, the
text would need to be clearer to specify that. But as you need to infer a
non-existing "wholly" (which would not be the correct way to fix it, but
never mind) to reach the other conclusion, I think we actually agree that the
text as it stands is not clear, and needs to be corrected.

We can debate what the solution should be (that should happen on the list,
not here in notes) but it is plainly obvious that a fix is needed.


- Issue History
Date Modified Username Field Change
2018-08-29 19:26 kre New Issue
2018-08-29 19:26 kre Name => Robert Elz
2018-08-29 19:26 kre Section => XCU 4 -- printf
2018-08-29 19:26 kre Page Number => 3113
2018-08-29 19:26 kre Line Number => 104118 - 104120, 104123 - 104126
2018-08-30 12:46 joerg Note Added: 0004092
2018-08-30 14:43 shware_systems Note Added: 0004093
2018-08-30 16:28 kre Note Added: 0004095
2018-08-30 18:13 shware_systems Note Added: 0004096
2018-08-30 18:42 kre Note Added: 0004097
2018-09-02 01:50 kre Note Edited: 0004097


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker