Anonymous | Login | 2024-12-02 07:07 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001592 | [Issue 8 drafts] Shell and Utilities | Comment | Enhancement Request | 2022-07-15 08:36 | 2024-06-11 09:12 | ||
Reporter | geoffclare | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted | ||||
Status | Closed | Product Version | Draft 2.1 | ||||
Name | Geoff Clare | ||||||
Organization | The Open Group | ||||||
User Reference | |||||||
Section | printf | ||||||
Page Number | 3085 | ||||||
Line Number | 104307 | ||||||
Final Accepted Text | |||||||
Summary | 0001592: Add %n$ support to the printf utility | ||||||
Description |
During the work on adding gettext, it was suggested that we should also add support for %n$ in format strings to the printf utility, to allow argument reordering by translators. This is not as straightforward as it might seem, as per the email thread "Adding %n$ conversions to the printf utility" in Sept 2021. One of the points raised in the thread was that it would be preferable for printf to treat a missing argument as an error when %n$ is used. The suggested changes recommend this (via the use of "should") but also allow the null/zero behaviour that happens for unnumbered argument conversions. |
||||||
Desired Action |
On page 3085 line 104307 section printf, insert a new item 8:Conversions can be applied to the nth argument operand rather than to the next argument operand. In this case, the conversion specifier character '%' is replaced by the sequence "%n$", where n is a decimal integer in the range [1,{NL_ARGMAX}], giving the argument operand number. This feature provides for the definition of format strings that select arguments in an order appropriate to specific languages. and renumber the later items. On page 3085 line 104307 section printf, change: For each conversion specification that consumes an argument, the next argument operand shall be evaluated and converted to the appropriate type for the conversion as specified below.to: For each conversion specification that consumes an argument, an argument operand shall be evaluated and converted to the appropriate type for the conversion as specified below. The operand to be evaluated shall be determined as follows: On page 3085 line 104310 section printf, change: 9.to:The format operand shall be reused as often as necessary to satisfy the argument operands. Any extra b, c, or s conversion specifiers shall be evaluated as if a null string argument were supplied; other extra conversion specifications shall be evaluated as if a zero argument were supplied. If the format operand contains no conversion specifications and argument operands are present, the results are unspecified. 10.The format operand shall be reused as often as necessary to satisfy the argument operands. If conversion specifications beginning with a "%n$" sequence are used, on format reuse the value of n shall refer to the nth argument operand following the highest numbered argument operand consumed by the previous use of the format operand.11.If an argument operand to be consumed by a conversion specification does not exist: and renumber the later items. After page 3087 line 104372 section printf, add a new paragraph: Unlike the printf() function, when numbered conversion specifications are used, specifying the Nth argument does not require that all the leading arguments, from the first to the (N-1)th, are specified in the format string. For example, "%3$s %1$d\n" is an acceptable format operand which evaluates the first and third argument operands but not the second. On page 3089 line 104455 section printf, change: The EXTENDED DESCRIPTION section almost exactly matches the printf() function in the ISO C standard, although it is described in terms of the file format notation in [xref to XBD Chapter 5].to: The format strings for the printf utility are handled differently than for the printf() function in several respects. Notable differences include: On page 3089 line 104471 section printf, change FUTURE DIRECTIONS from: None.to: A future version of this standard may require that a missing argument operand to be consumed by a numbered argument conversion specification is treated as an error. |
||||||
Tags | issue8 | ||||||
Attached Files | |||||||
|
Notes | |
(0005890) kre (reporter) 2022-07-16 08:23 edited on: 2022-07-16 08:57 |
It is probably no big surprise, but I am opposed to this as it stands currently. I accept that having %n$ in printf is close to essential for handling internationalised scripts, but it does not need to be as described. I also find it weird, that from the mailing list discussion, the one issue considered worthy of adoption was the: "it would be preferable for printf to treat a missing argument as an error" issue, which defeats the one and only potential solution to the real problem with this proposal. That is, while a script which uses this mechanism controls the arg list which is presented to printf (so, after the format is removed, $1 is something known to the invoking script, as are $2 ...) but what it does not control is the format. That comes from the message translator, who provides the text of the format string, and interpolates whatever of the arguments are needed for that particular translation, in the order that is appropriate for the culture and language being presented. In the most extreme case, a script might call printf with several args which it believes would be useful to present to the user, but the translator simply provides a constant translation of the intent of the message, which uses none of the args. That would currently (both with, and without %n$ being added) run foul of the If the format operand contains no conversion specifications that consume an argument and there are argument operands present, the results are unspecified. restriction. That can be defeated (in an extremely ugly fashion) by adding a %999$s conversion somewhere in the format - knowing that there won't be 999 args given, so if we can rely on an undefined arg presenting an empty string to a %s conversion, that can be used anywhere, with no effect on the output. But if that's allowed to be an error, even that won't work. I believe it is time to remove that "results are unspecified" and change the "no conversions with args" to result in "print the format string once, and exit, ignoring unused args" - this change isn't really required for normal printf, where the situation normally shouldn't occur. But once we lose control of the format string, all bets are off. For the same reason, this: The format operand shall be reused as often as necessary to satisfy the argument operands. If conversion specifications beginning with a "%n$" sequence are used, on format reuse the value of n shall refer to the nth argument operand following the highest numbered argument operand consumed by the previous use of the format operand. does not work either. Since we have no idea which %n$ values will be used in a particular translation, we cannot supply args that are correct for this to work as desired Consider: printf trfanslated-format 10 36.3 string 5 19.4 word where one translation of the format is "text %1$d %3$s # %2$5.2f\n" another "total %2$%7.1g for %$1d %$3s\n" and "That will be %2$.2f for %1$ objects\n" (all in diverse languages of course). Consider what the proposed rule does with the 3rd format. Again, the %999$s trick could save things - essentially it causes the format string to be used exactly once, which for cases where we don't control what is in the format, is really what we need. The best solution is simply to not reuse the format string when any %n$ conversions are present. That avoids all kinds of issues. Further, it is hard to imagine a sane case for the need for that. Reusing the format is not used all that often, when it is, it tends to be for simple formats like printf " %s" "$@" where internationalization is not an issue. or for tabular data, like printf '%5d\t%-12s\t%c\t%5d\n' ...... again, where there is not really anything in the format to translate (on the other hand, the args, particularly strings, might want to be translated, but that's a side issue). So, I'd suggest changing things so the format is not reused if there has been any %$n arg used, forbid it to be an error to refer to an arg that doesn't exist (for at least for one conversion, even if a new made up one, which converts nothing -- %Z could be guaranteed to print an empty string, but we could still use %20$Z to make this a format using %n$, and thus assuming the other change is accepted, not make it undefined to not reference any args. Without changes like these, the proposal is simply unworkable - which is why I have resisted implementing %n$ in NetBSD printf. It would be truly easy to simply do what the other implementations have done, and do the easy thing (parsing and implementing it is trivial) but if the result is as hopeless as all of those are, I am not at all interested. |
(0005891) geoffclare (manager) 2022-07-18 11:17 |
Re Note: 0005890 The reason the behaviour is unspecified when there are arguments but no conversions is because some implementations write a warning message about the unused arguments. If it stops being unspecified they will have to change. A better work-around than %999$s is %1$.0s i.e. consume the argument(s) without them producing any contribution to the output. A similar trick also solves the "repeat the format" situation where a translator only wants to use two of three arguments - they can insert %3$.0s anywhere so that the third argument in each triple is consumed. You say "it is hard to imagine a sane case for the need for" format reuse with numbered arguments, but in the mailing list discussion someone (Stephane I think) said they often use it for reordering columns and would object to this feature being disallowed. The restrictions for the printf utility as proposed are already less stringent than the printf() function, where if you consume the Nth argument you must also consume all arguments from 1 to (N-1). Translators seem to have been able to cope with the printf() restrictions for many years, so they should be able to do so equally well for the printf utility. The only "extra" problem if printf is used with translated formats in a similar fashion to printf() is the arguments-but-no-conversions issue and as I said above there is a simple work-around for that (%1$.0s) |
(0005892) bhaible (reporter) 2022-07-18 14:14 |
The proposed specification looks perfectly OK to me. It marks a couple of cases as unspecified or error, that would not be needed in practical uses, and thus avoids backward-incompatible changes in existing implementations. There is no handling for a width or precision that comes from an argument, unlike in C, because POSIX printf(1) already does not support this. Example: LC_ALL=C printf '%.*f %s\n' 2 3.1415926 pi So this is OK as well. Replying to Robert Elz's comment: > having %n$ in printf is close to essential for handling > internationalised scripts No. There can be different ways to internationize a shell script. For example, these four statements do all the same thing: 1) pid=$$; eval_gettext "Running as process number \$pid."; echo 2) printf_gettext "Running as process number %d." $$; echo 3) printf "`gettext 'Running as process number %d.'`" $$; echo 4) printf $(gettext 'Running as process number %d.') $$; echo GNU gettext currently implements only the first one. When POSIX printf(1) comes with %n$ support, it will be possible for GNU gettext to support the three others as well. Namely, when preparing a template PO file from the given script, xgettext will add a marker '#, sh-printf-format' to the extracted message: #, sh-printf-format msgid "Running as process number %d." msgstr "" When translators then submit localizations, the '-c' option of 'msgfmt -c' will ensure that the msgstr must consume the same arguments, with the same formats. For example, #, sh-printf-format msgid "Running as process number %d." msgstr "Laufe als Prozess Nummer %1$d." will pass, whereas #, sh-printf-format msgid "Running as process number %d." msgstr "Laufe als Prozess Nummer %1$s." will be rejected (conversion specifier does not match), and #, sh-printf-format msgid "Running as process number %d." msgstr "Laufe als Prozess Nummer." will be rejected as well (number of consumed arguments does not match). This checking of format strings is essential for C, because badly localized printf(3) format strings would make the program crash. The same mechanism, applied to printf(1) format strings, will prevent the script from running into the unspecified/error cases of the specification. In other words, for the purpose of internationalization, it does not really matter what the specification says about these corner cases, because 'msgfmt -c' will ensure that these corner cases are not exercised. The important point is that POSIX defines the syntax with %n$ at all; this will be the foundation for the format string checking of '#, sh-printf-format' in msgfmt. > In the most extreme case, a script might call printf with several args > which it believes would be useful to present to the user, but the > translator simply provides a constant translation of the intent of the > message, which uses none of the args. 'msgfmt -c' will prevent this. > what it does > not control is the format. That comes from the message translator This is intentional. When printing an amount of money, for example, #, sh-printf-format msgid "Price: %.2f" msgstr "Preis: %.3f" is OK, since the translator knows better than the program whether amounts of money should better be printed with 2 or with 3 decimals. > and "That will be %2$.2f for %1$ objects\n" > Consider what the proposed rule does with the 3rd format. 'msgfmt -c' will reject that translation (number of consumed arguments does not match). |
(0005893) bhaible (reporter) 2022-07-18 14:51 |
Replying to Robert Elz: > the one issue considered worthy of adoption was the: > "it would be preferable for printf to treat a missing argument as an error" > issue Format strings in C also have the same restrictions that: (A) Numbered and unnumbered arguments cannot be used in the same format string. (B) Missing arguments in a format string with numbered arguments are an error. Restriction (A) has the purpose to ensure a clear specification, and also to catch some kinds of unintentional programmer/translator mistakes. Restriction (B) exists in C, because sizeof(argument) can only be deduced from the conversion specifier. In languages where each argument is merely a pointer, such as Awk, Boost, JavaScript, Ruby, Tcl, this restriction is unnecessary. For shell programming, where each argument is a plain string, it is unnecessary as well. |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |