View Issue Details

IDProjectCategoryView StatusLast Update
00003251003.1(2008)/Issue 7Shell and Utilitiespublic2013-04-16 13:06
Reportereblake Assigned Toajosey  
PrioritynormalSeverityObjectionTypeOmission
Status ClosedResolutionAccepted As Marked 
NameEric Blake
OrganizationRed Hat
User Referenceebb.tr
Sectiontr
Page Number3247
Line Number108366
Interp Status---
Final Accepted Text0000325:0000575
Summary0000325: tr and m4 translit behavior when string1 has repeated characters
DescriptionThe description for sed's y operator is explicit that repeating characters in
the first argument of a transliteration operation has unspecified results:
"if any of the characters in string1 appear more than once, the results are
undefined." (line 105011)

However, there is no corresponding text for either tr or m4's translit
built-in. And while sed leaves the behavior unspecified, existing
implementation practice for both tr and m4 is consistent across a large set
of implementations. Oddly enough, though, the two programs historically have
a different interpretation: tr favors the last instance of a character in
string1, while m4 favors the first:

$ echo a | tr aa 01
1
$ echo 'translit(a,aa,01)' | m4
0

The proposal below gives two alternatives; I would prefer the first which
mandates existing practice, but it may be deemed worth using the second
proposal to be more conservative and mirror sed's wording.

Meanwhile, 'tr -s a' falls into the category of -d not being specified, but
since it does not have a string2 argument, the text of line 108366 is not
applicable. Both options in the proposal fix the wording to make this clear.
The wording for m4 assumes the resolution for 0000242 has been applied first.

This proposal does not change the wording for sed, but it appears that most
sed implementations match the tr behavior of favoring the last instance of a
character.
Desired ActionOption 1:

At line 108366 (XCU tr EXTENDED DESCRIPTION), change:

Each input character found in the array specified by string1 shall be replaced
by the character in the same relative position in the array specified by
string2.

to:

If string2 is present, each input character found in the array specified by
string1 shall be replaced by the character in the same relative position in
the array specified by string2. If a character occurs more than once in
string1, the replacement shall be from the final position of that character.

At line 94344 (XCU m4 EXTENDED DESCRIPTION translit), change:

The defining text of the translit macro shall be the first argument with
every character that occurs in the second argument replaced with the
corresponding character from the third argument. If no replacement character
is specified for some source character because the second argument is longer
than the third argument, that character shall be deleted from the first
argument in translit's defining text.

to:

The defining text of the translit macro shall be the first argument with
every character that occurs in the second argument replaced with the
corresponding character from the third argument. If a character appears more
than once in the second argument, the replacement shall correspond to the
first instance of the character. If no replacement character is specified for
the first instance of a source character because the second argument is longer
than the third argument, that character shall be deleted from the first
argument in translit's defining text.



Option 2:

At line 108366 (XCU tr EXTENDED DESCRIPTION), change:

Each input character found in the array specified by string1 shall be replaced
by the character in the same relative position in the array specified by
string2. When the array specified by string2 is shorter that the one specified
by string1, the results are unspecified.

to:

If string2 is present, each input character found in the array specified by
string1 shall be replaced by the character in the same relative position in
the array specified by string2. If the array specified by string2 is shorter
that the one specified by string1, or if a character occurs more than once in
string1, the results are unspecified.

After line 94349 (XCU m4 EXTENDED DESCRIPTION translit), add another sentence:

The behavior is unspecified if the same character appears more than once in
the second argument.
Tagstc1-2008

Activities

msbrown

2010-10-14 15:44

manager   bugnote:0000575

The WG will be applying Option #2 from the Desired Action field; this will both clarify that this behavior is indeed undefined in the specification and also clean up the text for tr in this case.

Issue History

Date Modified Username Field Change
2010-09-29 15:09 eblake New Issue
2010-09-29 15:09 eblake Status New => Under Review
2010-09-29 15:09 eblake Assigned To => ajosey
2010-09-29 15:09 eblake Name => Eric Blake
2010-09-29 15:09 eblake Organization => Red Hat
2010-09-29 15:09 eblake User Reference => ebb.tr
2010-09-29 15:09 eblake Section => tr
2010-09-29 15:09 eblake Page Number => 3247
2010-09-29 15:09 eblake Line Number => 108366
2010-09-29 15:09 eblake Interp Status => ---
2010-10-14 15:44 msbrown Note Added: 0000575
2010-10-14 15:44 msbrown Resolution Open => Accepted As Marked
2010-10-14 15:46 msbrown Final Accepted Text => 0000325:0000575
2010-10-14 15:46 msbrown Tag Attached: tc1-2008
2010-10-14 15:56 geoffclare Status Under Review => Resolved
2013-04-16 13:06 ajosey Status Resolved => Closed