View Issue Details

IDProjectCategoryView StatusLast Update
00019151003.1(2016/18)/Issue7+TC2Shell and Utilitiespublic2025-04-17 10:00
Reportersteffen Assigned To 
PrioritynormalSeverityEditorialTypeClarification Requested
Status NewResolutionOpen 
Namesteffen
Organization
User Reference
Section2.5.2
Page Number2479
Line Number80382
Interp Status
Final Accepted Text
Summary0001915: clarification of 2.6.5 field splitting of 2.5.2 special parameter $*
DescriptionI was implementing a shell expression parser.
It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh.
The standard defines (p. 2479, lines 80382 ff.)


   [.]one field for
  each positional parameter that is set. When the expansion occurs in a context where field
  splitting will be performed, any empty fields may be discarded and each of the non-empty
  fields shall be further split as described in Section 2.6.5.


So in an example
<code>
  a() {
          echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
          echo $#,'*'="$*"/$*,
  }
  set -- '' 'a' ''
  for f in ' ' '' : ': ' ' :'; do
          IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS
  done
</code>

my parser was en par with the mentioned shells except for
<code>
  --- .1 2025-03-15 23:38:31.359307576 +0100
  +++ .2 2025-03-15 23:38:32.715974215 +0100
  @@ -6,10 +6,10 @@ a a a$
   3,*=aaa/a a a,$
   :a: a a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a a a,$
  +4,*=:a::a::a/ a a a,$
   :a: a a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a a a,$
  +4,*=:a::a::a/ a a a,$
    a a a$
   3,1= a / a ,2=a/a,3=a/a,4=$
   3,*= a a a/ a a a,$
</code>

After that many months i did not give up and wrote to kre@ and on the bash-bug list:
<code>
By the very meaning of this [POSIX words] the fields are split individually,
*first*. This is exactly what i do.
Hence
    echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
->
    4,1=:a:/ a ,2=a/a,3=/,4=a
becomes
    :a: -> '' + a
    a -> a
    '' -> discarded (but remembered as it separates fields)
    a -> a
becomes, with IFS=:, when actually creating the argument
    :a:a::a
becomes the actual argument
    < a a a>
</code>

Long story short (initial typo corrected):

<code>
  + /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to
  + * deviate from POSIX standardized behaviour, and field split the quoted variant instead!
  + * This applies to $@ as well as $* */
  + if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){
  + cp = n_var_vlook(n_star, TRU1);
  + goto jfs_split;
  + }
  +
  + /* In all other cases individually field split the expanded parameters */
</code>

Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case.
This seems to be the case for multiple decades, if not ever.
Desired ActionPlease clarify whether POSIX *really* meant what it says in *all* cases, whether the text is an omission of taking over application behavior into the first standard version.
Or, whether the above special case for non-(IFS-)WS byte in IFS[0] is a regular desired implementation detail.

(maybe reorder mantis layout so section etc are at the top again?)
TagsNo tags attached.

Activities

geoffclare

2025-03-18 10:31

manager   bugnote:0007122

The following is a copy of the Description, with the "code" tags changed to "pre".
---------------------------------------------------------------------------------------------------------------

I was implementing a shell expression parser.
It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh.
The standard defines (p. 2479, lines 80382 ff.)


   [.]one field for
  each positional parameter that is set. When the expansion occurs in a context where field
  splitting will be performed, any empty fields may be discarded and each of the non-empty
  fields shall be further split as described in Section 2.6.5.


So in an example
  a() {
          echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
          echo $#,'*'="$*"/$*,
  }
  set -- '' 'a' ''
  for f in ' ' '' : ': ' ' :'; do
          IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS
  done


my parser was en par with the mentioned shells except for
  --- .1  2025-03-15 23:38:31.359307576 +0100
  +++ .2  2025-03-15 23:38:32.715974215 +0100
  @@ -6,10 +6,10 @@ a a a$
   3,*=aaa/a a a,$
   :a: a  a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a  a  a,$
  +4,*=:a::a::a/ a a  a,$
   :a: a  a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a  a  a,$
  +4,*=:a::a::a/ a a  a,$
    a  a a$
   3,1= a / a ,2=a/a,3=a/a,4=$
   3,*= a  a a/ a a a,$


After that many months i did not give up and wrote to kre@ and on the bash-bug list:
By the very meaning of this [POSIX words] the fields are split individually,
*first*.  This is exactly what i do.
Hence
    echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
->
    4,1=:a:/ a ,2=a/a,3=/,4=a
becomes
    :a: -> '' + a
    a -> a
    '' -> discarded (but remembered as it separates fields)
    a -> a
becomes, with IFS=:, when actually creating the argument
    :a:a::a
becomes the actual argument
    < a a  a>


Long story short (initial typo corrected):

  +                       /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to
  +                        * deviate from POSIX standardized behaviour, and field split the quoted variant instead!
  +                        * This applies to $@ as well as $* */
  +                       if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){
  +                               cp = n_var_vlook(n_star, TRU1);
  +                               goto jfs_split;
  +                       }
  +
  +                       /* In all other cases individually field split the expanded parameters */


Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case.
This seems to be the case for multiple decades, if not ever.

lanodan

2025-03-18 11:29

reporter   bugnote:0007123

Last edited: 2025-03-18 11:32

Is the test script output supposed to be consistent across theoretically conforming shells? Because here I got 5 different outputs.

At least AT&T ksh gives the exact same output as busybox sh (ash derivative), dash, and yash.

$ printf ',l\nq\n' | ed test.sh
178
a() {$
\techo \$#,1="\$1"/\$1,2="\$2"/\$2,3="\$3"/\$3,4="\$4"$
\techo \$#,'*'="\$*"/\$*,$
}$
set -- '' 'a' ''$
for f in ' ' '' : ': ' ' :'; do$
\tIFS=\$f ; echo "\$*"\$* \$*; a "\$*"\$* \$*;unset IFS$
done$
$ env -i POSIXLY_CORRECT=1 sh -c 'for i in sh posh ksh mksh lksh loksh dash bash yash "busybox sh"; do qfile -v `command -v ${i% *}` ; $i test.sh >|"test_${i}.txt" 2>&1; done'
app-alternatives/sh-0: /bin/sh -> lksh
app-shells/posh-0.14.1: /bin/posh
app-shells/ksh-1.0.8: /bin/ksh
app-shells/mksh-59c: /bin/mksh
app-shells/mksh-59c: /bin/lksh
app-shells/loksh-7.6: /bin/loksh
app-shells/dash-0.5.12-r1: /bin/dash
app-shells/bash-5.2_p37: /bin/bash
app-shells/yash-2.57: /bin/yash
sys-apps/busybox-1.36.1-r3: /bin/busybox
$ sha1sum test.sh *.txt | sort
031cf59fcbcfc7eede60e35c9ede332bbb962f35  test_posh.txt
7edcc231165d9b3ca2f875501c02e4f8eff73e6b  test_loksh.txt
88acd3756f017d1f74d5bb62cfa9a0f0e72a08ee  test_bash.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_busybox sh.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_dash.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_ksh.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_yash.txt
c7f2494ab0f12315513dfc2a70ab24d693dd6592  test_lksh.txt
c7f2494ab0f12315513dfc2a70ab24d693dd6592  test_mksh.txt
c7f2494ab0f12315513dfc2a70ab24d693dd6592  test_sh.txt
ce30bb2a9eafa825dcd67be9a60e49529f091166  test.sh
$ diff -u test_posh.txt test_loksh.txt
--- test_posh.txt       2025-03-18 12:16:56.797930657 +0100
+++ test_loksh.txt      2025-03-18 12:16:56.822930552 +0100
@@ -1,9 +1,9 @@
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
-a a   a
-2,1=a a /a a ,2= a / a ,3=/,4=
-2,*=a a  a /a a   a ,
+a a a
+3,1=a/a,2=a/a,3=a/a,4=
+3,*=aaa/a a a,
 :a: a a
 3,1=:a:/  a ,2=a/a,3=a/a,4=
 3,*=:a::a:a/ a  a a,
$ diff -u test_posh.txt test_bash.txt
--- test_posh.txt       2025-03-18 12:16:56.797930657 +0100
+++ test_bash.txt       2025-03-18 12:16:56.837930489 +0100
@@ -1,15 +1,15 @@
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
-a a   a
-2,1=a a /a a ,2= a / a ,3=/,4=
-2,*=a a  a /a a   a ,
-:a: a a
-3,1=:a:/  a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a  a a,
-:a: a a
-3,1=:a:/  a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a  a a,
+a a a
+3,1=a/a,2=a/a,3=a/a,4=
+3,*=aaa/a a a,
+:a: a  a
+4,1=:a:/ a ,2=a/a,3=/,4=a
+4,*=:a::a::a/ a  a  a,
+:a: a  a
+4,1=:a:/ a ,2=a/a,3=/,4=a
+4,*=:a::a::a/ a  a  a,
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
$ diff -u test_bash.txt test_dash.txt
--- test_bash.txt       2025-03-18 12:16:56.837930489 +0100
+++ test_dash.txt       2025-03-18 12:16:56.830930519 +0100
@@ -4,12 +4,12 @@
 a a a
 3,1=a/a,2=a/a,3=a/a,4=
 3,*=aaa/a a a,
-:a: a  a
-4,1=:a:/ a ,2=a/a,3=/,4=a
-4,*=:a::a::a/ a  a  a,
-:a: a  a
-4,1=:a:/ a ,2=a/a,3=/,4=a
-4,*=:a::a::a/ a  a  a,
+:a: a a
+3,1=:a:/ a ,2=a/a,3=a/a,4=
+3,*=:a::a:a/ a a a,
+:a: a a
+3,1=:a:/ a ,2=a/a,3=a/a,4=
+3,*=:a::a:a/ a a a,
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
$ diff -u test_dash.txt test_lksh.txt
--- test_dash.txt       2025-03-18 12:16:56.830930519 +0100
+++ test_lksh.txt       2025-03-18 12:16:56.813930590 +0100
@@ -6,10 +6,10 @@
 3,*=aaa/a a a,
 :a: a a
 3,1=:a:/ a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a a a,
+3,*=:a::a:a/ a  a a,
 :a: a a
 3,1=:a:/ a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a a a,
+3,*=:a::a:a/ a  a a,
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,

geoffclare

2025-03-18 12:11

manager   bugnote:0007124

> Is the test script output supposed to be consistent across theoretically conforming shells?

Two different outputs are expected because of the optional discarding of empty fields (when the expansion occurs in a context where field splitting will be performed). If we disregard posh (which we don't usually pay attention to) and loksh (which I believe is descended from pdksh which had appallingly bad conformance), then you're seeing three behaviours. Unfortunately, ksh88 differs from those three:
$ sha1sum test_ksh88.txt
c615e994262cb2a6a6469a0f967ae8feeaa40966  test_ksh88.txt
$ sed -n l test_ksh88.txt
 a  a a$
3,1= a / a ,2=a/a,3=a/a,4=$
3,*= a  a a/ a a a,$
a a   a $
2,1=a a /a a ,2= a / a ,3=/,4=$
2,*=a a  a /a a   a ,$
:a: a  a$
4,1=:a:/ a ,2=a/a,3=/,4=a$
4,*=:a::a::a/ a  a  a,$
:a: a  a$
4,1=:a:/ a ,2=a/a,3=/,4=a$
4,*=:a::a::a/ a  a  a,$
 a  a a$
3,1= a / a ,2=a/a,3=a/a,4=$
3,*= a  a a/ a a a,$

(tested using /usr/xpg4/bin/sh on Solaris 11.4).

geoffclare

2025-04-17 10:00

manager   bugnote:0007138

Here is a much simpler test script which eliminates the unspecified behaviour and shows clearly the issue that Steffen has identified:
set a: a
IFS=:
printf '[%s]\n' $*

I get two different results with this script: ksh93 and dash produce:
[a]
[a]
but ksh88, bash and mksh produce:
[a]
[]
[a]

The standard requires the ksh93/dash behaviour. Shells which first do a quoted expansion and then split it end up splitting "a::a" which produces the extra empty field. This difference happens because IFS characters are terminators not separators (as stated in the RATIONALE on the sh page).

Issue History

Date Modified Username Field Change
2025-03-17 19:17 steffen New Issue
2025-03-18 10:31 geoffclare Note Added: 0007122
2025-03-18 11:29 lanodan Note Added: 0007123
2025-03-18 11:32 lanodan Note Edited: 0007123
2025-03-18 12:11 geoffclare Note Added: 0007124
2025-04-17 10:00 geoffclare Note Added: 0007138