View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001915 | 1003.1(2016/18)/Issue7+TC2 | Shell and Utilities | public | 2025-03-17 19:17 | 2025-04-17 10:00 |
Reporter | steffen | Assigned To | |||
Priority | normal | Severity | Editorial | Type | Clarification Requested |
Status | New | Resolution | Open | ||
Name | steffen | ||||
Organization | |||||
User Reference | |||||
Section | 2.5.2 | ||||
Page Number | 2479 | ||||
Line Number | 80382 | ||||
Interp Status | |||||
Final Accepted Text | |||||
Summary | 0001915: clarification of 2.6.5 field splitting of 2.5.2 special parameter $* | ||||
Description | I was implementing a shell expression parser. It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh. The standard defines (p. 2479, lines 80382 ff.)
So in an example <code> a() { echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" echo $#,'*'="$*"/$*, } set -- '' 'a' '' for f in ' ' '' : ': ' ' :'; do IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS done </code> my parser was en par with the mentioned shells except for <code> --- .1 2025-03-15 23:38:31.359307576 +0100 +++ .2 2025-03-15 23:38:32.715974215 +0100 @@ -6,10 +6,10 @@ a a a$ 3,*=aaa/a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ </code> After that many months i did not give up and wrote to kre@ and on the bash-bug list: <code> By the very meaning of this [POSIX words] the fields are split individually, *first*. This is exactly what i do. Hence echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" -> 4,1=:a:/ a ,2=a/a,3=/,4=a becomes :a: -> '' + a a -> a '' -> discarded (but remembered as it separates fields) a -> a becomes, with IFS=:, when actually creating the argument :a:a::a becomes the actual argument < a a a> </code> Long story short (initial typo corrected): <code> + /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to + * deviate from POSIX standardized behaviour, and field split the quoted variant instead! + * This applies to $@ as well as $* */ + if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){ + cp = n_var_vlook(n_star, TRU1); + goto jfs_split; + } + + /* In all other cases individually field split the expanded parameters */ </code> Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case. This seems to be the case for multiple decades, if not ever. | ||||
Desired Action | Please clarify whether POSIX *really* meant what it says in *all* cases, whether the text is an omission of taking over application behavior into the first standard version. Or, whether the above special case for non-(IFS-)WS byte in IFS[0] is a regular desired implementation detail. (maybe reorder mantis layout so section etc are at the top again?) | ||||
Tags | No tags attached. |
|
The following is a copy of the Description, with the "code" tags changed to "pre". --------------------------------------------------------------------------------------------------------------- I was implementing a shell expression parser. It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh. The standard defines (p. 2479, lines 80382 ff.)
So in an example a() { echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" echo $#,'*'="$*"/$*, } set -- '' 'a' '' for f in ' ' '' : ': ' ' :'; do IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS done my parser was en par with the mentioned shells except for --- .1 2025-03-15 23:38:31.359307576 +0100 +++ .2 2025-03-15 23:38:32.715974215 +0100 @@ -6,10 +6,10 @@ a a a$ 3,*=aaa/a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ After that many months i did not give up and wrote to kre@ and on the bash-bug list: By the very meaning of this [POSIX words] the fields are split individually, *first*. This is exactly what i do. Hence echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" -> 4,1=:a:/ a ,2=a/a,3=/,4=a becomes :a: -> '' + a a -> a '' -> discarded (but remembered as it separates fields) a -> a becomes, with IFS=:, when actually creating the argument :a:a::a becomes the actual argument < a a a> Long story short (initial typo corrected): + /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to + * deviate from POSIX standardized behaviour, and field split the quoted variant instead! + * This applies to $@ as well as $* */ + if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){ + cp = n_var_vlook(n_star, TRU1); + goto jfs_split; + } + + /* In all other cases individually field split the expanded parameters */ Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case. This seems to be the case for multiple decades, if not ever. |
|
Is the test script output supposed to be consistent across theoretically conforming shells? Because here I got 5 different outputs. At least AT&T ksh gives the exact same output as busybox sh (ash derivative), dash, and yash. $ printf ',l\nq\n' | ed test.sh 178 a() {$ \techo \$#,1="\$1"/\$1,2="\$2"/\$2,3="\$3"/\$3,4="\$4"$ \techo \$#,'*'="\$*"/\$*,$ }$ set -- '' 'a' ''$ for f in ' ' '' : ': ' ' :'; do$ \tIFS=\$f ; echo "\$*"\$* \$*; a "\$*"\$* \$*;unset IFS$ done$ $ env -i POSIXLY_CORRECT=1 sh -c 'for i in sh posh ksh mksh lksh loksh dash bash yash "busybox sh"; do qfile -v `command -v ${i% *}` ; $i test.sh >|"test_${i}.txt" 2>&1; done' app-alternatives/sh-0: /bin/sh -> lksh app-shells/posh-0.14.1: /bin/posh app-shells/ksh-1.0.8: /bin/ksh app-shells/mksh-59c: /bin/mksh app-shells/mksh-59c: /bin/lksh app-shells/loksh-7.6: /bin/loksh app-shells/dash-0.5.12-r1: /bin/dash app-shells/bash-5.2_p37: /bin/bash app-shells/yash-2.57: /bin/yash sys-apps/busybox-1.36.1-r3: /bin/busybox $ sha1sum test.sh *.txt | sort 031cf59fcbcfc7eede60e35c9ede332bbb962f35 test_posh.txt 7edcc231165d9b3ca2f875501c02e4f8eff73e6b test_loksh.txt 88acd3756f017d1f74d5bb62cfa9a0f0e72a08ee test_bash.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_busybox sh.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_dash.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_ksh.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_yash.txt c7f2494ab0f12315513dfc2a70ab24d693dd6592 test_lksh.txt c7f2494ab0f12315513dfc2a70ab24d693dd6592 test_mksh.txt c7f2494ab0f12315513dfc2a70ab24d693dd6592 test_sh.txt ce30bb2a9eafa825dcd67be9a60e49529f091166 test.sh $ diff -u test_posh.txt test_loksh.txt --- test_posh.txt 2025-03-18 12:16:56.797930657 +0100 +++ test_loksh.txt 2025-03-18 12:16:56.822930552 +0100 @@ -1,9 +1,9 @@ a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, -a a a -2,1=a a /a a ,2= a / a ,3=/,4= -2,*=a a a /a a a , +a a a +3,1=a/a,2=a/a,3=a/a,4= +3,*=aaa/a a a, :a: a a 3,1=:a:/ a ,2=a/a,3=a/a,4= 3,*=:a::a:a/ a a a, $ diff -u test_posh.txt test_bash.txt --- test_posh.txt 2025-03-18 12:16:56.797930657 +0100 +++ test_bash.txt 2025-03-18 12:16:56.837930489 +0100 @@ -1,15 +1,15 @@ a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, -a a a -2,1=a a /a a ,2= a / a ,3=/,4= -2,*=a a a /a a a , -:a: a a -3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, -:a: a a -3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, +a a a +3,1=a/a,2=a/a,3=a/a,4= +3,*=aaa/a a a, +:a: a a +4,1=:a:/ a ,2=a/a,3=/,4=a +4,*=:a::a::a/ a a a, +:a: a a +4,1=:a:/ a ,2=a/a,3=/,4=a +4,*=:a::a::a/ a a a, a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, $ diff -u test_bash.txt test_dash.txt --- test_bash.txt 2025-03-18 12:16:56.837930489 +0100 +++ test_dash.txt 2025-03-18 12:16:56.830930519 +0100 @@ -4,12 +4,12 @@ a a a 3,1=a/a,2=a/a,3=a/a,4= 3,*=aaa/a a a, -:a: a a -4,1=:a:/ a ,2=a/a,3=/,4=a -4,*=:a::a::a/ a a a, -:a: a a -4,1=:a:/ a ,2=a/a,3=/,4=a -4,*=:a::a::a/ a a a, +:a: a a +3,1=:a:/ a ,2=a/a,3=a/a,4= +3,*=:a::a:a/ a a a, +:a: a a +3,1=:a:/ a ,2=a/a,3=a/a,4= +3,*=:a::a:a/ a a a, a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, $ diff -u test_dash.txt test_lksh.txt --- test_dash.txt 2025-03-18 12:16:56.830930519 +0100 +++ test_lksh.txt 2025-03-18 12:16:56.813930590 +0100 @@ -6,10 +6,10 @@ 3,*=aaa/a a a, :a: a a 3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, +3,*=:a::a:a/ a a a, :a: a a 3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, +3,*=:a::a:a/ a a a, a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, |
|
> Is the test script output supposed to be consistent across theoretically conforming shells? Two different outputs are expected because of the optional discarding of empty fields (when the expansion occurs in a context where field splitting will be performed). If we disregard posh (which we don't usually pay attention to) and loksh (which I believe is descended from pdksh which had appallingly bad conformance), then you're seeing three behaviours. Unfortunately, ksh88 differs from those three: $ sha1sum test_ksh88.txt c615e994262cb2a6a6469a0f967ae8feeaa40966 test_ksh88.txt $ sed -n l test_ksh88.txt a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ a a a $ 2,1=a a /a a ,2= a / a ,3=/,4=$ 2,*=a a a /a a a ,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ 4,*=:a::a::a/ a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ 4,*=:a::a::a/ a a a,$ a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ (tested using /usr/xpg4/bin/sh on Solaris 11.4). |
|
Here is a much simpler test script which eliminates the unspecified behaviour and shows clearly the issue that Steffen has identified:set a: a IFS=: printf '[%s]\n' $* I get two different results with this script: ksh93 and dash produce: [a] [a]but ksh88, bash and mksh produce: [a] [] [a] The standard requires the ksh93/dash behaviour. Shells which first do a quoted expansion and then split it end up splitting "a::a" which produces the extra empty field. This difference happens because IFS characters are terminators not separators (as stated in the RATIONALE on the sh page). |
Date Modified | Username | Field | Change |
---|---|---|---|
2025-03-17 19:17 | steffen | New Issue | |
2025-03-18 10:31 | geoffclare | Note Added: 0007122 | |
2025-03-18 11:29 | lanodan | Note Added: 0007123 | |
2025-03-18 11:32 | lanodan | Note Edited: 0007123 | |
2025-03-18 12:11 | geoffclare | Note Added: 0007124 | |
2025-04-17 10:00 | geoffclare | Note Added: 0007138 |