Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001190 [1003.1(2016)/Issue7+TC2] Base Definitions and Headers Comment Clarification Requested 2018-04-13 11:16 2018-04-19 06:27
Reporter geoffclare View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Geoff Clare
Organization The Open Group
User Reference
Section 9.3.5
Page Number 184
Line Number 6089
Interp Status ---
Final Accepted Text
Summary 0001190: backslash has two special meanings in the shell and only loses one of them in bracket expressions
Description XBD 9.3.5 item 1 says:
The special characters '.', '*', '[', and '\\' (<period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.

In the case of <backslash>, in the shell the character has two
different special meanings and this text does not make clear that it
is only referring to the pattern-matching special meaning of
<backslash> and does not affect its shell-quoting special meaning.
Desired Action On page 184 line 6089 section 9.3.5 RE Bracket Expression, after:
... shall lose their special meaning within a bracket expression.
add a small-font note:
Note: In the context of shell pattern matching, although <backslash> ('\\') loses its special meaning as a pattern matching character in bracket expressions, in situations where shell quoting is performed it is still a shell escape character as described in [xref to XCU 2.2 Quoting]. For example:
$ ls
! $ - a b c
$ echo [a\-c]
- a c
$ echo [\!a]
! a
$ echo ["!\$a-c"]
! $ - a c
$ echo [!"\$a-c"]
! b
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0003954)
kre (reporter)
2018-04-13 12:39

Just to restate what I have said on the list, I don't think this is
the correct approach to solve the problem at all.

Much better would be to completely divorce this section (which deals
with regular expressions) from anything related to shell pattern
matching (glob patterns) which are superficially similar, but
really completely different, and should be described independently.

But if it is eventually decided to do it this way for some reason,
the examples should avoid the complication of needing \$ inside the
double quotes by simply using single quotes instead - none of them
are relying on the expansions that can happen in double quoted strings.
(0003955)
geoffclare (manager)
2018-04-13 13:47

The examples demonstrate cases where backslash is a shell-quoting escape character. Inside single quotes it is not, and that is the reason I did not include any examples using single quotes.
(0003957)
stephane (reporter)
2018-04-14 17:17

Geoff, what are those "two special meanings" you're refering to? AFAICT, there's only one: a quoting operator. \ is not a glob operator in shells, it's only for for fnmatch() patterns. It's just that quoted characters are not considered as glob operators in shells.

Note that the notion of "special meaning" inside bracket expressions would have to be clarified. For instance, the example seems to imply that by quoting that "-", its "special meaning" as a range operator was removed, but would [\a-c] remove "a"'s special meaning as a range start? What about ["[:alnum:]"], [[:"$class":]], [[=$'\ue9'=]], etc (and there are variations between implementations there).
(0003958)
geoffclare (manager)
2018-04-16 08:31

Special meaning 1 is an escape character in shell quoting, as per 2.2.1 "A <backslash> that is not quoted shall preserve the literal value of the following character, with the exception of a <newline>."

Special meaning 2 is an escape character in pattern matching, as per 2.13.1 "A <backslash> character shall escape the following character. The escaping <backslash> shall be discarded."
(0003959)
geoffclare (manager)
2018-04-16 09:23
edited on: 2018-04-16 14:50

Another problem with this part of 9.3.5 has been identified in email discussion: it only lists the special characters for BREs. It should have different lists for BREs, EREs and shell patterns. New proposed changes...

On page 184 line 6087 section 9.3.5 RE Bracket Expression, change:
The special characters '.', '*', '[', and '\\' (<period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.
to:
When the bracket expression appears within a BRE, the special characters '.', '*', '[', and '\\' (<period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within the bracket expression. When the bracket expression appears within an ERE, the special characters '.', '(', '*', '+', '?', '{', '|', '$', '[', and '\\' (<period>, <left-parenthesis>, <asterisk>, plus-sign>, <question-mark>, <left-brace>, <vertical-line>, dollar-sign>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within the bracket expression; <circumflex> ('^') shall lose its special meaning as an anchor. When the bracket expression appears within a shell pattern (see [xref to XCU 2.13]), the special characters '?', '*', and '[' (<question-mark>, <asterisk>, and <left-square-bracket>, respectively) shall lose their special meaning within the bracket expression; <backslash> ('\\') shall lose its special meaning as a pattern matching character (see [xref to XCU 2.13.1]) but, in situations where shell quoting is performed, shall retain its special meaning as a shell quoting character (see [xref to XCU 2.2]). For example:
$ ls
! $ - a b c
$ echo [a\-c]
- a c
$ echo [\!a]
! a
$ echo ["!\$a-c"]
! $ - a c
$ echo [!"\$a-c"]
! b
$ echo [!\]\\]
! $ - a b c


(0003960)
stephane (reporter)
2018-04-16 11:06
edited on: 2018-04-16 11:12

That's wrong for shell pattern matching. As I said earlier, backslash is not a pattern matching operator in shell wildcards (not any more than ' or "), it's only a quoting operator and quoting disables wildcard operators.

For instance,

pattern='\*'
case $string in
  $pattern) echo something;;
esac


Matches on any string that starts with backslash, not on a literal star (at least in Bourne/ksh/ash/pdksh, not in bash nor zsh (which match on *)).

That's different for

find . -name '\*'


Which matches on files called literally "*" (same as -name '[*]') as in fnmatch() backslash is used as an ersatz of shell quoting.

Also note that $ loses its special meaning inside [...] in ERE (in BRE, is loses it already by not being the last character of the RE).

(0003961)
stephane (reporter)
2018-04-16 11:21
edited on: 2018-04-16 11:27

Note that:

pattern='[\]*'
case $string in
  $pattern) echo something;;
esac


matches on []anything in bash and dash, gives an error in zsh, matches on \anything in Bourne/ksh88/yash/mksh/busybox-sh (as I'd expect), only on [\]* (that I can tell) in ksh93.

So it does look like a jolly mess.

(0003962)
stephane (reporter)
2018-04-16 11:45
edited on: 2018-04-16 12:22

More fun:

$ ksh -c 'case "[\\]" in [\\]) echo yes;; esac'
yes
$ ksh -c 'case "\\" in [\\]) echo yes;; esac'
yes


(both ksh88 and ksh93, also in Bourne). Same for [\\\\] or [\\\\\\\\].

It looks as if those shells resort to string equality comparison when the patterns don't match (case [a] in [a]) echo yes;; esac also matches). AFAICT, that is not allowed by POSIX.

It does look like a historical "feature", as ksh doesn't do it for its [[ $string = $pattern ]] other pattern matching operator.

What that means is that it looks like it's impossible to have a variable contain a pattern meant to match a string that starts with backslash.

- pattern='\*' doesn't work in bash/zsh
- pattern='[\\]*' doesn't work reliably in ksh88/ksh93.

(0003963)
geoffclare (manager)
2018-04-16 14:41

Re: Note: 0003960, XCU 2.13.1 clearly defines a pattern-matching rule, distinct from the usual quoting rule, for backslash in the shell (in the first paragraph - it specifies it separately in the last paragraph for non-shell pattern matchers). It appears that bash and zsh are implementing the standard as written but the other shells you tested are not.

When testing this stuff note that ksh93 is known to behave incorrectly as regards quoting inside bracket expressions - that was the reason this whole discussion started in the first place. ksh88 also has some weird bugs, such as ["a\-c"] matching 'a', backslash and 'c' but not '-'.

Re: Note: 0003962, your final observation would seem to be a reason to keep the standard's requirements in 2.13.1 as-is, so that pattern='\\*' can be used for this, which works in bash and presumably zsh.
(0003964)
geoffclare (manager)
2018-04-16 14:52

I have edited Note: 0003959 to add '$' and '^' in the part about ERE special characters.
(0003965)
stephane (reporter)
2018-04-16 16:27
edited on: 2018-04-16 16:39

Re: Note: 0003963

Hmmm. Looks like pattern='\\*' is yet another different case with different differences between shells.

In Bourne/ksh88/mksh/yash/FreeBSD-sh, it matches on \\anything (as I'd expect), with dash, ksh93, bash, zsh, it matches on \anything instead.

For busybox-sh, I see two different behaviours with 2 different versions.

Note that as per your proposed text, if I understand correctly, bash, dash and zsh would not be compliant as with pattern='[\]*', they match on []anything instead of \anything and with pattern='[\-^]', they match on - and not \. That is backslash didn't lose its special meaning as a wildcard operator.

(0003966)
joerg (reporter)
2018-04-16 16:57

Could you please present a reproducable script to verify your claims and could you please mention which ksh93 version you are testing?
(0003967)
stephane (reporter)
2018-04-16 20:11
edited on: 2018-04-16 20:17

Re: Note: 0003966

That was ksh93u+ on Ubuntu 16.04 amd64.

Try for instance
#! /usr/bin/env bash
export PATTERN STRING
set -o noglob
while read -r PATTERN strings; do
  printf '\n%s\n' "$PATTERN"
  for shell do
    printf ' %12s[1]:' "$shell"
    for STRING in $strings; do
      (exec -a sh "$shell" -c '
        case $STRING in $PATTERN) ;; *) exit 1; esac') &&
        printf ' %s' "$STRING"
    done
    printf '\n %12s[2]:' "$shell"
    for STRING in $strings; do
      (exec -a sh "$shell" -c "
        case \$STRING in $PATTERN) ;; *) exit 1; esac") &&
        printf ' %s' "$STRING"
    done
    echo
  done
done << 'EOF'
[\]* \anything []anything [\]anything [\]* [\\]* *
[\\]* \anything []anything [\]anything [\]* [\\]* *
\*   \anything anything \*  *
\\*  \anything \\anything \\* \* *
EOF


To run as
that-script dash bash ksh93 mksh posh yash zsh busybox schily-sh


(0003968)
joerg (reporter)
2018-04-17 10:21
edited on: 2018-04-17 10:26

The Bourne Shell, ksh88 and bosh give this result:

[\]*
           sh[1]: \anything [\]*
           sh[2]:

[\\]*
           sh[1]: \anything [\\]*
           sh[2]: \anything [\]*

\*
           sh[1]: \anything \*
           sh[2]: *

\\*
           sh[1]: \\anything \\*
           sh[2]: \anything \\anything \\* \*

It has been manually verified for correctness.

(0003969)
joerg (reporter)
2018-04-17 10:34

Let me add another script:

--->
if [ "$BASH_VERSION" != "" ]; then
        echo() { command echo -e "$@"; }
fi

chk() { echo [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]; }

mkdir td && cd td || exit

printf '%s\n' '---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]'

echo ": \c"; chk

:> a; echo "a: \c"; chk; rm a

:> b; echo "b: \c"; chk; rm b

:> ./-; echo "-: \c"; chk; rm ./-

:> c; echo "c: \c"; chk; rm c

:> _; echo "_: \c"; chk; rm _

:> \\; echo "\\: \c"; chk; rm \\

:> d; echo "d: \c"; chk; rm d

rm -f *
cd ..
rmdir td
<---

Call: $shell ./test-script

Expected result:

---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
: [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b [a-c] [a\-c] [a-c] [a-c]
-: [a-c] - - - -
c: c c c c c
_: [a-c] [a-c] [a\-c] [a-c] [a-c]
\: [a-c] [a-c] \ [a-c] [a-c]
d: [a-c] [a-c] [a\-c] [a-c] [a-c]
(0003972)
shware_systems (reporter)
2018-04-19 06:27

The above discussion glosses over that the current requirements, as I read it, are that there are two contexts for evaluation of patterns; before and after quote removal has been performed. Before removal usage can occur, it appears, when evaluating ASSIGNMENT_WORD's, cmd_suffix WORD's and the operand to the in reserved word, as a glob expansion; after removal usage applies to case labels, and is limited to clauses XCU 2.13.1 and 2.13.2 as the file system is not being implicitly accessed. I believe each case has to be considered separately on how '\' is treated, as an escape or ordinary character, and this may entail additions to the grammar to make the distinction's fully normative.

- Issue History
Date Modified Username Field Change
2018-04-13 11:16 geoffclare New Issue
2018-04-13 11:16 geoffclare Name => Geoff Clare
2018-04-13 11:16 geoffclare Organization => The Open Group
2018-04-13 11:16 geoffclare Section => 9.3.5
2018-04-13 11:16 geoffclare Page Number => 184
2018-04-13 11:16 geoffclare Line Number => 6089
2018-04-13 11:16 geoffclare Interp Status => ---
2018-04-13 11:18 geoffclare Desired Action Updated
2018-04-13 12:39 kre Note Added: 0003954
2018-04-13 13:47 geoffclare Note Added: 0003955
2018-04-14 17:17 stephane Note Added: 0003957
2018-04-16 08:31 geoffclare Note Added: 0003958
2018-04-16 09:23 geoffclare Note Added: 0003959
2018-04-16 09:25 geoffclare Note Edited: 0003959
2018-04-16 11:06 stephane Note Added: 0003960
2018-04-16 11:10 stephane Note Edited: 0003960
2018-04-16 11:12 stephane Note Edited: 0003960
2018-04-16 11:21 stephane Note Added: 0003961
2018-04-16 11:22 stephane Note Edited: 0003961
2018-04-16 11:27 stephane Note Edited: 0003961
2018-04-16 11:45 stephane Note Added: 0003962
2018-04-16 12:18 stephane Note Edited: 0003962
2018-04-16 12:22 stephane Note Edited: 0003962
2018-04-16 14:41 geoffclare Note Added: 0003963
2018-04-16 14:50 geoffclare Note Edited: 0003959
2018-04-16 14:52 geoffclare Note Added: 0003964
2018-04-16 16:27 stephane Note Added: 0003965
2018-04-16 16:39 stephane Note Edited: 0003965
2018-04-16 16:57 joerg Note Added: 0003966
2018-04-16 20:11 stephane Note Added: 0003967
2018-04-16 20:13 stephane Note Edited: 0003967
2018-04-16 20:17 stephane Note Edited: 0003967
2018-04-17 10:21 joerg Note Added: 0003968
2018-04-17 10:26 joerg Note Edited: 0003968
2018-04-17 10:34 joerg Note Added: 0003969
2018-04-19 06:27 shware_systems Note Added: 0003972


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker