0000375: Extend test/[...] conditionals: ==, <, >, -nt, -ot, -ef

ID	Project	Category	View Status	Date Submitted	Last Update

0000375	1003.1(2008)/Issue 7	Shell and Utilities	public	2011-02-07 18:34	2024-06-11 08:53

Reporter	dwheeler	Assigned To	ajosey
Priority	normal	Severity	Objection	Type	Enhancement Request
Status	Closed	Resolution	Accepted As Marked

Name	David A. Wheeler
Organization
User Reference
Section	test
Page Number	3224-3225
Line Number	107503-107513
Interp Status	---
Final Accepted Text	See 0000375:0006040


Summary	0000375: Extend test/[...] conditionals: ==, <, >, -nt, -ot, -ef
Description	Many implementations of "test" (aka "["), including shell built-ins, implement conditionals beyond those specified in the current version POSIX. What's more, many extant programs rely on these extensions. I recommend formally adding these widely-implemented extensions to the POSIX specification itself, as these extensions have become widespread and are ready to be standardized. Specifically, add: * "s1 == s2" as a synonym for "s1 = s2". Since "=" is used for "assignment" in shell scripts, the spelling "==" is a clearer/less-ambiguous way to state that this is a test and not an assignment. This is already implemented in bash, ksh, and busybox sh. * "s1 < s2" and "s1 > s2" for lexicographic comparison. Users must quote "<" and ">", but even so, it's a useful operation. Alternatives tend to be ugly, e.g., awk -v v1="1" -v v2="fcd" 'BEGIN{exit !(v1 "" < "" v2)}' This is implemented in bash, dash, busybox sh, and ksh. * "pathname1 -nt pathname2" and "pathname1 -ot pathname2" - newer than/older than, comparing modification times. You can work around this by using expressions such as [ "$(find 'file1' -prune -newer 'file2')" ], but this is ugly compared to a straightforward test. Determining if something should be done, based on whether or not one file is newer than another, is a common operation, and thus makes sense to include. Implemented in ksh, bash, and busybox sh. The semantics when the files don't exist appear to be the same between bash and ksh, so I've included those semantics in the "desired action" below. * "pathname1 -ef pathname2". True if pathname1 and pathname2 are hard linked (refer to the same device and inode numbers). Implemented in bash, ksh, and busybox sh. Some are identified as "bashisms" in places like http://mywiki.wooledge.org/Bashism but in fact these are widely implemented and/or depended upon. But because they're not in the POSIX specification, they can't be universally depended upon, and I think that needs to change.
Desired Action	Add to the list of primaries for "test" (circa page 3224): * s1 == s2 True if the strings s1 and s2 are identical; otherwise, false. This primary is equivalent to s1 = s2. * s1 < s2 True if the string s1 is lexicographically less than s2; otherwise, false. * s1 > s2 True if the string s1 is lexicographically greater than s2; otherwise, false. * pathname1 -nt pathname2 True if pathname1 exists and pathname2 does not, or if pathname1 is newer than pathname2 according to their modification times; otherwise, false. If pathname1 does not exist, the result is false. * pathname1 -ot pathname2 True if pathname2 exists and pathname1 does not, or if pathname1 is older than pathname2 according to their modification times; otherwise, false. If pathname2 does not exist, the result is false. * pathname1 -ef pathname2 True if pathname1 and pathname2 are hard linked, that is, refer to the same file. See section 3.191 {the section number defining "hard link".}
Tags	issue8
Attached Files	Extendingshellconditionals.pdf (324,995 bytes) Extendingshellconditionals.txt (23,138 bytes) Extending shell conditionals (DRAFT) 2011-11-15 This white paper proposes various extensions to the existing POSIX shell conditional statements. It supports Austin group defect report 375 (http://austingroupbugs.net/view.php?id=375), in particular reply 967. This is a draft. As it is spread more widely, it is expected to change. The intent is to provide a single location where the issues can be discussed in an organized fashion. To the extent possible under law, the contributors to this document have waived all copyright and released it to the copyright public domain, under the terms of the Creative Commons CC0 waiver: http://creativecommons.org/choose/zero/waiver This way, any of its material can be used by the standards developers (or others) in any way they desire. Table of contents: Overview Issue Background Requirements Importance Proposed changes to the POSIX specification Add “==” to test Add “-nt” (newer-than) and “-ot” (older-than) to test Add “-ef” to test Add “[[” Rationale Add “==” to test Add “-nt” (newer-than) and “-ot” (older-than) to test Add “-ef” to test Add “[[” Alternative proposals Adding “<” and “>” to test Appendix A: Interpretation of -nt and -ot Overview Issue Many implementations of "test" (aka "["), including shell built-ins, implement conditionals beyond those specified in the current version of POSIX. What's more, many extant programs rely on these extensions. This proposal recommends formally adding these widely-implemented extensions to the POSIX specification itself, as these extensions have become widespread and are ready to be standardized. Each of these additions is described separately, since they can be treated separately. Background Austin group defect report 375 (http://austingroupbugs.net/view.php?id=375) (“Extend test/[...] conditionals: ==, <, >, -nt, -ot, -ef”) proposed extensions to POSIX "test". It proposed adding certain widely-implemented and widely-used extensions of "test" to the POSIX standard. This defect report was discussed at the September 8, 2011 teleconference meeting and it was agreed that the submitter should “produce a whitepaper expanding the proposal (similar to proposals made in the past, for example the LFS proposal, http://www.unix.org/version2/whatsnew/lfs20mar.html (Adding Support for Arbitrary File Sizes to the Single UNIX Specification). This could then be widely circulated amongst all interested parties to look for consensus. The standard developers recommend that the white paper should pay particular attention to note 670.” This document is the whitepaper requested by the Austin group. This whitepaper attempts to expand the proposal so that it can be "widely circulated amongst all interested parties to look for consensus". It attempts to respond to standards developers recommendations, in particular, to pay "particular attention to note 670." Requirements These proposals are only proposed because they meet the following requirements: 1. Are already implemented in at least one implementation. 2. Are used in existing programs/scripts. 3. Are easily implemented. Importance All of these proposals can be implemented in other ways, but their omission in POSIX can render otherwise-compatible scripts non-conforming. Some of these extensions are identified as "bashisms" in pages such as http://mywiki.wooledge.org/Bashism, but in fact these are widely implemented and/or depended upon, whether or not bash is used. Their widespread use and implementation suggests that they are ready to be added to POSIX itself. Proposed changes to the POSIX specification Add “==” to test In the text of test, circa page 3224, add the following primary definition: 1. s1 == s2 True if the strings s1 and s2 are identical; otherwise, false. This primary is equivalent to s1 = s2. Add “-nt” (newer-than) and “-ot” (older-than) to test In the text of test, circa page 3224, add the following primary definition: 1. pathname1 -nt pathname2 True if pathname1 exists and pathname2 does not, or if both exist and pathname1 is newer than pathname2 according to their modification times; otherwise, false. 2. pathname1 -ot pathname2 True if pathname2 exists and pathname1 does not, or if both exist and pathname1 is older than pathname2 according to their modification times; otherwise, false. Add “-ef” to test In the text of test, circa page 3224, add the following primary definition: 1. "pathname1 -ef pathname2". True if pathname1 and pathname2 name the same file, otherwise, the result is false. Add “[[” In XCU section 2.4, line 72478, add “[[“ and “]]” to the list of reserved words. On line 72491, remove “[[“ and “]]” from the reserved word list. In section 2.6, after line 72671, add: “Expressions directly enclosed by [[ and ]] do not perform field splitting or pathname expansion; see the section on double bracket expressions for more information.” In section 2.9.4.1 (“Grouping Commands” inside “Compound Commands”), page 2321, add the following grouping command: [[ expression ]] Execute the expression in the current process environment, and return a status of 0 or 1. The expression is evaluated using the rules of DOUBLE BRACKET EXPRESSIONS. Later in the text, add the following: DOUBLE BRACKET EXPRESSIONS Inside [[ … ]], expressions are evaluated to return a status of 0 or 1. Field splitting and pathname expansion are not performed on the words directly enclosed between the [[ and ]], but other expansions are performed: tilde expansion, quote removal, parameter expansion, command substitution, and arithmetic expansion, as described in section 2.6 (word expansions). Conditional operators must be unquoted to be recognized as conditional operators; if they are quoted, they are not considered conditional operators inside double bracket expressions. All of the conditional operators of test are available, with the following exceptions and additions: 1. There must be at least two parameters. The single-parameter form (which was a shorthand for determining if a string is non-empty) is not accepted (use “-n” instead). 2. Support for the -a (and) and -o (or) operators are not required. They are replaced with && and \|\|, as described below. 3. “string == pattern”. Returns true if string matches the pattern, as described in “Pattern Matching Notation” (section 2.13). Otherwise, return false. 1. “string != pattern”. Return the logical negation of “string == pattern”. 2. “string = pattern”. Not defined in this specification; implementations are free to implement it the same as “==”. 3. “string =~ regex”. Return true (0) if string matches the extended regular expression regex, otherwise return false (1). 4. “string1 < string2”. Compare two strings lexicographically, using the current locale settings, and return true if string1 is less than string2. 5. “string1 > string2”. Compare two strings lexicographically, using the current locale settings, and return true if string1 is greater than string2. Expressions may be combined as follows (in decreasing order of precedence): 1. “( e )”. Returns the value of expression e. 2. “! e”. Returns true (0) if expression e is false, else returns false (1). 3. “e1 && e2”. Returns true if both e1 and e2 are true, else returns false. This is a short-circuit evaluation; if e1 is false, e2 is not evaluated. 4. “e1 \|\| e2”. Returns true if either e1 or e2 are true, else returns false. This is a short-circuit evaluation; if e1 is true, e2 is not evaluated. In the grammar of section 2.10.2, after line 73492, add the following: After 73505 (CLOBBER) and its comment, add: %token DLBRACKET DRBRACKET /* ’[[’ ‘]]’ / After line 73545 (“\| subshell”), add: \| double_bracket_expression After 73553 (definition of subshell), add: double_bracket_expression : DLBRACKET inner_db_expression DRBRACKET ; Rationale Add “==” to test This proposed change adds primary "s1 == s2" as a synonym for "s1 = s2". There are three major reasons for adding “==”: 1. Primary “==” is more visually distinct from assignment (“=”). Since “=” is also used for assignment in shell scripts, using “==” for “is equals” makes the comparison visually distinctive, making it clearer to readers that “is equals” is intended. 2. Allowing “==” for “is-equal-to” adds consistency with other programming languages that use “=” for assignment. Most languages that use “=” for assignment also use “==” for “is equals” so that these operations are more visually distinct. These include C, C++, Java, C#, Python, and Perl. It is oddly inconsistent that test/[ do not support “==” as well. Many languages (like Pascal) that use "=" for comparison use another spelling (like ":=") for assignment, again, to keep their spellings separate. In some cases these languages can always disambiguate from context, and even then, they intentionally do not use the same spelling. It's too late to get rid of "=" for comparison, but it's easy to add "==" as a synonym, which is what is proposed. 3. Primary “==” is already widely implemented in many implementations and is used in many shell scripts. This suggests that there is value in standardizing it. A counter-argument to adding “==” is that it is redundant with “=”. This is true, but there are many other redundancies in POSIX. For example, “[” is redundant with “test” but this is not considered a problem. In any case, it is a redundancy that is considered valuable by many; “=” came first, and implementers have added “==” since. It could be argued that perhaps “==” should mean “numeric equality” instead of “string equality”. However, “==” is already widely implemented and used for string equality, and there are no instances where it is implemented or used to mean numeric equality in a test/[ implementation. This universal agreement suggests that string equality is the right semantic to standardize. Obviously, there is no requirement that assignment and is-equal-to, be visually distinctive, since they are disambiguated by being part of test/[ or not. There is no technical ambiguity, since “=” only means “is-equal” inside test/[. But the many shell and test implementations which do support “==”; their many users, and the many other languages which do this, suggest that this is widely considered to be useful. This proposal is not a proposal that “everyone just switch to bash”. This is widely-implemented extension, not one implemented solely by bash. The primary “==” is a very widely-implemented synonym for “=” and in all cases it is implemented as a synonym for “=”. The “==” in test is already implemented in the following implementations: 1. GNU bash: Supports ==. 2. GNU coreutils “test”: Added "==" support on 2011-03-22. 3. ash: Supports ==. 4. pdksh (public domain korn shell): Supports ==, see http://web.cs.mun.ca/~michael/pdksh/pdksh-man.html (Note that some system’s “ksh” is actually pdksh). 5. mksh (MirBSD(TM) Korn Shell): Supports ==. See http://www.mirbsd.org/mksh.htm 6. OpenBSD's /bin/sh: Supports "==" (it's not documented, but it DOES work). 7. FreeBSD-current's /bin/sh and /bin/test have recently added "==". See http://svn.freebsd.org/base/head/bin/test/test.c. 8. busybox ash: Supports ==. This is particularly remarkable, since busybox is designed for relatively small systems and emphasizes small code size. Yet even busybox implements “==”. A few implementations do not support “==”, but even in those cases it tends to be trivial to add: 1. NetBSD’s sh doesn't support "==", but a patch has been submitted to add it. The last comment (2011-03-18) on it was positive, but is is not clear what they will do with it: http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=44733. However, if this is added to POSIX, it is likely to be added to NetBSD sh. 2. The dash shell does not support "==", but doing so is a one-line patch. This patch is at http://permalink.gmane.org/gmane.comp.shells.dash/498 and was submitted on 2011-03-06. The developers seemed to agree that if POSIX added "==" as a requirement, dash would implement it. This proposal has no effect on the official ksh from AT&T; ksh doesn't have test/[ built-in, so it simply uses the underlying implementation of test/[. . Note that some systems have a “ksh” that is actually a pdksh. AT&T ksh does have “[[“ and inside this it does support “==” as a synonym for “=”; in fact, it considers “=” obsolete inside “[[“. Add “-nt” (newer-than) and “-ot” (older-than) to test This proposal adds primaries -nt (newer-than) and -ot (older-than) for comparing modification times. Determining if something should be done, based on whether or not one file is newer than another, is a common operation. Thus, it makes sense to include the ability to easily compare modification times of filesystem objects. It is possible to get the same effect using the standard mechanisms using awkward expressions such as [ "$(find 'pathname1' -prune -newer 'pathname2')" ]. However, this is not at all clear, and is much more complicated. This extension is widely implemented, and this proposal adds it the standard. One challenge is that there is some disagreement on what the semantics should be if files do not exist. Possibilities for the standard are: 1. Both files must exist for a “true” result. This is the semantic of “dash” and some other implementations. This can be expressed as: 1. pathname1 −nt pathname2:True if pathname1 and pathname2 exist and pathname1 is newer than pathname2 according to their modification times; otherwise false. 2. pathname1 −ot pathname2: True if pathname1 and pathname2 exist and pathname1 is older than pathname2 according to their modification times; otherwise false. 1. A nonexistent file is considered older than a file that does exist. This is the semantic of bash and current pdksh. Note that pdksh version 5.2.14 switched to this semantic in 1999, suggesting that there was value to this particular semantic. This semantic can be expressed as: 1. pathname1 -nt pathname2: True if pathname1 exists and pathname2 does not, or if both exist and pathname1 is newer than pathname2 according to their modification times; otherwise, false. (Note that if pathname1 does not exist, the result is false.) 2. pathname1 -ot pathname2: True if pathname2 exists and pathname1 does not, or if both exist and pathname1 is older than pathname2 according to their modification times; otherwise, false. (Note that if pathname2 does not exist, the result is false.) 1. Allow either semantic. An example would be: 1. pathname1 −nt pathname2:True if both pathname1 and pathname2 exist and pathname1 is newer than pathname2 according to their modification times. False if pathname1 does not exist. Otherwise, it is unspecified if it returns true or false. 2. pathname1 −ot pathname2: True if both pathname1 and pathname2 exist and pathname1 is older than pathname2 according to their modification times. False if pathname2 does not exist. Otherwise, it is unspecified if it returns true or false. An argument for option 1 is that its description is slightly simpler. But it is not much simpler. The proposal here recommends option 2, namely, that nonexistent files be considered older, for the following reasons: 1. This makes it simple to express the case where a file “overrides” an older file, as a file that exists is considered newer than a file that does not exist. 2. Tighter semantics are in general desirable, where practical. 3. Since pdksh intentionally switched to this semantic, this suggests that this is a more useful semantic. An argument for option 3 is that no one has to change their implementation to match. If option 2 is not accepted, option 3 would be a reasonable alternative, especially since there would always be the option to tighten up the semantics in some future version of POSIX if necessary. Add “-ef” to test In many cases it is useful to know if two different filenames refer to the same file. For example, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30838 reports on a shell script “gen-classlist.sh” with the following line, so that certain actions will only occur if two different directory names refer to different directories: if test ! "${top_builddir}" -ef "@top_srcdir@"; then This text is worded as “refer to the same file” instead of simply “are hard linked,” as this is what extant implementations actually do. In particular, if files symbolically link to the same eventual file, comparing them with “-ef” should produce “true”. Austin group defect report 375, reply 670, reports that: touch a; ln -s a b; test a -ef b sets $? to 0 on at least bash and GNU coreutils test (at least). This primary is currently implemented in at least bash, busybox sh, and GNU coreutils test. Add “[[” The test/[ operator can sometimes be difficult to use correctly. Word splitting and pathname expansion can require many quote characters to do simple comparisons. Longer expressions (involving “-a” or “-o”) can be misinterpreted. The “<” and “>” comparisons do not need to be quoted (and unless various test extensions are done according to the locale). Perhaps most concerningly, it is not particularly simple to compare strings with various text patterns using test/[. Developers sometimes use “case” to compare variables with a pattern, simply because test does not include a mechanism for doing so. And “case” only supports the simple globbing scheme, not the far more capable regular expression pattern-matching mechanism. The “[[“ adds these pattern-matching capabilities for when it is needed. The grammar given above stops at the point where “test” is no longer defined in a grammar. The proposed semantics are based on the “[[“ implementations of bash (see http://www.gnu.org/s/bash/manual/html_node/Conditional-Constructs.html#Conditional-Constructs), pdksh (see http://web.cs.mun.ca/~michael/pdksh/pdksh-man.html) , and AT&T ksh93 (http://www2.research.att.com/sw/download/man/man1/ksh.html). Alternative proposals Adding “<” and “>” to test Early versions of this proposal also proposed this:. In the text of test, circa page 3224, add the following primary definitions: 1. s1 < s2 True if the string s1 is lexicographically less than s2; otherwise, false. 2. s1 > s2 True if the string s1 is lexicographically greater than s2; otherwise, false. However, comment #670 by eblake on 2011-02-07 (see http://austingroupbugs.net/view.php?id=375 comment #670) made some good points about the problems with these primaries. He noted that < and > inside test/[ must be quoted. Also, existing implementations often fail to implement locale-specific collation. Thus, as recommended by eblake, an effort has been made to standardize [[, where < and > do not have to be quoted, and where collation is done according to locale. Appendix A: Interpretation of -nt and -ot Unfortunately, there are differences in how -nt and -ot are implemented in different shells. This appendix shows the differences in detail, to help justify the options and the one selected above. http://austingroupbugs.net/view.php?id=375 bugnote 975 includes a report from gber on 2011-09-25 stating: “It should be noted that there are widespread implementations of test -nt/-ot with different and incompatible semantics in FreeBSD/NetBSD/OpenBSD and dash. These test implementations all trace their roots to the test builtin of pdksh before version 5.2.14, the difference to the behavior described above is that test will return failure in case the second file does not exist. The test implementations with this behavior have been used by NetBSD since 1994 and by FreeBSD since 1999 and it seems to have been used by dash since the first Linux port of ash in 1993.” In particular, pdksh trunk changed its semantics in 1999 with this changelog entry from http://web.cs.mun.ca/~michael/pdksh/ChangeLog : Wed Jun 30 17:42:54 NDT 1999 Michael Rendell (michael@lyman.cs.mun.ca) c_test.c(test_eval): changed -nt/-ot tests so they succeed if pathname2 (pathname2) `does not exist' (ie, the stat fails). (based on fix from Dave Hillman). To determine various systems’ behavior, the following script was run in a directory with files “n” (newer) and “o” (older), and no such files named 1 or 2: result() { if [ "$?" = 0 ] ; then echo "t" else echo "f" fi } # files 1 and 2 don't exit. File "o" is older than file "n" (newer): ITEMS="1 o n" echo "Smoke test: Produce false and true:" false ; result true ; result echo "test -nt, for files $ITEMS:" for left in $ITEMS ; do for right in $ITEMS 2 ; do if ! [ "$right" = "2" ] \|\| [ "$left" = "1" ] ; then printf "%s -nt %s: " "$left" "$right" test $left -nt $right ; result fi done done echo "test -ot, for files $ITEMS:" for left in $ITEMS ; do for right in $ITEMS 2 ; do if ! [ "$right" = "2" ] \|\| [ "$left" = "1" ] ; then printf "%s -ot %s: " "$left" "$right" test $left -ot $right ; result fi done done The following are produced by GNU bash 4.1.10(4), GNU coreutils test, and pdksh version 5.2.14: Smoke test: Produce false and true: f t test -nt, for files 1 o n: 1 -nt 1: f 1 -nt o: f 1 -nt n: f 1 -nt 2: f o -nt 1: t o -nt o: f o -nt n: f n -nt 1: t n -nt o: t n -nt n: f test -ot, for files 1 o n: 1 -ot 1: f 1 -ot o: t 1 -ot n: t 1 -ot 2: f o -ot 1: f o -ot o: f o -ot n: t n -ot 1: f n -ot o: f n -ot n: f The following is produced by dash version 0.5.6.1: Smoke test: Produce false and true: f t test -nt, for files 1 o n: 1 -nt 1: f 1 -nt o: f 1 -nt n: f 1 -nt 2: f o -nt 1: f o -nt o: f o -nt n: f n -nt 1: f n -nt o: t n -nt n: f test -ot, for files 1 o n: 1 -ot 1: f 1 -ot o: f 1 -ot n: f 1 -ot 2: f o -ot 1: f o -ot o: f o -ot n: t n -ot 1: f n -ot o: f n -ot n: f The output of “diff -u ,bash ,dash” is, briefly (the results of “bash” are shown with “-” while the results of dash are shown with “+”): 1 -nt o: f 1 -nt n: f 1 -nt 2: f -o -nt 1: t +o -nt 1: f o -nt o: f o -nt n: f -n -nt 1: t +n -nt 1: f n -nt o: t n -nt n: f test -ot, for files 1 o n: 1 -ot 1: f -1 -ot o: t -1 -ot n: t +1 -ot o: f +1 -ot n: f 1 -ot 2: f o -ot 1: f o -ot o: f Extendingshellconditionals.txt (23,138 bytes) Extendingshellconditionals-2011-11-23.pdf (368,158 bytes) Extendingshellconditionals-2011-11-23.odt (97,978 bytes) Extendingshellconditionals-2011-11-23.txt (30,171 bytes) Extending shell conditionals (DRAFT) 2011-11-23 This white paper proposes various extensions to the existing POSIX shell conditional statements. It supports Austin group defect report 375 (http://austingroupbugs.net/view.php?id=375), in particular reply 967. This is a draft. As it is spread more widely, it is expected to change. The intent is to provide a single location where the issues can be discussed in an organized fashion. To the extent possible under law, the contributors to this document have waived all copyright and released it to the copyright public domain, under the terms of the Creative Commons CC0 waiver: http://creativecommons.org/choose/zero/waiver This way, any of its material can be used by the standards developers (or others) in any way they desire. Table of contents: Overview Issue Background Requirements Importance Proposed changes to the POSIX specification Add “==” to test Add “-nt” (newer-than) and “-ot” (older-than) to test Add “-ef” to test Add “[[” Rationale Add “==” to test Add “-nt” (newer-than) and “-ot” (older-than) to test Add “-ef” to test Add “[[” Alternative proposals Adding “<” and “>” to test Appendix A: Interpretation of -nt and -ot Overview Issue Many implementations of "test" (aka "["), including shell built-ins, implement conditionals beyond those specified in the current version of POSIX. What's more, many extant programs rely on these extensions. This proposal recommends formally adding these widely-implemented extensions to the POSIX specification itself, as these extensions have become widespread and are ready to be standardized. Each of these additions is described separately, since they can be treated separately. Background Austin group defect report 375 (http://austingroupbugs.net/view.php?id=375) (“Extend test/[...] conditionals: ==, <, >, -nt, -ot, -ef”) proposed extensions to POSIX "test". It proposed adding certain widely-implemented and widely-used extensions of "test" to the POSIX standard. This defect report was discussed at the September 8, 2011 teleconference meeting and it was agreed that the submitter should “produce a whitepaper expanding the proposal (similar to proposals made in the past, for example the LFS proposal, http://www.unix.org/version2/whatsnew/lfs20mar.html (Adding Support for Arbitrary File Sizes to the Single UNIX Specification). This could then be widely circulated amongst all interested parties to look for consensus. The standard developers recommend that the white paper should pay particular attention to note 670.” This document is the whitepaper requested by the Austin group. This whitepaper attempts to expand the proposal so that it can be “widely circulated amongst all interested parties to look for consensus”.. It attempts to respond to standards developers recommendations, in particular, to pay “particular attention to note 670.” This document was developed by David A. Wheeler (dwheeler, at, dwheeler, dot com), based on feedback on the Austin group bug tracker and mailing list. The first version of this document was dated 2011-11-15. It has been changed since that time due to various feedback, in particular, historical corrections from David Korn (especially his email on November 16, 2011) and suggestions from Geoff Clare. Requirements These proposals are only proposed because they meet the following requirements: 1. Are already implemented in at least one implementation. 2. Are used in existing programs/scripts. 3. Are easily implemented. Importance All of these proposals can be implemented in other ways, but their omission in POSIX can render otherwise-compatible scripts non-conforming. Some of these extensions are identified as "bashisms" in pages such as http://mywiki.wooledge.org/Bashism, but in fact these are widely implemented and/or depended upon, whether or not bash is used. Their widespread use and implementation suggests that they are ready to be added to POSIX itself. Proposed changes to the POSIX specification Add “==” to test In the text of test, circa page 3224, add the following primary definition: 1. s1 == s2 True if the strings s1 and s2 are identical; otherwise, false. This primary is equivalent to s1 = s2. Add “-nt” (newer-than) and “-ot” (older-than) to test In the text of test, circa page 3224, add the following primary definition: 1. pathname1 -nt pathname2 True if pathname1 resolves to an existing file and pathname2 does not, or if both resolve to an existing file and pathname1 is newer than pathname2 according to their last data modification timestamps; otherwise, false. If either is a symbolic link, the target of the symbolic link is used (not the symbolic link itself). 2. pathname1 -ot pathname2 True if pathname2 resolves to an existing file and pathname1 does not, or if both resolve to an existing file and pathname1 is older than pathname2 according to their last data modification timestamps; otherwise, false. If either is a symbolic link, the target of the symbolic link is used (not the symbolic link itself). Add “-ef” to test In the text of test, circa page 3224, add the following primary definition: 1. "pathname1 -ef pathname2". True if pathname1 and pathname2 name the same file, otherwise, the result is false. Add “[[” In XCU section 2.4, line 72478, add “[[“ and “]]” to the list of reserved words. On line 72491, remove “[[“ and “]]” from the reserved word list. In section 2.6, extend the first paragraph to cover double bracket extensions. After “Not all expansions are performed on every word, as explained in the following sections” add “and in section 2.9.x Double Bracket Expressions”. In section 2.9, add a how new third-level subsection “Double bracket expressions” after the “function definition” section, e.g., as a new 2.9.6: DOUBLE BRACKET EXPRESSIONS A “double bracket expression” is the reserved word “[[” followed by one or more words and then the reserved word “]]”, optionally followed by redirections, terminated by a control operator. A double bracket expression shall be evaluated to return a status of 0 (true) or 1 (false). Syntactic examination of the expression to identify conditional operators and expression operators (parentheses, !, &&, and \|\|) shall be done before any expansions; text that is quoted or is the result of an expansion shall not be recognized as a conditional operator or expression operator when it is directly contained in a double bracket expression. Field splitting and pathname expansion shall not be performed on the words directly enclosed between the [[ and ]], but other expansions are performed as described in section 2.6 (word expansions). All of the conditional operators of test shall be available in a double bracket expression, except that an implementation may (but need not) support the following: 1. [[ string ]] and [[ ! string ]]. For maximum portability applications must use alternatives such as -n, -z, or comparing a string with "". 2. The -a (and) and -o (or) operators; see the description of && and \|\| below. 3. “s1 = s2”. In addition, an implementation shall support at least the following additional conditional operators inside a double bracket expression: 1. “string == pattern”. Returns true if string matches the pattern as described in “Pattern Matching Notation” (section 2.13); otherwise it returns false. 1. “string != pattern”. Returns the logical negation of “string == pattern”. 2. “string =~ regex”. Returns true (0) if string matches the extended regular expression regex, otherwise return false (1). 3. “string1 < string2”. Compare two strings lexicographically, using the current locale settings, and returns true if string1 is less than string2 (otherwise return false). 4. “string1 > string2”. Compare two strings lexicographically, using the current locale settings, and return true if string1 is greater than string2 (otherwise return false). In pattern-matching operations (==, !=, and =~), pattern metacharacters (such as “” in globbing and “.” in regular expressions) shall only be considered metacharacters if they are outside "..." and '...'. Inside"..." and '...' they are considered escaped and only match themselves. For example: pattern='' [[ string == $pattern ]] # matches [[ string == "$pattern" ]] # does not match because the * is literal Expressions shall be combined as follows using the following expression operators, in decreasing order of precedence: 1. “( e )”. Returns the value of expression e. 2. “! e”. Returns true (0) if expression e is false, else returns false (1). 3. “e1 && e2”. Returns true if both e1 and e2 are true, else returns false. This is a short-circuit evaluation; if e1 is false, e2 shall not be evaluated. 4. “e1 \|\| e2”. Returns true if either e1 or e2 are true, else returns false. This is a short-circuit evaluation; if e1 is true, e2 shall not be evaluated. In the grammar of section 2.10.2, after line 73492, add the following: In line 73515-73516, add ‘[[‘ and ‘]]’ by replacing those lines with: %token Lbrace Rbrace Bang Ldbracket Rdbracket /* ’{’ ’}’ ’!’ ’[[’ ’]]’ / After line 73542, after function_definition, add as a new type of command: \| double_bracket_expression Before 73617 (definition of simple_command), add: double_bracket_expression : Ldbracket wordlist Rdbracket \| Ldbracket wordlist Rdbracket redirect_list ; Rationale Add “==” to test This proposed change adds primary "s1 == s2" as a synonym for "s1 = s2". There are three major reasons for adding “==”: 1. Primary “==” is more visually distinct from assignment (“=”). Since “=” is also used for assignment in shell scripts, using “==” for “is equals” makes the comparison visually distinctive, making it clearer to readers that “is equals” is intended. 2. Allowing “==” for “is-equal-to” adds consistency with other programming languages that use “=” for assignment. Most languages that use “=” for assignment also use “==” for “is equals” so that these operations are more visually distinct. These include C, C++, Java, C#, Python, and Perl. It is oddly inconsistent that test/[ do not support “==” as well. Many languages (like Pascal) that use "=" for comparison use another spelling (like ":=") for assignment, again, to keep their spellings separate. In some cases these languages can always disambiguate from context, and even then, they intentionally do not use the same spelling. It's too late to get rid of "=" for comparison, but it's easy to add "==" as a synonym, which is what is proposed. 3. Primary “==” is already widely implemented in many implementations and is used in many shell scripts. This suggests that there is value in standardizing it. A counter-argument to adding “==” is that it is redundant with “=”. This is true, but there are many other redundancies in POSIX. For example, “[” is redundant with “test” but this is not considered a problem. In any case, it is a redundancy that is considered valuable by many; “=” came first, and implementers have added “==” since. It could be argued that perhaps “==” should mean “numeric equality” instead of “string equality”. However, “==” is already widely implemented and used to mean string equality. In contrast, there are no instances where it is implemented or used to mean numeric equality in a test/[ or shell implementation. This universal agreement strongly suggests that string equality is the right semantic to standardize. Obviously, there is no requirement that assignment and is-equal-to be visually distinctive, since they are disambiguated by being part of test/[ or not. A “=” only means “is-equal” inside test/[, and it only means “assignment” as a shell command. But many shell and test implementations do support “==”; their many users, and the many other languages which do this, suggest that this is widely considered to be useful. This proposal is not a proposal that “everyone just switch to bash”. This is widely-implemented extension, not one implemented solely by bash, and it is used by those who do not use bash. The primary “==” is a very widely-implemented synonym for “=” and in all cases it is implemented as a synonym for “=”. The “==” in test is already implemented in the following implementations: 1. GNU bash: Supports ==. 2. GNU coreutils “test”: Added "==" support on 2011-03-22. 3. ash: Supports ==. 4. pdksh (public domain korn shell): Supports ==, see http://web.cs.mun.ca/~michael/pdksh/pdksh-man.html (Note that some system’s “ksh” is actually pdksh). 5. mksh (MirBSD(TM) Korn Shell): Supports ==. See http://www.mirbsd.org/mksh.htm 6. OpenBSD's /bin/sh: Supports "==" (it's not documented, but it DOES work). 7. FreeBSD-current's /bin/sh and /bin/test have recently added "==". See http://svn.freebsd.org/base/head/bin/test/test.c. 8. busybox ash: Supports ==. This is particularly remarkable, since busybox is designed for relatively small systems and emphasizes small code size. Yet even busybox implements “==”. 9. AT&T ksh, see below. A few implementations do not support “==”, but even in those cases it tends to be trivial to add: 1. NetBSD’s sh doesn't support "==", but a patch has been submitted to add it. The last comment (2011-03-18) on it was positive, but is is not clear what they will do with it: http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=44733. However, if this is added to POSIX, it is likely to be added to NetBSD sh. 2. The dash shell does not support "==", but doing so is a one-line patch. This patch is at http://permalink.gmane.org/gmane.comp.shells.dash/498 and was submitted on 2011-03-06. The developers seemed to agree that if POSIX added "==" as a requirement, dash would implement it. An older version of this proposal stated that it had “no effect on the official ksh from AT&T; ksh doesn't have test/[ built-in, so it simply uses the underlying implementation of test/[. . Note that some systems have a ‘ksh’ that is actually a pdksh.” However, David Korn reported in an email (to austin-group-l at opengroup.org dated November 16, 2011) that, “This is not true. test has been a built-in from day 1 of ksh. Moreover, == is supported as a synonym for =”. AT&T ksh does have “[[“ and inside this it does support “==” as a synonym for “=”; in fact, it considers “=” obsolete inside “[[“. Add “-nt” (newer-than) and “-ot” (older-than) to test This proposal adds primaries -nt (newer-than) and -ot (older-than) for comparing modification timestamps. Determining if something should be done, based on whether or not one file is newer than another, is a common operation. Thus, it makes sense to include the ability to easily compare modification times of filesystem objects. It is possible to get the same effect using the standard mechanisms using awkward expressions such as [ "$(find 'pathname1' -prune -newer 'pathname2')" ]. However, this is not at all clear, and is much more complicated. This extension is widely implemented, and this proposal adds it the standard. An older version of this proposal defined the semantic as checking if a file “existed”. However, Geoff Clare pointed out on November 18, 2011, that ksh and bash (at least) do not distinguish between non-existence (ENOENT) from the stat() errors EACCES, ENOTDIR, and ELOOP (at least), and that is probably true for all stat() errors. Thus, the proposal was tweaked to state if the file “resolves to an existing file” instead of “exists” in the proposed text. Also, the phrase “modification time” was clarified to “last data modification timestamps”. One challenge is that there is some disagreement on what the semantics should be if files do not exist or have stat errors. For purposes of this issue, we will ignore the distinction between “do not exist” and “have a stat error” (as that is a separate issue), and simply say “exist” as that is easier to say. Possibilities for the standard are: 1. Both files must exist for a “true” result. This is the semantic of “dash” and some other implementations. This can be expressed as: 1. pathname1 −nt pathname2:True if pathname1 and pathname2 exist and pathname1 is newer than pathname2 according to their modification times; otherwise false. 2. pathname1 −ot pathname2: True if pathname1 and pathname2 exist and pathname1 is older than pathname2 according to their modification times; otherwise false. 1. A nonexistent file is considered older than a file that does exist. This is the semantic of bash and current pdksh. Note that pdksh version 5.2.14 switched to this semantic in 1999, suggesting that there was value to this particular semantic. It’s also the semantic of the original KornShell (though the KornShell book incorrectly says otherwise; see below). This semantic can be expressed as: 1. pathname1 -nt pathname2: True if pathname1 exists and pathname2 does not, or if both exist and pathname1 is newer than pathname2 according to their modification times; otherwise, false. (Note that if pathname1 does not exist, the result is false.) 2. pathname1 -ot pathname2: True if pathname2 exists and pathname1 does not, or if both exist and pathname1 is older than pathname2 according to their modification times; otherwise, false. (Note that if pathname2 does not exist, the result is false.) 1. Allow either semantic. An example would be: 1. pathname1 −nt pathname2:True if both pathname1 and pathname2 exist and pathname1 is newer than pathname2 according to their modification times. False if pathname1 does not exist. Otherwise, it is unspecified if it returns true or false. 2. pathname1 −ot pathname2: True if both pathname1 and pathname2 exist and pathname1 is older than pathname2 according to their modification times. False if pathname2 does not exist. Otherwise, it is unspecified if it returns true or false. An argument for option 1 is that its description is slightly simpler. But it is not much simpler. The proposal here recommends option 2, namely, that nonexistent files be considered older, for the following reasons: 1. This makes it simple to express the case where a file “overrides” an older file, as a file that exists is considered newer than a file that does not exist. 2. Tighter semantics are in general desirable, where practical. 3. Since pdksh intentionally switched to this semantic, this suggests that this is a more useful semantic. An argument for option 3 is that no one has to change their implementation to match. If option 2 is not accepted, option 3 would be a reasonable alternative, especially since there would always be the option to tighten up the semantics in some future version of POSIX if necessary. David Korn reported in an email (to austin-group-l at opengroup.org dated November 16, 2011) some useful history on -nt and -ot. In this email, he stated that the behavior “of -nt and -ot when file1 and file2 did not exist was not well documented the KornShell book upon which the standard is based. It was documented incorrectly in the New KornShell book published in 1995 [as] [[ file1 -nt file2 ]] is true if file1 is newer than file2 or file2 does not exits. [[ file1 -ot file2 ]] is true if file1 is older than file2 or file2 does not exits. The [[ file1 -ot file2 ]] should be true if file1 doesn't exist, not file2. Thus if [[ file1 nt -file2 ]] is true, then [[ file1 -ot -file2 ]] must be false even if file1 file2 do not exist. If both do not exist, then they must both be false.” Thus, KornShell implemented the semantics as proposed (option 2), even though the KornShell book incorrectly says otherwise. Add “-ef” to test In many cases it is useful to know if two different filenames refer to the same file. For example, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30838 reports on a shell script “gen-classlist.sh” with the following line, so that certain actions will only occur if two different directory names refer to different directories: if test ! "${top_builddir}" -ef "@top_srcdir@"; then This text is worded as “refer to the same file” instead of simply “are hard linked,” as this is what extant implementations actually do. In particular, if files symbolically link to the same eventual file, comparing them with “-ef” should produce “true”. Austin group defect report 375, reply 670, reports that: touch a; ln -s a b; test a -ef b sets $? to 0 on at least bash and GNU coreutils test (at least). This primary is currently implemented in at least bash, busybox sh, and GNU coreutils test. Add “[[” The test/[ operator can sometimes be difficult to use correctly. Word splitting and pathname expansion can require many quote characters to do simple comparisons. Longer expressions (involving “-a” or “-o”) can be misinterpreted, especially if an expansion produces a value that looks like a primary (e.g., “-z”). The common extension comparisons “<” and “>” must to be quoted if they are used at all, and implementations differ on how locale affects these comparisons inside test/[.. Perhaps most concerningly, it is overly difficult to compare strings with text patterns using test/[. Developers sometimes use “case” to compare variables with a globbing pattern, because test does not include a mechanism for doing so. And “case” only supports the simple globbing scheme, not the far more capable regular expression pattern-matching mechanism. Adding “[[“ adds a way to perform tests that are less error-prone, as well as adding various useful capabilities such as pattern-matching (both globbing and regular expressions) and lexical comparison. What’s more, these are already in use. An older version of this proposal added “[[“ as a grouping command, but it is not really a grouping command. Instead, it is a way to compute expressions, and thus it doesn't really fit with the way the standard uses the term “grouping command” (the specification says they "provide control flow for commands"). It is not a simple command either, because preceding it with assignments or redirections causes it not to be recognised. For example, in ksh: $ foo=bar [[ x == x ]] -ksh: [[: not found [No such file or directory] $ > /tmp/foo [[ x == x ]] -ksh: [[: not found [No such file or directory] The text is worded to make it clear that in [[...]], if a word evaluates to a conditional operator (such as “-z”) it will not be considered an operator. This is different from test/[, which is an advantage of [[...]]. For example: $ [ $(printf '%s\n' -z) "" ]; echo $? 0 $ [[ $(printf '%s\n' -z) "" ]]; echo $? -ksh: syntax error: `' unexpected The grammar given above stops at the point where “test” is no longer defined in a grammar. The effects of quoting and expansion on operators that take patterns is not well documented in ksh’s man page nor in bash’s info page. This is not as simple as just saying that quoting characters within the pattern preserves their literal value, because backslash is still special within "..." but not '...'. The effects of quoting also apply to special characters resulting from expansions. For example: pattern='' [[ string == $pattern ]] # matches [[ string == "$pattern" ]] # does not match because the * is literal This is also true for regular expression matching using “=~” using both ksh and bash: pattern="." [[ stuff =~ . ]] && echo true # prints true [[ stuff =~ $pattern ]] && echo true # prints true [[ stuff =~ "." ]] && echo true # Does NOT print true [[ stuff =~ "$pattern" ]] && echo true # Does NOT print true [[ stuff =~ '.' ]] && echo true # Does NOT print true The proposed semantics are based on the “[[“ implementations of bash (see http://www.gnu.org/s/bash/manual/html_node/Conditional-Constructs.html#Conditional-Constructs), pdksh (see http://web.cs.mun.ca/~michael/pdksh/pdksh-man.html) , and AT&T ksh93 (http://www2.research.att.com/sw/download/man/man1/ksh.html). Implementations are free to implement “=” inside a double bracket expression with the same semantics as the double bracket conditional pattern matching operator “==”. It is not defined, since ksh (at least) identifies “=” as obsolete inside [[...]]. The proposal given here includes “! expression”. However, it does not require support for the one-parameter “string” or “! string” as a valid expression. Many implementations interpret a “string” all by itself as true if non-null, and false if an empty string. These were not included on the theory that there are alternative ways of expressing this that are much clearer: -n string -z string string == "" string != "" If the standards body prefers, one-parameter “string” and “! string” could be easily added to the proposal. The text has been worded so that single parameter “string” and “! string” are valid nonstandard extensions. Alternative proposals Adding “<” and “>” to test Early versions of this proposal also proposed this:. In the text of test, circa page 3224, add the following primary definitions: 1. s1 < s2 True if the string s1 is lexicographically less than s2; otherwise, false. 2. s1 > s2 True if the string s1 is lexicographically greater than s2; otherwise, false. However, comment #670 by eblake on 2011-02-07 (see http://austingroupbugs.net/view.php?id=375 comment #670) made some good points about the problems with these primaries. He noted that < and > inside test/[ must be quoted. Also, existing implementations often fail to implement locale-specific collation with these primaries. Thus, as recommended by eblake, an effort has been made to standardize [[, where < and > do not have to be quoted, and where collation is always done according to locale. Appendix A: Interpretation of -nt and -ot Unfortunately, there are differences in how -nt and -ot are implemented in different shells. This appendix shows the differences in detail, to help justify the options and the one selected above. http://austingroupbugs.net/view.php?id=375 bugnote 975 includes a report from gber on 2011-09-25 stating: “It should be noted that there are widespread implementations of test -nt/-ot with different and incompatible semantics in FreeBSD/NetBSD/OpenBSD and dash. These test implementations all trace their roots to the test builtin of pdksh before version 5.2.14, the difference to the behavior described above is that test will return failure in case the second file does not exist. The test implementations with this behavior have been used by NetBSD since 1994 and by FreeBSD since 1999 and it seems to have been used by dash since the first Linux port of ash in 1993.” In particular, pdksh trunk changed its semantics in 1999 with this changelog entry from http://web.cs.mun.ca/~michael/pdksh/ChangeLog : Wed Jun 30 17:42:54 NDT 1999 Michael Rendell (michael@lyman.cs.mun.ca) * c_test.c(test_eval): changed -nt/-ot tests so they succeed if pathname2 (pathname2) `does not exist' (ie, the stat fails). (based on fix from Dave Hillman). To determine various systems’ behavior, the following script was run in a directory with files “n” (newer) and “o” (older), and no such files named 1 or 2: result() { if [ "$?" = 0 ] ; then echo "t" else echo "f" fi } # files 1 and 2 don't exit. File "o" is older than file "n" (newer): ITEMS="1 o n" echo "Smoke test: Produce false and true:" false ; result true ; result echo "test -nt, for files $ITEMS:" for left in $ITEMS ; do for right in $ITEMS 2 ; do if ! [ "$right" = "2" ] \|\| [ "$left" = "1" ] ; then printf "%s -nt %s: " "$left" "$right" test $left -nt $right ; result fi done done echo "test -ot, for files $ITEMS:" for left in $ITEMS ; do for right in $ITEMS 2 ; do if ! [ "$right" = "2" ] \|\| [ "$left" = "1" ] ; then printf "%s -ot %s: " "$left" "$right" test $left -ot $right ; result fi done done The following are produced by GNU bash 4.1.10(4), GNU coreutils test, and pdksh version 5.2.14: Smoke test: Produce false and true: f t test -nt, for files 1 o n: 1 -nt 1: f 1 -nt o: f 1 -nt n: f 1 -nt 2: f o -nt 1: t o -nt o: f o -nt n: f n -nt 1: t n -nt o: t n -nt n: f test -ot, for files 1 o n: 1 -ot 1: f 1 -ot o: t 1 -ot n: t 1 -ot 2: f o -ot 1: f o -ot o: f o -ot n: t n -ot 1: f n -ot o: f n -ot n: f The following is produced by dash version 0.5.6.1: Smoke test: Produce false and true: f t test -nt, for files 1 o n: 1 -nt 1: f 1 -nt o: f 1 -nt n: f 1 -nt 2: f o -nt 1: f o -nt o: f o -nt n: f n -nt 1: f n -nt o: t n -nt n: f test -ot, for files 1 o n: 1 -ot 1: f 1 -ot o: f 1 -ot n: f 1 -ot 2: f o -ot 1: f o -ot o: f o -ot n: t n -ot 1: f n -ot o: f n -ot n: f The output of “diff -u ,bash ,dash” is, briefly (the results of “bash” are shown with “-” while the results of dash are shown with “+”): 1 -nt o: f 1 -nt n: f 1 -nt 2: f -o -nt 1: t +o -nt 1: f o -nt o: f o -nt n: f -n -nt 1: t +n -nt 1: f n -nt o: t n -nt n: f test -ot, for files 1 o n: 1 -ot 1: f -1 -ot o: t -1 -ot n: t +1 -ot o: f +1 -ot n: f 1 -ot 2: f o -ot 1: f o -ot o: f Extendingshellconditionals-2011-11-23.txt (30,171 bytes) Extendingshellconditionals-2011-11-28.odt (53,716 bytes) Extendingshellconditionals-2011-11-28.pdf (248,508 bytes) Extendingshellconditionals-2013-10-24.odt (58,679 bytes) Extendingshellconditionals-2013-10-24.pdf (256,955 bytes) Extendingshellconditionals-2013-11-29.odt (60,345 bytes) Extendingshellconditionals-2013-11-29-track-changes.pdf (1,443,146 bytes) Extendingshellconditionals-2013-11-29.pdf (1,300,831 bytes) Extendingshellconditionals-2013-12-08.odt (64,218 bytes) Extendingshellconditionals-2013-12-08.pdf (1,269,805 bytes) Extendingshellconditionals-2013-12-09.odt (65,088 bytes) Extendingshellconditionals-2013-12-09.pdf (1,258,736 bytes)

dwheeler 2011-02-07 18:49 reporter bugnote:0000666	Since this is a request for new functionality in the specification, I believe "Severity" should be "Objection" (not "editorial") and "Type" should be "Enhancement Request" (not "Clarification requested"). Sorry for the confusion.

~~Don Cragun~~ 2011-02-07 20:23 viewer bugnote:0000667 Last edited: 2011-02-07 22:22	Severity, Type, Description, and Desired Action changed as requested by submitter.

dwheeler 2011-02-07 21:56 reporter bugnote:0000668	Note: In the "Desired Action" text above, "file1" and "file2" should be "pathname1" and "pathname2" respectively.

dwheeler 2011-02-07 22:50 reporter bugnote:0000669	One issue that needs discussion is if "<" and ">" should be affected by the current locale's collation sequence (LC_COLLATE if set, else LANG/LC_ALL). Unfortunately, historical implementations seem to ignore the collation setting. So I recommend a compromise position, adding to each operator the statement that "The effective collation sequence must be set to POSIX or C; whether or not another setting is applied is implementation-defined." Other options are certainly possible; Here's the background. The GNU bash and ksh93 documentation lead me to believe that they IGNORE the collation settings when using "<" or ">" inside "test" and "[". GNU bash 4 documentation suggests that it does pay attention to the collation setting if you use its nonstandard "[[" extension, but not with "test" or "[". A quick test of GNU bash leads me to believe that its documentation (http://www.gnu.org/software/bash/manual/bashref.html) is correct - LC_COLLATE is ignored by these two operators. If the POSIX group decides that these operators should match historical precedent, then this should be specifically documented. E.G., for both "<" and ">", add: "This test operates as if LC_COLLATE is set to POSIX." That's the easy thing to do. Now, it would make sense if the proposed "<" and ">" primitives were affected by the current locale's collation sequence (LC_COLLATE if set, else LANG/LC_ALL). If a shell implementer doesn't want to include a collation library in their shell, yet "test" is a built-in, their shell could implement "<" and ">" as a call out to the 'test' command (and then have test implement the collation defined by the locale). You could make the case that ignoring these settings is a bug in current implementations, as "<" and ">" are fundamentally collation comparisons. However, this would be a change to current behavior. It would require implementations to change their behavior, and for applications who care to switch to notations like this: if LC_COLLATE=POSIX [ "$string1" '<' "$string2" ] I don't know how the shell implementers feel about changing the semantics of "<" or ">" in test to pay attention to the locale, or how much that would affect real code. A useful compromise could be that "<" and ">" could only have a defined meaning when the effective collation (as determined by LC_COLLATE and friends) are POSIX or C. This would mean that technically you'd have to set LC_COLLATE to do the comparison. This would grandfather existing code, and create a transition process so that eventually paying attention to the collation would be required. Basically, to meet the POSIX spec, application authors would know that they should add LC_COLLATE=... to their code, and then it'd be easier to add support for LC_COLLATE. So I suggest adding the following to "<" and ">": "The effective collation sequence must be set to POSIX or C; whether or not another setting is applied is implementation-defined." If no consensus can be reached on these two primitives (< and >), I'd still like to see the others added.

eblake 2011-02-07 23:13 manager bugnote:0000670	Due to the fact that < and > must be quoted with test (or [), and in light of the issues of some current test implementations failing to implement locale-specific collation, I'd much rather see an effort made to standardize [[ (where < and > do not have to be quoted, and where collation must be done according to lcale), than to standardize < and > for test with loose semantics. But adding the other four operators (==, -ot, -nt, -ef) make sense, although I think the wording for -ef should be "names the same file" rather than "are hard links", seeing as how 'touch a; ln -s a b; test a -ef b' sets $? to 0 on at least bash, coreutils, and ksh.

dwheeler 2011-03-08 03:15 reporter bugnote:0000688	FYI, here's a (partial) report on implementation effort, specifically on adding "==" to the test/[ implementation in dash. It turns out to be trivial to modify dash to support "==". It's 1 line, plus a few lines to update comments and documentation. Here is the patch: http://permalink.gmane.org/gmane.comp.shells.dash/498 I do not know if the dash project will add this functionality; that's up to them. However, several have commented that they'd prefer to add this (or just about any other functionality) only if it's specifically added to POSIX. There did seem to be universal agreement that they would definitely add this functionality if "==" was in the newer POSIX spec, at least, that's how I understand the comments. As noted earlier, "==" is already in bash, busybox ash, and ksh, so it requires nothing to add to them. I now also know that it'd be trivial to add to dash, and based on that experience, it'd probably be trivial to add "==" to other shells as well. And, as I noted earlier, "==" is already in wide use in shell scripts. Not all "trivial to add" and "already in wide use" capabilities are added to the POSIX spec, but they are reasonable reasons to add something to the POSIX spec. Thanks.

jsonn 2011-04-24 13:16 reporter bugnote:0000754	I don't see any value in adding "==". The only reason why it is used by a bunch of scripts is because some popular shells like GNU bash accept it (even in POSIX mode). As such this only legalises the dependency of completely redundant vendor extensions. The GNU toolchain is not even consistent in this regard, since GNU coreutils has a test that doesn't support ==.

dwheeler 2011-04-24 19:01 reporter bugnote:0000755	Why add "=="? The issue is that "=" is already used for shell variable assignment. Using the SAME spelling for both "is-equal-to" and "assignment" is confusing, and thus is something that MANY languages avoid. Many languages that use "=" for assignment use a SEPARATE "==" operator for comparison; this includes C, C++, Java, C#, Python, and Perl. Many languages (like Pascal) that use "=" for comparison use another spelling like ":=" for assignment, again, to keep their spellings separate. In some cases these languages can always disambiguate from context, and even then, they intentionally do NOT use the same spelling. It is inconsistent and misleading that the shell, unlike all these other languages, conflates the spellings of assignment and is-equal-to. I think it's too late to get rid of "=" for comparison, but it's easy to add "==" as a synonym, which is what is proposed. In short, adding "==" (1) clarifies whether assignment or comparison is being used, and (2) improves consistency between shell and the MANY other languages that use "=" for assignment. It's true that many shell scripts use "==" and that a lot of popular shell implementations support "==". But that is not an argument against standardization; that is an argument FOR standardization. If a lot of implementations support it (they do), and a lot users use it (they do), then I think that's a pretty good argument that it should be added to the standard. This is EASY for implementations to add, it takes a trivial amount of space (even Busybox, which is for embedded systems, has it), and it's easy to add to the standard, too. As far as GNU coreutils' test goes, FSF's development version of test added "==" support on 2011-03-22. So yes, the current GNU toolchain doesn't support "==", but that will be fixed soon. So GNU test is an argument FOR adding support for "==". Here's the current state of "==" in some implementations of sh and test that I know of; there's a lot of support for it, and it'd be easy to add to others: * (GNU) bash: Supports ==. * pdksh (public domain korn shell): Supports ==. See: http://web.cs.mun.ca/~michael/pdksh/pdksh-man.html * mksh (MirBSD(TM) Korn Shell): Supports ==. See http://www.mirbsd.org/mksh.htm * GNU coreutils test: Added on 2011-03-22. * Official ksh: N/A; Official ksh from AT&T doesn't have test/[ built-in. I originally said that ksh supports "==", but it turns out I was actually testing a pdksh that was named ksh, so my comments were really about pdksh noted above. Its "[[" condition DOES now support "==" as a synonym for "=", and "=" is considered obsolete. * NetBSD: Doesn't support "==", but a patch has been submitted to add it. The last comment (2011-03-18) on it was positive, but I don't know what they will do about it (http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=44733l) http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=44733 * dash: Does not support "==", but doing so is a one-line patch. Patch submitted 2011-03-06. Developers seemed to agree that if POSIX added "==" as a requirement, dash would implement it. * OpenBSD's /bin/sh supports "==" (it's not documented, but it DOES work). * FreeBSD-current's /bin/sh and /bin/test have recently added "==". See http://svn.freebsd.org/base/head/bin/test/test.c. * busybox ash: Supports ==. This is such a common vendor extension that I think it's proved itself; it's time to standardize it. ALL of the implementations agree that "==" is a synonym for "=", so there's no question about "different semantics so which one should we implement?". Indeed, a good reason to add this to the standard is to make sure that everyone else does the same thing in the future.

jsonn 2011-04-24 21:48 reporter bugnote:0000756	The "=" looks like an assignment is bogus. Contrary to C, Python, Pascal or any other language in this league, you can not write "test $a=foo" or "test $a==foo" and expect it to work. On the same line of reasoning, "a = foo" doesn't work either. The chance of mixing both up is a sign of not knowing the shell language at all. OpenBSD's /bin/sh is a ksh. FreeBSD added == for compatibility with bash etc. None of this make a good reason reason other than "people don't know how to write portable scripts". With that argument, bash could just be declared the standard, since that's what everyone is using after all...

dwheeler 2011-04-25 20:45 reporter bugnote:0000758	See part 2 of my argument, namely, that adding == would improve "consistency between shell and the MANY other languages that use '=' for assignment". There's value to working WITH "muscle memory" and having consistency with other languages when it's easy to do so. C, C++, Objective-C, Python, Perl, PHP, C#, Java, and many other languages that use "=" for assignment also use "==" for is-equal-to. It is inconsistent that test does not, which is why this addition is implemented so widely. There are now at least 6 implementations of sh or test that accept "==" by my count; bash is only one of them. Most standards organizations prefer to only standardize stuff that's implemented at least once, and prefer to standardize something that's implemented more widely. This is already implemented widely, and that should be a point in its favor. As far as "legalising the dependency of completely redundant vendor extensions", I disagree. Redundancy for backwards-compatibility is perfectly reasonable, first of all, and while the semantic is the same, the spelling is different. And as far as extensions go, I believe POSIX has always permitted extensions and that each version of POSIX has taken some vendor extensions and pulled them into the specification itself. It's the usual way to get better ideas tried out before they're standardized, and once they're in the standard, they aren't "vendor extensions" - they ARE part of the standard. Anyway, I hope that clarifies things.

markh 2011-04-26 09:25 reporter bugnote:0000759	There are many variants of is-equal-to. "==" is commonly used to test for numeric equality, which is -eq in "test". Some languages have one equality operator that tests for either string or numeric equality depending on the types of the operands ("test" obviously does not fall into that category since it is not typed) and in that case "==" may also be used to test for string equality. Because "test" has separate operators for numeric and string equality, it seems particularly confusing to make "==" an alias for the string equality operator "=". In other languages, including C and Perl which also have separate numeric and string equality operators, "==" is used for numeric value comparisons. So if someone familiar with these languages sees the "==" they are more likely to incorrectly interpret it as a numeric comparison. For someone writing a shell script, it would seem to be better if their implementation did not support the "==" extension and instead produced an error. Besides the minor benefit of helping them to write more portable shell scripts (even if the standard is changed, "=" is portable even to older implementations), it would also help those familiar with "==" from other languages to avoid a programming error. Since the standard string comparison operator is "=", someone using "==" may do so because they are familiar with C/Perl and intended to write a numeric comparison (-eq), or because they are familiar with a language that uses "==" for both numeric and string comparisons and did not think about what kind of comparison they actually wanted, so they may end up unintentionally using the wrong comparison without realizing it.

wpollock 2011-04-26 09:57 reporter bugnote:0000760	I disagree with those who think adding == will make shell scripts less clear and more error prone. Nearly every popular language uses == for equality testing: C, C++, C#, PHP, Java, JavaScript, Ruby, Python, ... , and many modern shells. It is true that many languages support several equality tests, but there is no good reason to break shell by redefining == and -eq to match e.g. perl. What I wonder is, should expr allow == as well, for consistency? Note awk already uses ==.

jsonn 2011-04-26 10:36 reporter bugnote:0000761	I am not arguing that adding == makes shell script less clear. I am arguing that it is completely redundant and therefore without merits. Please sit down for a moment and go back to the start. Imagine for a moment that GNU bash wouldn't silently provide this "feature" even in /bin/sh mode. The basic result would be that noone would use it and we would not have this discussion about adding a redundant operator. Note that I don't have a problem with the other operators. I would suggest being consistent and having -nte etc too. I am not sure about the mnemonic behind -ef and "<" and ">" begs for a string name, but all those operators add value.

ajosey 2011-09-22 08:59 manager bugnote:0000967	This defect report was discussed at the September 8 teleconference meeting. It was agreed that the best way to progress this would be for the submitter to produce a whitepaper expanding the proposal (similar to proposals made in the past, for example the LFS proposal, http://www.unix.org/version2/whatsnew/lfs20mar.html). This could then be widely circulated amongst all interested parties to look for consensus. The standard developers recommend that the white paper should pay particular attention to note 670.

dwheeler 2011-09-25 00:36 reporter bugnote:0000974	That sounds very reasonable. I will create a first draft and post its URL here for discussion.

gber 2011-09-25 18:51 reporter bugnote:0000975	It should be noted that there are widespread implementations of test -nt/-ot with different and incompatible semantics in FreeBSD/NetBSD/OpenBSD and dash. These test implementations all trace their roots to the test builtin of pdksh before version 5.2.14, the difference to the behavior described above is that test will return failure in case the second file does not exist. The test implementations with this behavior have been used by NetBSD since 1994 and by FreeBSD since 1999 and it seems to have been used by dash since the first Linux port of ash in 1993.

dwheeler 2011-11-15 17:26 reporter bugnote:0001011	Per comment 967 above, I've created a whitepaper: https://docs.google.com/document/d/1Gd9r0f0rmmUIZlBnO4NyhTz2q-_1PHHGVCHAglFbJoY/edit I tried to take all of the comments into account. In particular, I've given up adding < and > into test/[ itself; as is noted above, they're awkward to use (they require quoting) and implementations differ on whether or not they use the locale. Instead, I propose adding "[[" ... "]]", with < and > primaries that are easier to use and always locale-aware. I used the LFS proposal as a model. In particular, the rationale text is separate from the proposal itself. In particular, I propose a specific semantic for -nt and -ot, and I give the specific rationale for it. All of the updated proposal is widely implemented in a variety of implementations, and are already in use in shell scripts. There's some variation in -nt and -ot; I'd rather define a specific semantic that would require minor changes in a few implementations, but if that's a bridge too far, I also describe an alternative semantic that would require no changes to anyone who implements -nt and -ot (though at the cost of weakening the semantics somewhat). Comments are, of course, very welcome!

ajosey 2011-11-15 19:30 manager bugnote:0001013	I have also added the proposal referenced in bugnote 1011 as an attached file to this bug in txt and pdf formats

dwheeler 2011-11-23 23:57 reporter bugnote:0001044	I have tried to respond to the comments on this proposal, including those by David Korn and Geoff Clare. The revised proposal for bug 375 is here: https://docs.google.com/document/d/1Gd9r0f0rmmUIZlBnO4NyhTz2q-_1PHHGVCHAglFbJoY/edit I have also attached the revised proposal in .pdf, .txt, and .odt formats to this bug report. Please let me know what needs fixing, and preferably how to fix it.

geoffclare 2011-11-28 15:50 manager bugnote:0001060 Last edited: 2011-11-28 16:10	I have looked at the new Nov 23 draft of the white paper and I spotted quite a lot of changes that are still needed to the proposed wording changes in the standard. Rather than describe them, I thought I would take the more direct route of downloading the odt file and making the changes myself. The changes are recorded in the new odt file; you can choose whether to see them or not in OpenOffice/LibreOffice (and I assume in other alternatives). I have also produced a clean pdf that does not show the changes, just the end result. I will attach them to the bug after posting this note. As with my emailed comments, I have only addressed the proposed changes to the standard, not the rationale, in the white paper. Also I have not attempted to fix the style issues.

dwheeler 2011-11-28 16:38 reporter bugnote:0001061	Regarding comment #1060: I like Geoff Clare's improved proposal, that's fantastic. Thanks for doing that.

Roger Marquis 2011-11-29 03:26 reporter bugnote:0001064 Last edited: 2011-11-29 03:27	/bin/sh shell does not have these features because of the need, by systems administrators, for cross-platform, backwards and forwards compatibility. Since /bin/sh is the system (admin's) shell it is technically outside of the scope of the Austin Group to make changes. This scope has been violated in the past, however, that does not change the charter. Most importantly, this proposed change to /bin/sh would degrade compatibility more than all of previous systems-administration-related changes combined. There certainly is a need for these operators in non-system shells and interpreted languages such as Perl, Ruby and Python but that does not mean there is a need for these operators in /bin/sh or that the damage to cross-platform compatibility would be offset by their addition. Indeed, people use /bin/sh instead of other shells or interpreters specifically because its syntax does not change, will not break across implementations, and does not need to be upgraded like non-system packages and applications. The proposed changes would not improve compatibility between operating systems over time, would not increase compatibility across OS distributions, they would not improve compatibility between operating systems in any way. For this reason and the other reasons spelled out above this change to /bin/sh is A) bad idea and B) outside of the Austin Group's jurisdiction. Perhaps this would be a good time to fork the POSIX shell definition to specify ksh and bash as unique, changeable and distinct from /bin/sh?

dwheeler 2011-11-29 04:51 reporter bugnote:0001065	Regarding comment #1064: I disagree. 1. POSIX.1-2008 is to enable "portability of application programs" (page iv) - and the shell interface sh is DEFINITELY used by application programs. For example, users certainly write shell scripts; make by default executes commands using the shell (modulo optimizations); and functions system() and popen(), which are used by user applications, also depend on the shell. Therefore, the definition of sh is clearly within the POSIX specification's scope. Yes, system administrators often use sh, but they also use ls, cp, mv, and many other commands that users and application programs use. The fact that system administrators and application programs share certain interfaces does not make these commands out-of-scope for POSIX. My understanding from page iv is that facilities ONLY for system administrators are excluded from POSIX... but clearly sh is not exclusively used by system administrators. Besides, POSIX.1-2008's XCU chapter 2 defines the shell interface; it "describes the syntax of that command language as it is used by the sh utility and the system( ) and popen( ) functions defined in the System Interfaces volume of POSIX.1-2008". I'm not creating a new scope, merely trying to improve what's already there. If the Austin Group has a radically different scope than POSIX.1-2008 page iv ("Purpose"), please let me know. 2. Many shell implementations, including whatever /bin/sh is, implement [[ and ]]. POSIX page 2301 lines 72489-72491 specifically reserves [[ ... ]] because of their widespread use. Thus, anyone using [[...]] is already NOT limiting themselves to the POSIX specification, and the POSIX specification already warned people about this. The primaries are likewise not in the POSIX specification, so no POSIX-compliant shell script uses them anyway. 3. You could argue that in some environments it's too much to ask /bin/sh to implement something. But that's hard to justify in the case of "==", "-nt", and "-ot" at least; even busybox's sh implements these (busybox is widely used in embedded systems). You could make a better case for that with "[[", I suppose. But that doesn't seem to be the argument you're making anyway. 4. The proposals add to the specification some capabilities that are already widely used and depended upon. They would certainly improve compatibility between operating systems and distributions over time, since people could then reliably depend on them between any POSIX-compliant system). Thanks.

dwheeler 2013-10-10 01:15 reporter bugnote:0001869	I now think I put too much into a single proposal, sorry about that. I've separated the proposal about "==" into a separate bug report: http://austingroupbugs.net/view.php?id=762 My hope is that by separating that out, it will be easier to discuss each of these proposals.

dwheeler 2013-10-12 04:08 reporter bugnote:0001881	Note that http://www.opengroup.org/austin/docs/austin_630.txt discusses this (including the now-separate 762).

geoffclare 2013-10-24 15:23 manager bugnote:0001944	I have uploaded a new version of the white paper as Extendingshellconditionals-2013-10-24.{odt,pdf}. As before, the odt file records the changes since the previous version and the pdf is a clean version. This version addresses some new comments made by David Korn to the core team mailing list and fixes some other problems I spotted while working on them. I have also had a go at the context-dependent recognition rules for the new DB_UNARY and DB_BINARY tokens. As previously, I have only worked on the proposed changes to the standard, not on the rationale part of the white paper.

geoffclare 2013-10-25 10:26 manager bugnote:0001945	Just noticed something I missed in the update to the white paper. It gives the format of the "[[" command as: [[ conditional-expression ]] but it should be: [[ conditional-expression ]][io-redirect ...]

ranjit 2013-11-01 15:36 reporter bugnote:0001950	I'd just like to get my opposition to == on the record for this bug, in line with the objections from markh and jsonn, and as outlined in http://austingroupbugs.net/view.php?id=762#c1923. Note that this is not just about redundancy, nor how easy it is to implement, but simply about language design and consistency. AFAIK all [[ already implement == as a synonym for = and if POSIX merely specifies = all implementations will conform, and be free to add == if they choose. Thus = would work exactly as it does in bash [[, with fnmatch semantics, and quoting used to suppress special chars like ''. And one would use = in both [ and [[; '=' is also the style recommended by #bash; based on many years of learning from that channel, and seeing which form greycat and others _always_ use. greycat is the main op, and responsible for the wooledge.org[1] site which includes the BashFAQ, BashPitfalls, and BashGuide, which are in essence the* best resources for learning BASH available. My point simply is that professional BASH scripters do not use '==', and stick to '=' for the reasons given: in a nutshell to remind ourselves that we are writing shell, not some other language we may also use to implement. "Muscle memory" from other languages is a bad idea, as this is not another language, and the discussion in that note, simply underlines exactly why sh was first _designed_ to use = for string-comparison. There is no visual ambiguity as jsonn discussed, so that is a spurious argument as well, and should not be given any credence. It certainly does not belong in the Rationale, as it implies POSIX does not understand sh. Given: a=$foo vs: [[ $a = "$foo" ]] -- is anyone seriously claiming that those can be confused? The proposed change for DB_BINARY from "one" to "two" chars is also not in the pdf; I'd ask that be included as well, again for consistency and clarity in language-design, and simplicity of implementation and use. [1] http://mywiki.wooledge.org/

dwheeler 2013-11-01 19:09 reporter bugnote:0001953	I'd like to record my disagreement about "==", in response to #1950. Yes, people will be free to add "==" to test if they wish. In fact, it'd be impractical to prevent it, since a vast number of shells support it (bash, pdksh, mksh, GNU coreutils test, OpenBSD /bin/sh, FreeBSD /bin/sh, busybox ash). The problem is that I am constantly trying to change "==" to "=" to work with dash, which is the only shell that I normally encounter that does not already support "==" as equality in test. This is a losing game, and absurd. It would be far more helpful to accept "==" as a synonym for "=" in sh. The "professional scripters" for bash I see use "==", not "="; clearly there are sampling differences. I agree that [[...]] should be added to the spec, but I don't see an adequate justification for making test/[ obsolescent.

shware_systems 2013-11-01 19:43 reporter bugnote:0001954	It was brought up during the 31-Oct phone call that ksh has 3 forms of equality test, namely = or == WORD for glob matching, = or == "TEXT" for straight equality, =~ WORD or "EXPR" for ERE matching. test and [ has = WORD or "TEXT" for straight equality, with == as synonym in most cases. What is missing is BRE's as the argument, so I was thinking for [[ to un-obsolete = and use = WORD as straight equality like with test, and = "EXPR" as a BRE match. This gives = and == separate, useful functionality. They're not synonyms, in other words. Whether ksh changes to use this would be up to dgk. Some ksh scripts that are using = still in [[ may need editing to use == only, to work properly in sh, but if ksh doesn't change it can still execute those without edits. Those people that want to convert scripts from test to [[ are editing them anyways. I think backward incompatibility isn't really a factor, from the application side. As this is new for sh as a requirement there's no real issue from the implementation side either. I'm not taking sides whether = or == should be used, or test deprecated or not. I think this is something that can be lived with and fills the hole of no BREs at all in shell.

dwheeler 2013-11-10 14:46 reporter bugnote:0001979	I am the original proposer for adding [[...]], and I still think it is a good idea to add it. I'm glad it's being seriously discussed. However, I am opposed to making "test" and "[ expr ]" obsolescent as mentioned in http://austingroupbugs.net/view.php?id=762#c1948 First of all, the test/[ construct is in WIDESPREAD use. Normally marking something as obsolescent is a warning that it is likely to be removed in a future. Yet this isn't true in this case. Comment 1948 admits that test/[ will "never be removed from the standard". And I agree that it should not be removed in the future; there are probably millions of lines of shell scripts that depend on this functionality. If it will never be removed, it is erroneous to mark it as obsolete. Second, there are use cases where [[..]] is a bad idea. The obvious example is the widely-used autoconf tool; obsoleting "test" would mean that obvious comparisons would look like SOMETHING([[[[x == y]]]], ...). I do agree test's "-a" and "-o" are problematic. If you want to mark something as obsolescent, mark those as obsolescent, and recommend the use of "&&" and "\|\|" respectively when using sh. But don't throw out the baby with the bathwater.

dwheeler 2013-11-22 22:13 reporter bugnote:0002017	This was discussed in the 21 November 2013 Teleconference; notes are given here: http://www.opengroup.org/austin/docs/austin_635.txt In that discussion there was a preference to only add == to [ and test, and add -nt -ot etc. to [[ only. I will modify the current proposal based on that discussion.

dwheeler 2013-11-29 19:11 reporter bugnote:0002031	I've updated and posted "Extending shell conditionals" dated 2013-11-29. This responds to the decisions of 2013-11-21. The proposal adds "==" to test/[ (and nothing else), and adds "[[...]]" which in turn contains -nt, -ot, -ef, <, >, and pattern comparison operators (among others). Since Geoff Clare and David Korn have made so many proposals, I've claimed all three of us as proposal authors. Please let me know if there's an objection; I'm just trying to give credit where credit is due. This updated proposal adds various capabilities to POSIX that are already implemented (by multiple implementations) and are widely used. It'll be a big improvement to have them in the POSIX specification itself. As far as I know, all proposal issues have been resolved, but please let me know if there are additional issues or fixes that need to be made.

dwheeler 2013-11-29 19:24 reporter bugnote:0002032	Regarding comment #0001954: No, "=" should NOT mean BRE matching inside [[...]]. This would directly conflict with the MirBSD Korn shell, where its "=" means the same as the proposed "==". Indeed, MirBSD Korn shell version I have on Cygwin doesn't even support "==" right now, never mind suggesting that it is obsolete. And while it may be noted "obsolete" in Korn shell, it'd still represent a silent change for implementations. That is best avoided. To be honest, I think supporting ERE matching is enough. That's usually what you want for regexes. But if it's REALLY important to support BRE matching (vs. ERE matching), I think a completely different operator syntax should be used, say "=~b" or some such.

jilles 2013-12-01 22:52 reporter bugnote:0002033	Regarding [[ -v word ]], this is redundant with -n "${word+set}" if word is a literal. If not, you still need eval (or non-standard namerefs) to actually access the value, so why not do that for the test as well. It seems to make more sense to add [[ -v word ]] in a separate proposal, together with other features for specifying variable names indirectly. Regarding the new grammar rule, I think it makes sense to recognize ]] everywhere, so that things like [[ a == ]] && ... or [[ -n ]] ... can be rejected at the presumably erroneous ]]. Some of the rationale for 'test' (line 109222-109225 in posix.1-2008+tc1) discussing why [[ was not added should be removed if this is accepted.

shware_systems 2013-12-02 08:59 reporter bugnote:0002034	Re: 0002032 Yes, you're more right than not... BREs would be nice but aren't that necessary. I overlooked a couple things that would make adding them overly ambiguous as presented, and even as a separate operator would mean possible grammar changes so is better left as a possible extension. One point, though, is that the proposed '==' text has "Characters that are quoted in pattern (e.g., inside double-quotes) shall not be treated as special.", which is not explicit about applying just to pattern special characters per XCU 2.13 or all shell special characters per XCU 2.2 and 2.13. The latter would have double quotes acting more like single quotes, which supports the straight compare "TEXT" form for inside '[[ ]]' as the intent I heard, but changes it from how "..." handled for test.

shware_systems 2013-12-02 11:15 reporter bugnote:0002035	Re: 0002033 "Regarding [[ -v word ]], this is redundant with -n "${word+set}" if word is a literal." While redundant if it doesn't exclude special parameter characters allowed by ${parameter+word} form, or ignores the set +u/-u option, it adds clarity of intent and is probably faster to process. If it is limited to WORDs that could be assignment variable names or ignores -u that should probably be mentioned, perhaps as an Application Usage or Rationale entry. If it isn't common to all current '[[' extensions, or has differing behavior amongst them, I agree it should be a separate proposal.

dwheeler 2013-12-08 22:06 reporter bugnote:0002052	Re: 0002033 and 0002035 Regarding "-v": The "-v" option is in ksh93: http://www2.research.att.com/sw/download/man/man1/ksh.html The "-v" option is also in the latest version of bash: https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html#Bash-Conditional-Expressions Note that this a recent addition; bash version 4.0.23(1) does NOT have it. The MirBSD ksh does NOT have "-v". However, there's no conflicting interpretation of "-v", so it could be added without trouble: https://www.mirbsd.org/htman/i386/man1/mksh.htm I tried out the ksh93 version, and there is NO limitation on word; "-v VAR" checks for literal VAR, while "-v $VAR" as expected retrieves the content of VAR, allowing indirect checking if a given value exists. I could go either way on this one. Yes, it's a little redundant. However, the fact that it's implemented by multiple implementations makes me think we may as well include it in the [[ ... ]] proposal. For the moment I plan to continue to keep it, but I have no strong feelings about it. Comments?

dwheeler 2013-12-08 22:48 reporter bugnote:0002053	Re: 0002034 "One point, though, is that the proposed '==' text has "Characters that are quoted in pattern (e.g., inside double-quotes) shall not be treated as special.", which is not explicit about applying just to pattern special characters per XCU 2.13 or all shell special characters per XCU 2.2 and 2.13." You're right, that needs to be much clearer. Here's my try below, suggestions welcome: ***** In addition, the following conditional operators shall be supported inside a conditional command: string == pattern true if the entire string matches the pattern as described in “Pattern Matching Notation” (section 2.13); otherwise false. Characters have a special meaning in pattern matching notation (“?”, “”, and “[“) that are quoted in pattern (using <backslash>, single-quote, or double-quote) shall not have a special meaning, and shall instead match themselves. string =~ regex true if string matches the extended regular expression (ERE) regex as defined in section 9.4; otherwise false. Characters that are quoted in regex (using <backslash>, single-quote, or double-quote) shall not be treated as an ERE special character, but shall instead be treated as an ERE ordinary character. ****

dwheeler 2013-12-08 23:05 reporter bugnote:0002054	Re: 0002033 "Regarding the new grammar rule, I think it makes sense to recognize ]] everywhere, so that things like [[ a == ]] && ... or [[ -n ]] ... can be rejected at the presumably erroneous ]]." I think the grammar rule already does this. A "]]" is going to be considered "Rdbracket", so [[ a == ]] ... would fail to meet the syntax requirements. We don't really want it recognized "everywhere", because that would interfere with backwards compatibility. For example, this should print [[ ]]: echo [[ ]] I note that both bash and mksh work this way. We could make that clearer by adding this: "Once a conditional command is begun with the command “[[”, an unquoted “]]” word shall terminate it."

dwheeler 2013-12-09 01:35 reporter bugnote:0002055	I've posted files that update the proposal per previous comments. I left "-v" in (unchanged), but I did try to clarify that quoting only (by itself) disables the special characters for pattern-matching. Please let me know if there are any remaining issues.

shware_systems 2013-12-09 06:55 reporter bugnote:0002056	Re: 0002052 "I tried out the ksh93 version, and there is NO limitation on word; "-v VAR" checks for literal VAR, while "-v $VAR" as expected retrieves the content of VAR, allowing indirect checking if a given value exists." By "special parameter character" in 0002035 I meant those of XSH 2.5.2, which '${' provides the syntax hook prefix for, so "${+set}" evaluates to 'set'. The possible differences would be in how does -v currently handle those, as in "-v $", "-v $!" or "-v $$", or "VAR = '$'; [[ -v $VAR ]];". Re: 0002053 "In addition, the following conditional operators shall be supported inside a conditional command: string == pattern true if the entire string matches the pattern as described in “Pattern Matching Notation” (section 2.13); otherwise false. Characters have a special meaning in pattern matching notation (“?”, “”, and “[“) that are quoted in pattern (using <backslash>, single-quote, or double-quote) shall not have a special meaning, and shall instead match themselves." Suggest: "In addition, the following conditional operators shall be supported inside a conditional command: string == pattern true if the entire string matches the pattern as described in “Pattern Matching Notation” (section 2.13); otherwise false. Characters with a special meaning in pattern matching notation (“?”, “”, and “[“) that are quoted in the pattern (using <backslash>, or within single- or double-quote pairs) shall not have a special meaning, and shall instead match themselves." Putting it that way avoids any people thinking they should use "?""[ instead of "?*[". Similar applies to '=~' text. Additionally, a WORD fully in double-quotes already inhibits field splitting and path expansions, so having that as a separate restriction doesn't appear warranted. A caveat that not using quotes may lead to mysterious behavior depending on expansions used does still apply. ============================================= Also, the issue about limiting additional named unary operators to single characters in Note 10 still up in air, though was discussed on ML more than here: "When the TOKEN is exactly “-b”, “-c”, “-d”, “-e”, “-f”, “-g”, “-h”, “-L”, “-n”, “-p”, “-r”, “-S”, “-s”, “-t”, “-u”, “-w”, “-x”, or “-z”, and the preceding token was neither DB_UNARY nor DB_BINARY, the token identifier DB_UNARY shall result. Implementations may recognize additional DB_UNARY tokens consisting of a <hyphen> and an alphabetic character from the portable character set. When the TOKEN is exactly “==”, “!=”, “=~”, “<”, “>”, “-ef”, “-eq”, “-ne”, “-gt”, “-ge”, “-lt”, “-le”, “-nt”, or “-ot”, and the preceding token was neither DB_UNARY nor DB_BINARY, the token identifier DB_BINARY shall result. Implementations may recognize additional DB_BINARY tokens consisting of one or more punctuation characters from the portable character set or consisting of a <hyphen> and one or more alphabetic characters from the portable character set." Both "-v" and "=" by itself needs to be included now, and I still think it should use "-_" as a prefix to reserve a namespace for unary tokens where more than one alpha desired, or needed for a 53rd operator. Suggest: "When the TOKEN is exactly “-b”, “-c”, “-d”, “-e”, “-f”, “-g”, “-h”, “-L”, “-n”, “-p”, “-r”, “-S”, “-s”, “-t”, “-u”, “-v”, “-w”, “-x”, or “-z”, and the preceding token was neither DB_UNARY nor DB_BINARY, the token identifier DB_UNARY shall result. Implementations may recognize additional DB_UNARY tokens consisting of a <hyphen> and an alphabetic character from the portable character set, or <hyphen><underscore> and one or more alphabetic characters from the portable character set. When the TOKEN is exactly “=”, “==”, “!=”, “=~”, “<”, “>”, “-ef”, “-eq”, “-ne”, “-gt”, “-ge”, “-lt”, “-le”, “-nt”, or “-ot”, and the preceding token was neither DB_UNARY nor DB_BINARY, the token identifier DB_BINARY shall result. Implementations may recognize additional DB_BINARY tokens consisting of one or more punctuation characters from the portable character set, excluding '-' and '-_', or consisting of a <hyphen> and two or more alphabetic characters from the portable character set." I've also a concern for the integer comparison operators, whether it should be explicit as to numerical range that shall be supported, and 'out of range' being a new reserved error code that sh may return. This was not a potential issue with test being a separate utility, as it glossed it over as one of the '>0' return values, but is for '[[', I think. Additionally, as ( e ) already marked obsolete in test, I don't think it should be included in '[[' with the same semantic. I can see it for its usual precedence override usage, as indicated by the grammar additions, but the text doesn't reflect that. ============================================= Lastly, do any use -m integer and -o integer for negation (-minusing) and ones complementing? All I see now is ! integer for logical negation. They'd be aliases of $((-integer)) and $((~integer)), I realize, but less typing like with -v.

dwheeler 2013-12-10 04:46 reporter bugnote:0002059	Re: 0002056 "By "special parameter character" in 0002035 I meant those of XSH 2.5.2, which '${' provides the syntax hook prefix for, so "${+set}" evaluates to 'set'. The possible differences would be in how does -v currently handle those, as in "-v $", "-v $!" or "-v $$", or "VAR = '$'; [[ -v $VAR ]];"." These examples don't make sense to me: 1. First of all, in both ksh93 and bash, running "echo ${+set}" produces a blank line, not "set". And that makes sense to me, too. After all, the "" variable has not been set by an assignment (it's a special parameter). I don't see any text about whether or not special parameters are considered "set", but if so, that text probably belongs earlier in the ${...+...} text. 2. Also, I think you're missing the semantics of "-v"... perhaps they are not clearly stated? It would be unusual for the parameter after "-v" to begin with "$", and many of the examples listed above seem pointless. "[[ -v $$ ]]" would retrieve the current PID, and then see if a variable by that numeric name is set. The parameter after -v could* begin with "$", to do indirect access, so it should be permitted... but typically users will want to do stuff like "[[ -v flag ]]". I like your rewording of the condition text to make it clearer... I'll change it to match. Thanks! I don't know how many implementations would use -_X for more unary operators, but reserving a namespace seems like a reasonable idea. Also, thanks for the list fixes. "I've also a concern for the integer comparison operators, whether it should be explicit as to numerical range that shall be supported, and 'out of range' being a new reserved error code that sh may return. This was not a potential issue with test being a separate utility, as it glossed it over as one of the '>0' return values, but is for '[[', I think." I think we should appeal to the "Arithmetic Expansion" section, which already specifies a range (at least signed integer). Here's what I propose: "Minimum requirements for numeric processing in conditional expressions are defined in section 2.6.4 (“Arithmetic Expansion”)." The arithmetic expansion text already admits that arithmetic can fail, and just notes that there would be an error. Do we really need more detail? It's not clear that we could get more agreement than that. "Additionally, as ( e ) already marked obsolete in test, I don't think it should be included in '[[' with the same semantic. I can see it for its usual precedence override usage, as indicated by the grammar additions, but the text doesn't reflect that." The whole reason the "( e )" rule is given is specifically to override precedence. I think it already does that, however, it sounds like that's not clear enough. So I'll append to the text about "( e )" the following: "This is used to override other precedence rules." "Lastly, do any use -m integer and -o integer for negation (-minusing) and ones complementing? All I see now is ! integer for logical negation. They'd be aliases of $((-integer)) and $((~integer)), I realize, but less typing like with -v. " I don't know of any shell with "-m integer" or "-o integer" built-in. But as you mentioned, the $((...)) expansion already includes - negation, ~ bitwise complement, and lots of other goodies, and you can include $((...)) inside [[...]]. In addition, ksh93 uses "-o" for other functionality (e.g., to determine if a given option is on).

shware_systems 2013-12-10 09:41 reporter bugnote:0002060	Yes, for general usage the special parameters don't make a lot of sense, and the examples could be better, but they do expand to WORDs. As parameters, special or otherwise, can mostly be used where variable names are used, maybe it's better to ask if WORD is expected to be treated as the parameter of ${parameter} or just as a NAME like the non-switch arguments to unset. That concern is a special case of that general question. I could see someone wanting to do something like: params=1; while ! [[ -v $params ]] { . . . ;params=$(($params+1));} to walk an argument list with over 9 arguments in a script, where some might be explicitly "" when IFS includes a comma. "I don't know how many implementations would use -_X for more unary operators," Yes, it looks a bit silly but is in keeping with leading underscore being reserved identifiers for variables in C. Here '-_' would be an operator more than '-' '_op' being the semantic. I'm open to just '_' as a leading char being used also, if var names are limited to starting with portable alpha, but that could be visually confusing, e.g. '_v' and '-v', and easier to typo. "I think we should appeal to the "Arithmetic Expansion" section, which already specifies a range (at least signed integer)." That limits it to signed long as integer, which would be mostly compatible, yes. I'm thinking for scripts dealing with testing file offsets or sizes, they may need long long. It somewhat begs the question should that be updated also, or be a set option to toggle long or long long arithmetic as range, as a separate issue. "The whole reason the "( e )" rule is given is specifically to override precedence. I think it already does that, however, it sounds like that's not clear enough. So I'll append to the text about "( e )" the following: "This is used to override other precedence rules." " That text being: "( e ) true if expression e is true, otherwise false." It has the additional behavior of an implied "-eq 0" above the override function, as stated, converting all grouping to logical when sub-expressions may need to be string or integer still. While with the current operators that's the net effect anyways, I'm trying to leave open that the text doesn't require all implementation defined operators have a logical result type. The -m and -o are more examples of that than I expected they are implemented yet, but it'd be nice, from my perspective.

dwheeler 2013-12-10 16:08 reporter bugnote:0002061	Re: 0002060 "Yes, for general usage the special parameters don't make a lot of sense, and the examples could be better, but they do expand to WORDs. As parameters, special or otherwise, can mostly be used where variable names are used, maybe it's better to ask if WORD is expected to be treated as the parameter of ${parameter} or just as a NAME like the non-switch arguments to unset. That concern is a special case of that general question. I could see someone wanting to do something like: params=1; while ! [[ -v $params ]] { . . . ;params=$(($params+1));} to walk an argument list with over 9 arguments in a script, where some might be explicitly "" when IFS includes a comma." That makes more sense. I tried this out with ksh when passing one parameter. ${1+set} returns "set" as expected, but "[[ -v 1 ]]" is false in the same circumstance. Ugh. I think people don't normally use special parameters with [[ -v .... ]], so nobody noticed. I think it's easily argued that this is a bug in ksh; it's hard to imagine anyone DEPENDING on this behavior. We could say that '[[ -v VAR ]]' returns the same truth result as '[ "${VAR+set}" = "set" ]'. I like that idea; that would leave them consistent, at least. But I think we're exposing that the spec is unclear about when special parameters are set. Should we require that [[ -v 1 ]] return true if $1 is set? Or should we just declare that the results are undefined for special parameters in either case? I said: "I think we should appeal to the "Arithmetic Expansion" section, which already specifies a range (at least signed integer)." Comment reply: "That limits it to signed long as integer, which would be mostly compatible, yes. I'm thinking for scripts dealing with testing file offsets or sizes, they may need long long. It somewhat begs the question should that be updated also, or be a set option to toggle long or long long arithmetic as range, as a separate issue." Well, technically it doesn't limit implementations to signed long, instead, that is the limit that applications can portably depend on. But ignoring that nit... I completely disagree with the premise. [[..]] would normally be implemented as a built-in in sh. As such, it doesn't make sense for [[..]] to mandate support for arithmetic calculations for data types larger than the type sh uses for arithmetic expressions. You can't compare data you can't process! If you think sh should handle larger values, I suggest arguing the case as a separate change to the "Arithmetic Expansion" section. Now I agree that sometimes you need to compare larger values. But you can use other tools, like bc, in such special circumstances. Regarding "( e )", "It has the additional behavior of an implied "-eq 0" above the override function, as stated, converting all grouping to logical when sub-expressions may need to be string or integer still. While with the current operators that's the net effect anyways, I'm trying to leave open that the text doesn't require all implementation defined operators have a logical result type. The -m and -o are more examples of that than I expected they are implemented yet, but it'd be nice, from my perspective." Oh, I see. As you note, with the proposed standard operators it doesn't make any difference anyway. But I don't mind making the text more flexible so that extensions could allow value returns; that seems to provide flexibility with no real downside. Okay, I think we should change that text to be more flexible, probably to something like "returns the value of e".

jilles 2013-12-10 22:36 reporter bugnote:0002063	Re: 0002054 "recognize ]] everywhere" was not clear enough. What I had meant to say was to add to the new grammer rule (that is, after "After line 74771 (2013 edition) add a new grammar rule" in the document), something like: When the TOKEN is exactly "]]", the token identifier Rdbracket shall result. Note that this leaves "!" as the only reserved word that is not recognized after a DB_UNARY or DB_BINARY token. Re: 0002059 I think the numerical range should match the range for "test". The "test" page does not specify it implicitly, so XCU 1.1.2.1 Arithmetic Precision and Operations applies, stating that the type is signed long following the rules of the C standard (therefore, overflow is undefined and may happen to be implemented as a larger type). Referring to arithmetic expressions may confuse things: the decimal numbers accepted by e.g. [ -eq are not a subset of arithmetic expressions. For example, "010" evaluated as a decimal number is ten, but "010" evaluated as an arithmetic expression is eight.

dwheeler 2013-12-11 00:46 reporter bugnote:0002064	Re: 2063 Comment: "recognize ]] everywhere" was not clear enough. What I had meant to say was to add to the new grammer rule (that is, after "After line 74771 (2013 edition) add a new grammar rule" in the document), something like: "When the TOKEN is exactly "]]", the token identifier Rdbracket shall result." Note that this leaves "!" as the only reserved word that is not recognized after a DB_UNARY or DB_BINARY token. Ah, now I understand. Okay, I'll make that change in an updated proposal. Comment says: "Referring to arithmetic expressions may confuse things: the decimal numbers accepted by e.g. [ -eq are not a subset of arithmetic expressions. For example, "010" evaluated as a decimal number is ten, but "010" evaluated as an arithmetic expression is eight." I've run some tests, and it turns out that this is a subtle difference between bash and ksh93/mksh (ugh). Basically, bash is more like test, while ksh93/mksh omits the capability to handle base 8 and base 16 prefixes. So... let's get into the details. The arithmetic expression says that, "Only the decimal-constant, octal-constant, and hexadecimal-constant constants specified in the ISO C standard, Section 6.4.4.1 are required to be recognized as constants." It turns out that bash implements this as well, so inside [[ ... ]], bash will convert 0xNUM and 0NUM from base 16 or 8 when asked to do a numeric comparison. In contrast, ksh93 and mksh do not. For example, this prints two "0" lines in bash: [[ 0x10 -gt 15 ]] echo $? [[ 010 -lt 9 ]] echo $? In contrast, ksh93 prints "1" (false) both times, and will print "0" (true) for this: [[ 010 -eq 10 ]] echo $? I think the bash behavior should be required in this case. These numeric operators (such as -eq and -lt) are specifically for numbers, so there's no ambiguity, and failing to support base 16 and base 8 is a gratuitous incompatibility and loss of functionality as compared to test. I think the goal is to encourage people to use [[...]] instead of [...] as much as possible; losing base 8 and base 16 is a strange incompatible omission. While this would require changing ksh93 and mksh, I think it highly unlikely that people are supplying "0x10" and assuming this does NOT mean hexadecimal. It's more plausible for "010" meaning the same as "10", but I again I doubt it, and it's not hard to strip leading 0s from data if you need to. So for the moment I intend no change, but this is certainly worthy of discussion. Comments? Comments?

shware_systems 2013-12-11 00:48 reporter bugnote:0002065	"Well, technically it doesn't limit implementations to signed long, instead, that is the limit that applications can portably depend on. But ignoring that nit..." No, it doesn't, as extensions. For conforming mode it does limit them, though, as overflows are expected to both produce diagnostic messages on variable contents atoi evaluation or conversion when a real used to evaluate an expression internally, and for all cases when untrapped raise a SIGFPE abort as SIG_DFLT action. That's how I interpret the last paragraph of XSH 2.6.4 and "Only signed long integer arithmetic". It may be portable but appears limiting nowadays on 64-bit processors. "I completely disagree with the premise. [[..]] would normally be implemented as a built-in in sh. As such, it doesn't make sense for [[..]] to mandate support for arithmetic calculations for data types larger than the type sh uses for arithmetic expressions. You can't compare data you can't process! If you think sh should handle larger values, I suggest arguing the case as a separate change to the "Arithmetic Expansion" section." I agreed with that, that it should be consistent, but it appears current practice includes it as the de facto preference so is why I added 'mostly'. jilles point about being consistent with test as an alternative is also worth discussion, IMO. As a separate grammar section is being used it isn't necessarily limited to how AEs handle it. I'm not saying it has to be one way or another, just that enough thought is given so whatever decided is least likely to come back as portable but non-useful. "Now I agree that sometimes you need to compare larger values. But you can use other tools, like bc, in such special circumstances." That's doable, but tends to obfuscate the intent and is implicitly slower to execute. If bc had a -c switch like sh, so $(bc -c " . . . ") could be used for single expression evaluation, that would still be pretty clear even with the exec() penalty. A self contained script needs a pipe or 'here doc' based construct for similar effect as things stand, which may have I/O related side effects also complicating things.

dwheeler 2013-12-11 00:58 reporter bugnote:0002066	Re: 0002064 and 0002063. Whups, please allow me to correct myself. I just reviewed the spec for "test", and it's not obvious to me that test/[ requires support for 0xNUM or 0NUM. I thought it did, but while $((...)) does require support for conversion, there doesn't seem to be a requirement to do so in test/[. I did a few tests and found that at least some test/[ implementations do not implement base conversions with -eq for example. Oddly enough, bash's implementation of "test" does not support 0x10, but its implementation of [[...]] does support 0x10. So should base conversions in [[...]] be forbidden (since the test spec, ksh93, and mksh do not include it)? Required (as bash does)? Silently allowed (which is the easiest for the implementers, though not for the users)? Thoughts welcome.

shware_systems 2013-12-12 01:38 reporter bugnote:0002068 Last edited: 2013-12-12 06:04	Re: 0002066 Given mailing list discussion, it appears XCU 1.1.2.1 supersets and supersedes XBD 12.1, so both test and [[ ]] as a part of sh should be handling those bases as required, with [0[:alnum:]#] being a common extension for arbitrary base that I feel should be left as an extension at this point. Versions that handle only decimal, with or without multiple leading zeros, may be compliant with a pre SUSV6 version but not the current wording, it looks, so are buggy from this versions perspective and I'd hope for Issue 8 as well. The semantics of C insist the next char after a leading 0 shall be one of 0 to 7, 'x', 'X', 'u', 'U', 'l', 'L' for integers, or '.', 'e', 'E', 'p' and 'P' that switch it to real representation, so multiple leading zeros are only allowed for base 8, not base 10. While limiting it to signed long removes it being necessary to process suffixes, though it should be specific they'll be silently ignored as normative, the otherwise blanket language about conversions doesn't exclude properly determining base from prefix. Arithmetic expressions in [[ ]] fall under extensions permitted because binary operators may be unused combinations of punct chars in the proposed text, but those should be left as extensions also IMO.

geoffclare 2014-03-06 17:15 manager bugnote:0002176	Should [[...]] be required to support numeric values with magnitude up to the maximum file size supported by the implementation? This is required for test (see XCU 1.5), or at least it will be once the conflict with XBD 12.1 is fixed via 0000813.

ranjit 2014-03-06 19:13 reporter bugnote:0002177	In response to #1953: You have avoided the substantive point, which is that this is not about making commands easier and more consistent, nor about making POSIX sh some sort of superset of current implementations (so everyone's scripts "work"), but about the design of a language. Your justification for this addition is: > The issue is that "=" is already used for shell variable assignment. Using > the SAME spelling for both "is-equal-to" and "assignment" is confusing, > and thus is something that MANY languages avoid. ..which completely ignores the point that shell was is ALREADY a different language. You cannot disallow the use of [ bar = "$foo" ] (there's far too many portable sh scripters who have it in their muscle-memory already) so this "confusion" point is moot. In fact you _add_ confusion when there are two operators with the exact same meaning, as I have seen with newbs in #bash (another reason no-one in there uses ==.) And if your sampling included anyone from most Linux distros, or who's learnt from TLDP ABS, then you likely haven't really spoken to very competent scripters. Linux distros, having completely failed to script anything very well, have in the main decided that means shell is bad. Hence systemd (+ javascript!). I'm in favour of -nt, etc, as said, and indeed [[ without field-splitting. Just not == and the reasons are nothing to do with convenience (although I feel it is more convenient with just =, based on experience of teaching others both bash and sh, and far too many hours working on scripts) and everything to do with language consistency. There is clear contextual introduction to strcmp in shell, and further it is impossible to confuse it with assignment (rendering the justification void). In addition it reads more like algol, or like a mathematical statement, in tune with its history. You are in a different language: respect that, don't try to twist it into a hybrid. As for ease of typing by the time I've got there, I just want to hit = and move on to the RHS. I'm well aware I'm inside a [ by that point (i have to remember to close.) So I don't recognise any of your reasons as valid, except for someone who isn't really a shell-scripter, just dabbling while their muscle-memory and brain is still in another language. We'll have agree to disagree, evidently. With respect, I'd like the Committee to consider that they are guardians of the sh language, and that changes to it are of a different nature than changes to command-line utilities; adding sed -E is basic common sense. Adding == with the exact same meaning as = is bad language design. Can you imagine WG14 adding another operator to Standard C, with the exact same semantic and meaning as an existing one, but a different token? I certainly cannot. Since the proposer also wishes '=' inside '[[' to mean the same thing as '==', again I would recommend simply adding '=' with the fnmatch() semantic. POSIX can, and I would argue should, simply standardise a consistent approach, regardless of what implementation extensions currently exist, exactly as you are doing with make, a format and not a language.

shware_systems 2014-03-06 21:22 reporter bugnote:0002178	Re: 0002068 During the phone discussion of Bug #813, it was stated the term 'decimal number' in XBD 12.1 is to be construed as requiring all utilities in conforming mode to treat leading zeroes as permitted when converting arguments required to represent integer values, thereby precluding all syntax formats that use a leading zero to indicate an alternate base is desired.

dwheeler 2014-03-07 15:46 reporter bugnote:0002179	Re: comment #2177: The use and implementation of "==" instead of "=" is already widespread; see comment #755. This proposal merely updates the specification to match many users' expectations and widespread practice. A lot of people, including me, prefer "==" in sh (e.g., because it is visually distinct from "="). It would be possible to deprecate "=" and require all future scripts to use "==", but I think that would be an unnecessary burden on script writers for no good purpose. There are other cases where POSIX provides more than one mechanism for a purpose to support worthy goals such as portability and backwards compatibility. Even test itself can be spelled "test" or "[", so it's not true that sh tests have "only one way to do things". This is just another example of allowing multiple approaches to support larger goals.

dwheeler 2014-03-07 15:52 reporter bugnote:0002180	Re: 0002178 That's good to know, thanks for clarifying the point about leading zeros.

shware_systems 2014-03-11 14:52 reporter bugnote:0002181	Re: 0002180 I'm just reporting it as how I heard what the ORs felt was existing practice, despite XCU 1.1.2.1 requiring otherwise because of its deferral to c99 section 6.5.1p3. Consensus was not reached, IOW, even if is just me dissenting. For the utilities where XCU 1.1.2.1 applies, including sh and the c99 compiler, it is up to the application developer to detect and strip leading zeroes before use or storage of an argument, and detect and convert to a decimal base any value stored using an alternate base format before using it as an argument to another utility.

dwheeler 2014-03-12 23:05 reporter bugnote:0002182	Okay. It seems to me that the "easy" way forward is to not require support for leading 0 (and bases) for integers in the spec, though allowing it in implementations. It's not hard to get this functionality in other ways.

ranjit 2014-03-14 13:25 reporter bugnote:0002186	Again you are ignoring the substantive point. I am not interested in hearing a repetition of an argument I have just addressed, as if that would change my mind. This is not a command-line utility. It is a language, so arguing from the point of view of "widespread practise" is muddle-headed. WG14 refuse to standardise things just on that basis, and so should POSIX when it comes to the syntax of a language under its purview. Let's leave it there, since there is no point in talking past each other. Both sides have been adequately expressed and repeating the same points is futile, and no doubt tedious to read.

rhansen 2014-10-13 16:44 manager bugnote:0002418	I noticed that the 'Requirements' section doesn't list backwards compatibility. Is this intentional? Are we not worried about breaking existing scripts by introducing '[['? (People are unlikely to encounter any problems, but it's still theoretically possible.)

geoffclare 2014-10-13 16:52 manager bugnote:0002419	The standard already reserves '[[' in order to allow it as an extension. See XCU 2.4.

dwheeler 2014-10-13 16:55 reporter bugnote:0002420	Regarding comment 2418: As comment #2419 noted, the sequence "[[" is already reserved. In addition, support for "[[" is widespread. The goal is to standardize widely-used practice. The [[...]] form adds important functionality and reduces the risks of certain kinds of errors.

rhansen 2014-10-13 17:54 manager bugnote:0002421	Ah, right, I forgot that [[ was already reserved. Additional questions: Would it be appropriate to add '[[ -o <name> ]]' to this proposal? (This would be used to test if a shell option is enabled.) Do existing implementations always expand all WORDs, including those that have been short-circuited via '\|\|' or '&&'? (Some expansions have side effects.)

stephane 2016-09-19 21:46 reporter bugnote:0003382	For the =~ part, we'd need to define a new type of token for what's on the right of =~. For instance in [[ aacc =~ (aa\|bb)c ]], currently (aa\|bb)c can't be a WORD as (, \| and ) are token delimiters (note that [[ "a(a)" = *(a) ]] doesn't return true in bash or ksh93). See also [[ a1 =~ (a)\1 ]] that returns true in bash and false in ksh93 (\ is not like the other quoting operators). See also [[ a =~ \b ]] (or \>...). There's also the questions of the effect of quoting in things like [[ a =~ ["$chars"] ]], should [[ : =~ ["^:#"] ]] return true like in bash or false like in ksh93? There's also the question of ~ expansion. Should HOME=/ [[ / =~ ~ ]] return true like in bash or false like in ksh93?

shware_systems 2016-09-20 17:16 reporter bugnote:0003383	The intent in the proposal is the right hand side respect the extended_reg_exp production of the grammar in XBD 9.5, so it is already defined. I agree adding a production clause to db_factor, or modifying Note 10, is desirable to make this cross reference clearer in the grammar. Also, the cross reference to 9.4 in the description of =~ is missing "XBD" as volume referent. As the ~ is a non-special ERE character, I don't believe it should be expanded to / for the compare, so think the ksh93 result more the intent. For [[ / = ~ ]] as the expression, that I'd expect to return true.

stephane 2016-09-20 20:34 reporter bugnote:0003384	Re: 0000375:0003383 1- about grammar That doesn't work. For instance the space character is not special in EREs, yet [[ $a =~ a b ]] doesn't work obviously. The behaviour for the a\ b ERE is unspecified. We may need to acknowledged somehow that backslash is overloaded as a quoting operator in the shell and as an escaping operator for EREs. For instance, is re='a\b'; [[ a =~ $re ]] meant to be treated the same as [[ a =~ a\b ]] (it's not in bash). And re='a"."b'; [[ a =~ $re ]] vs [[ a =~ a"."b ]] See also: $ bash -c '[[ a =~ & ]]' bash: -c: line 0: syntax error in conditional expression: unexpected token `&' bash: -c: line 0: syntax error near `&' bash: -c: line 0: `[[ a =~ & ]]' (same for #, < , >) IMO, the original [[ =~ ]] implementation (in bash31) which is also the one of zsh, where shell quoting doesn't make any difference in terms of RE matching is a lot cleaner and easier to specify (applications just do [[ $var =~ '(..\|X)&' ]], make sure the argument to =~ is a normal shell WORD. Or we could just give up on specifying [[...]] altogether (my preference as it's broken by design except in zsh IMO as its =/== operator is not an equality operator and it has hardly any benefit over the "[" command) and add the =~ operator to "[" like in yash or zsh which would also be straightforward to specify and unambiguous. 2- about ~ Whether ~ is a ERE operator or not is irrelevant here. The fact that it's not a ERE operator (it is in ksh93 REs though as in [[ a =~ (?K:~(E)a) ]]) makes it even a justification that ~ can be safely expanded. ` is not a ERE operator either, yet [[ Linux =~ `uname` ]] returns true on Linux with both ksh93 and bash. It's just that we need to clarify what expansions are or may be performed in the argument to =~. Expanding ~ is not very useful because it's only expanded at the start of the RE, but I can't see why it wouldn't be when `...`, $param, $((arith)), $(cmd) are. Also note that in bash the expansion of ~ ends up being quoted from an ERE point of view, which makes sense as ~ is meant to expand to a path, not a RE. As in HOME=.; [[ a/a =~ ~/a ]] returns false. In bash, the same applies to process substitution. [[ "<:" =~ <(:) ]] returns false (true in ksh93) Both bash and ksh93 have taken some liberties with the standard inside [[...]] as that construct was not specified by the standard. It's going to be hard to come up with a specification that doesn't break either or both. And if you want to throw zsh in the picture as well, it's going to be more difficult... or simpler if we only specify [[ a =~ $var ]], that is leave it unspecified unless the word after =~ is an unquoted variable and the content of the variable contains a valid ERE. At the moment, that's the only thing that is portable across bash31, bash32-or-above, zsh and ksh93.

stephane 2016-09-20 21:03 reporter bugnote:0003385 Last edited: 2016-09-20 21:07	Re 0000375:0003384 > Both bash and ksh93 have taken some liberties with the standard inside [[...]] as that construct was not specified by the standard Amongst those liberties: a='+(a)'; [[ a = $a ]] currently returns true in both bash and ksh even though the current proposal would mandate it to return false. While in case a in $a) ... it would not match as required. Do we want to specify some of the ksh extended operators or force bash and ksh to modify their [[...]] when in POSIX mode? Any good reason why one may want to use that pattern matching operator (with "=" being the operator!?) when we already have the case foo in pattern) much more sensible one?

shware_systems 2016-09-21 07:35 reporter bugnote:0003386	Re: 3384 Ok, I oversimplified a bit on 1. It is the result of the WORD, after substitutions and quote removal, but minus field splitting, globbing, and I was thinking tilde expansions, that is meant to be reparsed as an extended_reg_exp. It is left to the script author to figure out how much the input needs to be quoted so the result has blanks and other meta-characters EREs don't consider special appearing where required after the quote removal. I still think an additional token type isn't required for expressing this in the grammar. Possibly a Note 11 if this is cleaner than adding to Note 10, but the overall context is more similar to the pattern production than the context of ASSIGNMENT_WORD, that I see. If tilde expansion is included then 2. should return true, otherwise no. I can see limiting tilde expansion to those operators that expect a filepath string as the WORD result, and non-special otherwise.

stephane 2016-09-21 10:20 reporter bugnote:0003387	Let me rephrase. [[ =~ ]] in ksh93 and bash32+ is a bit broken by design, but to use it properly in both (which seems to be the intent of this proposal: to specify an operator compatible with both), it seems we need (for the argument after =~) to: - quote &, <, >, blanks (locale-dependant ()) and unmatched ) and some unmatched ] at least or we'd get errors in bash or ksh93 - not* quote the ERE operators (.?+[]{}()\|$^) if we want them to retain their special ERE significance. You'll notice that several of those (()\|) can't be in WORD - quote `, ~, $ and quotes to remove their special meaning as shell tokens (but not inside [...] for some) - to remove the special meaning of those ERE operators, we can use double quotes, single quotes, backslash or $'...' but not when they're inside [...]. - other characters can be quoted as well but not with backslash as that could introduce ERE extensions like \<, \>, \b, \w in ksh93. - for parts of the arguments that are the result of an unquoted expansion other than ~ expansion, \ (in the content of the expansion) removes the special meaning of ERE operators and may introduce new ones. Examples: a='<foo>' bash -c '[[ $a =~ <.> ]]' gives an error a='foo' ksh93 -c '[[ $a =~ \<.\> ]]' returns true (\<, \> taken as word boundaries) You need to use: a='<foo>' shell -c '[[ $a =~ "<".">" ]]' so it works in both shells or a='<foo>' shell -c 're="<.>"; [[ $a =~ $re ]]' which also works in zsh and bash31 Other example: a='blah' shell -c '[[ $a =~ [xy)] ]]' gives an error in both ksh and bash a='\' shell -c '[[ $a =~ [xy\)] ]]' doesn't give an error but matches in ksh (not in bash). and [[ $a =~ [xy")"] ]] also matches on backslash in ksh93, bash used to have a similar bug. a='blah' shell -c 're="[xy)]"; [[ $a =~ $re ]]' works in all shells (zsh, bash31 included) () as already discussed, since blank recognition is locale dependant, you get behaviours like: $ LC_CTYPE=fr_FR.ISO8859-15@euro bash -c '[[ $a =~ tête-à-tête ]]' bash: -c: line 0: syntax error in conditional expression bash: -c: line 0: syntax error near `tête-à-tête' bash: -c: line 0: `[[ $a =~ tête-à-tête ]]' as that à UTF-8 character is 0xc3 0xa0 and 0xa0 happens to be a blank in ISO8859-15 locales on Solaris. So in effect you may need to quote every character that is not in the portable character set in case they may be a blank in the user's locale.

shware_systems 2016-09-21 22:44 reporter bugnote:0003388	This proposal, as a white paper, is a case where the behaviors are being merged to be future compatible more than backwards. This is also why test is being deprecated and marked LEGACY. Both it and the [[ extensions are broken one way or the other, so this proposal is taking what isn't broken and giving it a firmer logical model by adding it to the grammar with extensibility provisions. The caveat about what the standard says being reliable only in the C locale and locales using nationalized ASCII applies here too, for now; future Enhancement Reqs. against the V8 drafts that define additional required locales will address remaining localization concerns. It is known this may break any current script using [[ as an extension. The changes to fix those scripts are expected to be minimal; as proposed, for EREs, enclosing the entire WORD in double quotes will suffice in most cases, only needing \", \` and \$ internally to disable substitutions processing. If tilde expansion is enabled and desired something like "head"~"tail" is required. The onus is already on script authors, if they don't enclose a WORD in quotes, to escape any character that implicitly delimits the token. New scripts are expected to do version checking that they're running on V8 or later to avoid conflicts of interpretation with older shells. It should not break any conforming scripts that were only using test with operators other than -a or -o. Last, for portable scripts blank detection is currently not locale-dependant. For tokenization only the SPC and TAB code points count, as the default members of the blank charclass. Recognizing additional members of a locale's blank charclass is undefined behavior. Adding default members from ISO-646 is possible, but not likely.

stephane 2016-09-22 08:42 reporter bugnote:0003389	Re: 0000375:0003387 > - not quote the ERE operators (.?+[]{}()\|$^) if we want them > to retain their special ERE significance. You'll notice that > several of those (()\|) can't be in WORD [...] > - to remove the special meaning of those ERE operators, we can > use double quotes, single quotes, backslash or $'...' but not > when they're inside [...]. Actually, I hadn't realised ksh93's was so broken so the above is incorrect. In ksh93, quotes other than backslash only escape the ERE operators that also are wildcard operators (?[]()\|&), not ^$.+, as in [[ a =~ "." ]] returns true (so does r=.; [[ a = "$r" ]]). And [[ "\\" =~ ["?"] ]] (or any of the other glob operators in place of ?) returns true like in old versions of bash for RE operators. It looks like internally ksh93 replaces [[ x =~ y ]] with [[ x = ~(E)y ]]. [[ a = ~(E)"." ]] also returns true, while [[ a = "?" ]] returns false.

stephane 2016-09-22 08:59 reporter bugnote:0003390	Re: 0000375:0003388 > This proposal, as a white paper, is a case where the behaviors are being > merged to be future compatible more than backwards. This is also why test is > being deprecated and marked LEGACY. Both it and the [[ extensions are broken > one way or the other, so this proposal is taking what isn't broken and giving > it a firmer logical model by adding it to the grammar with extensibility > provisions. The caveat about what the standard says being reliable only in > the C locale and locales using nationalized ASCII applies here too, for now; > future Enhancement Reqs. against the V8 drafts that define additional > required locales will address remaining localization concerns. With -a, -o, (, ) and an argument-less -t removed, "test"/"[" is no longer broken. What I'm saying is that the two requirements that the =~ argument must be a WORD and that quoting ERE operators remove their special meaning are incompatible (or at least make for a not useful operator). If you need to make it a WORD, then for instance, you can't do [[ $string =~ foo\|bar ]] and if you do [[ $string =~ "foo\|bar" ]] Then you're disabling that "\|" ERE operator. You'd need: re="foo\|bar" [[ $string =~ $re ]] to make it work. Then, you might as well specify the bash31/zsh behaviour where [[ $string =~ "foo\|bar" ]] does test whether $string contain foo or bar as opposed to containing "foo\|bar" > It is known this may break any current script using [[ as an extension. The > changes to fix those scripts are expected to be minimal; as proposed, for > EREs, enclosing the entire WORD in double quotes will suffice in most cases, > only needing \", \` and \$ internally to disable substitutions processing. If > tilde expansion is enabled and desired something like "head"~"tail" is > required. The onus is already on script authors, if they don't enclose a WORD > in quotes, to escape any character that implicitly delimits the token. New > scripts are expected to do version checking that they're running on V8 or > later to avoid conflicts of interpretation with older shells. It should not > break any conforming scripts that were only using test with operators other > than -a or -o. tildes are not expanded in "head"~"tail", only at the beginning of a word and when not quoted (though rules are different in assignments). > Last, for portable scripts blank detection is currently not locale-dependant. > For tokenization only the SPC and TAB code points count, as the default > members of the blank charclass. Recognizing additional members of a locale's > blank charclass is undefined behavior. Adding default members from ISO-646 is > possible, but not likely. What makes you say that? IIRC both bash and yash explicitely made their tokenisation locale dependant for POSIX compliance. (I do agree it's not desirable though)

shware_systems 2016-09-23 04:17 reporter bugnote:0003391	"foo\|bar" after quote removal effectively gets passed to regcomp() as the C string foo\|bar\0. A shell can use an internal workalike function that includes the effect of regexec()/free() too. That is what is expected to handle the extended_reg_exp production, not the code that handles shell operators. The result is then the same for re="foo\|bar" [[ $str =~ $re ]] or even =~ `printf %s%s%s foo '\|' bar`. The concerns you have apply if the ERE was an argument to [[ as a utility, as then the simple_command production governs interpretation, but as a keyword pair with its own productions those constraints don't hold. The locale's charmap determines the bit patterns of each code point. So the shell is dependant on them for determining whether a byte is a defined char and which class it is. Hardcoding a reliance on the C locale's charmap may fail, as a result, if the charmap of a locale says the TAB code point is represented by 0xFF instead. This is a separate issue from what code points are required to be in each charclass for all locales. For a script to be portable unquoted input with grammatical significance in terms of being token delimiters is limited to these code points. Relying on code points a locale may add to a class means the script can not function properly on a platform that doesn't provide localedef or a locale with the same additions, so to me is undefined behavior. This does not limit scripts to using only the POSIX locale, but it is a subset of the locales scripts that don't care about portability might use.

stephane 2016-09-23 06:59 reporter bugnote:0003392	Re: 0000375:0003391 Mark, you're missing the point I'm trying to make since the beginning. What you're describing (about quoting of RE operators) is the zsh/bash31 behaviour, not the one of bash32+/ksh93 or what this proposal specifies. Quoting the proposal: > string =~ regex > true if string matches the extended regular expression (ERE) > regex as defined in section 9.4; otherwise false. Characters > that are quoted in regex (using <backslash>, single-quote, or > double-quote) shall not be treated as an ERE special character, > but shall instead be treated as an ERE ordinary character and > thus only match themselves. See also appendix B So [[ $a =~ "foo\|bar" ]] and s='foo\|bar'; [[ $a =~ "$s" ]] is testing whether $a contains "foo\|bar" (as if by calling regcomp("foo\\\|bar")), [[ $a =~ foo\|bar ]] is not allowed and re='foo\|bar'; [[ $a =~ $re ]] is matching $a against the foo\|bar ERE (regcomp("foo\|bar")). With bash-4.3: $ ltrace -e regcomp bash -c '[[ a =~ "a\|b" ]]' bash->regcomp(0x7ffff013f3b0, "a\\\|b", 1) = 0 +++ exited (status 1) +++ $ ltrace -e regcomp bash -c '[[ a =~ a\|b ]]' bash->regcomp(0x7fff33a64d50, "a\|b", 1) = 0 +++ exited (status 0) +++ $ ltrace -e regcomp bash -c 're="a\|b"; [[ a =~ "$re" ]]' bash->regcomp(0x7fffba6ec710, "a\\\|b", 1) = 0 +++ exited (status 1) +++ $ ltrace -e regcomp bash -c 're="a\|b"; [[ a =~ $re ]]' bash->regcomp(0x7ffe497b2700, "a\|b", 1) = 0 +++ exited (status 0) +++ $ ltrace -e regcomp bash -c '[[ a =~ ["a\|b"] ]]' bash->regcomp(0x7ffe32decc20, "[a\|b]", 1) = 0 +++ exited (status 0) +++ $ ltrace -e regcomp bash -c '[[ a =~ "[a\|b]" ]]' bash->regcomp(0x7ffea326a1e0, "\\[a\\\|b]", 1) = 0 +++ exited (status 1) +++ With zsh-5.1.1 $ ltrace -e regcomp zsh -c '[[ a =~ "a\|b" ]]' regex.so->regcomp(0x7ffc3aaa2ac0, "a\|b", 1) = 0 +++ exited (status 0) +++ $ ltrace -e regcomp zsh -c '[[ a =~ a\|b ]]' zsh:1: parse error near `\|' +++ exited (status 1) +++

chet_ramey 2016-09-23 13:05 reporter bugnote:0003393	Re: http://austingroupbugs.net/view.php?id=375#c3388 2.3 Token Recognition, rule 8: "If the current character is an unquoted <blank>, any token containing the previous character is delimited and the current character shall be discarded." <blank> is defined in 3.74 as "One of the characters that belong to the blank character class as defined via the LC_CTYPE category in the current locale. In the POSIX locale, a <blank> character is either a <tab> or a <space>." So unless you say that portable scripts are restricted to the POSIX locale, which you may be saying, blank detection during tokenization depends on the current locale.

shware_systems 2016-10-10 11:35 reporter bugnote:0003405	Re: 3392 Yes, you're right; as stated it's a bit wonky. I was looking at it as that applied to escaping or single-quoted sequences inside double or single-quotes, like "foo'\|'bar" or 'foo"\|"bar' would match 'foo\|bar', with quote removal still effective on the ERE as a WORD token before processing internal quoting like that according to the proposed text. The 'After quote removal, ' preamble is missing, though, and the caveat \' and \" are needed for them to be treated as the ERE ordinary chars they usually are. I'd expect <foo\" =~ 'foo\"\|\"bar'> to return true. Re: 3393 I am not saying portable scripts are limited to the POSIX locale. I am saying a script that limits what characters of the blank class it uses as token separators to the 2, TAB and SPC, of the POSIX and C locales are portable whatever locale is in effect, as those 2 are required to be in all locale's blank charclass. A script author that knows on a particular system a locale includes a <halfwidthspace> code point in its blank charclass can use it to separate tokens also, but that script is not guaranteed portable.

mirabilos 2017-05-19 21:20 reporter bugnote:0003698	Hello dwheeler, please DO NOT add [[ to POSIX in a way that invalidates all existing scripts using it. This means: PLEASE DO NOT ADD [[ TO POSIX! The ksh world has been using [[ in scripts since forever and told people to use it exclusively for secure shell scripting, so basically every* script uses it. If POSIX is going to invade on existing territory like that, it’s like a declaration of war. Some notes on =~ (which mksh does not implement at this time): I’ve read attempts to put a RE grammar onto the RHS. I don’t think this is the correct way because we also need to be able to put the RE/extglob into a variable, in order to make it possible to specify them dynamically: x='br'; [[ bar = $x ]] # returns true As for quoting… incidentally, I figured this out when adding the BSD [[:<:]] and [[:>:]] to mksh (while adding character classes in the first place): [[ 'a foo b' = [[:\<:]]foo[[:\>:]]* ]] # returns true Combining the two from above: x='[[:<:]]foo[[:>:]]'; [[ 'a foo b' = $x ]] # returns true

shware_systems 2017-05-19 21:44 reporter bugnote:0003700 Last edited: 2017-05-19 21:46	The proposal extends the ksh93 usage, not breaks it... Attention to being backwards compatible is part of the discussion. The main reason I see it's preferable to using [/test, security wise, is that it's built-in and can't be overridden, not due to any security specific features above what test provides. As it is an extension, no portable script uses it, since forever...

geoffclare 2022-09-22 16:18 manager bugnote:0005972	This bug was reviewed in the September 22, 2022 teleconference. Given the difficulties identified with the matching operations performed by == and =~ in [[ ... ]] there would seem to be two options for progressing this bug: 1. Keep [[ ... ]] but without =~ and with == "neutered" such that it can only be used portably for fixed-string comparisons. 2. Omit [[ ... ]] altogether but perhaps add -nt, -ot, and -ef to test/[ instead. We would welcome feedback on whether option 1 is an acceptable compromise or is too limiting. We need to resolve this bug by no later than 2022-11-13 if it is to make it into draft 3. After that, only option 2 will be possible (because draft 3 will be "feature complete").

stephane 2022-09-22 20:30 reporter bugnote:0005973	IMO, this bug should be split into - one to add support for -nt, -ot, -ef, <, >, ==, =~ (the latter so far only implemented in zsh and yash AFAIK) to test/[, that can be addressed now. - one to implement [[...]] IMO, [[...]] has no benefit other than cosmetic over [/test and adds only complications as it comes with its own specific language whose syntax varies from shell to shell. -ef, -nt, -ot are useful (care to be taken of the cases where files are not stat()able though). <, > as well given that expr is unusable (and IMO should be deprecated) and awk's awk -- 'BEGIN{exit(!(ARGV[1]"" < ""ARGV[2]))}' a bit cumbersome (though not that much when wrapping that in a function).

chet_ramey 2022-10-14 15:12 reporter bugnote:0005991	I'm not in favor of one proposal over the other -- bash isn't going to change what it does today -- but if I had to choose I would choose a variant of option 2: change test/[ to add -nt, -ot, -ef (spec including what happens when files don't exist or otherwise error on stat(2)); add < and > (specifying whether or not they depend on the locale); and add == as a synonym for =. Omit any pattern matching or regexp matching; leave those for some later specification of `[['.

steffen 2022-10-26 14:16 reporter bugnote:0006014	While rewriting the conditional expression of my MUA i stumbled over [[..]] of which i thought there was an open issue .. and found this. Now i read stephane's "MO, [[...]] has no benefit other than cosmetic over [" (#5973), but nonetheless i want to note that _if_ [[ is about to be standardized, then wording should be applied to make ambiguities much less likely than they are for [. The standard has a tremendous amount of warnings and notes about [ and the syntax ambiguities it has, and the initial four (?) in-order tests that should be used to avoid them. To my surprise i found yesterday that [[..]] is quite ambiguous, whereas i thought it is not: #?0\|kent:tmp$ if [[ -n == && true ]]; then echo au; fi au That is ok #?0\|kent:tmp$ if [[ -n == ] && true ]]; then echo au; fi -bash: syntax error near unexpected token `-n' #?0\|kent:tmp$ if [[ -n == ']' && true ]]; then echo au; fi -bash: syntax error in conditional expression But that is not in my opinion. #?0\|kent:tmp$ if [[ '-n' == ']' && true ]]; then echo au; fi -bash: syntax error near unexpected token `'-n'' #?2\|kent:tmp$ if [[ x'-n' == x']' && true ]]; then echo au; fi With escaping it works, but _I_ thought [[..]] is designed so escaping ambiguities is not necessary! I think the standard should note something around "if first argument is an unary operator, and second argument is a binary operator, lookahead shall be performed to see if the (a) argument list is exhausted, (b) an AND/OR list follows, or (c) a group is closed, in order to reduce ambiguities from the language as much as possible".

geoffclare 2022-11-09 09:44 manager bugnote:0006040 Last edited: 2022-11-09 09:46	There seems to be support for adding < and > to test/[ but mixed views on ==, so here are some suggested changes, adapted from Extendingshellconditionals-2013-12-09.odt, to implement option 2 from 0000375:0005972 with < and > also added ... After D2.1 page 3206 line 108874 section test, add: pathname1 -ef pathname2 True if pathname1 and pathname2 resolve to existing directory entries for the same file; otherwise, false. After D2.1 page 3207 line 108889 section test, add: pathname1 -nt pathname2 True if pathname1 resolves to an existing file and pathname2 cannot be resolved, or if both resolve to existing files and pathname1 is newer than pathname2 according to their last data modification timestamps; otherwise, false. pathname1 -ot pathname2 True if pathname2 resolves to an existing file and pathname1 cannot be resolved, or if both resolve to existing files and pathname1 is older than pathname2 according to their last data modification timestamps; otherwise, false. After D2.1 page 3207 line 108925 section test, add: s1 > s2 True if s1 collates after s2 in the current locale; otherwise, false. s1 < s2 True if s1 collates before s2 in the current locale; otherwise, false. On D2.1 page 3208 line 108934 section test, change: if a pathname argument is to: if a pathname, pathname1, or pathname2 argument is After D2.1 page 3209 line 108974 section test, add: LC_COLLATE Determine the locale for the behavior of the > and < string comparison operators. After D2.1 page 3210 line 108999 section test, add a new first paragraph to APPLICATION USAGE: Since '>' and '<' are operators in the shell language, applications need to quote them when passing them as arguments to test from a shell. On D2.1 page 2212 line 109090 section test, append to the paragraph: A later proposal to add [[ ]] in Issue 8 was also rejected because existing implementations of it were found to be error-prone in a similar way to historical versions of test, and there was also too much variation in behavior between shells that support it. On D2.1 page 2212 line 109092 section test, change: the error-prone historical -o flag of test. to: the error-prone historical -a and -o operators of test. On D2.1 page 2213 line 109126 section test, delete: str = pattern, str != pattern, On D2.1 page 2213 line 109127 section test, change: They were not carried forward into the test utility when the conditional command was removed from the shell because they have not been included in the test utility built into historical implementations of the sh utility. to: They were not carried forward into the test utility when the conditional command was removed from the shell because they had not been included in the test utility built into historical implementations of the sh utility. However, they were later added to this standard once support for them became widespread.

Date Modified	Username	Field	Change
2011-02-07 18:34	dwheeler	New Issue
2011-02-07 18:34	dwheeler	Status	New => Under Review
2011-02-07 18:34	dwheeler	Assigned To	=> ajosey
2011-02-07 18:34	dwheeler	Name	=> David A. Wheeler
2011-02-07 18:34	dwheeler	Section	=> test
2011-02-07 18:34	dwheeler	Page Number	=> 3224-3225
2011-02-07 18:34	dwheeler	Line Number	=> 107503-107513
2011-02-07 18:49	dwheeler	Note Added: 0000666
2011-02-07 20:23	~~Don Cragun~~	Interp Status	=> ---
2011-02-07 20:23	~~Don Cragun~~	Note Added: 0000667
2011-02-07 20:23	~~Don Cragun~~	Severity	Editorial => Objection
2011-02-07 20:23	~~Don Cragun~~	Type	Clarification Requested => Enhancement Request
2011-02-07 21:56	dwheeler	Note Added: 0000668
2011-02-07 22:20	~~Don Cragun~~	Description Updated
2011-02-07 22:20	~~Don Cragun~~	Desired Action Updated
2011-02-07 22:22	~~Don Cragun~~	Note Edited: 0000667
2011-02-07 22:50	dwheeler	Note Added: 0000669
2011-02-07 23:13	eblake	Note Added: 0000670
2011-03-08 03:15	dwheeler	Note Added: 0000688
2011-04-24 13:16	jsonn	Note Added: 0000754
2011-04-24 19:01	dwheeler	Note Added: 0000755
2011-04-24 21:48	jsonn	Note Added: 0000756
2011-04-25 20:45	dwheeler	Note Added: 0000758
2011-04-26 09:25	markh	Note Added: 0000759
2011-04-26 09:57	wpollock	Note Added: 0000760
2011-04-26 10:36	jsonn	Note Added: 0000761
2011-09-22 08:59	ajosey	Note Added: 0000967
2011-09-25 00:36	dwheeler	Note Added: 0000974
2011-09-25 18:51	gber	Note Added: 0000975
2011-11-15 17:26	dwheeler	Note Added: 0001011
2011-11-15 19:25	ajosey	File Added: Extendingshellconditionals.pdf
2011-11-15 19:27	ajosey	File Added: Extendingshellconditionals.txt
2011-11-15 19:30	ajosey	Note Added: 0001013
2011-11-23 23:49	dwheeler	File Added: Extendingshellconditionals-2011-11-23.pdf
2011-11-23 23:50	dwheeler	File Added: Extendingshellconditionals-2011-11-23.odt
2011-11-23 23:50	dwheeler	File Added: Extendingshellconditionals-2011-11-23.txt
2011-11-23 23:57	dwheeler	Note Added: 0001044
2011-11-28 15:50	geoffclare	Note Added: 0001060
2011-11-28 15:51	geoffclare	File Added: Extendingshellconditionals-2011-11-28.odt
2011-11-28 15:52	geoffclare	File Added: Extendingshellconditionals-2011-11-28.pdf
2011-11-28 16:10	geoffclare	Note Edited: 0001060
2011-11-28 16:38	dwheeler	Note Added: 0001061
2011-11-29 03:26	Roger Marquis	Note Added: 0001064
2011-11-29 03:27	Roger Marquis	Note Edited: 0001064
2011-11-29 04:51	dwheeler	Note Added: 0001065
2013-10-10 01:15	dwheeler	Note Added: 0001869
2013-10-12 04:08	dwheeler	Note Added: 0001881
2013-10-24 15:15	geoffclare	File Added: Extendingshellconditionals-2013-10-24.odt
2013-10-24 15:16	geoffclare	File Added: Extendingshellconditionals-2013-10-24.pdf
2013-10-24 15:23	geoffclare	Note Added: 0001944
2013-10-25 10:26	geoffclare	Note Added: 0001945
2013-10-31 15:55	~~Don Cragun~~	Relationship added	has duplicate 0000762
2013-11-01 15:36	ranjit	Note Added: 0001950
2013-11-01 19:09	dwheeler	Note Added: 0001953
2013-11-01 19:43	shware_systems	Note Added: 0001954
2013-11-10 14:46	dwheeler	Note Added: 0001979
2013-11-22 22:13	dwheeler	Note Added: 0002017
2013-11-29 18:56	dwheeler	File Added: Extendingshellconditionals-2013-11-29.odt
2013-11-29 18:57	dwheeler	File Added: Extendingshellconditionals-2013-11-29-track-changes.pdf
2013-11-29 18:58	dwheeler	File Added: Extendingshellconditionals-2013-11-29.pdf
2013-11-29 19:11	dwheeler	Note Added: 0002031
2013-11-29 19:24	dwheeler	Note Added: 0002032
2013-12-01 22:52	jilles	Note Added: 0002033
2013-12-02 08:59	shware_systems	Note Added: 0002034
2013-12-02 11:15	shware_systems	Note Added: 0002035
2013-12-08 22:06	dwheeler	Note Added: 0002052
2013-12-08 22:48	dwheeler	Note Added: 0002053
2013-12-08 23:05	dwheeler	Note Added: 0002054
2013-12-09 01:32	dwheeler	File Added: Extendingshellconditionals-2013-12-08.odt
2013-12-09 01:33	dwheeler	File Added: Extendingshellconditionals-2013-12-08.pdf
2013-12-09 01:35	dwheeler	Note Added: 0002055
2013-12-09 06:55	shware_systems	Note Added: 0002056
2013-12-10 04:46	dwheeler	Note Added: 0002059
2013-12-10 04:52	dwheeler	File Added: Extendingshellconditionals-2013-12-09.odt
2013-12-10 04:53	dwheeler	File Added: Extendingshellconditionals-2013-12-09.pdf
2013-12-10 09:41	shware_systems	Note Added: 0002060
2013-12-10 16:08	dwheeler	Note Added: 0002061
2013-12-10 22:36	jilles	Note Added: 0002063
2013-12-11 00:46	dwheeler	Note Added: 0002064
2013-12-11 00:48	shware_systems	Note Added: 0002065
2013-12-11 00:58	dwheeler	Note Added: 0002066
2013-12-12 01:38	shware_systems	Note Added: 0002068
2013-12-12 06:04	shware_systems	Note Edited: 0002068
2014-03-06 17:08	eblake	Relationship added	related to 0000813
2014-03-06 17:15	geoffclare	Note Added: 0002176
2014-03-06 19:13	ranjit	Note Added: 0002177
2014-03-06 21:22	shware_systems	Note Added: 0002178
2014-03-07 15:46	dwheeler	Note Added: 0002179
2014-03-07 15:52	dwheeler	Note Added: 0002180
2014-03-11 14:52	shware_systems	Note Added: 0002181
2014-03-12 23:05	dwheeler	Note Added: 0002182
2014-03-14 13:25	ranjit	Note Added: 0002186
2014-10-13 16:44	rhansen	Note Added: 0002418
2014-10-13 16:52	geoffclare	Note Added: 0002419
2014-10-13 16:55	dwheeler	Note Added: 0002420
2014-10-13 17:54	rhansen	Note Added: 0002421
2016-09-19 21:46	stephane	Note Added: 0003382
2016-09-20 17:16	shware_systems	Note Added: 0003383
2016-09-20 20:34	stephane	Note Added: 0003384
2016-09-20 21:03	stephane	Note Added: 0003385
2016-09-20 21:07	stephane	Note Edited: 0003385
2016-09-21 07:35	shware_systems	Note Added: 0003386
2016-09-21 10:20	stephane	Note Added: 0003387
2016-09-21 22:44	shware_systems	Note Added: 0003388
2016-09-22 08:42	stephane	Note Added: 0003389
2016-09-22 08:59	stephane	Note Added: 0003390
2016-09-23 04:17	shware_systems	Note Added: 0003391
2016-09-23 06:59	stephane	Note Added: 0003392
2016-09-23 13:05	chet_ramey	Note Added: 0003393
2016-10-10 11:35	shware_systems	Note Added: 0003405
2017-05-19 21:20	mirabilos	Note Added: 0003698
2017-05-19 21:44	shware_systems	Note Added: 0003700
2017-05-19 21:46	shware_systems	Note Edited: 0003700
2022-09-22 16:18	geoffclare	Note Added: 0005972
2022-09-22 20:30	stephane	Note Added: 0005973
2022-10-14 15:12	chet_ramey	Note Added: 0005991
2022-10-26 14:16	steffen	Note Added: 0006014
2022-11-09 09:44	geoffclare	Note Added: 0006040
2022-11-09 09:46	geoffclare	Note Edited: 0006040
2022-11-14 16:06	nick	Final Accepted Text	=> See 0000375:0006040
2022-11-14 16:06	nick	Status	Under Review => Resolved
2022-11-14 16:06	nick	Resolution	Open => Accepted As Marked
2022-11-14 16:07	nick	Tag Attached: issue8
2022-11-30 16:44	geoffclare	Status	Resolved => Applied
2024-06-11 08:53	agadmin	Status	Applied => Closed

View Issue Details

Relationships

Activities

Issue History

has duplicate	0000762	Closed	ajosey	1003.1(2008)/Issue 7	Add "==" as synonym for "=" in test
related to	0000813	Closed		1003.1(2013)/Issue7+TC1	Utility numeric argument syntax requirements ambiguous