Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001560 [Issue 8 drafts] Shell and Utilities Editorial Enhancement Request 2022-01-31 23:30 2022-11-30 16:30
Reporter calestyo View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied   Product Version Draft 2.1
Name Christoph Anton Mitterer
Organization
User Reference
Section 2.6.3 Command Substitution
Page Number 2323
Line Number 74944
Final Accepted Text Note: 0005817
Summary 0001560: clarify wording of command substitution
Description In:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33716&limit=100&offset=0&sid= [^]

I've had asked whether POSIX requires any conforming shell to consider a last "line" without trailing newline in a command substitution for that purpose, or whether a shell would in principle be allowed to ignore such line, if it had no trailing newline.

The answer was, that a shell MUST in fact consider such lines.
Desired Action 1)

In:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33741&limit=100&offset=0&sid= [^]
Geoff Clare proposed a rewording of the current text:

From:
»replacing the command substitution (the text of the commands string plus the enclosing "$()" or backquotes) with the standard output of the command(s), removing sequences of one or more <newline> characters at the end of the substitution. Embedded <newline> characters before the end of the output shall not be removed; however, they may be treated as field delimiters«

To:
»replacing the command substitution (the text of the commands string plus the enclosing "$()" or backquotes) with the standard output of the command(s); if the output ends with one or more bytes that have the encoded value of a <newline> character, they shall not be included in the replacement. Any such bytes that occur elsewhere shall be included in the replacement; however, they might be treated as field delimiters«





2) The above change would seem to already indicate that the standard output is to be taken as bytes.
Similarly, the line "If the output contains any null bytes, the behavior is unspecified." may be considered an indication that anything *except* NUL bytes need to be substituted

I personally, would further suggest to directly mention this, instead of just indirectly.
Tags tc3-2008
Attached Files

- Relationships
related to 0001561Applied clarify what kind of data shell variables need to be able to hold 
related to 0001649Applied Field splitting is woefully under specified, and in places, simply wrong 

-  Notes
(0005794)
geoffclare (manager)
2022-04-11 13:50

Since field splitting is performed on the results of (unquoted) command substitutions, it is also affected by this issue. Suggested changes...

On page 2323 line 74944 section 2.6.3 Command Substitution, change:
replacing the command substitution (the text of the commands string plus the enclosing "$()" or backquotes) with the standard output of the command(s), removing sequences of one or more <newline> characters at the end of the substitution. Embedded <newline> characters before the end of the output shall not be removed; however, they may be treated as field delimiters
to:
replacing the command substitution (the text of the commands string plus the enclosing "$()" or backquotes) with the standard output of the command(s); if the output ends with one or more bytes that have the encoded value of a <newline> character, they shall not be included in the replacement. Any such bytes that occur elsewhere shall be included in the replacement; however, they might be treated as field delimiters

On page 2325 line 75034 section 2.6.5 Field Splitting, change:
The shell shall treat each character of the IFS as a delimiter
to:
The shell shall treat the byte sequence comprising each character of the IFS as a delimiter

On page 2325 line 75038 section 2.6.5 Field Splitting, change:
any sequence of <space>, <tab>, or <newline> characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field
to:
any sequence of bytes that have the encoded values of <space>, <tab>, or <newline> characters at the beginning or end of the input shall be ignored and any sequence of such bytes within the input shall delimit a field

On page 2326 line 75046 section 2.6.5 Field Splitting, change:
The term ``IFS white space'' is used to mean any sequence (zero or more instances) of white-space characters that are in the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any sequence of <space> and <tab> characters is considered IFS white space).
to:
The term ``IFS white space'' is used to mean any sequence (zero or more instances) of the byte sequences that comprise white-space characters in the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any sequence of bytes that have the encoded values of <space> and <tab> characters is considered IFS white space).

On page 2326 line 75051 section 2.6.5 Field Splitting, change:
Each occurrence in the input of an IFS character that is not IFS white space
to:
Each occurrence in the input of a byte sequence comprising an IFS character that is not IFS white space
(0005802)
calestyo (reporter)
2022-04-15 00:41

AFAIU, this involves now three types of changes:

1) The first one, which improves on the wording of trailing newlines.
=> seems good to me.


2) "comprising each character of the IFS" and similar
"The shell shall treat the byte sequence comprising each character of the IFS as a delimiter"

It took me a bit to understand what's meant. I would reword this, especially the "each" is a bit strange here, I think.

AFAIU, you want to say, that any byte sequence in a word, that equals one of the characters in IFS is to be taken as a split point.
So isn't that *any* character... not *each* character?

What about:
"The shell shall treat a byte sequence forming any character of the characters in the IFS value as a delimiter"
- "comprise" doesn't quite fit, IMO
- replaced "the byte sequence" with "a...", cause "the" would be a concrete one, but there may be several (as there are several characters in IFS)
- aligned the wording with what you did in the change a bit below.


The same in:
"The term ``IFS white space'' is used to mean any sequence (zero or more instances) of the byte sequences that comprise white-space characters in the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any sequence of bytes that have the encoded values of <space> and <tab> characters is considered IFS white space)."

rather something like:
"The term ``IFS white space'' is used to mean any sequence (zero or more instances) of the byte sequences that form any of the white-space characters in the IFS value..."

Perhaps also instead of "is used to mean" just "means".



3) You introduce bytes/byte sequences vs. characters.

I don't understand why you need that at all?

The old wordings didn't really mention the to-be-matched-data (i.e. the processed words, which may be bytes) at all... just what's matched with (i.e. the characters of IFS),... which kinda delegated the question of with what the characters are matched with to other places.


Is this because, regardless of the lexical locale (i.e. the one in which the script/command is parsed, all the words are considered bytes (other than NUL), e.g. coming from parameter expansions/command substitutions, etc. (which all may be bytes) and you want to emphasise this?


I'm not against that change (of "type" (3))... I just wonder why and whether it's really needed at this point (or just complicates reading)... or whether it's already clear that words that undergo field splitting may be bytes.

Perhaps it would be better to generally mention that somewhere in the field splitting chapter?
I.e. something like:
"The words that undergo field splitting may be any byte (except NUL) and for the purpose of field splitting are matched against the characters of IFS".


However, as said, I wouldn't oppose that change, if you think it makes things clearer... I'd approve.


=> But there is one thing that's IMO lost on the way:
The old:
" any sequence of <space>, <tab>, or <newline> characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field"

"sequence of those characters" indicated that a sequence of 1-n IFS characters were still regarded as one single field splitter.

With the new:
"ignored and any sequence of such bytes"
that's IMO a bit lost... sequence of bytes is rather considered like ONE "multi-byte" character.

You don't have that problem with the 4th change, where you explicitly say:
"any sequence (zero or more instances) of the byte sequences that comprise white-space characters"

But these difference between the two (the latter, where you specifically point it out vs. the former where not)... makes it IMO even more to be interpreted like as if the first case would be different.
(0005817)
geoffclare (manager)
2022-04-22 11:05
edited on: 2022-04-25 10:34

New proposal ...

On page 2323 line 74944 section 2.6.3 Command Substitution, change:
replacing the command substitution (the text of the commands string plus the enclosing "$()" or backquotes) with the standard output of the command(s), removing sequences of one or more <newline> characters at the end of the substitution. Embedded <newline> characters before the end of the output shall not be removed; however, they may be treated as field delimiters
to:
replacing the command substitution (the text of the commands string plus the enclosing "$()" or backquotes) with the standard output of the command(s); if the output ends with one or more bytes that have the encoded value of a <newline> character, they shall not be included in the replacement. Any such bytes that occur elsewhere shall be included in the replacement; however, they might be treated as field delimiters

On page 2325 line 75034 section 2.6.5 Field Splitting, change:
The shell shall treat each character of the IFS as a delimiter
to:
The shell shall treat a byte sequence forming any of the characters in the IFS value as a delimiter

On page 2325 line 75038 section 2.6.5 Field Splitting, change:
any sequence of <space>, <tab>, or <newline> characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field
to:
any sequence (zero or more instances) of bytes that have the encoded values of <space>, <tab>, or <newline> characters at the beginning or end of the input shall be ignored and any sequence (one or more instances) of such bytes within the input shall delimit a field

On page 2326 line 75046 section 2.6.5 Field Splitting, change:
The term ``IFS white space'' is used to mean any sequence (zero or more instances) of white-space characters that are in the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any sequence of <space> and <tab> characters is considered IFS white space).
to:
The term ``IFS white space'' is used to mean any sequence (zero or more instances) of the byte sequences that form any of the white-space characters in the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any sequence of bytes that have the encoded values of <space> and <tab> characters is considered IFS white space).

On page 2326 line 75051 section 2.6.5 Field Splitting, change:
Each occurrence in the input of an IFS character that is not IFS white space
to:
Each occurrence in the input of a byte sequence that forms an IFS character that is not IFS white space


(0005819)
calestyo (reporter)
2022-04-24 22:45

See my reply to your mail on the list.

In specific:
a) "any sequence of bytes that have the encoded values of <space>, <tab>, or <newline> characters at the beginning or end of the input"

=> I'd add "(zero or more instances)"


b) Your change from "(zero or more instances)" to (one or more instances) is possibly wrong.
(0005820)
geoffclare (manager)
2022-04-25 10:36

I have edited Note: 0005817 to make the two small changes suggested in Note: 0005819.

- Issue History
Date Modified Username Field Change
2022-01-31 23:30 calestyo New Issue
2022-01-31 23:30 calestyo Name => Christoph Anton Mitterer
2022-01-31 23:30 calestyo Section => 2.6.3 Command Substitution
2022-01-31 23:30 calestyo Page Number => 2323
2022-01-31 23:30 calestyo Line Number => 74944
2022-04-07 16:29 geoffclare Relationship added related to 0001561
2022-04-11 13:50 geoffclare Note Added: 0005794
2022-04-15 00:41 calestyo Note Added: 0005802
2022-04-22 11:05 geoffclare Note Added: 0005817
2022-04-22 11:14 geoffclare Note Edited: 0005817
2022-04-24 22:45 calestyo Note Added: 0005819
2022-04-25 10:34 geoffclare Note Edited: 0005817
2022-04-25 10:36 geoffclare Note Added: 0005820
2022-10-31 16:30 geoffclare Final Accepted Text => Note: 0005817
2022-10-31 16:30 geoffclare Status New => Resolved
2022-10-31 16:30 geoffclare Resolution Open => Accepted As Marked
2022-10-31 16:30 geoffclare Tag Attached: tc3-2008
2022-11-30 16:30 geoffclare Status Resolved => Applied
2023-09-07 16:41 geoffclare Relationship added related to 0001649


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker