Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000282 [1003.1(2008)/Issue 7] Shell and Utilities Editorial Clarification Requested 2010-07-12 15:20 2013-04-16 13:06
Reporter bonzinip View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Paolo Bonzini
Organization
User Reference
Section sed
Page Number 394
Line Number 104833
Interp Status Approved
Final Accepted Text Note: 0000533
Summary 0000282: Extended Description wrong with respect to D command
Description The extended description for sed says "In default operation, sed cyclically shall append a line of input, less its terminating <newline>, into the pattern space. Normally the pattern space will be empty, unless a D command terminated the last cycle."

The POSIX standard requires an input read on every new cycle, regardless of what's in the pattern space after a D command. It makes no mention of skipping the input read on the next cycle if the pattern space is not empty. However, this is contrary to the operation of most (all?) implementations, which do *not* append a line if the cycle was restarted due to the D command.

Looking back now at POSIX ancestors, I see that the wording *used* to be:

"sed cyclically copies a line of input, less its terminating newline character, into a pattern space (unless there is something left after a D command)..."

Which is consistent with GNU sed documentation and behavior. But that was back in 1997 (Open Group, Commands and Utilities, Issue 5).

Reference:
http://www.opengroup.org/online-pubs-short?DOC=9693999599&FORM=PDF [^]

Starting with Issue 6 (which was POSIX:2001), the wording was apparently changed to:

"sed cyclically shall append a line of input, less its terminating <newline>, into the pattern space. Normally the pattern space will be empty, unless a D command terminated the last cycle."

The "Change History" section of that document yields only one possible (vague) clue for this change:

"The EXTENDED DESCRIPTION is changed to align with the IEEE P1003.2b draft standard."

Reference:
http://www.opengroup.org/onlinepubs/009695399/toc.htm [^]

However, a fairly late draft of 1003.2b (D12, the last or next to last version) that I found still had the (presumably correct) wording:

"In default operation, sed cyclically shall copy a line of input, less its terminating <newline>, into a pattern space (unless there is something left after a D command)"

It would seem that the POSIX standard on this aspect of sed (behavior of the D command) has been broken for almost 10 years, due to some apparently unjustified and mysterious reason.

It doesn't seem safe to use the D command portably until the standard is fixed.

This is a testcase:

/[23]/q
N
D

with input

1
2
3

should print "2" when following the GNU implementation, "2<newline>3"
when following POSIX.
Desired Action Change back the extended description to "unless there is something left after a D command, sed cyclically shall copy a line of input, less its terminating <newline>, into a pattern space".
Tags tc1-2008
Attached Files

- Relationships

-  Notes
(0000460)
nick (manager)
2010-07-12 18:14

Issue 6 had several base documents, and some extensive editorial work was done during the ballot cycles to use uniform language (what became known as the great "shallification"). I suspect the root of the change was in this phase, and the change may well have been unintentional.
(0000483)
msbrown (manager)
2010-07-29 15:56

At line 104833, replace:

In default operation, sed cyclically shall append a line of input, less its terminating <newline>, into the pattern space. Normally the pattern space will be empty, unless a D command terminated the last cycle."

with:

In default operation, sed cyclically shall copy a line of input, less its terminating <newline> character, into a pattern space unless there is something left after a D command.
(0000484)
ajosey (manager)
2010-07-29 16:00

As an informational note. The change occurred between Draft 5 and Draft 6 of the original Austin Group draft ~April 2001. Draft 5 was the 1003.2b merge and included the wording as noted.

The Change Request report at:
http://www.opengroup.org/austin/docs/austin_75r1.txt [^]

notes a change due to change [DST-1939] which was applied.


 _____________________________________________________________________________
 OBJECTION Enhancement Request Number 453
 donnte@xxxxxxxxxxxxxx Bug in xcud5 Assorted (rdvk# 434)
 [DST-1939] Mon, 5 Feb 2001 19:57:08 -0800
 _____________________________________________________________________________
 Accept_____ Accept as marked below_X___ Duplicate_____ Reject_____
 Rationale for rejected or partial changes:

In default operation, sed cyclically shall append a line of input,
less its terminating <newline>, into the pattern space. Normally the
pattern space will be empty, unless a D command terminated the last cycle.
The sed utility shall then apply in sequence all...


 _____________________________________________________________________________
 Page: 3047 Line: 32102 Section: sed


 Problem:

 In default operation, sed cyclically shall copy a line of input,
 less its terminating <newline>, into a pattern space (unless there
 is something left after a D command), apply in sequence all

 This is unclear: "unless" what? I *think* it's trying to say the following.

 Action:

 In default operation, sed cyclically shall append a line of input,
 less its terminating <newline>, into a pattern space. Normally
 the pattern space will be empty, but if a D command has been used
 it may not be empty. It shall then apply in sequence all...
(0000524)
bonzinip (reporter)
2010-08-16 19:35

Thanks for the information!

It was now brought to my attention that even the original text of Issue 5 is not correct.

The right wording would be:

"In default operation, sed cyclically shall copy a line of input, less its terminating <newline> character, into a pattern space. This step shall however be skipped whenever the last executed command was a D command, and the command found no newline in pattern space."

The reason is that the wording in POSIX introduces an unwanted difference in the behavior of the D command, depending on whether the last line appended to pattern space was empty or not.

For example, take the following sed invocation:

(1) $ sed -ne 'N;=;D'

GNU sed and BSD sed both implement the wording I suggested, so the POSIX behavior could be implemented with

(2) $ sed -ne 'N;=;/\n$/d;D'

in both GNU sed and BSD sed.

Now, given the two inputs "\n\n\n" and "a\na\na\n", the output should be the same, since the overall structure of the file is the same (the only difference is whether lines are empty or not). However:

- for "a\na\na\n" both scripts will output "2\n3\n";
- for "\n\n\n", script 1 will output "2\n3\n", while script 2 will output "2\n"

This shows how the output of script 1 is more coherent.

Thanks!
(0000525)
bonzinip (reporter)
2010-08-16 19:36

Of course, the modification should be

"In default operation, sed cyclically shall copy a line of input, less its terminating <newline> character, into a pattern space. This step shall however be skipped whenever the last executed command was a D command, and the command found a newline in pattern space."

I apologize for the confusion.
(0000526)
geoffclare (manager)
2010-08-17 08:31

There is also a related problem with the description of the D command.
It says:

    Delete the initial segment of the pattern space through the first
    <newline> and start the next cycle.

It doesn't say what happens if there is no <newline> in the
pattern space.
(0000532)
eblake (manager)
2010-08-19 18:35
edited on: 2010-08-26 15:20

Based on the additional comments, the fixed wording should be as follows.


At line 104833, replace:

In default operation, sed cyclically shall append a line of input, less its terminating <newline>, into the pattern space. Normally the pattern space will be empty, unless a D command terminated the last cycle. The sed utility shall then apply in sequence all commands whose addresses select that pattern space, and at the end of the script copy the pattern space to standard output (except when −n is specified) and delete the pattern space.

with:

In default operation, sed cyclically shall append a line of input, less its terminating <newline> character, into the pattern space. Reading from input shall be skipped if a <newline> was in the pattern space prior to a D command ending the previous cycle. The sed utility shall then apply in sequence all commands whose addresses select that pattern space, until a command starts the next cycle or quits. If no commands explicitly started a new cycle, then at the end of the script the pattern space shall be copied to standard output (except when −n is specified) and the pattern space shall be deleted.

At line 104926, replace:

[2addr]D Delete the initial segment of the pattern space through the first <newline> and start the next cycle.

with:

[2addr]D If the pattern space contains no <newline>, delete the pattern space and start a normal new cycle as if the d command was issued. Otherwise, delete the initial segment of the pattern space through the first <newline>, and start the next cycle with the resultant pattern space and without reading any new input.

(0000533)
Don Cragun (manager)
2010-08-26 15:23

Interpretation response
------------------------
The standard states the behavior of the D command , and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
The current text does not match historic practice.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
Make the changes specified in Note: 0000532
(0000573)
ajosey (manager)
2010-10-14 11:28

Interpretation approved 14 October 2010

- Issue History
Date Modified Username Field Change
2010-07-12 15:20 bonzinip New Issue
2010-07-12 15:20 bonzinip Status New => Under Review
2010-07-12 15:20 bonzinip Assigned To => ajosey
2010-07-12 15:20 bonzinip Name => Paolo Bonzini
2010-07-12 15:20 bonzinip Section => sed
2010-07-12 15:20 bonzinip Page Number => ?
2010-07-12 15:20 bonzinip Line Number => ?
2010-07-12 18:14 nick Note Added: 0000460
2010-07-29 15:56 msbrown Page Number ? => 394
2010-07-29 15:56 msbrown Line Number ? => 104833
2010-07-29 15:56 msbrown Interp Status => ---
2010-07-29 15:56 msbrown Note Added: 0000483
2010-07-29 15:56 msbrown Severity Objection => Editorial
2010-07-29 15:56 msbrown Status Under Review => Resolved
2010-07-29 15:56 msbrown Resolution Open => Accepted As Marked
2010-07-29 15:56 msbrown Final Accepted Text => Note: 0000483
2010-07-29 16:00 ajosey Note Added: 0000484
2010-08-16 19:35 bonzinip Note Added: 0000524
2010-08-16 19:36 bonzinip Note Added: 0000525
2010-08-17 08:31 geoffclare Note Added: 0000526
2010-08-17 08:31 geoffclare Resolution Accepted As Marked => Reopened
2010-08-19 18:35 eblake Note Added: 0000532
2010-08-26 15:20 eblake Note Edited: 0000532
2010-08-26 15:23 Don Cragun Note Added: 0000533
2010-08-26 15:24 Don Cragun Final Accepted Text Note: 0000483 => Note: 0000533
2010-08-26 15:24 Don Cragun Status Resolved => Interpretation Required
2010-08-26 15:24 Don Cragun Resolution Reopened => Accepted As Marked
2010-08-26 16:33 Don Cragun Interp Status --- => Pending
2010-09-13 05:48 ajosey Interp Status Pending => Proposed
2010-09-24 16:18 geoffclare Tag Attached: tc1-2008
2010-10-14 11:28 ajosey Interp Status Proposed => Approved
2010-10-14 11:28 ajosey Note Added: 0000573
2013-04-16 13:06 ajosey Status Interpretation Required => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker