Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001538 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Editorial Error 2021-12-05 06:48 2024-06-11 09:07
Reporter andras_farkas View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Andras Farkas
Organization
User Reference
Section what
Page Number 3437
Line Number 116041
Interp Status Approved
Final Accepted Text Note: 0005857
Summary 0001538: what -s is poorly described, uses the word "quit"
Description On:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/what.html [^]
The -s option for what is described as so:
Quit after finding the first occurrence of the pattern in each file.
I find the usage of the word 'quit' here unfortunate, as it can be read as exiting or terminating.
Both "in each file" and the behavior of what on various BSDs leads me to believe "quit" isn't the best way to describe the behavior of -s as "what -s foo bar" on the BSDs doesn't quit after finding a pattern in foo. (it also checks bar after) If I understand right, this may be a correct behavior according to the standard.

In the man pages on NetBSD and OpenBSD, they describe -s as follows:
If the -s option is specified, only the first occurrence of an identification string in each file is printed.
On FreeBSD, the following phrasing is used:
Stop searching each file after the first match.
I also checked the phrasing in Solaris 10 and AIX 7.2 but don't have access to test actual behavior. (So I don't know if on SysV, "what -s foo bar" doesn't check bar if a pattern is found in foo)

I don't know if this is an Issue 7 or Issue 8 sort of thing to fix, so I have this marked as Issue 7.
Desired Action There may be a better wording than this, but this is what I recommend.

Change:
Quit after finding the first occurrence of the pattern in each file.
to:
For each file, print only the first occurrence of the pattern.

I wonder if there's a way to make this more clear. -s doesn't make "pathname:" not get printed, so "print only" can still be misinterpreted. I'd like feedback and suggestions on this.

On that topic, the normative STDOUT section doesn't specify how printing works when multiple patterns are found, while the informative EXAMPLES section does. (Alternatively, this could be read as the STDOUT and EXAMPLES section defining different outputs)
Tags tc3-2008
Attached Files

- Relationships
related to 0001563Closed Wording for what seem to imply odd behavior. "all occurrences of @(#)" 

-  Notes
(0005674)
geoffclare (manager)
2022-02-17 15:57

I checked Solaris 11.4 and "what -s" with two files searched the second file after finding a match in the first, as expected.
(0005675)
geoffclare (manager)
2022-02-17 17:00
edited on: 2022-06-21 09:18

OLD Interpretation response
------------------------
The standard states the output produced by the what utility includes the name of the file for each occurrence of the pattern found and that no output is produced for a file that has no occurrences of the pattern, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
Implementations write the pathname once for each file, regardless of how many identification strings are found in it.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
On page 3437 line 116038 (-s option), change:
Quit after finding the first occurrence of the pattern in each file.
to:
Skip to the next file operand (if any) after finding the first occurrence of the pattern in each file.

On page 3437 line 116063 (STDOUT), change:
The standard output shall consist of the following for each file operand:
"%s:\n\t%s\n", <pathname>, <identification string>
to:
The standard output for each file operand shall consist of:
"%s:\n", <file>
followed by one line for each identification string (if any) found in the file, in the following format:
"\t%s\n",  <identification string>


(0005680)
kre (reporter)
2022-02-18 09:14

Re Note: 0005675

Rather than:

    Skip to the next file operand (if any) after finding the first
    occurrence of the pattern in each file.

which suggests a required implementation technique, it would be better
to say

    Output no more than the string after the first matching
    occurrence of the pattern in each file.

which allows the implementation to continue reading the file to EOF if
that is what it wants to do, or to simply stop reading upon encountering
the first occurrence of the pattern, and go on to the next file.

As proposed, it also perhaps suggests that with -s, what can find the
pattern, then skip to the next file (omitting writing the identification
string). That would be perverse, but can be read into the proposed wording.

While my phrasing can always be improved, I chose "no more than the first"
rather than "only the first" (which would be simpler and easier to read)
to avoid needing to add an "(if any)" or something similar to make it clear
that it isn't required that the pattern occur.

As an aside, it might also, while we are here, making it clear what should
be output when (with no use of -s) a file contains

     @(#)ABC@(#)DEF@(#)GHI\n

(and similar). Does that require 3 lines of output (after the filename
output) like

     file:
             ABC@(#)DEF@(#)GHI
             DEF@(#)GHI
             GHI

or just one line (the first above), or something different?

The implementation I am familiar with outputs just one (that is, it starts
searching for the next occurrence of @(#) following the end of the
identification string that is output, but neither the standard,
nor our manual page, says that it should work that way.

And also while here, the text currently says:

   INPUT FILES
            The input files shall be of any file type.

which is fine by me, but I'm not sure it is consistent with all
implementations (and perhaps also the standard) to require that
directories be able to be read along with other file types.

And as a postscript, it seems kind of weird to be wasting any time at all on
sccs related commands, simply deleting them all (from the standard) might be
a simpler thing to do.
(0005681)
geoffclare (manager)
2022-02-18 09:39

> it would be better to say
>
> Output no more than the string after the first matching
> occurrence of the pattern in each file.

That would be more difficult to interpret, given that the proposed STDOUT text says "one line for each identification string (if any) found in the file".

The combination of this and your proposed text would mean that "what -s" is not allowed to find more than one string in a file. Which implies that it cannot read to the end of the file just as much as using "skip" does.

In any case there is an implied "as if" with all requirements made by the standard. An implementation can do whatever it likes internally as long as a conforming application can't tell the difference. So neither proposal actually places any requirement on implementation technique.

> As proposed, it also perhaps suggests that with -s, what can find the
> pattern, then skip to the next file (omitting writing the identification
> string).

No, because the STDOUT section requires that it writes the string it found.

The problem with multiple @(#) strings in one line should be dealt with in a separate bug so that it can have its own interpretation response.
(0005682)
andras_farkas (reporter)
2022-02-18 14:39

I agree with the accepted text in note 0005675 and would like this change.
However, kre has some good points:

> As an aside, it might also, while we are here, making it clear what should
> be output when (with no use of -s) a file contains
>
> @(#)ABC@(#)DEF@(#)GHI\n
>
> (and similar). Does that require 3 lines of output (after the filename
> output) like
>
> file:
> ABC@(#)DEF@(#)GHI
> DEF@(#)GHI
> GHI
>
> or just one line (the first above), or something different?

I feel having it only output one line makes the most sense. When we look at:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/get.html [^]
Notice the %A% keyword: %Z%%Y%%M%%I%%Z% Since %Z% is @(#) notice that it'd be expanded to "@(#)TextHere@(#)". If 'what' were to produce multiple lines for such a line, one of those lines would be a useless blank line.

I'll make a bug report for this, momentarily.

> And as a postscript, it seems kind of weird to be wasting any time at all on
> sccs related commands, simply deleting them all (from the standard) might be
> a simpler thing to do.

If the SCCS commands are removed, I'd like 'what' to be kept. 'what' is a very useful tool when developers take the time to use "@(#)".
(0005683)
andras_farkas (reporter)
2022-02-18 15:07

I made a new, related bug report:
https://austingroupbugs.net/view.php?id=1563 [^]
(0005687)
kre (reporter)
2022-02-18 19:20

Re Note: 0005681

Quoting:
     That would be more difficult to interpret, given that the proposed STDOUT
     text says "one line for each identification string (if any) found in the
     file".

It does indeed, but that's not relevant (for the issue of how to phrase the
behaviour of the -s option).

     The combination of this and your proposed text would mean that "what -s"
     is not allowed to find more than one string in a file.

Yes. That is the desired outcome, surely?

     Which implies that it cannot read to the end of the file just as much
     as using "skip" does.

Nonsense. cat reads to end of the file, yet doesn't find any @(#) patterns
or identification strings. All that it takes not to find something is not to
look for it. You may ask why one would bother reading, and not looking, and
indeed, most implementations probably don't - unless the intent is to
actually require that (in which case the language should contain "shall" and
be considerably more precise. However there are reasons (which would be better
discussed on the mailing list than here, if you are interested) why an
implementation might want to do this, if it is permitted - unless the intent
is to prohibit it, then the language should not be capable of being read as
if it does prohibit it, which the proposed text is capable of being.

     In any case there is an implied "as if" with all requirements made
     by the standard.

Sure, that's understood.

     An implementation can do whatever it likes internally as long as a
     conforming application can't tell the difference.

Yes. The problem is that here, the difference is visible. Even leaving
aside non-POSIX techniques that allow a file descriptor to be shared between
processes, when one of them opens a magic filename (/proc/PID/fd/N or /dev/fd/N)
to access a file descriptor already opened by another process (where obviously
the final setting of the file offset when what finishes is visible to the other
process) we still have the requirement that what be able to read all file types.

One such type is a fifo

If we do:

    mkfifo /tmp/fifo
    what -s /tmp/fifo &
    { printf '@(#)%s\n' "${idstring}";sleep 100000;printf done\\n;}>/tmp/fifo

[Apologies for the lack of redundant spaces, which would improve readability,
-- just an attempt to avoid line wrapping...]

Does the what process finish in less that 100000 seconds, or not? That's
visible. Does the second printf get SIGPIPE or not? Also visible.

There is no "as if" here. This whole problem occurs because of the wording
indicating how the -s flag should be implemented, rather than what it means
(what its effect is). Do it that way, say nothing about whether the rest of
the file is read or not (or explicitly make it unspecified if you prefer),
and implementations can then do it whichever way they like.

For the next point, please do remember that this is reading the text in
a particularly pedantic, way - but what is written should really be immune to
this kind of wordplay...

    > As proposed, it also perhaps suggests that with -s, what can find the
    > pattern, then skip to the next file (omitting writing the identification
    > string).

    No, because the STDOUT section requires that it writes the string it found.

Not relevant. The proposed wording is:

    Skip to the next file operand (if any) after finding the first
    occurrence of the pattern in each file.

That is, in pseudo-C (assuming we're in a loop which runs once for each
file), and ignoring the actual reading, all error checking, buffering, file
opening and closing, ...

    if (p[0] == '@' && p[1] == '(' && p[2] == '#' && p[3] == ')') {
          if (sflag)
              continue;
          /* here we find the end of the identification string and write it */
    }

That is doing exactly what the line quoted above says, when we find the
pattern ("@(#)") we skip to the next file.

The proposed STDOUT section is:

     followed by one line for each identification string (if any) found
     in the file,

which is fine, but doesn't apply, as the code didn't ever find an
identification string - it didn't look for one (note that the pattern
is not part of that string - the string follows the pattern) - the code
did as instructed, as soon as the pattern is found, skip to the next file.

We should not be creating new text that even leaves the possibility of
arguments like that one - not when it is so easy to do it properly (even
if that means also slightly modifying the STDOUT section proposed as well).
(0005688)
kre (reporter)
2022-02-18 19:27

Also re Note: 0005681

  The problem with multiple @(#) strings in one line should be dealt with in a
  separate bug so that it can have its own interpretation response.

Andras already did that, so whatever is needed, can happen, but I doubt that
an interpretation is needed for that one, no more than it is for changing the
wording from "quit" to "skip to the next" or whatever it ends up saying. Those
are just clarifying what was always intended, how the implementations actually
work, but wasn't stated clearly.

What needed the interpretation (as much as I understand the processing
requirements here, which isn't much) was the change from the explicit (but
wrong) text in the old standard requiring the file name to be printed again
and again for each id string printed. That's where the standard was clear,
but incorrect - that's where (or at least one case of) an interpretation
is needed ... not just clarifying the text to say what it should always
have said, but didn't bother to say clearly.
(0005764)
agadmin (administrator)
2022-03-25 17:08

Interpretation proposed: 25 March 2022
(0005821)
geoffclare (manager)
2022-04-26 11:08

Re: Note: 0005687

Thank you for pointing out that the difference is detectable by applications if a FIFO is being read. All implementations I have tried close the file without reading to the end and this would appear to have been the intention of the current "quit" wording. So unless someone can find an existing implementation that reads to the end, I believe it would be best to require this "skip" behaviour rather than make applications cope with either behaviour. Perhaps a RATIONALE addition is warranted, such as:
The description of the -s option requires what to ``skip'' to the next file operand. This intentionally implies that it does not continue reading from the current file. Applications would experience different behavior if a file operand names a FIFO special file and what waited for an end-of-file condition rather than closing the file straight away.


On the second point, I'm happy to alter the wording to prevent the potential for it being misunderstood. I'd suggest:
Skip to the next file operand (if any) after writing to standard output the first occurrence of the pattern in each file.
(0005854)
kre (reporter)
2022-06-20 15:02
edited on: 2022-06-20 17:00

Re Note: 0005821

Apologies for the delay of this response.

First, I don't now (nor did I earlier) take any position wrt what
what is supposed to do, or how it should behave (there are few utilities
that are less important than this one - some, but not a lot).

My concerns are entirely about how the standard specifies whatever it
intends to specify, so if the intent is that implementations must read
no more than the identification string following the first pattern, and
then close that file and go onto the next, that's fine - but let's be
precise in how we specify it please.

For example, the proposed wording

    Skip to the next file operand (if any) after writing to standard output
    the first occurrence of the pattern in each file.

Is nothing like correct. First, and most blatantly, what *never*
writes the pattern to standard output. The pattern is the @(#)
string, the DESCRIPTION says so quite explicitly:

      The what utility shall search the given files for all occurrences of the
      pattern that get (see get) substitutes for the %Z% keyword ("@(#)")

that's clear it continues:

     and shall write to standard output what follows until the first
     occurrence of one of the following:

(the list isn't important here) - what is is that that which is written is
not the pattern. It is also relevant that there is no term defined here
(or anywhere) to label this "what follows" text, the STDOUT section says

    The standard output shall consist of the following for each file operand:
    "%s:\n\t%s\n", <pathname>, <identification string>

but nowhere is it defined what an "identification string" is, we are just
supposed to presume that that is the "what follows" text, but nothing
actually equates the two.

What I'd suggest is changing the DESCRIPTION to be

    The what utility searches for identification strings in each file
    given, and writes those identification strings to standard output.
    An identification string is the text that follows an occurrence of
    the pattern that get (see get) substitutes for the %Z% keyword ("@(#)")
    up until the next occurrence of one of the following:

(I believe that EOF was already added to the list of terminating "characters",
if not, it needs to be.)


Then for -s I'd say

     -s After locating and writing to standard output the identification
          string following the first pattern in a file, no further data
          shall be read from that file, the search shall recommence from
          the beginning of the subsequent file, if any.

(0005857)
geoffclare (manager)
2022-06-21 09:16
edited on: 2022-06-23 15:30

(Note that bug 0001563 already fixed STDOUT, so this is just about -s)

Interpretation response
------------------------
The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
The use of the word "quit" can be interpreted as allowing what to exit without processing later files, whereas the intention is that it skips to the next file.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

On page 3437 line 116032 section what, after applying bug 1563 change:
The what utility shall write to standard output what follows until the first occurrence ...
to:
The what utility shall write to standard output the identification string that follows up to, but not including, the first occurrence ...

On page 3437 line 116038 section what (-s option), change:
Quit after finding the first occurrence of the pattern in each file.
to:
Write at most one identification string for each file. After locating and writing to standard output the identification string following the first pattern (if any) in a file, no further data shall be read from that file and the search shall recommence from the beginning of the next file, if any.

On page 3438 line 116099 section what, change RATIONALE from:
None.
to:
The standard requires that when the -s option is used, what does not continue reading from the current file after writing the first identification string. This might seem an unimportant detail, but applications would experience different behavior if a file operand named a FIFO special file and what waited for an end-of-file condition rather than closing the file straight away.


(0005869)
agadmin (administrator)
2022-06-24 09:16

Revised Interpretation proposed: 24 June 2022
(0005914)
agadmin (administrator)
2022-07-26 11:15

Interpretation approved: 26 July 2022

- Issue History
Date Modified Username Field Change
2021-12-05 06:48 andras_farkas New Issue
2021-12-05 06:48 andras_farkas Name => Andras Farkas
2021-12-05 06:48 andras_farkas Section => what
2022-02-17 09:02 Don Cragun Page Number => 3437
2022-02-17 09:02 Don Cragun Line Number => 116041
2022-02-17 09:02 Don Cragun Interp Status => ---
2022-02-17 15:57 geoffclare Note Added: 0005674
2022-02-17 17:00 geoffclare Note Added: 0005675
2022-02-17 17:02 geoffclare Final Accepted Text => Note: 0005675
2022-02-17 17:02 geoffclare Status New => Interpretation Required
2022-02-17 17:02 geoffclare Resolution Open => Accepted As Marked
2022-02-17 17:03 geoffclare Interp Status --- => Pending
2022-02-17 17:03 geoffclare Tag Attached: tc3-2008
2022-02-18 09:14 kre Note Added: 0005680
2022-02-18 09:39 geoffclare Note Added: 0005681
2022-02-18 14:39 andras_farkas Note Added: 0005682
2022-02-18 15:07 andras_farkas Note Added: 0005683
2022-02-18 19:20 kre Note Added: 0005687
2022-02-18 19:27 kre Note Added: 0005688
2022-03-25 17:08 agadmin Interp Status Pending => Proposed
2022-03-25 17:08 agadmin Note Added: 0005764
2022-04-26 11:08 geoffclare Note Added: 0005821
2022-06-20 15:02 kre Note Added: 0005854
2022-06-20 15:17 kre Note Edited: 0005854
2022-06-20 16:57 kre Note Edited: 0005854
2022-06-20 17:00 kre Note Edited: 0005854
2022-06-21 08:54 geoffclare Relationship added related to 0001563
2022-06-21 09:16 geoffclare Note Added: 0005857
2022-06-21 09:18 geoffclare Note Edited: 0005675
2022-06-21 09:20 geoffclare Note Edited: 0005857
2022-06-22 07:55 geoffclare Note Edited: 0005857
2022-06-23 15:29 geoffclare Interp Status Proposed => Pending
2022-06-23 15:29 geoffclare Final Accepted Text Note: 0005675 => Note: 0005857
2022-06-23 15:30 geoffclare Note Edited: 0005857
2022-06-23 15:30 geoffclare Note Edited: 0005857
2022-06-24 09:16 agadmin Note Added: 0005869
2022-06-24 09:31 agadmin Interp Status Pending => Proposed
2022-07-26 11:15 agadmin Interp Status Proposed => Approved
2022-07-26 11:15 agadmin Note Added: 0005914
2022-08-19 15:04 geoffclare Status Interpretation Required => Applied
2024-06-11 09:07 agadmin Status Applied => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker