0001538: what -s is poorly described, uses the word "quit"

Notes
(0005674) geoffclare (manager) 2022-02-17 15:57	I checked Solaris 11.4 and "what -s" with two files searched the second file after finding a match in the first, as expected.

(0005675) geoffclare (manager) 2022-02-17 17:00 edited on: 2022-06-21 09:18	OLD Interpretation response ------------------------ The standard states the output produced by the what utility includes the name of the file for each occurrence of the pattern found and that no output is produced for a file that has no occurrences of the pattern, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- Implementations write the pathname once for each file, regardless of how many identification strings are found in it. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 3437 line 116038 (-s option), change: Quit after finding the first occurrence of the pattern in each file. to: Skip to the next file operand (if any) after finding the first occurrence of the pattern in each file. On page 3437 line 116063 (STDOUT), change: The standard output shall consist of the following for each file operand: "%s:\n\t%s\n", <pathname>, <identification string> to: The standard output for each file operand shall consist of: "%s:\n", <file> followed by one line for each identification string (if any) found in the file, in the following format: "\t%s\n", <identification string>

(0005680) kre (reporter) 2022-02-18 09:14	Re Note: 0005675 Rather than: Skip to the next file operand (if any) after finding the first occurrence of the pattern in each file. which suggests a required implementation technique, it would be better to say Output no more than the string after the first matching occurrence of the pattern in each file. which allows the implementation to continue reading the file to EOF if that is what it wants to do, or to simply stop reading upon encountering the first occurrence of the pattern, and go on to the next file. As proposed, it also perhaps suggests that with -s, what can find the pattern, then skip to the next file (omitting writing the identification string). That would be perverse, but can be read into the proposed wording. While my phrasing can always be improved, I chose "no more than the first" rather than "only the first" (which would be simpler and easier to read) to avoid needing to add an "(if any)" or something similar to make it clear that it isn't required that the pattern occur. As an aside, it might also, while we are here, making it clear what should be output when (with no use of -s) a file contains @(#)ABC@(#)DEF@(#)GHI\n (and similar). Does that require 3 lines of output (after the filename output) like file: ABC@(#)DEF@(#)GHI DEF@(#)GHI GHI or just one line (the first above), or something different? The implementation I am familiar with outputs just one (that is, it starts searching for the next occurrence of @(#) following the end of the identification string that is output, but neither the standard, nor our manual page, says that it should work that way. And also while here, the text currently says: INPUT FILES The input files shall be of any file type. which is fine by me, but I'm not sure it is consistent with all implementations (and perhaps also the standard) to require that directories be able to be read along with other file types. And as a postscript, it seems kind of weird to be wasting any time at all on sccs related commands, simply deleting them all (from the standard) might be a simpler thing to do.

(0005681) geoffclare (manager) 2022-02-18 09:39	> it would be better to say > > Output no more than the string after the first matching > occurrence of the pattern in each file. That would be more difficult to interpret, given that the proposed STDOUT text says "one line for each identification string (if any) found in the file". The combination of this and your proposed text would mean that "what -s" is not allowed to find more than one string in a file. Which implies that it cannot read to the end of the file just as much as using "skip" does. In any case there is an implied "as if" with all requirements made by the standard. An implementation can do whatever it likes internally as long as a conforming application can't tell the difference. So neither proposal actually places any requirement on implementation technique. > As proposed, it also perhaps suggests that with -s, what can find the > pattern, then skip to the next file (omitting writing the identification > string). No, because the STDOUT section requires that it writes the string it found. The problem with multiple @(#) strings in one line should be dealt with in a separate bug so that it can have its own interpretation response.

(0005682) andras_farkas (reporter) 2022-02-18 14:39	I agree with the accepted text in note 0005675 and would like this change. However, kre has some good points: > As an aside, it might also, while we are here, making it clear what should > be output when (with no use of -s) a file contains > > @(#)ABC@(#)DEF@(#)GHI\n > > (and similar). Does that require 3 lines of output (after the filename > output) like > > file: > ABC@(#)DEF@(#)GHI > DEF@(#)GHI > GHI > > or just one line (the first above), or something different? I feel having it only output one line makes the most sense. When we look at: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/get.html [^] Notice the %A% keyword: %Z%%Y%%M%%I%%Z% Since %Z% is @(#) notice that it'd be expanded to "@(#)TextHere@(#)". If 'what' were to produce multiple lines for such a line, one of those lines would be a useless blank line. I'll make a bug report for this, momentarily. > And as a postscript, it seems kind of weird to be wasting any time at all on > sccs related commands, simply deleting them all (from the standard) might be > a simpler thing to do. If the SCCS commands are removed, I'd like 'what' to be kept. 'what' is a very useful tool when developers take the time to use "@(#)".

(0005683) andras_farkas (reporter) 2022-02-18 15:07	I made a new, related bug report: https://austingroupbugs.net/view.php?id=1563 [^]

(0005687) kre (reporter) 2022-02-18 19:20	Re Note: 0005681 Quoting: That would be more difficult to interpret, given that the proposed STDOUT text says "one line for each identification string (if any) found in the file". It does indeed, but that's not relevant (for the issue of how to phrase the behaviour of the -s option). The combination of this and your proposed text would mean that "what -s" is not allowed to find more than one string in a file. Yes. That is the desired outcome, surely? Which implies that it cannot read to the end of the file just as much as using "skip" does. Nonsense. cat reads to end of the file, yet doesn't find any @(#) patterns or identification strings. All that it takes not to find something is not to look for it. You may ask why one would bother reading, and not looking, and indeed, most implementations probably don't - unless the intent is to actually require that (in which case the language should contain "shall" and be considerably more precise. However there are reasons (which would be better discussed on the mailing list than here, if you are interested) why an implementation might want to do this, if it is permitted - unless the intent is to prohibit it, then the language should not be capable of being read as if it does prohibit it, which the proposed text is capable of being. In any case there is an implied "as if" with all requirements made by the standard. Sure, that's understood. An implementation can do whatever it likes internally as long as a conforming application can't tell the difference. Yes. The problem is that here, the difference is visible. Even leaving aside non-POSIX techniques that allow a file descriptor to be shared between processes, when one of them opens a magic filename (/proc/PID/fd/N or /dev/fd/N) to access a file descriptor already opened by another process (where obviously the final setting of the file offset when what finishes is visible to the other process) we still have the requirement that what be able to read all file types. One such type is a fifo If we do: mkfifo /tmp/fifo what -s /tmp/fifo & { printf '@(#)%s\n' "${idstring}";sleep 100000;printf done\\n;}>/tmp/fifo [Apologies for the lack of redundant spaces, which would improve readability, -- just an attempt to avoid line wrapping...] Does the what process finish in less that 100000 seconds, or not? That's visible. Does the second printf get SIGPIPE or not? Also visible. There is no "as if" here. This whole problem occurs because of the wording indicating how the -s flag should be implemented, rather than what it means (what its effect is). Do it that way, say nothing about whether the rest of the file is read or not (or explicitly make it unspecified if you prefer), and implementations can then do it whichever way they like. For the next point, please do remember that this is reading the text in a particularly pedantic, way - but what is written should really be immune to this kind of wordplay... > As proposed, it also perhaps suggests that with -s, what can find the > pattern, then skip to the next file (omitting writing the identification > string). No, because the STDOUT section requires that it writes the string it found. Not relevant. The proposed wording is: Skip to the next file operand (if any) after finding the first occurrence of the pattern in each file. That is, in pseudo-C (assuming we're in a loop which runs once for each file), and ignoring the actual reading, all error checking, buffering, file opening and closing, ... if (p[0] == '@' && p[1] == '(' && p[2] == '#' && p[3] == ')') { if (sflag) continue; /* here we find the end of the identification string and write it */ } That is doing exactly what the line quoted above says, when we find the pattern ("@(#)") we skip to the next file. The proposed STDOUT section is: followed by one line for each identification string (if any) found in the file, which is fine, but doesn't apply, as the code didn't ever find an identification string - it didn't look for one (note that the pattern is not part of that string - the string follows the pattern) - the code did as instructed, as soon as the pattern is found, skip to the next file. We should not be creating new text that even leaves the possibility of arguments like that one - not when it is so easy to do it properly (even if that means also slightly modifying the STDOUT section proposed as well).

(0005688) kre (reporter) 2022-02-18 19:27	Also re Note: 0005681 The problem with multiple @(#) strings in one line should be dealt with in a separate bug so that it can have its own interpretation response. Andras already did that, so whatever is needed, can happen, but I doubt that an interpretation is needed for that one, no more than it is for changing the wording from "quit" to "skip to the next" or whatever it ends up saying. Those are just clarifying what was always intended, how the implementations actually work, but wasn't stated clearly. What needed the interpretation (as much as I understand the processing requirements here, which isn't much) was the change from the explicit (but wrong) text in the old standard requiring the file name to be printed again and again for each id string printed. That's where the standard was clear, but incorrect - that's where (or at least one case of) an interpretation is needed ... not just clarifying the text to say what it should always have said, but didn't bother to say clearly.

(0005764) agadmin (administrator) 2022-03-25 17:08	Interpretation proposed: 25 March 2022

(0005821) geoffclare (manager) 2022-04-26 11:08	Re: Note: 0005687 Thank you for pointing out that the difference is detectable by applications if a FIFO is being read. All implementations I have tried close the file without reading to the end and this would appear to have been the intention of the current "quit" wording. So unless someone can find an existing implementation that reads to the end, I believe it would be best to require this "skip" behaviour rather than make applications cope with either behaviour. Perhaps a RATIONALE addition is warranted, such as: The description of the -s option requires what to ``skip'' to the next file operand. This intentionally implies that it does not continue reading from the current file. Applications would experience different behavior if a file operand names a FIFO special file and what waited for an end-of-file condition rather than closing the file straight away. On the second point, I'm happy to alter the wording to prevent the potential for it being misunderstood. I'd suggest: Skip to the next file operand (if any) after writing to standard output the first occurrence of the pattern in each file.

(0005854) kre (reporter) 2022-06-20 15:02 edited on: 2022-06-20 17:00	Re Note: 0005821 Apologies for the delay of this response. First, I don't now (nor did I earlier) take any position wrt what what is supposed to do, or how it should behave (there are few utilities that are less important than this one - some, but not a lot). My concerns are entirely about how the standard specifies whatever it intends to specify, so if the intent is that implementations must read no more than the identification string following the first pattern, and then close that file and go onto the next, that's fine - but let's be precise in how we specify it please. For example, the proposed wording Skip to the next file operand (if any) after writing to standard output the first occurrence of the pattern in each file. Is nothing like correct. First, and most blatantly, what never writes the pattern to standard output. The pattern is the @(#) string, the DESCRIPTION says so quite explicitly: The what utility shall search the given files for all occurrences of the pattern that get (see get) substitutes for the %Z% keyword ("@(#)") that's clear it continues: and shall write to standard output what follows until the first occurrence of one of the following: (the list isn't important here) - what is is that that which is written is not the pattern. It is also relevant that there is no term defined here (or anywhere) to label this "what follows" text, the STDOUT section says The standard output shall consist of the following for each file operand: "%s:\n\t%s\n", <pathname>, <identification string> but nowhere is it defined what an "identification string" is, we are just supposed to presume that that is the "what follows" text, but nothing actually equates the two. What I'd suggest is changing the DESCRIPTION to be The what utility searches for identification strings in each file given, and writes those identification strings to standard output. An identification string is the text that follows an occurrence of the pattern that get (see get) substitutes for the %Z% keyword ("@(#)") up until the next occurrence of one of the following: (I believe that EOF was already added to the list of terminating "characters", if not, it needs to be.) Then for -s I'd say -s After locating and writing to standard output the identification string following the first pattern in a file, no further data shall be read from that file, the search shall recommence from the beginning of the subsequent file, if any.

(0005857) geoffclare (manager) 2022-06-21 09:16 edited on: 2022-06-23 15:30	(Note that bug 0001563 already fixed STDOUT, so this is just about -s) Interpretation response ------------------------ The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: ------------- The use of the word "quit" can be interpreted as allowing what to exit without processing later files, whereas the intention is that it skips to the next file. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 3437 line 116032 section what, after applying bug 1563 change: The what utility shall write to standard output what follows until the first occurrence ... to: The what utility shall write to standard output the identification string that follows up to, but not including, the first occurrence ... On page 3437 line 116038 section what (-s option), change: Quit after finding the first occurrence of the pattern in each file. to: Write at most one identification string for each file. After locating and writing to standard output the identification string following the first pattern (if any) in a file, no further data shall be read from that file and the search shall recommence from the beginning of the next file, if any. On page 3438 line 116099 section what, change RATIONALE from: None. to: The standard requires that when the -s option is used, what does not continue reading from the current file after writing the first identification string. This might seem an unimportant detail, but applications would experience different behavior if a file operand named a FIFO special file and what waited for an end-of-file condition rather than closing the file straight away.

(0005869) agadmin (administrator) 2022-06-24 09:16	Revised Interpretation proposed: 24 June 2022

(0005914) agadmin (administrator) 2022-07-26 11:15	Interpretation approved: 26 July 2022

Issue History
Date Modified	Username	Field	Change
2021-12-05 06:48	andras_farkas	New Issue
2021-12-05 06:48	andras_farkas	Name	=> Andras Farkas
2021-12-05 06:48	andras_farkas	Section	=> what
2022-02-17 09:02	Don Cragun	Page Number	=> 3437
2022-02-17 09:02	Don Cragun	Line Number	=> 116041
2022-02-17 09:02	Don Cragun	Interp Status	=> ---
2022-02-17 15:57	geoffclare	Note Added: 0005674
2022-02-17 17:00	geoffclare	Note Added: 0005675
2022-02-17 17:02	geoffclare	Final Accepted Text	=> Note: 0005675
2022-02-17 17:02	geoffclare	Status	New => Interpretation Required
2022-02-17 17:02	geoffclare	Resolution	Open => Accepted As Marked
2022-02-17 17:03	geoffclare	Interp Status	--- => Pending
2022-02-17 17:03	geoffclare	Tag Attached: tc3-2008
2022-02-18 09:14	kre	Note Added: 0005680
2022-02-18 09:39	geoffclare	Note Added: 0005681
2022-02-18 14:39	andras_farkas	Note Added: 0005682
2022-02-18 15:07	andras_farkas	Note Added: 0005683
2022-02-18 19:20	kre	Note Added: 0005687
2022-02-18 19:27	kre	Note Added: 0005688
2022-03-25 17:08	agadmin	Interp Status	Pending => Proposed
2022-03-25 17:08	agadmin	Note Added: 0005764
2022-04-26 11:08	geoffclare	Note Added: 0005821
2022-06-20 15:02	kre	Note Added: 0005854
2022-06-20 15:17	kre	Note Edited: 0005854
2022-06-20 16:57	kre	Note Edited: 0005854
2022-06-20 17:00	kre	Note Edited: 0005854
2022-06-21 08:54	geoffclare	Relationship added	related to 0001563
2022-06-21 09:16	geoffclare	Note Added: 0005857
2022-06-21 09:18	geoffclare	Note Edited: 0005675
2022-06-21 09:20	geoffclare	Note Edited: 0005857
2022-06-22 07:55	geoffclare	Note Edited: 0005857
2022-06-23 15:29	geoffclare	Interp Status	Proposed => Pending
2022-06-23 15:29	geoffclare	Final Accepted Text	Note: 0005675 => Note: 0005857
2022-06-23 15:30	geoffclare	Note Edited: 0005857
2022-06-23 15:30	geoffclare	Note Edited: 0005857
2022-06-24 09:16	agadmin	Note Added: 0005869
2022-06-24 09:31	agadmin	Interp Status	Pending => Proposed
2022-07-26 11:15	agadmin	Interp Status	Proposed => Approved
2022-07-26 11:15	agadmin	Note Added: 0005914
2022-08-19 15:04	geoffclare	Status	Interpretation Required => Applied
2024-06-11 09:07	agadmin	Status	Applied => Closed

Aardvark Mark IV