Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001464 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Editorial Enhancement Request 2021-04-05 13:20 2021-11-18 16:49
Reporter mohd_akram View Status public  
Assigned To
Priority normal Resolution Rejected  
Status Closed  
Name Mohamed Akram
Organization
User Reference
Section grep
Page Number 2842-2843
Line Number 93588,93590,93592,93642
Interp Status ---
Final Accepted Text
Summary 0001464: grep(1): add -o option
Description The -o option makes grep print only the matching parts of a line. It's widely available across systems.
Desired Action On page 2842, line 93588, change

  grep [−E|−F] [−c|−l|−q] [−insvx] −e pattern_list

into

  grep [−E|−F] [−c|−l|-o|−q] [−insvx] −e pattern_list

On page 2842, line 93590, change

  grep [−E|−F] [−c|−l|−q] [−insvx] [−e pattern_list]...

into

  grep [−E|−F] [−c|−l|-o|−q] [−insvx] [−e pattern_list]...

On page 2842, line 93592, change

  grep [−E|−F] [−c|−l|−q] [−insvx] pattern_list [file...]

into

  grep [−E|−F] [−c|−l|-o|−q] [−insvx] pattern_list [file...]

On page 2843, insert after line 93641

  -o Print only the matching parts of selected lines.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0005301)
kre (reporter)
2021-04-05 15:14

I have no problem in principle with adding the option, but even though
the typical man page for it says exactly what has been suggested here,
that's not nearly adequate to describe how the option works, or should work.

In particular, if I do:

     echo 'abc
     xxx yzpq
     abc def abc
     abcabcabc def abc' | grep abc

then I get the 3 lines that contain abc. If I add -o, on the implementation
I use anyway (no idea if this is as intended, or a bug) I get 7 lines of
output, one for each instance of 'abc' in the input stream. The 7 lines
each contain exactly 'abc'.

If I combine -c with -o, the answer is 3, just as if the -o was not there.

If the answer had been 7 things would have been more consistent, and the
-o would have been useful as a way to count how many times the pattern
appears in the file (but would probably need specification of what happens
with patterns that overlap ... it probably needs that anyway).

If I combine -n with -o ... well this one has to be seen to be believed.

     echo 'abc
     xxx yzpq
     abc def abc
     abcabcabc def abc' | grep -no abc
     1:abc
     3:abc
     abc
     4:abc
     abc
     abc
     abc

Is that what is supposed to happen? I have no idea, but what I do know
is that "-o Print only the matching parts of selected lines." is not nearly
a good enough description of what is happening. But nor is it good enough to
describe what should happen, if this was not it.

I didn't bother testing it, but I'm not sure that I could even guess what
combining -v and -o might do ... what does it mean to print only the
matching parts of lines that do not match?

Presumably -o would be a no-op when used with -q.

Personally, I'd send this option back to the drawing board, and let whoever
initially invented it do a better job of specifying it, at the very least.
As it is now (at least the implementation I use) it looks very much like
a hack invented to solve some particular problem (which would probably have
been better handled by sed) with essentially no thought given to how it
interacts with the rest of what grep does.
(0005302)
stephane (reporter)
2021-04-05 17:08

Also note that with the GNU implementations (where the option comes from), only the non-empty matches are output (one per line). So you can be in a situation where nothing is output, but grep returns success as some lines matched.

With the ast-open implementation (which is the grep builtin of ksh93 when built as part of ast-open), empty matches are output, but then you enter an infinite loop as the same match is found again and again.

Try:

echo foo | grep -o '^'


for instance. In that implementation,

$ echo foo | grep -o '^.'
f
o
o


As after the first match, "^." is matched against what follows on the line, so "oo"

Some sed implementations have the same kind of issue with s/^./x/g or s/\<./x/g
(0005303)
stephane (reporter)
2021-04-05 17:15

pcregrep has -o1, -o2 and --om-separator which is useful with things like:

grep -Eo1 -o2 '^(....).*( error: .*)'


for instance.

Though that's the thing you'd use "sed" for portably:

sed -nE 's/^(...).*( error: )/\1\2/p'


(except that few sed implementations support perl-like regexps; you can do the same with "perl -n" but that's outside the scope of POSIX).

IMO, grep -o is useful especially when combined with -P on the command line as a quick way to extract information, but is otherwise mostly redundant with sed/perl.
(0005304)
geoffclare (manager)
2021-04-06 14:23

Another difference between BSD and GNU:

BSD (macOS)
$ echo abc | grep -o -e ab -e bc
ab
$ echo abc | grep -o -e bc -e ab
bc

GNU
$ echo abc | grep -o -e ab -e bc
ab
$ echo abc | grep -o -e bc -e ab
ab
(0005305)
stephane (reporter)
2021-04-06 16:29

Re: Note: 0005304

well spotted. However note that one can't just say BSD to refer to the behaviour on any system in the BSD family when it comes to grep at least.

For a long time, grep on most BSDs was GNU grep, and AFAIK it still is on FreeBSD and NetBSD (though based on a very old version). IIRC macos relatively recently switched to a different implementation (which is why you can't get access to the source code any longer).

I don't have access to a macos machine, but the behaviour you're describing matches that of FreeBSD and NetBSD based on GNU grep 2.5.1, so it's likely the behaviour one could find in older versions of GNU grep.

On OpenBSD, which has a different implementation (but likely also tried to preserve some backward compatibility to the GNU grep API), I get:
$ grep --version
grep version 0.9
$ echo abc | grep -o -e ab -e bc
ab
bc
(0005306)
stephane (reporter)
2021-04-06 16:55

Re: Note: 0005301

Out of curiosity, I tried
echo x | grep -vno y


I find that:

- in the ancient GNU grep 2.5.1 as found on FreeBSD or NetBSD (also verified with a build of that old version on GNU/Linux), it outputs 1: without trailing newline

- in recent GNU grep and in OpenBSD grep it outputs nothing and returns success (which sounds like the most reasonable behaviour to me)

- in ast-open grep, it outputs 1:x

- in busybox grep, it segfaults
(0005307)
enh (reporter)
2021-04-06 17:04

fwiw, toybox (and thus Android) agrees with ast-open for that one:

/tmp/toybox$ echo x | ./toybox grep -vno y
1:x

it matches GNU for the '^'/'^.' and ab/bc examples.

why do you think that "output nothing and return success" makes sense? i wasn't responsible for the toybox behavior, but it's actually doing what i'd have guessed -vno would mean here. (and i don't understand the GNU behavior at all, especially with input like yxya where toybox's 1:x 1:a is again what i'd expect, but GNU outputs nothing at all!)
(0005308)
kre (reporter)
2021-04-06 22:39
edited on: 2021-04-06 22:54

The default NetBSD grep is indeed GNU grep 2.5.1a (with a local change, I
did not check what that is).

But we can also build a different grep, it is a choice when building the
system, It claims to bs 2.5.1 (that's suspicious) from FreeBSD.
(It is much smaller than the normal one, which is ~100KB text,
whereas this one is just ~25K text).

The small one includes
    * Copyright (c) 1999 James Howard and Dag-Erling Co\xc3\xafdan Sm\xc3\xb8rgrav
    * Copyright (C) 2008-2009 Gabor Kovesdan <gabor@FreeBSD.org>
It has a standard BSD licence (no GPL).

That one is much the same as shown for BSD grep above, except the
     echo x | grep -vno y
test prints 1:x (with the newline!)

and
     echo foo | grep -o '^.'

prints just one line 'f' (with the newline). No additional 'o' lines
(the default NetBSD grep - the one shipped with binary distributions, does
print the 'o' lines).

The case without the '.'
     echo foo | grep -o '^'
prints a single empty line (the default NetBSD grep prints nothing,
and yes, with an exit status of 0). With -n added, both versions
include 1: before what they otherwise (do or don't) print. That is
the small one prints "1:\n" and the big (GNU) one "1:".

For the grep -vno case, as I said in my original note, I can't guess what
might be correct, I can understand the 1:x output (that's taking the lines
that did not match (-v) and outputting what did not match from that line
(ie: -v inverts the sense of -o, which makes some sense). The case of
printing nothing also makes sense, that is taking the lines which did not
match, and printing what did match on those lines (ie: nothing). Apart from
the missing newline, the version that just prints 1: also makes some sense,
just the same as the one that prints nothing, except this one is obeying
the -n flag, even in that case. None of those is obviously right, or
obviously wrong.

The specification however is wholly inadequate. This request should be
withdrawn, until we get some common (documented) view of what -o should do,
and the implementations have a chyance to converge on that.

(0005309)
mohd_akram (reporter)
2021-04-08 19:15

Toybox's behavior with regards to -vo is definitely the most useful, although it requires a change to how -v is described. GNU is probably the most accurate with respect to the current wording. I'm not sure how useful -vo is practically in the first place, but IMO the choice should be between toybox or "unspecified", since the other implementations either print nothing (useless) or have the same output as -v (also useless).

With regards to -n, GNU is both the most accurate with respect to the current wording ("precede each output line by its relative line number"), and the most useful, since the output can be more easily processed.

For '^.', it should only print the first character - I think this one is the least controversial. GNU is also correct in printing an empty line for empty matches.

In summary, I would say both Toybox and GNU's behavior satisfy the spec as it currently stands. At minimum, the other implementations need to fix their behavior with '^' (should only match line start), empty matches (should print a newline), and with -n (should prefix every output line with the line number).
(0005528)
Don Cragun (manager)
2021-11-18 16:49

This bug was discussed during the 2021-22-18 conference call and rejected due to the differences in behavior between systems. We would welcome another bug to standardize the simple case of the -o option that has consistent behavior between systems.

- Issue History
Date Modified Username Field Change
2021-04-05 13:20 mohd_akram New Issue
2021-04-05 13:20 mohd_akram Name => Mohamed Akram
2021-04-05 13:20 mohd_akram Section => grep
2021-04-05 13:20 mohd_akram Page Number => 2842-2843
2021-04-05 13:20 mohd_akram Line Number => 93588,93590,93592,93642
2021-04-05 15:14 kre Note Added: 0005301
2021-04-05 17:08 stephane Note Added: 0005302
2021-04-05 17:15 stephane Note Added: 0005303
2021-04-06 14:23 geoffclare Note Added: 0005304
2021-04-06 16:29 stephane Note Added: 0005305
2021-04-06 16:55 stephane Note Added: 0005306
2021-04-06 17:04 enh Note Added: 0005307
2021-04-06 22:39 kre Note Added: 0005308
2021-04-06 22:54 kre Note Edited: 0005308
2021-04-08 19:15 mohd_akram Note Added: 0005309
2021-11-18 16:49 Don Cragun Interp Status => ---
2021-11-18 16:49 Don Cragun Note Added: 0005528
2021-11-18 16:49 Don Cragun Status New => Closed
2021-11-18 16:49 Don Cragun Resolution Open => Rejected
2021-11-18 19:26 mohd_akram Issue Monitored: mohd_akram


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker