CONTENTS

Chapter 13. Searching Through Files

13.1 Different Versions of grep

Summary Box

grep is one of Unix's most useful tools. As a result, all users seem to want their own, slightly different version that solves a different piece of the problem. (Maybe this is a problem in itself; there really should be only one grep, as the manpage says.) Three versions of grep come with every Unix system; in addition, there are six or seven freely available versions that we'll mention here, as well as probably dozens of others that you can find kicking around the Net.

Here are the different versions of grep and what they offer. We'll start with the standard versions:

Plain old grep

Great for searching with regular expressions (Section 13.2).

Extended grep (or egrep)

Handles extended regular expressions. It is also, arguably, the fastest of the standard greps (Section 13.4).

Fixed grep (or fgrep)

So named because it matches fixed strings. It is sometimes inaccurately called "fast grep"; often it is really the slowest of them all. It is useful to search for patterns with literal backslashes, asterisks, and so on that you'd otherwise have to escape somehow. fgrep has the interesting ability to search for multiple strings (Section 13.5).

Of course, on many modern Unixes all three are the same executable, just with slightly different behaviors, and so you may not see dramatic speed differences between them. Now for the freeware versions:

agrep, or "approximate grep"

A tool that finds lines that "more or less" match your search string. A very interesting and useful tool, it's part of the glimpse package, which is an indexing and query system for fast searching of huge amounts of text. agrep is introduced in Section 13.6.

Very fast versions of grep, such as GNU grep/egrep/fgrep

Most free Unixes use GNU grep as their main grep.

rcsgrep

Searches through RCS files (Section 39.5) (Section 13.7).

In addition, you can simulate the action of grep with sed, awk, and perl. These utilities allow you to write such variations as a grep that searches for a pattern that can be split across several lines (Section 13.9) and other context grep programs (Section 41.12), which show you a few lines before and after the text you find. (Normal greps just show the lines that match.)

— ML

13.2 Searching for Text with grep

There are many well-known benefits provided by grep to the user who doesn't remember what his files contain. Even users of non-Unix systems wish they had a utility with its power to search through a set of files for an arbitrary text pattern (known as a regular expression).

The main function of grep is to look for strings matching a regular expression and print only the lines found. Use grep when you want to look at how a particular word is used in one or more files. For example, here's how to list the lines in the file ch04 that contain either run-time or run time:

".." Section 27.12

$ grep "run[- ]time" ch04
This procedure avoids run-time errors for not-assigned
and a run-time error message is produced.
run-time error message is produced.
program aborts and a run-time error message is produced.
DIMENSION statement in BASIC is executable at run time.
This means that arrays can be redimensioned at run time.
accessible or not open, the program aborts and a run-time

Another use might be to look for a specific HTML tag in a file. The following command will list top-level (<H1> or <h1>) and second-level (<H2> or <h2>) headings that have the starting tag at the beginning (^) of the line:

$ grep "^<[Hh][12]>" ch0[12].html
ch01.html:<h1>Introduction</h1>
ch01.html:<h1>Windows, Screens, and Images</h1>
ch01.html:<h2>The Standard Screen-stdscr</h2>
ch01.html:<h2>Adding Characters</h2>
ch02.html:<H1>Introduction</H1>
ch02.html:<H1>What Is Terminal Independence?</H1>
ch02.html:<H2>Termcap</H2>
ch02.html:<H2>Terminfo</H2>

In effect, it produces a quick outline of the contents of these files.

grep is also often used as a filter (Section 1.5), to select from the output of some other program. For example, you might want to find the process id of your inetd, if you just changed the configuration file and need to HUP inetd to make it reread the configuration file. Using ps ( Section 24.5) and grep together allows you to do this without wading through a bunch of lines of output:

% ps -aux | grep inetd
root     321  0.0  0.2  1088  548  ??  Is   12Nov01   0:08.93 inetd -wW
deb    40033  0.0  0.2  1056  556  p5  S+   12:55PM   0:00.00 grep inetd
% kill -HUP 321

There are several options commonly used with grep. The -i option specifies that the search ignore the distinction between upper- and lowercase. The -c option tells grep to return only a count of the number of lines matched. The -w option searches for the pattern "as a word." For example, grep if would match words like cliff or knife, but grep -w if wouldn't. The -l option returns only the name of the file when grep finds a match. This can be used to prepare a list of files for another command. The -v option (Section 13.3) reverses the normal action, and only prints out lines that don't match the search pattern. In the previous example, you can use the -v option to get rid of the extra line of output:

% ps -aux | grep inetd | grep -v grep
root     321  0.0  0.2  1088  548  ??  Is   12Nov01   0:08.93 inetd -wW
% kill -HUP 321

— DD

13.3 Finding Text That Doesn't Match

The grep programs have one very handy feature: they can select lines that don't match a pattern just as they can select the lines that do. Simply use the -v option.

I used this most recently when working on this book. We have thousands of separate files under RCS (Section 39.5), and I sometimes forget which ones I've got checked out. Since there's a lot of clutter in the directory and several people working there, a simple ls won't do. There are a series of temporary files created by some of our printing scripts that I don't want to see. All of their filenames consist of one or more x characters: nothing else. So I use a findpt alias to list only the files belonging to me. It's a version of the find. alias described in Section 9.26, with -user tim added to select only my own files and a grep pattern to exclude the temporary files. My findpt alias executes the following command line:

find. | grep -v '^\./xx*$'

The leading ./ matches the start of each line of find. output, and xx* matches one x followed by zero or more xs. I couldn't use the find operators ! -name in that case because -name uses shell-like wildcard patterns, and there's no way to say "one or more of the preceding character" (in this case, the character x) with shell wildcards.

Obviously, that's as specific and nonreproducible an example as you're likely to find anywhere! But it's precisely these kinds of special cases that call for a rich vocabulary of tips and tricks. You'll never have to use grep -v for this particular purpose, but you'll find a use for it someday.

[Note that you could use a slightly simpler regular expression by using egrep (Section 13.4), which supports the plus (+) operator to mean "one or more," instead of having to use the basic regular expression character character zero-or-more (xx*). The previous regular expression would then become:

find. | egrep -v '^\./x+$'

The richer regular expression language is the primary advantage of egrep. — DJPH]

— TOR

13.4 Extended Searching for Text with egrep

The egrep command is yet another version of grep (Section 13.2), one that extends the syntax of regular expressions. (Versions where grep and egrep are the same allow you to get egrep-like behavior from grep by using the -E option.) A plus sign (+) following a regular expression matches one or more occurrences of the regular expression; a question mark (?) matches zero or one occurrences. In addition, regular expressions can be nested within parentheses:

% egrep "Lab(oratorie)?s" name.list
AT&T Bell Laboratories
AT&T Bell Labs

Symtel Labs of Chicago

Parentheses surround a second regular expression and ? modifies this expression. The nesting helps to eliminate unwanted matches; for instance, the word Labors or oratories would not be matched.

Another special feature of egrep is the vertical bar (|), which serves as an or operator between two expressions. Lines matching either expression are printed, as in the next example:

% egrep "stdscr|curscr" ch03
into the stdscr, a character array.
When stdscr is refreshed, the
stdscr is refreshed.
curscr.
initscr( ) creates two windows: stdscr
and curscr.

Remember to put the expression inside quotation marks to protect the vertical bar from being interpreted by the shell as a pipe symbol. Look at the next example:

% egrep "Alcuin (User|Programmer)('s)? Guide" docguide
Alcuin Programmer's Guide is a thorough
refer to the Alcuin User Guide
Alcuin User's Guide introduces new users to

You can see the flexibility that egrep's syntax can give you, matching either User or Programmer and matching them regardless of whether they had an 's.

Both egrep and fgrep can read search patterns from a file using the -f option (Section 13.5).

— DJPD

13.5 grepping for a List of Patterns

egrep (Section 13.4) lets you look for multiple patterns using its grouping and alternation operators (big words for parentheses and a vertical bar). But sometimes, even that isn't enough.

Both egrep and fgrep support a -f option, which allows you to save a list of patterns (fixed strings in the case of fgrep) in a file, one pattern per line, and search for all the items in the list with a single invocation of the program. For example, in writing this book, we've used this feature to check for consistent usage in a list of terms across all articles:

% egrep -f terms *

(To be more accurate, we used rcsegrep (Section 13.7), since the articles are all kept under RCS (Section 39.5), but you get the idea.)

— TOR

13.6 Approximate grep: agrep

agrep is one of the nicer additions to the grep family. It's not only one of the faster greps around; it also has the unique feature of looking for approximate matches. It's also record oriented rather than line oriented. The three most significant features of agrep that are not supported by the grep family are as follows:

  1. The ability to search for approximate patterns, with a user-definable level of accuracy. For example:

    % agrep -2 homogenos foo

    will find "homogeneous," as well as any other word that can be obtained from "homogenos" with at most two substitutions, insertions, or deletions.

    % agrep -B homogenos foo

    will generate a message of the form:

    best match has 2 errors, there are 5 matches, output them? (y/n)
  2. agrep is record oriented rather than just line oriented; a record is by default a line, but it can be user-defined with the -d option specifying a pattern that will be used as a record delimiter. For example:

    % agrep -d '^From ' 'pizza' mbox

    outputs all mail messages (Section 1.21) (delimited by a line beginning with From and a space) in the file mbox that contain the keyword pizza. Another example:

    % agrep -d '$$'  pattern  foo 

    will output all paragraphs (separated by an empty line) that contain pattern.

  3. agrep allows multiple patterns with AND (or OR) logic queries. For example:

    % agrep -d '^From ' 'burger,pizza' mbox

    outputs all mail messages containing at least one of the two keywords (, stands for OR).

    % agrep -d '^From ' 'good;pizza' mbox

    outputs all mail messages containing both keywords.

Putting these options together, one can write queries such as the following:

% agrep -d '$$' -2 '<CACM>; TheAuthor ;Curriculum;<198[5-9]>' bib 

which outputs all paragraphs referencing articles in CACM between 1985 and 1989 by TheAuthor dealing with Curriculum. Two errors are allowed, but they cannot be in either CACM or the year. (The < > brackets forbid errors in the pattern between them.)

Other agrep features include searching for regular expressions (with or without errors), using unlimited wildcards, limiting the errors to only insertions or only substitutions or any combination, allowing each deletion, for example, to be counted as two substitutions or three insertions, restricting parts of the query to be exact and parts to be approximate, and many more.

—JP, SW, and UM

13.7 Search RCS Files with rcsgrep

Storing multiple versions of a file in RCS (Section 39.5) saves space. How can you search a lot of those files at once? You could check out all the files, then run grep — but you'll have to remove the files after you're done searching. Or, you could search the RCS files themselves with a command like grep foo RCS/*,v — but that can show you garbage lines from previous revisions, log messages, and other text that isn't in the latest revision of your file. This article has two ways to solve that problem.

13.7.1 rcsgrep, rcsegrep, rcsfgrep

The rcsgrep script — and two links to it named rcsegrep and rcsfgrep — run grep , egrep (Section 13.4), and fgrep on all files in the RCS directory. (You can also choose the files to search.)

The script tests its name to decide whether to act like grep, egrep, or fgrep. Then it checks out each file and pipes it to the version of grep you chose. The output looks just like grep's — although, by default, you'll also see the messages from the co command (the -s option silences those messages).

By default, rcsgrep searches the latest revision of every file. With the -a option, rcsgrep will search all revisions of every file, from first to last. This is very handy when you're trying to see what was changed in a particular place and to find which revision(s) have some text that was deleted some time ago. (rcsgrep uses rcsrevs (Section 39.6) to implement -a.)

Some grep options need special handling to work right in the script: -e, -f, and -l. (For instance, -e and -f have an argument after them. The script has to pass both the option and its argument.) The script passes any other options you type to the grep command. Your grep versions may have some other options that need special handling, too. Just edit the script to handle them.

13.7.2 rcsegrep.fast

To search an RCS file, rcsgrep and its cousins run several Unix processes: co, grep, sed, and others. Each process takes time to start and run. If your directory has hundreds of RCS files (like our directory for this book does), searching the whole thing can take a lot of time. I could have cut the number of processes by rewriting rcsgrep in Perl; Perl has the functionality of grep, sed, and others built in, so all it would need to do is run hundreds of co processes . . . which would still make it too slow.

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: rcsegrep.fast

The solution I came up with was to do everything in (basically) one process: a gawk (Section 20.11) script. Instead of using the RCS co command to extract each file's latest revision, the rcsegrep.fast script reads each RCS file directly (The rcsfile(5) manpage explains the format of an RCS file.) An RCS file contains the latest revision of its working file as plain text, with one difference: each @ character is changed to @@. rcsegrep.fast searches the RCS file until it finds the beginning of the latest revision. Then it applies an egrep-like regular expression to each line. Matching lines are written to standard output with the filename first; the -n option gives a line number after the filename.

rcsegrep.fast is sort of a kludge because it's accessing RCS files without using RCS tools. There's a chance that it won't work on some versions of RCS or that I've made some other programming goof. But it's worked very well for us. It's much faster than rcsgrep and friends. I'd recommend using rcsegrep.fast when you need to search the latest revisions of a lot of RCS files; otherwise, stick to the rcsgreps.

— JP

13.8 GNU Context greps

By default, standard grep utilities show only the lines of text that match the search pattern. Sometimes, though, you need to see the matching line's context: the lines before or after the matching line. The GNU greps (grep, fgrep, and egrep) can do this. There are three context grep options:

Each set of contiguous matching lines is separated by a line of two dashes (--).

Let's look at an example: I'd like to search my system mail log for all messages sent to anyone at oreilly.com. But sendmail doesn't put all information about a message on the to= log line; some info is in the from= line, which is usually the previous line. So I'll search for all "to" lines and add one line of context before each match. I'll also use the -n, which numbers the output lines, to make the context easier to see. This option also puts marker characters after the line number: a line number ends with a colon (:) if this line contains a match, and a dash (-) marks lines before or after a match. Here goes:

# grep -n -B 1 'to=<[^@]*@oreilly\.com>' maillog
7-Nov 12 18:57:42 jpeek sendmail[30148]: SAA30148: from=<jpeek@jpeek.com>...
8:Nov 12 18:57:43 jpeek sendmail[30150]: SAA30148: to=<al@oreilly.com>...
9-Nov 12 22:49:51 jpeek sendmail[1901]: WAA01901: from=<jpeek@jpeek.com>...
10:Nov 12 22:49:51 jpeek sendmail[1901]: WAA01901: to=<wfurby@oreilly.com>...
11:Nov 12 22:50:23 jpeek sendmail[2000]: WAA01901: to=<wfurby@oreilly.com>...
--
25-Nov 13 07:42:38 jpeek sendmail[9408]: HAA09408: from=<jpeek@jpeek.com>...
26:Nov 13 07:42:44 jpeek sendmail[9410]: HAA09408: to=<al@oreilly.com>...
27-Nov 13 08:08:36 jpeek sendmail[10004]: IAA10004: from=<jpeek@jpeek.com>...
28:Nov 13 08:08:37 jpeek sendmail[10006]: IAA10004: to=<wfurby@oreilly.com>...
--
32-Nov 13 11:59:46 jpeek sendmail[14473]: LAA14473: from=<jpeek@jpeek.com>...
33:Nov 13 11:59:47 jpeek sendmail[14475]: LAA14473: to=<al@oreilly.com>...
34-Nov 13 15:34:17 jpeek sendmail[18272]: PAA18272: from=<jpeek@jpeek.com>...
35:Nov 13 15:34:19 jpeek sendmail[18274]: PAA18272: to=<al@oreilly.com>...

I've truncated each line for printing, but you still can see the matches. A few notes about what's happening here:

— JP

13.9 A Multiline Context grep Using sed

[One weakness of the grep family of programs is that they are line oriented. They read only one line at a time, so they can't find patterns (such as phrases) that are split across two lines. agrep (Section 13.6) can do multiline searches. One advantage of the cgrep script is that it shows how to handle multiple-line patterns in sed and can be adapted for work other than searches. — JP]

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: cgrep

It may surprise you to learn that a fairly decent context grep (Section 13.8) program can be built using sed. As an example, the following command line:

$ cgrep -10 system main.c

will find all lines containing the word system in the file main.c and show ten additional lines of context above and below each match. (The -context option must be at least one, and it defaults to two lines.) If several matches occur within the same context, the lines are printed as one large "hunk" rather than repeated smaller hunks. Each new block of context is preceded by the line number of the first occurrence in that hunk. This script, which can also search for patterns that span lines:

$ cgrep -3 "awk.*perl"

will find all occurrences of the word "awk" where it is followed by the word "perl" somewhere within the next three lines. The pattern can be any simple regular expression, with one notable exception: because you can match across lines, you should use \n in place of the ^ and $ metacharacters.

[While this is a wonderful example of some neat sed techniques, if this is all you're trying to do, use perl. It has features designed to do exactly this sort of thing very efficiently, and it will be much faster. — DH]

— GU

13.10 Compound Searches

You may recall that you can search for lines containing "this" or "that" using the egrep (Section 13.4) | metacharacter:

egrep 'this|that' files

But how do you grep for "this" and "that"? Conventional regular expressions don't support an and operator because it breaks the rule of patterns matching one consecutive string of text. Well, agrep (Section 13.6) is one version of grep that breaks all the rules. If you're lucky enough to have it installed, just use this:

agrep 'cat;dog;bird' files

If you don't have agrep, a common technique is to filter the text through several greps so that only lines containing all the keywords make it through the pipeline intact:

grep cat files | grep dog | grep bird

But can it be done in one command? The closest you can come with grep is this idea:

grep 'cat.*dog.*bird' files

which has two limitations — the words must appear in the given order, and they cannot overlap. (The first limitation can be overcome using egrep 'cat.*dog|dog.*cat', but this trick is not really scalable to more than two terms.)

As usual, the problem can also be solved by moving beyond the grep family to the more powerful tools. Here is how to do a line-by-line and search using sed, awk, or perl:[1]

sed '/cat/!d; /dog/!d; /bird/!d' files
awk '/cat/ && /dog/ && /bird/' files
perl -ne 'print if /cat/ && /dog/ && /bird/' files

Okay, but what if you want to find where all the words occur in the same paragraph? Just turn on paragraph mode by setting RS="" in awk or by giving the -00 option to perl:

awk '/cat/ && /dog/ && /bird/ {print $0 ORS}' RS= files
perl -n00e 'print "$_\n" if /cat/ && /dog/ && /bird/' files

And if you just want a list of the files that contain all the words anywhere in them? Well, perl can easily slurp in entire files if you have the memory and you use the -0 option to set the record separator to something that won't occur in the file (like NUL):

perl -ln0e 'print $ARGV if /cat/ && /dog/ && /bird/' files

(Notice that as the problem gets harder, the less powerful commands drop out.)

The grep filter technique shown earlier also works on this problem. Just add a -l option and the xargs command (Section 27.17) to make it pass filenames, rather than text lines, through the pipeline:

grep -l cat files | xargs grep -l dog | xargs grep -l bird

(xargs is basically the glue used when one program produces output needed by another program as command-line arguments.)

— GU

13.11 Narrowing a Search Quickly

If you're searching a long file to find a particular word or name, or you're running a program like ls -l and you want to filter some lines, here's a quick way to narrow down the search. As an example, say your phone file has 20,000 lines like these:

Smith, Nancy:MFG:50 Park Place:Huntsville:(205)234-5678

and you want to find someone named Nancy. When you see more information, you know you can find which of the Nancys she is:

% grep Nancy phones 
 ...150 lines of names...

Use the C shell's history mechanism (Section 30.2) and sed to cut out lines you don't want. For example, about a third of the Nancys are in Huntsville, and you know she doesn't work there:

% !! | sed -e /Huntsville/d 
grep Nancy phones | sed -e /Huntsville/d
 ...100 lines of names...

The shell shows the command it's executing: the previous command (!!) piped to sed, which deletes lines in the grep output that have the word Huntsville.

Okay. You know Nancy doesn't work in the MFG or SLS groups, so delete those lines, too:

% !! -e /MFG/d -e /SLS/d 
grep Nancy phones | sed -e /Huntsville/d -e /MFG/d -e /SLS/d
 ...20 lines of names...

Keep using !! to repeat the previous command line, and keep adding more sed expressions until the list gets short enough. The same thing works for other commands. When you're hunting for errors in a BSDish system log, for example, and you want to skip lines from named and sudo, use the following:

% cat /var/log/messages | sed -e /named/d -e /sudo/d
...

If the matching pattern has anything but letters and numbers in it, you'll have to understand shell quoting (Section 27.12) and sed regular expressions. Most times, though, this quick-and-dirty method works just fine.

[Yes, you can do the exact same thing with multiple grep -v (Section 13.3) commands, but using sed like this allows multiple matches with only one execution of sed. grep -v requires a new grep process for each condition. — DH]

— JP

13.12 Faking Case-Insensitive Searches

This may be the simplest tip in the book, but it's something that doesn't occur to lots of users.

Some versions of egrep don't support the -i option, which requests case-insensitive searches. I find that case-insensitive searches are absolutely essential, particularly to writers. You never know whether any particular word will be capitalized.

To fake a case-insensitive search with egrep, just eliminate any letters that might be uppercase. Instead of searching for Example, just search for xample. If the letter that might be capitalized occurs in the middle of a phrase, you can replace the missing letter with a "dot" (single character) wildcard, rather than omitting it.

Sure, you could do this the "right way" with a command like:

% egrep '[eE]xample' *

but our shortcut is easier.

This tip obviously isn't limited to egrep; it applies to any utility that only implements case-sensitive searches, like more.

— ML

13.13 Finding a Character in a Column

Here's an idea for finding lines that have a given character in a column. Use the following simple awk (Section 20.10) command:

% awk 'substr($0, n ,1) == " c "'  filename 

where c is the character you're searching for, and n is the column you care about.

Where would you do this? If you're processing a file with strict formatting, this might be useful; for example, you might have a telephone list with a # in column 2 for "audio" telephone numbers, $ for dialup modems, and % for fax machines. A script for looking up phone numbers might use an awk command like this to prevent you from mistakenly talking to a fax machine.

If your data has any TAB characters, the columns might not be where you expect. In that case, use expand on the file, then pipe it to awk.

—JP and ML

13.14 Fast Searches and Spelling Checks with "look"

Every so often, someone has designed a new, faster grep-type program. Public- domain software archives have more than a few of them. One of the fastest search programs has been around for years: look. It uses a binary search method that's very fast. But look won't solve all your problems: it works only on files that have been sorted (Section 22.1). If you have a big file or database that can be sorted, searching it with look will save a lot of time. For example, to search for all lines that start with Alpha:

% look Alpha  filename 
Alpha particle
Alphanumeric

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: look

The look program can also be used to check the spelling of a word or find a related word; see Section 16.3. If you don't have look installed on your system, you can get it from the Unix Power Tools web site.

— JP

13.15 Finding Words Inside Binary Files

If you try to read binaries on your screen with cat -v (Section 12.4), you'll see a lot of nonprintable characters. Buried in there somewhere, though, are words and strings of characters that might make some sense. For example, if the code is copyrighted, you can usually find that information in the binary. The pathnames of special files read by the program will probably show up. If you're trying to figure out which program printed an error message, use strings on the binaries and look for the error. Some versions of strings do a better job of getting just the useful information; others may write a lot of junk, too. But what the heck? — pipe the output to a pager (Section 12.3) or grep (Section 13.2), redirect it to a file, and ignore the stuff you don't want.

Here's a (shortened) example on FreeBSD:

% strings /usr/bin/write
/usr/libexec/ld-elf.so.1
FreeBSD
libc.so.4
strcpy
...
@(#) Copyright (c) 1989, 1993
        The Regents of the University of California.  All rights reserved.
$FreeBSD: src/usr.bin/write/write.c,v 1.12 1999/08/28 01:07:48 peter Exp $
can't find your tty
can't find your tty's name
you have write permission turned off
/dev/
%s is not logged in on %s
%s has messages disabled on %s
usage: write user [tty]
/var/run/utmp
utmp
%s is not logged in
%s has messages disabled
%s is logged in more than once; writing to %s
%s%s
Message from %s@%s on %s at %s ...

The eighth line ($FreeBSD: ... $) comes from RCS (Section 39.5) — you can see the version number, the date the code was last modified or released, and so on. The %s is a special pattern that the printf(3) function will replace with values like the username, hostname, and time.

By default, strings doesn't search all of a binary file: it only reads the initialized and loaded sections. The - (dash) option tells strings to search all of the file. Another useful option is -n, where n is the minimum-length string to print. Setting a higher limit will cut the "noise," but you might also lose what you're looking for.

The od command with its option -sn command does a similar thing: finds all null-terminated strings that are at least n characters long.

— JP

13.16 A Highlighting grep

Do you ever grep for a word, and when lines scroll down your screen, it's hard to find the word on each line? For example, suppose I'm looking for any mail messages I've saved that say anything about the perl programming language. But when I grep the file, most of it seems useless:

% grep perl ~/Mail/save
> and some of it wouldn't compile properly.  I wonder if
Subject: install script, for perl scripts
 perl itself is installed?
> run but dies with a read error because it isn't properly
> if I can get it installed properly on another machine I
> run but dies with a read error because it isn't properly
> if I can get it installed properly on another machine I

Well, as described on its own manual page, here's a program that's "trivial, but cute." hgrep runs a grep and highlights the string being searched for, to make it easier for us to find what we're looking for.

% hgrep perl ~/Mail/save 
> and some of it wouldn't compile properl y.  I wonder if
Subject: install script, for perl  scripts
 perl  itself is installed?
> run but dies with a read error because it isn't properl y
> if I can get it installed properl y on another machine I
> run but dies with a read error because it isn't properl y
> if I can get it installed properl y on another machine I

And now we know why the output looked useless: because most of it is! Luckily, hgrep is just a frontend; it simply passes all its arguments to grep. So hgrep necessarily accepts all of grep's options, and I can just use the -w option to pare the output down to what I want:

% hgrep -w perl ~/Mail/save 
Subject: install script, for perl  scripts
 perl  itself is installed?

The less (Section 12.3) pager also automatically highlights matched patterns as you search.

— LM

[1]  Some versions of nawk require an explicit $0~ in front of each pattern.

CONTENTS