CONTENTS

Chapter 33. Wildcards

33.1 File-Naming Wildcards

Wildcards (Section 1.13) are the shell's way of abbreviating filenames. Just as in poker, where a wildcard is a special card that can match any card in the deck, filename wildcards are capable of matching letters or groups of letters in the alphabet. Rather than typing a long filename or a long chain of filenames, a wildcard lets you provide parts of names and then use some "wildcard characters" for the rest. For example, if you want to delete all files whose names end in .o, you can give the following command:

% rm *.o

You don't have to list every filename.

I'm sure you already know that wildcards are useful in many situations. If not, they are summarized in Section 33.2. Here are a few of my favorite wildcard applications:

It's a common misconception, particularly among new users, that application programs and utilities have something to do with wildcards. Given a command like grep ident *.c, many users think that grep handles the * and looks to see which files have names that end in .c. If you're at all familiar with Unix's workings, you'll realize that this is the wrong picture. The shell interprets wildcards. That is, the shell figures out which files have names ending in .c, puts them in a list, puts that list on the command line, and then hands that command line to grep. As it processes the command line, the shell turns grep ident *.c into grep ident file1.c file2.c....

Since there are several shells, one might think (or fear!) that there should be several different sets of wildcards. Fortunately, there aren't. The basic wildcards work the same for all shells.

— ML

33.2 Filename Wildcards in a Nutshell

This article summarizes the wildcards that are used for filename expansion (see Table 33-1). The shells use the same basic wildcards, though most shells have some extensions. Unless otherwise noted, assume that wildcards are valid for all shells.

Table 33-1. Filename wildcards

Wildcard

Shells

Description

*

All

Match zero or more characters. For example, a* matches the files a, ab, abc, abc.d, and so on. (zsh users: also see x# and x##, below.)

?

All

Match exactly one character. For example, a? matches aa, ab, ac, etc.

[12..a..z]

All

Match any character listed in the brackets. For example, a[ab] matches aa or ab.

[a-z]

All

Match all characters between a and z, in a case-sensitive manner, based on the characters' value in the ASCII character set. For example, a[0-9] matches a0, a1, and so on, up to a9.

[!ab..z]

bash, ksh, zsh, newer sh

Match any character that does not appear within the brackets. For example, a[!0-9] doesn't match a0 but does match aa.

[^ab..z]

tcsh, zsh

Match any character that does not appear within the brackets. For example, a[^0-9] doesn't match a0, but does match aa.

<m-n>

zsh

Any number in the range m to n. If m is omitted, this matches numbers less than or equal to n. If n is omitted, it matches numbers greater than or equal to m. The pattern <-> matches all numbers.

{word1,word2...}

bash, csh, pdksh, zsh

Match word1, word2, etc. For example, a_{dog,cat,horse} matches the filenames a_dog, a_cat, and a_horse. These (Section 28.4) actually aren't filename-matching wildcards. They expand to all strings you specify, including filenames that don't exist yet, email addresses, and more. (If you want to match one or more of a group of filenames that already exist, see also the parenthesis operators ( ) below.)

?(x|y|z)

ksh, bash2

Match zero or one instance of any of the specified patterns. For example, w?(abc)w matches ww or wabcw. Also, ?(foo|bar) matches only foo, bar, and the empty string. In bash2, this works only if you've set the extglob option using shopt.

*(x|y|z)

ksh, bash2

Match zero or more instances of any of the specified patterns. For example, w*(abc)w matches ww, wabcw, wabcabcw, etc. Also, *(foo|bar) matches foo, bar, foobarfoo, etc., as well as the empty string. In bash2, this works only if you've set the extglob option using shopt.

+(x|y|z)

ksh, bash2

Match one or more instances of any of the specified patterns. For example, w+(abc)w matches wabcw, wabcabcw, etc. Also, +(foo|bar) matches foo, bar, foobarfoo, etc. In bash2, this works only if you've set the extglob option using shopt.

@(x|y|z)

ksh, bash2

Match exactly one of any of the specified patterns. For example, @(foo|bar) matches foo or bar. (See also {word1,word2...}.) In bash2, this works only if you've set the extglob option using shopt.

!(x|y|z)

ksh, bash2

Match anything that doesn't contain any of the specified patterns. For example, w!(abc)w doesn't match wabcw or wabcabcw, but it does match practically anything else that begins or ends with w. Also, !(foo|bar) matches all strings except foo and bar. In bash2, this works only if you've set the extglob option using shopt. (For other shells, see nom (Section 33.8).)

^pat

tcsh, zsh

Match any name that doesn't match pat. In zsh, this only works if you've set the EXTENDED_GLOB option. In tcsh, the pat must include at least one of the wildcards *, ? and [ ]. So, to match all except a single name in tcsh, here's a trick: put brackets around one character. For instance, you can match all except abc with ^ab[c]. (For other shells, see nom (Section 33.8).)

(x|y)

zsh

Match either x or y. The vertical bar (|) must be used inside parentheses.

**

zsh

Search recursively.

***

zsh

Search recursively, following symbolic links to directories.

x#

zsh

Matches zero or more occurrences of the pattern x (like the regular expresssion (Section 32.2) x*). The pattern can have parentheses ( ) around it. You must have set the EXTENDED_GLOB option.

x##

zsh

Matches one or more occurrences of the pattern x (like the regular expresssion (Section 32.15) x+). The pattern can have parentheses ( ) around it. You must have set the EXTENDED_GLOB option.

Note that wildcards do not match files whose names begin with a dot (.), like .cshrc. This prevents you from deleting (or otherwise mucking around with) these files by accident. The usual way to match those files is to type the dot literally. For example, .[a-z]* matches anything whose name starts with a dot and a lowercase letter. Watch out for plain .*, though; it matches the directory entries . and ... If you're constantly needing to match dot-files, though, you can set the bash variable glob_dot_filenames and the zsh option GLOB_DOTS to include dot-files' names in those shells' wildcard expansion.

You can prevent wildcard expansion by quoting ( Section 27.12, Section 27.13), of course. In the C shells, you can stop all wildcard expansion (which is also called globbing, by the way) without quoting if you set the noglob shell variable. In bash, ksh, and zsh, set the noglob option.

And a final note: many operating systems (VAX/VMS and DOS included) consider a file's name and extension to be different entities; therefore, you can't use a single wildcard to match both. What do we mean? Consider the file abc.def. Under DOS or VMS, to match this filename you'd need the wildcard expression *.*. The first * matches the name (the part before the period), and the second matches the extension (the part after the period). Although Unix uses extensions, they aren't considered a separate part of the filename, so a single * will match the entire name.

—JP, ML, and SJC

33.3 Who Handles Wildcards?

Wildcards (Section 1.13) are actually defined by the Unix shells, rather than the Unix filesystem. In theory, a new shell could define new wildcards, and consequently, we should discuss wildcarding when we discuss the shell. In practice, all Unix shells (including ksh, bash, and other variants (Section 1.6)) honor the same wildcard conventions, and we don't expect to see anyone change the rules. (But most new shells also have extended wildcards (Section 33.2). And different shells do different things when a wildcard doesn't match (Section 33.4).)

You may see different wildcarding if you have a special-purpose shell that emulates another operating system (for example, a shell that looks like the COMMAND.COM in MS-DOS) — in this case, your shell will obey the other operating system's wildcard rules. But even in this case, operating system designers stick to a reasonably similar set of wildcard rules.

The fact that the shell defines wildcards, rather than the filesystem itself or the program you're running, has some important implications for a few commands. Most of the time, a program never sees wildcards. For example, the result of typing:

% lpr *

is exactly the same as typing:

% lpr  file1 file2 file3 file4 file5 

In this case everything works as expected. But there are other situations in which wildcards don't work at all. Assume you want to read some files from a tape, which requires the command tar x (Section 38.6), so you type the command tar x *.txt. Will you be happy or disappointed?

You'll be disappointed — unless older versions of the files you want are already in your current directory (Section 1.16). The shell expands the wildcard *.txt, according to what's in the current directory, before it hands the completed command line over to tar for execution. All tar gets is a list of files. But you're probably not interested in the current directory; you probably want the wildcard * to be expanded on the tape, retrieving any *.txt files that the tape has.

There's a way to pass wildcards to programs, without having them interpreted by the shell. Simply put *.txt in quotes (Section 27.12). The quotes prevent the Unix shell from expanding the wildcard, passing it to the command unchanged. Programs that can be used in this way (like ssh and scp (Section 46.6)) know how to handle wildcards, obeying the same rules as the shell (in fact, these programs usually start a shell to interpret their arguments). You only need to make sure that the programs see the wildcards, that they aren't stripped by the shell before it passes the command line to the program. As a more general rule, you should be aware of when and why a wildcard gets expanded, and you should know how to make sure that wildcards are expanded at an appropriate time.

If your shell understands the {} characters (Section 28.4), you can use them because they can generate any string — not just filenames that already exist. You have to type the unique part of each name, but you only have to type the common part once. For example, to extract the files called project/wk9/summary, project/wk14/summary, and project/wk15/summary from a tar tape or file, you might use:

% tar xv project/wk{9,14,15}/summary
x project/wk9/summary, 3161 bytes, 7 tape blocks
x project/wk14/summary, 878 bytes, 2 tape blocks
x project/wk15/summary, 2268 bytes, 5 tape blocks

Some versions of tar understand wildcards, but many don't. There is a clever workaround (Section 38.10).

— ML

33.4 What if a Wildcard Doesn't Match?

I ran into a strange situation the other day. I was compiling a program that was core dumping. At some point, I decided to delete the object files and the core file, and start over, so I gave the command:

% rm *.o core

It works as expected most of the time, except when no object files exist. (I don't remember why I did this, but it was probably by using !! (Section 30.8) when I knew there weren't any .o's around.) In this case, you get No match, and the core file is not deleted.

It turns out, for C shell users, that if none of the wildcards can be expanded, you get a No match error. It doesn't matter that there's a perfectly good match for other name(s). That's because, when csh can't match a wildcard, it aborts and prints an error — it won't run the command. If you create one .o file or remove the *.o from the command line, core will disappear happily.

On the other hand, if the Bourne shell can't match a wildcard, it just passes the unmatched wildcard and other filenames:

*.o core

to the command (in this case, to rm) and lets the command decide what to do with it. So, with Bourne shell, what happens will depend on what your rm command does when it sees the literal characters *.o.

The Korn shell works like the Bourne shell.

You can make csh and tcsh act a lot like sh (and ksh) by setting the shell's nonomatch option. Without nonomatch set, the shell sees a nonmatching wildcard and never runs ls at all. Then I set nonomatch and the shell passes the unmatched wildcard on to ls, which prints its own error message:

% ls a*
ls: No match.
% set nonomatch
% ls a*
ls: a*: No such file or directory

In bash Version 1, the option allow_null_glob_expansion converts nonmatching wildcard patterns into the null string. Otherwise, the wildcard is left as is without expansion. Here's an example with echo (Section 27.5), which simply shows the arguments that it gets from the shell. In the directory where I'm running this example, there are no names starting with a, but there are two starting with s. In the first case below, allow_null_glob_expansion isn't set, so the shell passes the unmatched a* to echo. After setting allow_null_glob_expansion, the shell removes the unmatched a* before it passes the results to echo:

bash$ echo a* s*
a* sedscr subdir
bash$ allow_null_glob_expansion=1
bash$ echo a* s*
sedscr subdir

bash Version 2 leaves nonmatching wildcard patterns as they are unless you've set the shell's nullglob option (shopt -s nullglob). The nullglob option does the same thing that allow_null_glob_expansion=1 does in bash version 1.

zsh gives you all of those choices. See the options CSH_NULL_GLOB, NOMATCH and NULL_GLOB.

—ML and JP

33.5 Maybe You Shouldn't Use Wildcards in Pathnames

Suppose you're giving a command like the one below (not necessarily rm — this applies to any Unix command):

% rm /somedir/otherdir/*

Let's say that matches 100 files. The rm command gets 100 complete pathnames from the shell: /somedir/otherdir/afile, /somedir/otherdir/bfile, and so on. For each of these files, the Unix kernel has to start at the root directory, then search the somedir and otherdir directories before it finds the file to remove.

That can make a significant difference, especially if your disk is already busy. It's better to cd to the directory first and run the rm from there. You can do it in a subshell (with parentheses) (Section 43.7) if you want to, so you won't have to cd back to where you started:

&& Section 35.14

% (cd /somedir/otherdir && rm *)

There's one more benefit to this second way: you're not as likely to get the error Arguments too long. (Another way to handle long command lines is with the xargs (Section 28.17) command.)

— JP

33.6 Getting a List of Matching Files with grep -l

Normally when you run grep (Section 13.1) on a group of files, the output lists the filename along with the line containing the search pattern. Sometimes you want to know only the names of the files, and you don't care to know the line (or lines) that match. In this case, use the -l (lowercase letter "l") option to list only filenames where matches occur. For example, the following command:

% grep -l R6  file1 file2 ...  > r6.filelist 

searches the files for a line containing the string R6, produces a list of those filenames, and stores the list in r6.filelist. (This list might represent the files containing Release 6 documentation of a particular product.) Because these Release 6 files can now be referenced by one list, you can treat them as a single entity and run various commands on them all at once:

'...' Section 28.14

% lpr `cat r6.filelist`           Print only the Release 6 files
% grep UNIX `cat r6.filelist`     Search limited to the Release 5 files

You don't have to create a file list, though. You can insert the output of a grep directly into a command line with command substitution. For example, to edit only the subset of files containing R6, you would type:

% vi `grep -l R6  file1 file2 ... `

(Of course, you also could use a wildcard like file* instead of a list of filenames.)

grep -l is also good for shell programs that need to check whether a file contains a particular string. The traditional way to do that test is by throwing away grep's output and checking its exit status:

if grep something somefile >/dev/null
then ...

If somefile is huge, though, grep has to search all of it. Adding the grep -l option saves time because grep can stop searching after it finds the first matching line.

—DG and JP

33.7 Getting a List of Nonmatching Files

You can use the grep (Section 13.2) option -c to tell you how many occurrences of a pattern appear in a given file, so you can also use it to find files that don't contain a pattern (i.e., zero occurrences of the pattern). This is a handy technique to package into a shell script.

33.7.1 Using grep -c

Let's say you're indexing a DocBook (SGML) document and you want to make a list of files that don't yet contain indexing tags. What you need to find are files with zero occurrences of the string <indexterm>. (If your tags might be uppercase, you'll also want the -i option (Section 9.22).) The following command:

% grep -c "<indexterm>" chapter*

might produce the following output:

chapter1.sgm:10
chapter2.sgm:27
chapter3.sgm:19
chapter4.sgm:0
chapter5.sgm:39
   ...

This is all well and good, but suppose you need to check index entries in hundreds of reference pages. Well, just filter grep's output by piping it through another grep. The previous command can be modified as follows:

% grep -c "<indexterm>" chapter* | grep :0

This results in the following output:

chapter4.sgm:0

Using sed (Section 34.1) to truncate the :0, you can save the output as a list of files. For example, here's a trick for creating a list of files that don't contain index macros:

% grep -c "<indexterm>" * | sed -n 's/:0$//p' > ../not_indexed.list

The sed -n command prints only the lines that contain :0; it also strips the :0 from the output so that ../not_indexed.list contains a list of files, one per line. For a bit of extra safety, we've added a $ anchor (Section 32.5) to be sure sed matches only 0 at the end of a line — and not, say, in some bizarre filename that contains :0. (We've quoted (Section 27.12) the $ for safety — though it's not really necessary in most shells because $/ can't match shell variables.) The .. pathname (Section 1.16) puts the not_indexed.list file into the parent directory — this is one easy way to keep grep from searching that file, but it may not be worth the bother.

To edit all files that need index macros added, you could type this:

% vi `grep -c "<indexterm>" * | sed -n 's/:0$//p'`

This command is more obvious once you start using backquotes a lot.

33.7.2 The vgrep Script

You can put the grep -c technique into a little script named vgrep with a couple of safety features added:

"$@" Section 35.20

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: vgrep

#!/bin/sh
case $# in
0|1) echo "Usage: `basename $0` pattern file [files...]" 1>&2; exit 2 ;;
2)  # Given a single filename, grep returns a count with no colon or name.
    grep -c -e "$1" "$2" | sed -n "s|^0\$|$2|p"
    ;;
*)  # With more than one filename, grep returns "name:count" for each file.
    pat="$1"; shift
    grep -c -e "$pat" "$@" | sed -n "s|:0\$||p"
    ;;
esac

Now you can type, for example:

% vi `vgrep "<indexterm>" *`

One of the script's safety features works around a problem that happens if you pass grep just one filename. In that case, most versions of grep won't print the file's name, just the number of matches. So the first sed command substitutes a digit 0 with the filename.

The second safety feature is the grep -e option. It tells grep that the following argument is the search pattern, even if that pattern looks like an option because it starts with a dash (-). This lets you type commands like vgrep -0123 * to find files that don't contain the string -0123.

—DG and JP

33.8 nom: List Files That Don't Match a Wildcard

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: nom

The nom (no match) script takes filenames (usually expanded by the shell) from its command line. It outputs all filenames in the current directory that don't match. As Section 33.2 shows, some shells have an operator — ! or ^ — that works like nom, but other shells don't. Here are some examples of nom:

Here's the script:

trap Section 35.17, case Section 35.11, $* Section 35.20, comm Section 11.8

#! /bin/sh
temp=/tmp/NOM$$
stat=1     # Error exit status (set to 0 before normal exit)
trap 'rm -f $temp; exit $stat' 0 1 2 15

# Must have at least one argument, and all have to be in current directory:
case "$*" in
"") echo Usage: `basename $0` pattern 1>&2; exit ;;
*/*)    echo "`basename $0` quitting: I can't handle '/'s." 1>&2; exit ;;
esac

# ls gives sorted file list. -d=don't enter directories, -1=one name/line.
ls -d ${1+"$@"} > $temp   # Get filenames we don't want to match
ls -1 | comm -23 - $temp  # Compare to current dir; output names we want
stat=0

The -d option (Section 8.5) tells ls to list the names of any directories, not their contents. The ${1+"$@"} (Section 36.7) works around a problem in some Bourne shells. You can remove the -1 option on the script's ls command line if your version of ls lists one filename per line by default; almost all versions of ls do that when they're writing into a pipe. Note that nom doesn't know about files whose names begin with a dot (.); you can change that if you'd like by adding the ls -A option (uppercase letter "A", which isn't on all versions of ls).

Finally, if you've got a shell with process substitution, such as bash, which is what we use below, you can rewrite nom without the temporary file and the trap:

#!/bin/bash
# Must have at least one argument, and all have to be in current directory:
case "$*" in
"")  echo Usage: `basename $0` pattern 1>&2; exit ;;
*/*) echo "`basename $0` quitting: I can't handle '/'s." 1>&2; exit ;;
esac

# ls gives sorted file list. -d=don't enter directories, -1=one name/line.
# Compare current directory with names we don't want; output names we want:
comm -23 <(ls -1) <(ls -d "$@")

— JP

CONTENTS