CONTENTS

Chapter 9. Finding Files with find

9.1 How to Use find

The utility find is one of the most useful and important of the Unix utilities. It finds files that match a given set of parameters, ranging from the file's name to its modification date. In this chapter, we'll be looking at many of the things it can do. As an introduction, here's a quick summary of its features and basic operators:

% find  path operators 

where path is one or more directories in which find will begin to search and operators (or, in more customary jargon, options) tell find which files you're interested in. The operators are as follows:

-name filename

Find files with the given filename. This is the most commonly used operator. filename may include wildcards, but if it does, they must be quoted to prevent the shell from interpreting the wildcards.

-perm mode

Find files with the given access mode. You must give the access mode in octal.

-type c

Find the files of the given type, specified by c. c is a one-letter code; for example, f for a plain file, b for a block special file, l for a symbolic link, and so forth.

-user name

Find files belonging to user name. name may also be a user ID number.

-group name

Find files belonging to group name. name may also be a group ID number.

-size n

Find files that are n blocks long. A block usually equals 512 bytes. The notation +n says "find files that are over n blocks long." The notation nc says "find files that are n characters long." Can you guess what +nc means?

-inum n

Find files with the inode number n.

-atime n

Find files that were accessed n days ago. +n means "find files that were accessed over n days ago" (i.e., not accessed in the last n days). -n means "find files that were accessed less than n days ago" (i.e., accessed in the last n days).

-mtime n

Similar to -atime, except that it checks the time the file's contents were modified.

-ctime n

Similar to -atime, except that it checks the time the inode was last changed. "Changed" means that the file was modified or that one of its attributes (for example, its owner) was changed.

-newer file

Find files that have been modified more recently than file.

You might want to take some action on files that match several criteria. So we need some way to combine several operators:

operator1 -a operator2

Find files that match both operator1 and operator2. The -a isn't strictly necessary; when two search parameters are provided, one after the other, find assumes you want files that match both conditions.

operator1 -o operator2

Find files that match either operator1 or operator2.

! operator

Find all files that do not match the given operator. The ! performs a logical NOT operation.

\( expression \)

Logical precedence; in a complex expression, evaluate this part of the expression before the rest.

Another group of operators tells find what action to take when it locates a file:

-print

Print the file's name on standard output. On most modern finds, this is the default action if no action is given.

-ls

List the file's name on standard output with a format like ls -l. (Not on older versions.)

-exec command

Execute command. To include the pathname of the file that's just been found in command, use the special symbol {}. command must end with a backslash followed by a semicolon (\;). For example:

% find . -name "*.o" -exec rm -f {} \;

tells find to delete any files whose names end in .o.

-ok command

Same as -exec, except that find prompts you for permission before executing command. This is a useful way to test find commands.

A last word: find is one of the tools that vendors frequently fiddle with, adding (or deleting) a few operators that they like (or dislike). The GNU version, in particular, has many more. The operators listed here should be valid on virtually any system. If you check your manual page, you may find others.

— ML

9.2 Delving Through a Deep Directory Tree

The first, most obvious, use of this utility is find's ability to locate old, big, or unused files whose locations you've forgotten. In particular, find's most fundamentally important characteristic is its ability to travel down subdirectories.

Normally the shell provides the argument list to a command. That is, Unix programs are frequently given filenames and not directory names. Only a few programs can be given a directory name and march down the directory searching for subdirectories. The programs find, tar (Section 38.3), du, and diff do this. Some versions of chmod (Section 50.5), chgrp, ls, rm, and cp will, but only if a -r or -R option is specified.

In general, most commands do not understand directory structures and rely on the shell to expand wildcards to directory names. That is, to delete all files whose names end with a .o in a group of directories, you could type:

% rm *.o */*.o */*/*.o

Not only is this tedious to type, it may not find all of the files you are searching for. The shell has certain blind spots. It will not match files in directories whose names start with a dot. And, if any files match */*/*/*.o, they would not be deleted.

Another problem is typing the previous command and getting the error "Arguments too long." This means the shell would expand too many arguments from the wildcards you typed.

find is the answer to these problems.

A simple example of find is using it to print the names of all the files in the directory and all subdirectories. This is done with the simple command:

% find . -print

The first arguments to find are directory and file pathnames — in the example, a dot (.) is one name for the current directory. The arguments after the pathnames always start with a minus sign (-) and tell find what to do once it finds a file; these are the search operators. In this case, the filename is printed.

You can use the tilde (~), as well as particular paths. For example:

% find ~ ~barnett /usr/local -print

And if you have a very slow day, you can type:

% find / -print

This command will list every file on the system. This is okay on single-user workstations with their own disks. However, it can tie up disks on multiuser systems enough to make users think of gruesome crimes! If you really need that list and your system has fast find or locate, try the command find '/*' or locate ' *' instead.

find sends its output to standard output, so once you've "found" a list of filenames, you can pass them to other commands. One way to use this is with command substitution:

% ls -l `find . -print`

The find command is executed, and its output replaces the backquoted string. ls sees the output of find and doesn't even know find was used.

An alternate method uses the xargs command. xargs and find work together beautifully. xargs executes its arguments as commands and reads standard input to specify arguments to that command. xargs knows the maximum number of arguments each command line can handle and does not exceed that limit. While the command:

% ls -ld `find / -print`

might generate an error when the command line is too large, the equivalent command using xargs will never generate that error:

% find / -print | xargs ls -ld

—BB and JP

9.3 Don't Forget -print

"Why didn't find find my file?" I wondered sometimes. "I know it's there!"

More often than not, I'd forgotten to use -print. Without -print (or -ls, on versions of find that have it), find may not print any pathnames. For a long time, this quirk of find confused new users, so most modern versions of find will assume -print if you don't supply an action; some will give you an error message instead. If you don't get the output you expected from find, check to make sure that you specified the action you meant.

—JP and DJPH

9.4 Looking for Files with Particular Names

You can look for particular files by using an expression with wildcards (Section 28.3) as an argument to the -name operator. Because the shell also interprets wildcards, it is necessary to quote them so they are passed to find unchanged. Any kind of quoting can be used:

% find . -name \*.o -print
% find . -name '*.o' -print
% find . -name "[a-zA-Z]*.o" -print

Any directory along the path to the file is not matched with the -name operator, merely the name at the end of the path. For example, the previous commands would not match the pathname ./subdir.o/afile — but they would match ./subdir.o and ./src/subdir/prog.o.

Section 9.27 shows a way to match directories in the middle of a path. Here's a simpler "find file" alias that can come in very handy:

alias ff "find . -name '*\!{*}*' -ls"

Give it a file or directory name; the alias will give a long listing of any file or directory names that contain the argument. For example:

% ff ch09
2796156 4 -rw-r--r--  1 deb  deb  628 Feb  2 10:41 ./oreilly/UPT/book/ch09.sgm

—BB and JP

9.5 Searching for Old Files

If you want to find a file that is seven days old, use the -mtime operator:

% find . -mtime 7 -print

An alternate way is to specify a range of times:

% find . -mtime +6 -mtime -8 -print

mtime is the last modified time of a file. If you want to look for files that have not been used, check the access time with the -atime argument. Here is a command to list all files that have not been read in 30 days or more:

% find . -type f -atime +30 -print

It is difficult to find directories that have not been accessed because the find command modifies the directory's access time.

There is another time associated with each file, called the ctime, the inode change time. Access it with the -ctime operator. The ctime will have a more recent value if the owner, group, permission, or number of links has changed, while the file itself has not. If you want to search for files with a specific number of links, use the -links operator.

Section 8.2 has more information about these three times, and Section 9.7 explains how find checks them.

— BB

9.6 Be an Expert on find Search Operators

find is admittedly tricky. Once you get a handle on its abilities, you'll learn to appreciate its power. But before thinking about anything remotely tricky, let's look at a simple find command:

% find . -name "*.c" -print

The . tells find to start its search in the current directory (.) and to search all subdirectories of the current directory. The -name "*.c" tells find to find files whose names end in .c. The -print operator tells find how to handle what it finds, i.e., print the names on standard output.

All find commands, no matter how complicated, are really just variations on this one. You can specify many different names, look for old files, and so on; no matter how complex, you're really only specifying a starting point, some search parameters, and what to do with the files (or directories or links or . . . ) you find.

The key to using find in a more sophisticated way is realizing that search parameters are really "logical expressions" that find evaluates. That is, find:

So, -name "*.c" is really a logical expression that evaluates to true if the file's name ends in .c.

Once you've gotten used to thinking this way, it's easy to use the AND, OR, NOT, and grouping operators. So let's think about a more complicated find command. Let's look for files that end in .o or .tmp AND that are more than five days old, AND let's print their pathnames. We want an expression that evaluates true for files whose names match either *.o OR *.tmp:

-name "*.o" -o -name "*.tmp"

If either condition is true, we want to check the access time. So we put the previous expression within parentheses (quoted with backslashes so the shell doesn't treat the parentheses as subshell operators). We also add a -atime operator:

-atime +5 \( -name "*.o" -o -name "*.tmp" \)

The parentheses force find to evaluate what's inside as a unit. The expression is true if "the access time is more than five days ago and \( either the name ends with .o or the name ends with .tmp \)." If you didn't use parentheses, the expression would mean something different:

-atime +5 -name "*.o" -o -name "*.tmp"                 Wrong!

When find sees two operators next to each other with no -o between, that means AND. So the "wrong" expression is true if "either \( the access time is more than five days ago and the name ends with .o \) or the name ends with .tmp." This incorrect expression would be true for any name ending with .tmp, no matter how recently the file was accessed — the -atime doesn't apply. (There's nothing really "wrong" or illegal in this second expression — except that it's not what we want. find will accept the expression and do what we asked — it just won't do what we want.)

The following command, which is what we want, lists files in the current directory and subdirectories that match our criteria:

% find . -atime +5 \( -name "*.o" -o -name "*.tmp" \) -print

What if we wanted to list all files that do not match these criteria? All we want is the logical inverse of this expression. The NOT operator is an exclamation point (!). Like the parentheses, in most shells we need to escape ! with a backslash to keep the shell from interpreting it before find can get to it. The ! operator applies to the expression on its right. Since we want it to apply to the entire expression, and not just the -atime operator, we'll have to group everything from -atime to "*.tmp" within another set of parentheses:

% find . \! \( -atime +5 \( -name "*.o" -o -name "*.tmp" \) \) -print

For that matter, even -print is an expression; it always evaluates to true. So are -exec and -ok; they evaluate to true when the command they execute returns a zero status. (There are a few situations in which this can be used to good effect.)

But before you try anything too complicated, you need to realize one thing. find isn't as sophisticated as you might like it to be. You can't squeeze all the spaces out of expressions, as if it were a real programming language. You need spaces before and after operators like !, (, ), and {}, in addition to spaces before and after every other operator. Therefore, a command line like the following won't work:

% find . \!\(-atime +5 \(-name "*.o" -o -name "*.tmp"\)\) -print

A true power user will realize that find is relying on the shell to separate the command line into meaningful chunks, or tokens. And the shell, in turn, is assuming that tokens are separated by spaces. When the shell gives find a chunk of characters like *.tmp)) (without the double quotes or backslashes — the shell took them away), find gets confused; it thinks you're talking about a weird filename pattern that includes a couple of parentheses.

Once you start thinking about expressions, find's syntax ceases to be obscure — in some ways, it's even elegant. It certainly allows you to say what you need to say with reasonable efficiency.

—ML and JP

9.7 The Times That find Finds

The times that go with the find operators -mtime , -atime, and -ctime often aren't documented very well. The times are in days:

Got that? Then you should see that -atime -2 and -atime 1 are both true on files that have been accessed between 48 and 24 hours ago. (-atime -2 is also true on files accessed 24 hours ago or less.)

For more exact comparisons, use find -newer with touch Section 9.8).

— JP

9.8 Exact File-Time Comparisons

One problem with find's time operators (-atime and its brethren) is that they don't allow very exact comparisons. They only allow you to specify time to within a day, and sometimes that's just not good enough. You think that your system was corrupted at roughly 4 p.m. yesterday (March 20); you want to find any files that were modified after that point, so you can inspect them. Obviously, you'd like something more precise than "give me all the files that were modified in the last 24 hours."

Some versions of touch , and other freely available commands like it, can create a file with an arbitrary timestamp. That is, you can use touch to make a file that's backdated to any point in the past (or, for that matter, postdated to some point in the future). This feature, combined with find's -newer operator, lets you make comparisons accurate to one minute or less.

For example, to create a file dated 4 p.m., March 20, give the command:

% touch -t 03201600 /tmp/4PMyesterday

Then to find the files created after this, give the command:

% find . -newer /tmp/4PMyesterday -print

What about "older" files? Older files are "not newer" files, and find has a convenient NOT operator (!) for just this purpose. So let's say that you want to find files that were created between 10:46 a.m. on July 3, 1999 and 9:37 p.m. on June 4, 2001. You could use the following commands:[1]

% touch -t 199907031046 /tmp/file1
% touch -t 200106042137 /tmp/file2
% find . -newer /tmp/file1 \! -newer /tmp/file2 -print
% rm /tmp/file[12]

— ML

9.9 Running Commands on What You Find

Often, when you find a file, you don't just want to see its name; you want to do something, like grep (Section 13.2) for a text string. To do this, use the -exec operator. This allows you to specify a command that is executed upon each file that is found.

The syntax is peculiar and in many cases, it is simpler just to pipe the output of find to xargs (Section 28.17). However, there are cases where -exec is just the thing, so let's plunge in and explain its peculiarities.

The -exec operator allows you to execute any command, including another find command. If you consider that for a moment, you realize that find needs some way to distinguish the command it's executing from its own arguments. The obvious choice is to use the same end-of-command character as the shell (the semicolon). But since the shell uses the semicolon itself, it is necessary to escape the character with a backslash or quotes.

Therefore, every -exec operator ends with the characters \;. There is one more special argument that find treats differently: {}. These two characters are used as the variable whose name is the file find found. Don't bother rereading that last line: an example will clarify the usage. The following is a trivial case and uses the -exec operator with echo to mimic the -print operator:

% find . -exec echo {} \;

The C shell (Section 29.1) uses the characters { and }, but doesn't change {} together, which is why it is not necessary to quote these characters. The semicolon must be quoted, however. Quotes can be used instead of a backslash:

% find . -exec echo {} ';'

as both will sneak the semicolon past the shell and get it to the find command. As I said before, find can even call find. If you wanted to list every symbolic link in every directory owned by a group staff under the current directory, you could execute:

% find `pwd` -type d -group staff -exec find {} -type l -print \;

To search for all files with group-write permission under the current directory and to remove the permission, you can use:

% find . -perm -20 -exec chmod g-w {} \;

or:

% find . -perm -20 -print | xargs chmod g-w 

The difference between -exec and xargs is subtle. The first one will execute the program once per file, while xargs can handle several files with each process. However, xargs may have problems with filenames that contain embedded spaces. (Versions of xargs that support the -0 option can avoid this problem; they expect NUL characters as delimiters instead of spaces, and find's -print0 option generates output that way.)

Occasionally, people create a strange file that they can't delete. This could be caused by accidentally creating a file with a space or some control character in the name. find and -exec can delete this file, while xargs could not. In this case, use ls -il to list the files and i-numbers, and use the -inum operator with -exec to delete the file:

% find . -inum 31246 -exec rm {} ';'

If you wish, you can use -ok, which does the same as -exec, except the program asks you to confirm the action first before executing the command. It is a good idea to be cautious when using find, because the program can make a mistake into a disaster. When in doubt, use echo as the command. Or send the output to a file, and examine the file before using it as input to xargs. This is how I discovered that find requires {} to stand alone in the arguments to -exec. I wanted to rename some files using -exec mv {} {}.orig, but find wouldn't replace the {} in {}.orig. I learned that I have to write a shell script that I tell find to execute.

GNU find will replace the {} in {}.orig for you. If you don't have GNU find, a little Bourne shell while loop with redirected input can handle that too:

$ find ... -print |
> while read file
> do mv "$file" "$file.orig"
> done

find writes the filenames to its standard output. The while loop and its read command read the filenames from standard input then make them available as $file, one by one.

Section 9.12 and Section 9.27 have more examples of -exec.

— BB

9.10 Using -exec to Create Custom Tests

Here's something that will really make your head spin. Remember that -exec doesn't necessarily evaluate to "true"; it only evaluates to true if the command it executes returns a zero exit status. You can use this to construct custom find tests.

Assume that you want to list files that are "beautiful." You have written a program called beauty that returns zero if a file is beautiful and nonzero otherwise. (This program can be a shell script, a perl script, an executable from a C program, or anything you like.)

Here's an example:

% find . -exec beauty {} \; -print

In this command, -exec is just another find operator. The only difference is that we care about its value; we're not assuming that it will always be "true." find executes the beauty command for every file. Then -exec evaluates to true when find is looking at a "beautiful" program, causing find to print the filename. (Excuse us, causing find to evaluate the -print. :-))

Of course, this ability is capable of infinite variation. If you're interested in finding beautiful C code, you could use the command:

% find . -name "*.[ch]" -exec beauty {} \; -print

For performance reasons, it's a good idea to put the -exec operator as close to the end as possible. This avoids starting processes unnecessarily; the -exec command will execute only when the previous operators evaluate to true.

—JP and ML

9.11 Custom -exec Tests Applied

My favorite reason to use find 's -exec is for large recursive greps. Let's say I want to search through a large directory with lots of subdirectories to find all of the .cc files that call the method GetRaw( ):

% find . -name \*.cc -exec grep -n "GetRaw(" {} \; -print
58:    string Database::GetRaw(const Name &owner) const {
67:    string Database::GetRaw(const Name &owner,
./db/Database.cc
39:            return new Object(owner, _database->GetRaw(owner));
51:    string Object::GetRaw(const Property& property) const {
52:        return _database->GetRaw(_owner, property);
86:            Properties properties(_database->GetRaw(owner));
103:        return _database->GetRaw(_owner);
./db/Object.cc
71:        return new DatabaseObject(owner, GetDatabase( ).GetRaw(owner));
89:            return Sexp::Parse(GetRaw(property));
92:            SexpPtr parent = Sexp::Parse(GetRaw("_parent"))->Eval(this);
./tlisp/Object.cc

This output is from a real source directory for an open source project I'm working on; it shows me each line that matched my grep along with its line number, followed by the name of the file where those lines were found. Most versions of grep can search recursively (using -R), but they search all files; you need find to grep through only certain files in a large directory tree.

—JP and DJPH

9.12 Finding Many Things with One Command

Running find is fairly time consuming, and for good reason: it has to read every inode in the directory tree that it's searching. Therefore, combine as many things as you can into a single find command. If you're going to walk the entire tree, you may as well accomplish as much as possible in the process.

Let's work from an example. Assume that you want to write a command (eventually for inclusion in a Chapter 27 shell script) that sets file-access modes correctly. You want to give 771 access to all directories, 600 access for all backup files (*.BAK), 755 access for all shell scripts (*.sh), and 644 access for all text files (*.txt). You can do all this with one command:

$ find . \( -type d       -a -exec chmod 771 {} \; \) -o \
         \( -name "*.BAK" -a -exec chmod 600 {} \; \) -o \
         \( -name "*.sh"  -a -exec chmod 755 {} \; \) -o \
         \( -name "*.txt" -a -exec chmod 644 {} \; \)

Why does this work? Remember that -exec is really just another part of the expression; it evaluates to true when the following command is successful. It isn't an independent action that somehow applies to the whole find operation. Therefore, -exec can be mixed freely with -type, -name, and so on.

However, there's another important trick here. Look at the first chunk of the command — the first statement, that is, between the first pair of \( and \). It says, "If this file is a directory and the chmod command executes successfully . . . " Wait. Why doesn't the -exec execute a chmod on every file in the directory to see whether it's successful?

Logical expressions are evaluated from left to right; in any chunk of the expression, evaluation stops once it's clear what the outcome is. Consider the logical expression "`A AND B' is true." If A is false, you know that the result of "`A AND B' is true" will also be false — so there's no need to look the rest of the statement, B.

So in the previous multilayered expression, when find is looking at a file, it checks whether the file is a directory. If it is, -type d is true, and find evaluates the -exec (changing the file's mode). If the file is not a directory, find knows that the result of the entire statement will be false, so it doesn't bother wasting time with the -exec. find goes on to the next chunk after the OR operator — because, logically, if one part of an OR expression isn't true, the next part may be — so evaluation of an OR . . . OR . . . OR . . . expression has to continue until either one chunk is found to be true, or they've all been found to be false. In this case having the directory first is important, so that directories named, for example, blah.BAK don't lose their execute permissions.

Of course, there's no need for the -execs to run the same kind of command. Some could delete files, some could change modes, some could move them to another directory, and so on.

One final point. Although understanding our multilayered find expression was difficult, it really was no different from a "garden variety" command. Think about what the following command means:

% find . -name "*.c" -print

There are two operators: -name (which evaluates to true if the file's name ends in .c) and -print (which is always true). The two operators are ANDed together; we could stick a -a between the two without changing the result at all. If -name evaluates to false (i.e., if the file's name doesn't end in .c), find knows that the entire expression will be false. So it doesn't bother with -print. But if -name evaluates to true, find evaluates -print — which, as a side effect, prints the name.

As we said in Section 9.6, find's business is evaluating expressions — not locating files. Yes, find certainly locates files; but that's really just a side effect. For me, understanding this point was the conceptual breakthrough that made find much more useful.

— ML

9.13 Searching for Files by Type

If you are only interested in files of a certain type, use the -type argument, followed by one of the characters in Table 9-1. Note, though that some versions of find don't have all of these.

Table 9-1. find -type characters

Character

Meaning

c

Block special file ("device file")

c

Character special file ("device file")

d

Directory

f

Plain file

l

Symbolic link

p

Named pipe file

s

Socket

Unless you are a system administrator, the important types are directories, plain files, or symbolic links (i.e., types d, f, or l).

Using the -type operator, here is another way to list files recursively:

% find . -type f -print | xargs ls -l

It can be difficult to keep track of all the symbolic links in a directory. The next command will find all the symbolic links in your home directory and print the files to which your symbolic links point. $NF gives the last field of each line, which holds the name to which a symlink points. If your find doesn't have a -ls operator, pipe to xargs ls -l as previously.

% find $HOME -type l -ls | awk '{print $NF}'

— BB

9.14 Searching for Files by Size

find has several operators that take a decimal integer. One such argument is -size. The number after this argument is the size of the files in disk blocks. Unfortunately, this is a vague number. Earlier versions of Unix used disk blocks of 512 bytes. Newer versions allow larger block sizes, so a "block" of 512 bytes is misleading.

This confusion is aggravated when the command ls -s is used. The -s option supposedly lists the size of the file in blocks. But if your system has a different block size than ls -s has been programmed to assume, it can give a misleading answer. You can put a c after the number and specify the size in bytes. To find a file with exactly 1,234 bytes (as in an ls -l listing), type:

% find . -size 1234c -print

To search for files using a range of file sizes, a minus or plus sign can be specified before the number. The minus sign (-) means less than, and the plus sign (+) means greater than. This next example lists all files that are greater than 10,000 bytes, but less than 32,000 bytes:

% find . -size +10000c -size -32000c -print

When more than one qualifier is given, both must be true.

— BB

9.15 Searching for Files by Permission

find can look for files with specific permissions. It uses an octal number for these permissions. If you aren't comfortable with octal numbers and the way Unix uses them in file permissions, Section 1.17 is good background reading.

The string rw-rw-r-- indicates that you and members of your group have read and write permission, while the world has read-only privilege. The same permissions are expressed as an octal number as 664. To find all *.o files with these permissions, use the following:

% find . -name \*.o -perm 664 -print

To see if you have any directories with write permission for everyone, use this:

% find . -type d -perm 777 -print

The previous examples only match an exact combination of permissions. If you wanted to find all directories with group write permission, you want to match the pattern ----w----. There are several combinations that can match. You could list each combination, but find allows you to specify a pattern that can be bitwise ANDed with the permissions of the file. Simply put a minus sign (-) before the octal value. The group write permission bit is octal 20, so the following negative value:

% find . -perm -20 -print

will match the following common permissions:

Permission

Octal value

rwxrwxrwx

777

rwxrwxr-x

775

rw-rw-rw-

666

rw-rw-r--

664

rw-rw----

660

If you wanted to look for files that the owner can execute (i.e., shell scripts or programs), you want to match the pattern --x------ by typing:

% find . -perm -100 -print

When the -perm argument has a minus sign, all of the permission bits are examined, including the set user ID, set group ID, and sticky bits.

— BB

9.16 Searching by Owner and Group

Often you need to look for a file belonging to a certain user or group. This is done with the -user and -group search operators. You often need to combine this with a search for particular permissions. To find all files that are set user ID (setuid) root, use this:

% find . -user root -perm -4000 -print

To find all files that are set group ID (setgid) staff, use this:

% find . -group staff -perm -2000 -print

Instead of using a name or group from /etc/passwd or /etc/group, you can use the UID or GID number:

% find . -user 0 -perm -4000 -print
% find . -group 10 -perm -2000 -print

Often, when a user leaves a site, his account is deleted, but his files are still on the computer. Some versions of find have -nouser or -nogroup operators to find files with an unknown user or group ID.

— BB

9.17 Duplicating a Directory Tree

In many versions of find, the operator {}, used with the -exec operator, only works when it's separated from other arguments by whitespace. So, for example, the following command will not do what you thought it would:

% find . -type d -exec mkdir /usr/project/{} \;

You might have thought this command would make a duplicate set of (empty) directories, from the current directory and down, starting at the directory /usr/project. For instance, when the find command finds the directory ./adir, you would have it execute mkdir /usr/project/./adir (mkdir will ignore the dot; the result is /usr/project/adir).

That doesn't work because those versions of find don't recognize the {} in the pathname. The GNU version does expand {} in the middle of a string. On versions that don't, though, the trick is to pass the directory names to sed , which substitutes in the leading pathname:

% find . -type d -print | sed 's@^@/usr/project/@' | xargs mkdir
% find . -type d -print | sed 's@^@mkdir @' | (cd /usr/project; sh)

Let's start with the first example. Given a list of directory names, sed substitutes the desired path to that directory at the beginning of the line before passing the completed filenames to xargs and mkdir. An @ is used as a sed delimiter because slashes (/) are needed in the actual text of the substitution. If you don't have xargs, try the second example. It uses sed to insert the mkdir command, then it changes to the target directory in a subshell where the mkdir commands will actually be executed.

— JP

9.18 Using "Fast find" Databases

Berkeley added a handy feature to its find command — if you give it a single argument, it will search a database for file or directory names that match. For example, if you know there's a file named MH.eps somewhere on the computer but you don't know where, type the following:

% find MH.eps
/nutshell/graphics/cover/MH.eps

That syntax can be confusing to new users: you have to give find just one argument. With more arguments, find searches the filesystem directly. Maybe that's one reason that GNU has a "fast find" utility named locate — and its find utility always searches, as described in the rest of this chapter. The GNU slocate command is a security-enhanced version of locate. In the rest of this article, I'll describe locate — but find with a single argument (as shown previously) works about the same way.

The "fast find" database is usually rebuilt every night. So, it's not completely up-to-date, but it's usually close enough. If your system administrator has set this up, the database usually lists all files on the filesystem — although it may not list files in directories that don't have world-access permission. If the database isn't set up at all, you'll get an error like /usr/lib/find/find.codes: No such file or directory. (If that's the case, you can set up a "fast find" database yourself. Set up your own private locate database, or see Section 9.20.)

Unless you use wildcards, locate does a simple string search, like fgrep, through a list of absolute pathnames. Here's an extreme example:

% locate bin
/bin
/bin/ar
   ...
/home/robin
/home/robin/afile
/home/sally/bin
   ...

You can cut down this output by piping it through grep, sed, and so on. But locate and "fast find" also can use wildcards to limit searches. Section 9.19 explains this in more detail.

locate has an advantage over the "fast find" command: you can have multiple file databases and you can search some or all of them. locate and slocate come with a database-building program.

Because locate is so fast, it's worth trying to use whenever you can. Pipe the output to xargs and any other Unix command, or run a shell or Perl script to test its output — almost anything will be faster than running a standard find. For example, if you want a long listing of the files, here are two locate commands to do it:

% ls -l `locate whatever`
% locate whatever | xargs ls -ld

There's one problem with that trick. The locate list may be built by root, which can see all the files on the filesystem; your ls -l command may not be able to access all files in the list. But slocate can be configured not to show you files you don't have permission to see.

The locate database may need to be updated on your machine before you can use locate, if it's not already in the system's normal cron scripts. Use locate.updatedb to do this, and consider having it run weekly or so if you're going to use locate regularly.

— JP

9.19 Wildcards with "Fast find" Database

locate and all the "fast find" commands I've used can match shell wildcards (Section 1.13) (* , ?, [ ]). If you use a wildcard on one end of the pattern, the search pattern is automatically "anchored" to the opposite end of the string (the end where the wildcard isn't). The shell matches filenames in the same way.

The difference between the shell's wildcard matching and locate matching is that the shell treats slashes (/) in a special manner: you have to type them as part of the expression. In locate, a wildcard matches slashes and any other character. When you use a wildcard, be sure to put quotes around the pattern so the shell won't touch it.

Here are some examples:

— JP

9.20 Finding Files (Much) Faster with a find Database

If you use find to search for files, you know that it can take a long time to work, especially when there are lots of directories to search. Here are some ideas for speeding up your finds.

By design, setups like these that build a file database won't have absolutely up-to-date information about all your files.

If your system has "fast find" or locate, that's probably all you need. It lets you search a list of all pathnames on the system.

Even if you have "fast find" or locate, it still might not do what you need. For example, those utilities only search for pathnames. To find files by the owner's name, the number of links, the size, and so on, you have to use "slow find." In that case — or, when you don't have "fast find" or locate — you may want to set up your own version.

slocate can build and update its own database (with its -u option), as well as search the database. The basic "fast find" has two parts. One part is a command, a shell script usually named updatedb or locate.updatedb, that builds a database of the files on your system — if your system has it, take a look to see a fancy way to build the database. The other part is the find or locate command itself — it searches the database for pathnames that match the name (regular expression) you type.

To make your own "fast find":

To search the database, type:

% ffind somefile
/usr/freddie/lib/somefile
% ffind '/(sep|oct)[^/]*$'
/usr/freddie/misc/project/september
/usr/freddie/misc/project/october

You can do much more: I'll get you started. If you have room to store more information than just pathnames, you can feed your find output to a command like ls -l. For example, if you do a lot of work with links, you might want to keep the files' i-numbers as well as their names. You'd build your database with a command like this:

% cd
% find . -print | xargs ls -id > .fastfind.new
% mv -f .fastfind.new .fastfind

Or, if your version of find has the handy -ls operator, use the next script. Watch out for really large i-numbers; they might shift the columns and make cut give wrong output. The exact column numbers will depend on your system:

% cd
% find . -ls | cut -c1-7,67- > .fastfind.new
% mv -f .fastfind.new .fastfind

Then, your ffind script could search for files by i-number. For instance, if you had a file with i-number 1234 and you wanted to find all its links:

% ffind "^1234 "

The space at the end of that regular expression prevents matches with i-numbers like 12345. You could search by pathname in the same way. To get a bit fancier, you could make your ffind a little perl or awk script that searches your database by field. For instance, here's how to make awk do the previous i-number search; the output is just the matching pathnames:

awk '$1 == 1234 {print $2}' $HOME/.fastfind

With some information about Unix shell programming and utilities like awk, the techniques in this article should let you build and search a sophisticated file database — and get information much faster than with plain old find.

— JP

9.21 grepping a Directory Tree

Want to search every file, in some directory and all its subdirectories, to find the file that has a particular word or string in it? That's a job for find and one of the grep commands.

For example, to search all the files for lines starting with a number and containing the words "SALE PRICE," you could use:

% egrep '^[0-9].*SALE PRICE' `find . -type f -print`
./archive/ad.1290: 1.99 a special SALE PRICE
./archive/ad.0191: 2.49 a special SALE PRICE

Using the backquotes (``) might not work. If find finds too many files, egrep's command-line arguments can get too long. Using xargs can solve that; it splits long sets of arguments into smaller chunks. There's a problem with that: if the last "chunk" has just one filename and the grep command finds a match there, grep won't print the filename:

% find . -type f -print | xargs fgrep '$12.99'
./old_sales/ad.0489: Get it for only $12.99!
./old_sales/ad.0589: Last chance at $12.99, this month!
Get it for only $12.99 today.

The answer is to add the Unix " empty file," /dev/null. It's a filename that's guaranteed never to match but always to leave fgrep with at least two filenames:

% find . -type f -print | xargs fgrep '$12.99' /dev/null

Then xargs will run commands like these:

fgrep '$12.99' /dev/null ./afile ./bfile ...
fgrep '$12.99' /dev/null ./archives/ad.0190 ./archives/ad.0290 ...
fgrep '$12.99' /dev/null ./old_sales/ad.1289

That trick is also good when you use a wildcard (Section 28.3) and only one file might match it. grep won't always print the file's name unless you add /dev/null:

% grep "whatever" /dev/null /x/y/z/a*

— JP

9.22 lookfor: Which File Has That Word?

The following simple shell script, lookfor, uses find to look for all files in the specified directory hierarchy that have been modified within a certain time, and it passes the resulting names to grep to scan for a particular pattern. For example, the command:

% lookfor /work -7 tamale enchilada

would search through the entire /work filesystem and print the names of all files modified within the past week that contain the words "tamale" or "enchilada." (For example, if this article is stored in /work, lookfor should find it.)

The arguments to the script are the pathname of a directory hierarchy to search in ($1), a time ($2), and one or more text patterns (the other arguments). This simple but slow version will search for an (almost) unlimited number of words:

#!/bin/sh
temp=/tmp/lookfor$$
trap 'rm -f $temp; exit' 0 1 2 15
find $1 -mtime $2 -print > $temp
shift; shift
for word
do grep -i "$word" `cat $temp` /dev/null
done

That version runs grep once to search for each word. The -i option makes the search find either upper- or lowercase letters. Using /dev/null makes sure that grep will print the filename. Watch out, though: the list of filenames may get too long.

The next version is more limited but faster. It builds a regular expression for egrep that finds all the words in one pass through the files. If you use too many words, egrep will say Regular expression too long. Also, your egrep may not have a -i option; you can just omit it. This version also uses xargs; though xargs has its problems.

#!/bin/sh
where="$1"
when="$2"
shift; shift
# Build egrep expression like (word1|word2|...) in $expr
for word
do
    case "$expr" in
    "") expr="($word" ;;
    *) expr="$expr|$word" ;;
    esac
done
expr="$expr)"
  
find $where -mtime $when -print | xargs egrep -i "$expr" /dev/null

—JP and TOR

9.23 Using Shell Arrays to Browse Directories

Even a graphical file manager might not be enough to help you step through a complicated directory tree with multiple layers of subdirectories. Which directories have you visited so far, and which are left to go? This article shows a simple way, using shell arrays, to step through a tree directory-by-directory. The technique is also good for stepping through lists of files — or almost any collection of things, over a period of time — of which you don't want to miss any. At the end are a couple of related tips on using arrays.

9.23.1 Using the Stored Lists

Let's start with a quick overview of expanding array values; then we'll look at specifics for each shell. A dollar sign ($) before the name of a shell variable gives you its value. In the C shells and zsh, that gives all members of an array. But, in the Korn shell and bash2, expanding an array value without the index gives just the first member. To pick out a particular member, put its number in square brackets after the name; in ksh and bash2, you also need to use curly braces ({}). A hash mark (#) gives the number of members. Finally, you can use range operators to choose several members of an array.

Here's a practical example that you might use, interactively, at a shell prompt. You're cleaning your home directory tree. You store all the directory names in an array named d. When you've cleaned one directory, you go to the next one. This way, you don't miss any directories. (To keep this simple, I'll show an example with just four directories.)

If you don't want to use shell commands to browse the directories, you could use a command to launch a graphical file browser on each directory in the array. For instance, make the nextdir alias launch Midnight Commander with mc $d[1].

Let's start with the C shell:

% set d=(`find $home -type d -print`)
% echo $#d directories to search: $d
4 directories to search: /u/ann /u/ann/bin /u/ann/src /u/ann/lib
% alias nextdir 'shift d; cd $d[1]; pwd; ls -l'
% cd $d[1]
   ...clean up first directory...
% nextdir
/u/ann/bin
total 1940
lrwxrwxrwx    1 ann    users      14 Feb  7  2002 ] -> /usr/ucb/reset
-r-xr-xr-x    1 ann    users    1134 Aug 23  2001 addup
   ...clean up bin directory...
% nextdir
/u/ann/src
   ...do other directories, one by one...
% nextdir
d: Subscript out of range.

You store the array, list the number of directories, and show their names. You then create a nextdir alias that changes to the next directory to clean. First, use the C shell's shift command; it "throws away" the first member of an array so that the second member becomes the first member, and so on. Next, nextdir changes the current directory to the next member of the array and lists it. (Note that members of a C shell array are indexed starting at 1 — unlike the C language, which the C shell emulates, where indexes start at 0. So the alias uses cd $d[1].) At the end of our example, when there's not another array member to shift away, the command cd $d[1] fails; the rest of the nextdir alias isn't executed.

Bourne-type shells have a different array syntax than the C shell. They don't have a shift command for arrays, so we'll use a variable named n to hold the array index. Instead of aliases, let's use a more powerful shell function. We'll show ksh and bash2 arrays, which are indexed starting at 0. (By default, the first zsh array member is number 1.) The first command that follows, to store the array, is different in ksh and bash2 — but the rest of the example is the same on both shells.

bash2$ d=(`find $HOME -type d -print`)
ksh$ set -A d `find $HOME -type d -print`
  
$ echo ${#d[*]} directories to search: ${d[*]}
4 directories to search: /u/ann /u/ann/bin /u/ann/src /u/ann/lib
$ n=0
$ nextdir( ) {
>   if [ $((n += 1)) -lt ${#d[*]} ]
>   then cd ${d[$n]}; pwd; ls -l
>   else echo no more directories
>   fi
> }
$ cd ${d[0]}
   ...clean up first directory...
$ nextdir
/u/ann/bin
total 1940
lrwxrwxrwx    1 ann    users      14 Feb  7  2002 ] -> /usr/ucb/reset
-r-xr-xr-x    1 ann    users    1134 Aug 23  2001 addup
   ...do directories, as in C shell example...
$ nextdir
no more directories

If you aren't a programmer, this may look intimidating — like something you'd never type interactively at a shell prompt. But this sort of thing starts to happen — without planning, on the spur of the moment — as you learn more about Unix and what the shell can do.

9.23.2 Expanding Ranges

We don't use quite all the array-expanding operators in the previous examples, so here's a quick overview of the rest. To expand a range of members in ksh and bash2, give the first and last indexes with a dash (-) between them. For instance, to expand the second, third, and fourth members of array arrname, use ${arrname[1-3]}. In zsh, use a comma (,) instead — and remember that the first zsh array member is number 1; so you'd use ${arrname[2-4]} in zsh. C shell wants $arrname[2-4]. If the last number of a range is omitted (like ${arrname[2-]} or $arrname[2-]), this gives you all members from 2 through the last.

Finally, in all shells except zsh, remember that expanded values are split into words at space characters. So if members of an array have spaces in their values, be careful to quote them. For instance, Unix directory names can have spaces in them — so we really should have used cd "$d[1]" in the newdir alias and cd "${d[$n]}" in the newdir function.[2] If we hadn't done this, the cd command could have gotten multiple argument words. But it would only pay attention to the first argument, so it would probably fail.

To expand a range of members safely, such as ${foo[1-3]} in bash2 and ksh, you need ugly expressions without range operators, such as "${foo[1]}" "${foo[2]}" "${foo[3]}". The C shell has a :q string modifier that says "quote each word," so in csh you can safely use $foo[1-3]:q. It's hard to quote array values, though, if you don't know ahead of time how many there are! So, using ${foo[*]} to give all members of the foo array suffers from word-splitting in ksh and bash2 (but not in zsh, by default). In ksh and bash2, though, you can use "${foo[@]}", which expands into a quoted list of the members; each member isn't split into separate words. In csh, $foo[*]:q does the trick.

— JP

9.24 Finding the (Hard) Links to a File

Here is how to find hard links, as well as a brief look at the Unix filesystem from the user's viewpoint. Suppose you are given the following:

% ls -li /usr/bin/at
8041 -r-sr-xr-x  4 root  wheel  19540 Apr 21  2001 /usr/bin/at*

In other words, there are four links, and /usr/bin/at is one of four names for inode 8041. You can find the full names of the other three links by using find. However, just knowing the inode number does not tell you everything. In particular, inode numbers are only unique to a given filesystem. If you do a find / -inum 8041 -print, you may find more than four files, if inode 8041 is also on another filesystem. So how do you tell which ones refer to the same file as /usr/bin/at?

The simplest way is to figure out the filesystem on which /usr/bin/at lives by using df:

% df /usr/bin/at
Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s1f    3360437  1644024  1447579    53%    /usr

Then start your find at the top of that filesystem, and use -xdev to tell it not to search into other filesystems:

% find /usr -xdev -inum 8041 -print
/usr/bin/at
/usr/bin/atq
/usr/bin/atrm
/usr/bin/batch

Some manpages list -x as an alternative to -xdev; -xdev is generally more portable.

—DJPH and CT

9.25 Finding Files with -prune

find has lots of operators for finding some particular kinds of files. But find won't stop at your current directory — if there are subdirectories, it looks there too. How can you tell it "only the current directory"? Use -prune.

Most finds also have a -maxdepth option that gives the maximum number of directory levels to descend. For example, find . -maxdepth 0 operates only on the current directory.

-prune cuts short find's search at the current pathname. So, if the current pathname is a directory, find won't descend into that directory for any further searches. The command line looks kind of hairy. Here's one to find all files modified in the last 24 hours from the current directory:

% date
Tue Feb 12 19:09:35 MST 2002
% ls -l
total 0
drwxr-xr-x  1 deb  deb  0 Feb 12 12:11 adir
-rw-r--r--  1 deb  deb  0 Feb 12 19:08 afile
-rw-r--r--  1 deb  deb  0 Jan 10 10:37 bfile
-rw-r--r--  1 deb  deb  0 Feb 11 22:43 cfile
% find . \( -type d ! -name . -prune \) -o \( -mtime -1 -print \)
./afile
./cfile

Let's try to understand this command: once you see the pattern, you'll understand some important things about find that many people don't. Let's follow find as it looks at a few pathnames.

find looks at each entry, one by one, in the current directory (.). For each entry, find tries to match the expression from left to right. As soon as some parenthesized part matches, it ignores the rest (if any) of the expression.[3]

When find is looking at the file named ./afile, the first part of the expression, ( -type d ! -name . -prune ), doesn't match (./afile isn't a directory). So find doesn't prune. It tries the other part, after the -o (or):

Has ./afile been modified in the last day? In this (imaginary) case, it has — so the -print (which is always true) prints the pathname.

Next, ./bfile: like the previous step, the first part of the expression won't match. In the second part, ( -mtime -1 -print ), the file's modification time is more than one day ago. So the -mtime -1 part of the expression is false; find doesn't bother with the -print operator.

Finally, let's look at ./adir, a directory: the first part of the expression, ( -type d ! -name . -prune ), matches. That's because ./adir is a directory (-type d ), its name is not . (! -name .). So -prune, which is always true, makes this part of the expression true. find skips ./adir (because -prune prunes the search tree at the current pathname). Note that if we didn't use ! -name ., then the current directory would match immediately and not be searched, and we wouldn't find anything at all.

Section 9.27 shows handy aliases that use -prune.

— JP

9.26 Quick finds in the Current Directory

find -prune prunes find's search tree at the current pathname. Here are a couple of aliases that use -prune to search for files in the current directory. The first one, named find. (with a dot on the end of its name, to remind you of ., the relative pathname for the current directory), simply prints names with -print. The second alias gives a listing like ls -gilds. You can add other find operators to the command lines to narrow your selection of files. The aliases work like this:

% find. -mtime -1
./afile
./cfile
% find.ls -mtime -1
43073   0 -r--------  1 jerry    ora        0 Mar 27 18:16 ./afile
43139   2 -r--r--r--  1 jerry    ora     1025 Mar 24 02:33 ./cfile

The find. alias is handy inside backquotes, feeding a pipe, and other places you need a list of filenames. The second one, find.ls, uses -ls instead of -print:

alias find. 'find . \( -type d ! -name . -prune \) -o \( \!* -print \)'
alias find.ls 'find . \( -type d ! -name . -prune \) -o \( \!* -ls \)'

If you don't want the ./ at the start of each name, add a pipe through cut -c3- or cut -d'/' -f2- to the end of the alias definition.

— JP

9.27 Skipping Parts of a Tree in find

Q: I want to run find across a directory tree, skipping standard directories like /usr/spool and /usr/local/bin. A -name dirname -prune clause won't do it because -name doesn't match the whole pathname — just each part of it, such as spool or local. How can I make find match the whole pathname, like /usr/local/bin/, instead of all directories named bin?

A: It cannot be done directly. You can do this:

% find /path -exec test {} = /foo/bar -o {} = /foo/baz \; -prune -o pred

This will not perform pred on /foo/bar and /foo/baz; if you want them done, but not any files within them, try:

% find /path \( -exec test test-exprs \; ! -prune \) -o pred

The second version is worth close study, keeping the manual for find at hand for reference. It shows a great deal about how find works.

The -prune operator simply says "do not search the current path any deeper" and then succeeds a la -print.

Q: I only want a list of pathnames; the pred I use in your earlier answer will be just -print. I think I could solve my particular problem by piping the find output through a sed or egrep -v filter that deletes the pathnames I don't want to see.

A: That would probably be fastest. Using test runs the test program for each file name, which is quite slow. Take a peek at locate, described in Section 9.18.

There's more about complex find expressions in other articles, especially Section 9.6 and Section 9.12.

—CT and JP

9.28 Keeping find from Searching Networked Filesystem

The most painful aspect of a large NFS environment is avoiding the access of files on NFS servers that are down. find is particularly sensitive to this because it is very easy to access dozens of machines with a single command. If find tries to explore a file server that happens to be down, it will time out. It is important to understand how to prevent find from going too far.

To do this, use -xdev or -prune with -fstype, though, unfortunately, not all finds have all of these. -fstype tests for filesystem types and expects an argument like nfs, ufs, cd9660, or ext2fs. To limit find to files only on a local disk or disks, use the clause -fstype nfs -prune, or, if your find supports it, -fstype local.

To limit the search to one particular disk partition, use -xdev. For example, if you need to clear out a congested disk partition, you could look for all files bigger than 10 MB (10*1024*1024) on the disk partition containing /usr, using this command:

% find /usr -size +10485760c -xdev -print

— BB

[1]  Very old versions of find have trouble with using multiple -newer expressions in one command. If find doesn't find files that it should, try using multiple explicit -mtime expressions instead. They're not as precise, but they will work even on finds with buggy -newer handling.

[2]  We didn't do so because the syntax was already messy enough for people getting started.

[3]  That's because if one part of an OR expression is true, you don't need to check the rest. This so-called "short-circuit" logical evaluation by find is important to understanding its expressions.

CONTENTS