CONTENTS

Chapter 14. Removing Files

14.1 The Cycle of Creation and Destruction

As a computer user, you spend lots of time creating files. Just as the necessary counterpart of life is death, the other side of file creation is deletion. If you never delete any files, you soon have a computer's equivalent of a population explosion: your disks get full, and you must either spend money (buy and install more disk drives) or figure out which files you don't really need.

In this chapter, we'll talk about ways to get rid of files: how to do it safely, how to get rid of files that don't want to die, and how to find "stale" files — or unused files that have been around for a long time. "Safe" deletion is a particularly interesting topic, because Unix's rm command is extreme: once you delete a file, it's gone permanently. There are several solutions for working around this problem, letting you (possibly) reclaim files from the dead.

— ML

14.2 How Unix Keeps Track of Files: Inodes

The ability to mumble about inodes is the key to social success at a Unix gurus' cocktail party. This may not seem attractive to you, but sooner or later you will need to know what an inode is.

Seriously, inodes are an important part of the Unix filesystem. You don't need to worry about them most of the time, but it does help to know what they are.

An inode is a data structure on the disk that describes a file. It holds most of the important information about the file, including the on-disk address of the file's data blocks (the part of the file that you care about). Each inode has its own identifying number, called an i-number.

You really don't care about where a file is physically located on a disk. You usually don't care about the i-number — unless you're trying to find the links (Section 9.24, Section 10.3) to a file. But you do care about the following information, all of which is stored in a file's inode:

The file's ownership

The user and the group that own the file

The file's access mode (Section 1.17, Section 50.2)

Whether various users and groups are allowed to read, write, or execute the file

The file's timestamps (Section 8.2)

When the file itself was last modified, when the file was last accessed, and when the inode was last modified

The file's type

Whether the file is a regular file, a special file, or some other kind of abstraction masquerading (Section 1.19) as a file

Each filesystem has a set number of inodes that are created when the filesystem is first created (usually when the disk is first initialized). This number is therefore the maximum number of files that the filesystem can hold. It cannot be changed without reinitializing the filesystem, which destroys all the data that the filesystem holds. It is possible, though rare, for a filesystem to run out of inodes, just as it is possible to run out of storage space — this can happen on filesystems with many, many small files.

The ls -l (Section 50.2) command shows much of this information. The ls -i option (Section 10.4) shows a file's i-number. The stat command lists almost everything in an inode.

— ML

14.3 rm and Its Dangers

Under Unix, you use the rm command to delete files. The command is simple enough; you just type rm followed by a list of files. If anything, rm is too simple. It's easy to delete more than you want, and once something is gone, it's permanently gone. There are a few hacks that make rm somewhat safer, and we'll get to those momentarily. But first, here's a quick look at some of the dangers.

To understand why it's impossible to reclaim deleted files, you need to know a bit about how the Unix filesystem works. The system contains a "free list," which is a list of disk blocks that aren't used. When you delete a file, its directory entry (which gives it its name) is removed. If there are no more links (Section 10.3) to the file (i.e., if the file only had one name), its inode (Section 14.2) is added to the list of free inodes, and its datablocks are added to the free list.

Well, why can't you get the file back from the free list? After all, there are DOS utilities that can reclaim deleted files by doing something similar. Remember, though, Unix is a multitasking operating system. Even if you think your system is a single-user system, there are a lot of things going on "behind your back": daemons are writing to log files, handling network connections, processing electronic mail, and so on. You could theoretically reclaim a file if you could "freeze" the filesystem the instant your file was deleted — but that's not possible. With Unix, everything is always active. By the time you realize you made a mistake, your file's data blocks may well have been reused for something else.

When you're deleting files, it's important to use wildcards carefully. Simple typing errors can have disastrous consequences. Let's say you want to delete all your object (.o) files. You want to type:

% rm *.o

But because of a nervous twitch, you add an extra space and type:

% rm * .o

It looks right, and you might not even notice the error. But before you know it, all the files in the current directory will be gone, irretrievably.

If you don't think this can happen to you, here's something that actually did happen to me. At one point, when I was a relatively new Unix user, I was working on my company's business plan. The executives thought, so as to be "secure," that they'd set a business plan's permissions so you had to be root (Section 1.18) to modify it. (A mistake in its own right, but that's another story.) I was using a terminal I wasn't familiar with and accidentally created a bunch of files with four control characters at the beginning of their name. To get rid of these, I typed (as root):

# rm ????*

This command took a long time to execute. When about two-thirds of the directory was gone, I realized (with horror) what was happening: I was deleting all files with four or more characters in the filename.

The story got worse. They hadn't made a backup in about five months. (By the way, this article should give you plenty of reasons for making regular backups (Section 38.3).) By the time I had restored the files I had deleted (a several-hour process in itself; this was on an ancient version of Unix with a horrible backup utility) and checked (by hand) all the files against our printed copy of the business plan, I had resolved to be very careful with my rm commands.

[Some shells have safeguards that work against Mike's first disastrous example — but not the second one. Automatic safeguards like these can become a crutch, though . . . when you use another shell temporarily and don't have them, or when you type an expression like Mike's very destructive second example. I agree with his simple advice: check your rm commands carefully! — JP]

— ML

14.4 Tricks for Making rm Safer

Summary Box

Here's a summary of ways to protect yourself from accidentally deleting files:

  • Use rm -i, possibly as an alias (Section 14.8).

  • Make rm -i less painful (Section 14.7).

  • Write a "delete" script that moves "deleted" files to a temporary directory (Section 14.9).

  • tcsh has an rmstar variable that makes the shell ask for confirmation when you type something like rm *. In zsh, this protection is automatic unless you set the RM_STAR_SILENT shell option to stop it.

  • Use revision control (Section 39.4).

  • Make your own backups, as explained in Section 38.3.

  • Prevent deletion (or renaming or creating) of files by making the directory (not necessarily the files in it!) unwritable (Section 50.2).

If you want to delete with wild abandon, use rm -f (Section 14.10).

— ML

14.5 Answer "Yes" or "No" Forever with yes

Some commands — like rm -i, find -ok, and so on — ask users to answer a "do it or not?" question from the keyboard. For example, you might have a file-deleting program or alias named del that asks before deleting each file:

% del *
Remove file1? y
Remove file2? y
   ...

If you answer y, then the file will be deleted.

What if you want to run a command that will ask you 200 questions and you want to answer y to all of them, but you don't want to type all those ys from the keyboard? Pipe the output of yes to the command; it will answer y for you:

% yes | del *
Remove file1?
Remove file2?
   ...

If you want to answer n to all the questions, you can do:

% yes n | del *

Not all Unix commands read their standard input for answers to prompts. If a command opens your terminal (/dev/tty (Section 36.15)) directly to read your answer, yes won't work. Try expect (Section 28.18) instead.

— JP

14.6 Remove Some, Leave Some

Most people use rm -i for safety: so they're always asked for confirmation before removing a particular file. Mike Loukides told me about another way he uses rm -i. When he has several files to remove, but the wildcards (Section 1.13) would be too painful to type with a plain rm, Mike gives rm -i a bigger list of filenames and answers "n" to filenames he doesn't want deleted. For instance:

% ls
aberrant    abhorred    abnormal     abominate   acerbic
aberrate    abhorrent   abominable   absurd      acrimonious
    ...
% rm -i ab*
rm: remove aberrant (y/n)? y
rm: remove aberrate (y/n)? n
rm: remove abhorred (y/n)? y
rm: remove abhorrent (y/n)? n
    ...

— JP

14.7 A Faster Way to Remove Files Interactively

The rm -i command asks you about each file, separately. The method in this article can give you the safety without the hassle of typing y as much.

Another approach, which I recommend, is that you create a new script or alias, and use that alias whenever you delete files. Call the alias del or Rm, for instance. This way, if you ever execute your special delete command when it doesn't exist, no harm is done — you just get an error. If you get into this habit, you can start making your delete script smarter. Here is one that asks you about each file if there are three or fewer files specified. For more than three files, it displays them all and asks you once if you wish to delete them all:

#!/bin/sh
case $# in
0)     echo "`basename $0`: you didn't say which file(s) to delete"; exit 1;;
[123]) /bin/rm -i "$@" ;;
*)     echo "$*"
       echo do you want to delete these files\?
       read a
       case "$a" in
       [yY]*) /bin/rm "$@" ;;
       esac
       ;;
esac

— BB

14.8 Safer File Deletion in Some Directories

Using noclobber (Section 43.6) and read-only files only protects you from a few occasional mistakes. A potentially catastrophic error is typing:

% rm * .o

instead of:

% rm *.o

In the blink of an eye, all of your files would be gone. A simple, yet effective, preventive measure is to create a file called -i in the particular directory in which you want extra protection:

./- Section 14.13

% touch ./-i

In this case, the * is expanded to match all of the filenames in the directory. Because the file -i is alphabetically listed before any file except those that start with one of the characters !, #, $, %, &, ', ( , ), *, +, or ,, the rm command sees the -i file as a command-line argument. When rm is executed with its -i option, files will not be deleted unless you verify the action. This still isn't perfect, though. If you have a file that starts with a comma (,) in the directory, it will come before the file starting with a dash, and rm will not get the -i argument first.

The -i file also won't save you from errors like this:

% rm [a-z]* .o

If lots of users each make a -i file in each of their zillions of subdirectories, that could waste a lot of disk inodes (Section 14.2). It might be better to make one -i file in your home directory and hard link (Section 15.4) the rest to it, like this:

~ Section 30.11

% cd 
% touch ./-i 
% cd  somedir 
% ln ~/-i . 
   ...

Second, to save disk blocks, make sure the -i file is zero-length — use the touch command, not vi or some other command that puts characters in the file.

— BB

14.9 Safe Delete: Pros and Cons

To protect themselves from accidentally deleting files, some users create a " trash" directory somewhere and then write a "safe delete" program that, instead of rming a file, moves it into the trash directory. The implementation can be quite complex, but a simple alias or shell function will do most of what you want:

alias del "mv \!* ~/trash/."

Or, for Bourne-type shells:

del ( ) { mv "$@" $HOME/trash/.; }

Of course, now your deleted files collect in your trash directory, so you have to clean that out from time to time. You can do this either by hand or automatically, via a cron (Section 25.2) entry like this:

&& Section 35.14, -r Section 14.16

23 2 * * * cd $HOME/trash && rm -rf *

This deletes everything in the trash directory at 2:23 a.m. daily. To restore a file that you deleted, you have to look through your trash directory by hand and put the file back in the right place. That may not be much more pleasant than poking through your garbage to find the tax return you threw out by mistake, but (hopefully) you don't make lots of mistakes.

There are plenty of problems with this approach. Obviously, if you delete two files with the same name in the same day, you're going to lose one of them. A shell script could (presumably) handle this problem, though you'd have to generate a new name for the deleted file. There are also lots of nasty side effects and "gotchas," particularly if you want an rm -r equivalent, if you want this approach to work on a network of workstations, or if you use it to delete files that are shared by a team of users.

Unfortunately, this is precisely the problem. A "safe delete" that isn't really safe may not be worth the effort. Giving people a safety net with holes in it is only good if you can guarantee in advance that they won't land in one of the holes, believing themselves protected. You can patch some of the holes by replacing this simple alias with a shell script; but you can't fix all of them.

— ML

14.10 Deletion with Prejudice: rm -f

The -f option to rm is the extreme opposite of -i. It says, "Just delete the file; don't ask me any questions." The "f" stands (allegedly) for "force," but this isn't quite right. rm -f won't force the deletion of something that you aren't allowed to delete. (To understand what you're allowed to delete, you need to understand directory access permissions (Section 50.2).)

What, then, does rm -f do, and why would you want to use it?

I find that I rarely use rm -f on the Unix command line, but I almost always use it within shell scripts. In a shell script, you (probably) don't want to be interrupted by lots of prompts should rm find a bunch of read-only files.

You probably also don't want to be interrupted if rm -f tries to delete files that don't exist because the script never created them. Generally, rm -f keeps quiet about files that don't exist; if the desired end result is for the file to be gone, it not existing in the first place is just as good.

— ML

14.11 Deleting Files with Odd Names

Summary Box

A perennial problem is deleting files that have strange characters (or other oddities) in their names. The next few articles contain some hints for the following:

  • Deleting files with random control characters in their names (Section 14.12).

  • Deleting files whose names start with a dash (Section 14.13).

  • Deleting files with "unprintable" filenames (Section 14.14).

  • Deleting files by using the inode number (Section 14.15).

  • Deleting directories and problems that can arise as a result (Section 14.16).

We'll also give hints for these:

  • Deleting unused (or rarely used) files (Section 14.17).

  • Deleting all the files in a directory, except for one or two (Section 14.18).

Most tips for deleting files also work for renaming the files (if you want to keep them): just replace the rm command with mv.

— ML

14.12 Using Wildcards to Delete Files with Strange Names

Filenames can be hard to handle if their names include control characters or characters that are special to the shell. Here's a directory with three oddball filenames:

% ls
What now
a$file
prog|.c
program.c

When you type those filenames on the command line, the shell interprets the special characters (space, dollar sign, and vertical bar) instead of including them as part of the filename. There are several ways (Section 14.11) to handle this problem. One is with wildcards (Section 33.2). Type a part of the filename without the weird characters, and use a wildcard to match the rest. The shell doesn't scan the filenames for other special characters after it interprets the wildcards, so you're (usually) safe if you can get a wildcard to match. For example, here's how to rename What now to Whatnow, remove a$file, and rename prog|.c to prog.c:

% mv What* Whatnow
% rm -i a*
rm: remove a$file? y
% mv prog?.c prog.c

Filenames with control characters are just another version of the same problem. Use a wildcard to match the part of the name that's troubling you. The real problem with control characters in filenames is that some control characters do weird things to your screen. Once I accidentally got a file with a CTRL-L in its name. Whenever I ran ls, it erased the screen before I could see what the filename was! Section 8.12 explains how, depending on your version of ls, you can use the -q or -b options to spot the offensive file and construct a wildcard expression to rename or delete it. (ls -q is the default on most Unix implementations these days, so you will probably never see this particular problem.)

— JP

14.13 Handling a Filename Starting with a Dash (-)

Sometimes you can slip and create a file whose name starts with a dash (-), like -output or -f. That's a perfectly legal filename. The problem is that Unix command options usually start with a dash. If you try to type that filename on a command line, the command might think you're trying to type a command option.

In almost every case, all you need to do is "hide" the dash from the command. Start the filename with ./ (dot slash). This doesn't change anything as far as the command is concerned; ./ just means "look in the current directory" (Section 1.16). So here's how to remove the file -f:

% rm ./-f

(Most rm commands have a special option for dealing with filenames that start with a dash, but this trick should work on all Unix commands.)

— JP

14.14 Using unlink to Remove a File with a Strange Name

Some versions of Unix have a lot of trouble with eight-bit filenames — that is, filenames that contain non-ASCII characters. The ls -q (Section 8.12) command shows the nonASCII characters as question marks (?), but usual tricks like rm -i * (Section 14.12) skip right over the file. You can see exactly what the filename is by using ls -b (Section 8.12):

% ls -q
    ????
afile
bfile
% rm -i *
afile: ? n
bfile: ? n
% ls -b
\t\360\207\005\254
afile
bfile

On older Unixes, the -b option to ls might not be supported, in which case you can use od -c (Section 12.4) to dump the current directory, using its relative pathname . (dot) (Section 1.16), character by character. It's messier, and isn't supported on all Unix platforms, but it's worth a try:

% od -c .
   ...
00.....   \t 360 207 005 254  \0  \0  \0  \0  ...

If you can move all the other files out of the directory, then you'll probably be able to remove the leftover file and directory with rm -rf (Section 14.16, Section 14.10). Moving files and removing the directory is a bad idea, though, if this is an important system directory like /bin. Otherwise, if you use the escaped name ls -b gave you, you might be able to remove it directly by using the system call unlink (2) in Perl. Use the same escape characters in Perl that ls -b displayed. (Or, if you needed to use od -c, find the filename in the od listing of the directory — it will probably end with a series of NUL characters, like \0 \0 \0.)

perl -e 'unlink("\t\360\207\005\254");'

— JP

14.15 Removing a Strange File by its i-number

If wildcards don't work (Section 14.12) to remove a file with a strange name, try getting the file's i-number (Section 13.2). Then use find's -inum operator (Section 9.9) to remove the file.

Here's a directory with a weird filename. ls (with its default -q option (Section 8.12) on most versions) shows that the name has three unusual characters. Running ls -i shows each file's i-number. The strange file has i-number 6239. Give the i-number to find, and the file is gone:

% ls
adir      afile     b???file  bfile     cfile     dfile
% ls -i
  6253 adir        6239 b???file    6249 cfile
  9291 afile       6248 bfile       9245 dfile
% find . -inum 6239 -exec rm {} \;
% ls
adir   afile  bfile  cfile  dfile

Instead of deleting the file, I also could have renamed it to newname with the command:

% find . -inum 6239 -exec mv {} newname \;

If the current directory has large subdirectories, you'll probably want to keep find from recursing down into them by using the -maxdepth 1 operator. (finds that don't support -maxdepth can use -prune ( Section 9.25) for speed.)

— JP

14.16 Problems Deleting Directories

What if you want to get rid of a directory? The standard — and safest — way to do this is to use the Unix rmdir "remove directory" utility:

% rmdir files

The rmdir command often confuses new users. It will only remove a directory if it is completely empty; otherwise, you'll get an error message:

% rmdir files
rmdir: files: Directory not empty
% ls files
%

As in the example, ls will often show that the directory is empty. What's going on?

It's common for editors and other programs to create " invisible" files (files with names beginning with a dot). The ls command normally doesn't list them; if you want to see them, you have to use ls -A (Section 8.9):[1]

% rmdir files
rmdir: files: Directory not empty
% ls -A files
.BAK.textfile2

Here, we see that the directory wasn't empty after all: there's a backup file that was left behind by some editor. You may have used rm * to clean the directory out, but that won't work: rm also ignores files beginning with dots, unless you explicitly tell it to delete them. We really need a wildcard pattern like .??* or .[a-zA-Z0-9]* to catch normal dotfiles without catching the directories . and ..:

% rmdir files
rmdir: files: Directory not empty
% ls -A files
.BAK.textfile2
% rm files/.??*
% rmdir files
%

Other pitfalls might be files whose names consist of nonprinting characters or blank spaces — sometimes these get created by accident or by malice (yes, some people think this is funny). Such files will usually give you "suspicious" ls output (Section 8.11) (like a blank line).

If you don't want to worry about all these special cases, just use rm -r:

% rm -r files

This command removes the directory and everything that's in it, including other directories. A lot of people warn you about it; it's dangerous because it's easy to delete more than you realize. Personally, I use it all the time, and I've never made a mistake. I never bother with rmdir.

— ML

14.17 Deleting Stale Files

Sooner or later, a lot of junk collects in your directories: files that you don't really care about and never use. It's possible to write find (Section 9.1) commands that will automatically clean these up. If you want to clean up regularly, you can add some find commands to your crontab file (Section 25.2).

Basically, all you need to do is write a find command that locates files based on their last access time (-atime (Section 9.5)) and use -ok or -exec (Section 9.9) to delete them. Such a command might look like this:

% find . -atime +60 -ok rm -f {} \;

This locates files that haven't been accessed in the last 60 days, asks if you want to delete the file, and then deletes the file. (If you run it from cron, make sure you use -exec instead of -ok, and make absolutely sure that the find won't delete files that you think are important.)

Of course, you can modify this find command to exclude (or select) files with particular names; for example, the following command deletes old core dumps and GNU Emacs backup files (whose names end in ~), but leaves all others alone:

% find . \( -name core -o -name "*~" \) -atime +60 -ok rm -f {} \;

If you take an automated approach to deleting stale files, watch out for these things:

Okay, I've said that I don't really think that automated deletion scripts are a good idea. However, I don't have a good comprehensive solution. I spend a reasonable amount of time (maybe an hour a month) going through directories and deleting stale files by hand. I also have a clean alias that I type whenever I think about it. It looks like this:

alias clean "rm *~ junk *.BAK core #*"

That is, this alias deletes all of my Emacs (Section 19.1) backup files, Emacs autosave files (risky, I know), files named junk, some other backup files, and core dumps. I'll admit that since I never want to save these files, I could probably live with something like this:

% find ~ \( -name "*~" -o -name core \) -atime +1 -exec rm {} \;

But stil, automated deletion commands make me really nervous, and I'd prefer to live without them.

— ML

14.18 Removing Every File but One

One problem with Unix: it's not terribly good at "excluding" things. There's no option to rm that says, "Do what you will with everything else, but please don't delete these files." You can sometimes create a wildcard expression (Section 33.2) that does what you want — but sometimes that's a lot of work, or maybe even impossible.

Here's one place where Unix's command substitution ( Section 28.14) operators (backquotes) come to the rescue. You can use ls to list all the files, pipe the output into a grep -v or egrep -v (Section 13.3) command, and then use backquotes to give the resulting list to rm. Here's what this command would look like:

% rm -i `ls -d *.txt | grep -v '^john\.txt$'`

This command deletes all files whose names end in .txt, except for john.txt. I've probably been more careful than necessary about making sure there aren't any extraneous matches; in most cases, grep -v john would probably suffice. Using ls -d (Section 8.5) makes sure that ls doesn't look into any subdirectories and give you those filenames. The rm -i asks you before removing each file; if you're sure of yourself, omit the -i.

Of course, if you want to exclude two files, you can do that with egrep:

% rm `ls -d *.txt | egrep -v 'john|mary'`

(Don't forget to quote the vertical bar (|), as shown earlier, to prevent the shell from piping egrep's output to mary.)

Another solution is the nom (Section 33.8) script.

— ML

14.19 Using find to Clear Out Unneeded Files

Do you run find on your machine every night? Do you know what it has to go through just to find out if a file is three days old and smaller than ten blocks or owned by "fred" or setuid root? This is why I tried to combine all the things we need done for removal of files into one big find script:

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: cleanup

2>&1 Section 36.16

#! /bin/sh
#
# cleanup - find files that should be removed and clean them
# out of the file system.

find / \(    \( -name '#*'                 -atime +1 \)  \
        -o   \( -name ',*'                 -atime +1 \)  \
        -o   \( -name rogue.sav            -atime +7 \)  \
        -o   \(      \( -name '*.bak'                    \
                     -o -name '*.dvi'                    \
                     -o -name '*.CKP'                    \
                     -o -name '.*.bak'                   \
                     -o -name '.*.CKP' \)  -atime +3 \)  \
        -o   \( -name '.emacs_[0-9]*'      -atime +7 \)  \
        -o   \( -name core                           \)  \
        -o   \( -user guest                -atime +9 \)  \
\) -print -exec rm -f {} \; > /tmp/.cleanup 2>&1

This is an example of using a single find command to search for files with different names and last-access times (see Section 9.5). Doing it all with one find is much faster — and less work for the disk — than running a lot of separate finds. The parentheses group each part of the expression. The neat indentation makes this big thing easier to read. The -print -exec at the end removes each file and also writes the filenames to standard output, where they're collected into a file named /tmp/.cleanup — people can read it to see what files were removed. You should probably be aware that printing the names to /tmp/.cleanup lets everyone see pathnames, such as /home/joe/personal/resume.bak, which some people might consider sensitive. Another thing to be aware of is that this find command starts at the root directory; you can do the same thing for your own directories.

CT and JP

[1]  If your version of ls doesn't have the -A option, use -a instead. You'll see the two special directory entries . and .. (Section 8.9), which you can ignore.

CONTENTS