CONTENTS

Chapter 8. Directories and Files

8.1 Everything but the find Command

A computer isn't that much different from a house or an office; unless you're incredibly orderly, you spend a lot of time looking for things that you've misplaced. Even if you are incredibly orderly, you still spend some time looking for things you need — you just have a better idea of where to find them. After all, librarians don't memorize the location of every book in the stacks, but they do know how to find any book, quickly and efficiently, using whatever tools are available. A key to becoming a proficient user of any system, then, is knowing how to find things.

This chapter is about how to find things. We're excluding the find (Section 9.1) utility itself because it's complicated and deserves a chapter of its own. We'll concentrate on simpler ways to find files, beginning with some different ways to use ls.

Well, okay, towards the end of the chapter we'll touch on a few simple uses of find, but to really get into find, take a peek at Chapter 9.

— ML

8.2 The Three Unix File Times

When you're talking to experienced Unix users, you often hear the terms " change time" and "modification time" thrown around casually. To most people (and most dictionaries), "change" and "modification" are the same thing. What's the difference here?

The difference between a change and a modification is the difference between altering the label on a package and altering its contents. If someone says chmod a-w myfile, that is a change; if someone says echo foo >> myfile, that is a modification. A change modifies the file's inode; a modification modifies the contents of the file itself. A file's modification time is also called the timestamp.

As long as we're talking about change times and modification times, we might as well mention "access times," too. The access time is the last time the file was read or written. So reading a file updates its access time, but not its change time (information about the file wasn't changed) or its modification time (the file itself wasn't changed).

Incidentally, the change time or "ctime" is incorrectly documented as the "creation time" in many places, including some Unix manuals. Do not believe them.

— CT

8.3 Finding Oldest or Newest Files with ls -t and ls -u

Your directory might have 50, 100, or more files. Which files haven't been used for a while? You might save space by removing them. You read or edited a file yesterday, but you can't remember its name? These commands will help you find it. (If you want a quick review of Unix file times, see Section 8.2.)

In this example, I'll show you my bin (Section 7.4) directory full of shell scripts and other programs — I want to see which programs I don't use very often. You can use the same technique for directories with text or other files.

The ls command has options to change the way it orders files. By default, ls lists files alphabetically. For finding old files, use the -t option. This sorts files by their modification time, or the last time the file was changed. The newest files are listed first. Here's what happens:

jerry@ora ~/.bin
60 % ls -t
weather       unshar        scandrafts    rn2mh         recomp
crontab       zloop         tofrom        rmmer         mhprofile
rhyes         showpr        incc          mhadd         append
rhno          rfl           drmm          fixsubj       README
pickthis      maillog       reheader      distprompter  rtfm
cgrep         c-w           zrefile       xmhprint      saveart
dirtop        cw            zscan         replf         echoerr
which         cx            zfolders      fols
tcx           showmult      alifile       incs

I just added a shell script named weather yesterday; you can see it as the first file in the first column. I also made a change to my script named crontab last week; it's shown next. The oldest program in here is echoerr; it's listed last.[1]

ls -t is also great for file-time comparisons in a script (Section 8.15). ls -t is quite useful when I've forgotten whether I've edited a file recently. If I've changed a file, it will be at or near the top of the ls -t listing. For example, I might ask, "Have I made the changes to that letter I was going to send?" If I haven't made the changes (but only think I have), my letter will most likely appear somewhere in the middle of the listing.

The -u option shows the files' last-access time instead of the last-modification time. The -u option doesn't do anything with plain ls — you have to use it with another option like -t or -l. The next listing shows that I've recently used the rtfm and rmmer files. I haven't read README in a long time, though — oops:

jerry@ora ~/.bin
62 % ls -tu
rtfm          cx            drmm          saveart       fixsubj
rmmer         c-w           zscan         scandrafts    echoerr
rfl           cw            zrefile       rhno          dirtop
mhprofile     distprompter  xmhprint      rhyes         cgrep
showmult      recomp        zloop         replf         append
tcx           crontab       zfolders      reheader      alifile
tofrom        mhadd         which         incs          README
rn2mh         pickthis      unshar        maillog
weather       incc          showpr        fols

(Some Unixes don't update the last-access time of executable files when you run them. Shell scripts are always read, so their last-access times will always be updated.)

The -coption shows when the file's inode information was last changed. The inode time tells when the file was created, when you used chmod to change the permissions, and so on.

jerry@ora ~/.bin
64 % ls -tc
weather      maillog       reheader      recomp        incs
crontab      tcx           rn2mh         fols          cx
cgrep        zscan         tofrom        rmmer         cw
zloop        zrefile       mhadd         fixsubj       c-w
dirtop       rfl           drmm          mhprofile     echoerr
pickthis     showmult      alifile       append        which
rhno         rtfm          showpr        saveart       README
unshar       incc          scandrafts    distprompter
rhyes        zfolders      xmhprint      replf

If you're wondering just how long ago a file was modified (or accessed), add the -l option for a long listing. As before, adding -u shows the last-access time; -c shows inode change time. If I look at the access times of a few specific files, I find that I haven't read README since 2001.

jerry@ora ~/.bin
65 % ls -ltu README alifile maillog
-rwxr-xr-x   1 jerry    ora           59 Feb  2  2002 maillog
-rwxr-xr-x   1 jerry    ora          213 Nov 29  2001 alifile
-rw-r--r--   1 jerry    ora         3654 Nov 27  2001 README

— JP

8.4 List All Subdirectories with ls -R

By default, ls lists just one directory. If you name one or more directories on the command line, ls will list each one. The -R (uppercase R) option lists all subdirectories, recursively. That shows you the whole directory tree starting at the current directory (or the directories you name on the command line).

This list can get pretty long; you might want to pipe the output to a pager program such as less (Section 12.3). The ls -C option is a good idea, too, to list the output in columns. (When the ls output goes to a pipe, many versions of ls won't make output in columns automatically.)

— JP

8.5 The ls -d Option

If you give ls the pathname of a directory, ls lists the entries in the directory:

% ls -l /home/joanne
total 554
-rw-r--r--  1 joanne      15329 Oct  5 14:33 catalog
-rw-------  1 joanne      58381 Oct 10 09:08 mail
   ...

With the -d option, ls lists the directory itself:

% ls -ld /home/joanne
drwxr-x--x  7 joanne       4608 Oct 10 10:13 /home/joanne

The -d option is especially handy when you're trying to list the names of some directories that match a wildcard. Compare the listing with and without the -d option:

% ls -Fd [a-c]*
arc/                    bm/                     ctrl/
atcat.c                 cdecl/
atl.c.Z                 cleanscript.c
% ls -F [a-c]*
atcat.c                 atl.c.Z                 cleanscript.c

arc:
BugsEtc.Z       arcadd.c        arcext.c.Z      arcmisc.c.Z
   ...
bm:
Execute.c.Z     MakeDesc.c.Z    MkDescVec.c.Z   Search.c.Z
   ...

— JP

8.6 Color ls

The GNU ls command — which is on a lot of systems, including Linux — can display names in colors. For instance, when I enable color listings on my system, directory names are in dark blue, symbolic links are in sky blue, executable files (scripts, programs, etc.) are in green, and so on.

tcsh 's built-in ls -F command can display in colors, too. Just set color in your .cshrc to enable it, and configure it using LS_COLORS as described later in this section. You may also want to look at Section 8.6.4 for another way to configure colors if - - color doesn't seem to work.

8.6.1 Trying It

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: GNU ls

Has your system been set up for this? Simply try this command:

$ ls --color / /bin

If you don't get an error (ls: no such option — color, or something similar), you should see colors. If you don't get an error, but you also don't get colors, try one of these commands, and see what you get:

$ ls --color=always / /bin | cat -v 
^[[00m/:
^[[01;34mbin^[[00m
^[[01;34mboot^[[00m
    ...
^[[01;34mvar^[[00m

/bin:
^[[01;32march^[[00m
^[[01;36mawk^[[00m
^[[01;32mbasename^[[00m
    ...

$ ls --color=yes / /bin | cat -v 
    ...same kind of output...

Those extra characters surrounding the filenames, such as ^[[01;34m and ^[[00m, are the escape sequences that (you hope) make the colors. (The cat -v (Section 12.4) command makes the sequences visible, if there are any to see.) The ^[ is an ESC character; the next [ starts a formatting code; the 01 code means "boldface"; the semicolon (;) is a code separator; the 34 means "blue"; and the m ends the escape sequence. ^[[00m is an escape sequence that resets the attributes to normal. If you see the escape sequences when you use cat -v, but you haven't gotten any highlighting effects when you don't use it, there's probably some kind of mismatch between your termcap or terminfo entry (Section 5.2) (which should define the sequences) and the color database (see later in this section). If you don't see the escape sequences at all, take a look at Section 8.6.4 for another way to configure color ls.

8.6.2 Configuring It

How are the colors set? Both GNU ls and tcsh's ls -F use the LS_COLORS environment variable to decide how to format filenames. Here's a sample (truncated and split onto three lines for printing):

$ echo $LS_COLORS
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:
bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:
*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:
    ...

The LS_COLORS value is a series of item=attribute values with a colon (:) between each pair. For instance, fi=00 means that files have the attribute (color) 00; di=01;34 means that directories have the attributes 01 (bold) and 34 (blue); and *.exe=01;32 means that filenames ending with .exe have the attributes 01 (bold) and 32 (green). There can be up to three numbers. The first is an attribute code (bold, underscore, etc.); the second is a foreground color; the third is a background color. So, 01;37;41 indicates boldfaced white foreground (37) text on a red background (41).

The format is fairly obtuse, so you won't want to set LS_COLORS directly if you don't have to. The easy way to set it is with the dircolors command — typically in a shell setup file (Section 3.3):

eval Section 27.8'...' Section 28.14

eval `dircolors`

There, dircolors is reading the default database and outputting a command to set LS_COLORS. What if you don't want the default database settings? You can make your own. An easy place to start is with dircolors -p, which outputs a copy of the database. You can redirect the output to a file; a good option is to use a .dircolorsrc file in your home directory. Then take a look at it:

$ dircolors -p > $HOME/.dircolorsrc 
$ cat $HOME/.dircolorsrc 
     ...
# Below should be one TERM entry for each colorizable termtype
TERM linux
     ...
TERM vt100

# Below are the color init strings for the basic file types. A color
# init string consists of one or more of the following numeric codes:
# Attribute codes:
# 00=none 01=bold 04=underscore 05=blink 07=reverse 08=concealed
# Text color codes:
# 30=black 31=red 32=green 33=yellow 34=blue 35=magenta 36=cyan 37=white
# Background color codes:
# 40=black 41=red 42=green 43=yellow 44=blue 45=magenta 46=cyan 47=white
NORMAL 00     # global default, although everything should be something.
FILE 00       # normal file
DIR 01;34     # directory
LINK 01;36    # symbolic link
    ...

# List any file extensions like '.gz' or '.tar' that you would like ls
# to colorize below. Put the extension, a space, and the color init string.
# (and any comments you want to add after a '#')
.tar 01;31 # archives or compressed (bright red)
.tgz 01;31
    ...

The file starts with a listing of terminal type (Section 5.3) names that understand the color escape sequences listed in this file. Fortunately, the escape sequences are almost universal; there are some old terminals (like my old Tektronix 4106, I think . . . R.I.P.) that don't understand these, but not many. (If you have a different terminal or an odd terminal emulator, you can select a setup file automatically as you log in (Section 3.10).) The second section has a commented-out list of the attributes that these terminals recognize. You can use that list in the third section — which has standard attributes for files, directories, and so on. The fourth section lets you choose attributes for files by their filename "extensions" — that is, the part of the filename after the final dot (like .tar).

If you make your own database, you'll need to use it (again, typically in a shell setup file) to set LS_COLORS:

eval `dircolors $HOME/.dircolorsrc`

8.6.3 The -- color Option

For better or for worse, the way to activate color ls is by using the --color option on the command line. Because almost no one will want to type those characters every time they run ls, most users need to make an alias (Section 29.2, Section 29.4) for ls that runs ls --color. For example, here are the three aliases defined for bash on my Linux system:

alias l.='ls .[a-zA-Z]* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'

If you're using tcsh, setting the color variable to enable ls -F's color also arranges to send -- color=auto to regular ls.

The -- color option gives you three choices of when the ls output should be colored: -- color=never to never output color, -- color=always to always output color, and -- color=auto to only output color escape sequences if the standard output of ls is a terminal. I suggest using -- color=auto, because -- color=always means that when you pipe the output of ls to a printer or redirect it to a file, it will still have the ugly escape sequences you saw earlier in this article.

8.6.4 Another color ls

Some systems have another way to configure and use color ls. My FreeBSD systems use this scheme; if none of the configuration techniques described earlier work, use ls -G or set the CLICOLOR environment variable. If this works, you'll want to use the LSCOLORS environment variable to configure color information instead of LS_COLORS as described earlier. Spend a little time perusing your ls(1) manpage for further details if your ls seems to work this way, as configuring it is likely to be completely different from what we described previously.

—JP and DJPH

8.7 Some GNU ls Features

A lot of the GNU utilities came from Unix utilities — but with extra features. The GNU ls command is no exception: as its info page (Section 2.9) says, "Because ls is such a fundamental program, it has accumulated many options over the years." Amen. Let's look at three of the options that aren't covered by other articles on ls.

An Emacs editor backup file (Section 19.4) has a name ending in ~ (tilde). If you use Emacs a lot, these files can really clutter your directories. The ls -B option ignores Emacs backup files:

$ ls
bar.c  bar.c~  baz.c  baz.c~  foo.c  foo.c~
$ ls -B
bar.c  baz.c  foo.c

The option -I (uppercase letter I) takes -B one step further: you can give a wildcard expression (shell wildcard pattern, not grep-like expressions) for entries not to list. (Remember that — because you want to pass the wildcard pattern to ls, and not let the shell expand it first — you need to quote (Section 27.12) the pattern.) For instance, to skip all filenames ending in .a and .o, use the wildcard pattern *.[ao], like this:

$ ls
bar.a  bar.c  bar.o  baz.a  baz.c  baz.o  foo.a  foo.c  foo.o
$ ls -I "*.[ao]"
bar.c  baz.c  foo.c

The "minimalist" side of me might argue that both -B and -I are feeping creatures because you can get basically the same effect by combining plain old ls with one of the "not this file" shell wildcard operators. This next option is in the same category. Instead of using -S to sort the files by size, you could pipe the output of plain ls -l to sort -n (Section 22.5) and sort on the size field, then strip off the information you didn't want and . . . ahem. (Grumble, grumble.) Okay, -S really is pretty useful. ;-) I use it a lot when I'm cleaning out directories and want to find the most effective files to remove:

$ ls -lS 
total 1724
-rw-rw-r--    1 jerry    ora   395927 Sep  9 06:21 SunTran_map.pdf
-rw-------    1 jerry    ora   389120 Oct 31 09:55 core
-rw-r--r--    1 jerry    ora   178844 May  8 16:36 how
-rw-------    1 jerry    ora    77122 Oct 29 08:46 dead.letter
    ...

— JP

8.8 A csh Alias to List Recently Changed Files

Looking for a recently changed file? Not sure of the name? Trying to do this in a directory with lots of files? Try the lr alias:

alias lr "ls -lagFqt \!* | head"

This alias takes advantage of the -t option (Section 8.3) to ls, so that recent files can float to the top of the listing. !* is the csh syntax for "put all of the arguments to the alias here." (We have to escape the exclamation point to keep it from being interpreted when we set the alias.) head (Section 12.12) shows just the first ten lines.

A simple lr in my home directory gives me:

bermuda:home/dansmith :-) lr
total 1616
-rw-------  1 dansmith staff      445092 Oct  7 20:11 .mush256
-rw-r--r--  1 dansmith staff        1762 Oct  7 20:11 .history
drwxr-xr-x 30 dansmith staff        1024 Oct  7 12:59 text/
-rw-------  1 dansmith staff      201389 Oct  7 12:42 .record
drwxr-xr-x 31 dansmith staff        1024 Oct  4 09:41 src/
-rw-r--r--  1 dansmith staff        4284 Oct  4 09:02 .mushrc
   ...

You can also give a wildcarded pattern to narrow the search. For example, here's the command to show me the dot files that have changed lately:

bermuda:home/dansmith :-) lr .??*
-rw-------  1 dansmith staff      445092 Oct  7 20:11 .mush256
-rw-r--r--  1 dansmith staff        1762 Oct  7 20:11 .history
-rw-------  1 dansmith staff      201389 Oct  7 12:42 .record
-rw-r--r--  1 dansmith staff        4284 Oct  4 09:02 .mushrc
   ...

— DS

8.9 Showing Hidden Files with ls -A and -a

The ls command normally ignores any files whose names begin with a dot (.). This is often very convenient: Unix has lots of small configuration files, scratch files, etc. that you really don't care about and don't want to be bothered about most of the time. However, there are some times when you care very much about these files. If you want to see "hidden" files, use the command ls -a. For example:

% cd 
% ls                                    Don't show hidden files
Mail       mail.txt      performance   powertools
% ls -a                                 This time, show me EVERYTHING
.        .emacs        Mail          powertools
..       .login        mail.txt
.cshrc   .mailrc       performance

With the -a option, we see four additional files: two C-shell initialization files, the customization files for the GNU Emacs editor, and mail. We also see two "special" entries, . and .., which represent the current directory and the parent of the current directory. All Unix directories contain these two entries (Section 10.2).

If you don't want to be bothered with . and .., many versions of ls also have a -A option:

% ls -A     Show me everything but . and ..
.cshrc   .login        Mail          performance
.emacs   .mailrc       mail.txt      powertools

— ML

8.10 Useful ls Aliases

Because ls is one of the most commonly used Unix commands and provides numerous options, it's a good idea to create aliases for the display formats that best suit your needs. For example, many users always want to know about their "hidden" files. That's reasonable — they're just as important as any other files you have. In some cases, they can grow to take up lots of room (for example, some editors hide backup files), so it's worth being aware of them.

Rather than typing ls -a every time, you can create a convenient alias that supplies the -a or -A option (Section 8.9) automatically:

$ alias la="ls -aF"
% alias la ls -aF

or:

$ alias la="ls -AF"
% alias la ls -AF

Two things to note here. First, I recommend using la as the name of the alias, rather than just renaming ls. I personally think it's dangerous to hide the pure, unadulterated command underneath an alias; it's better to pick a new name and get used to using that name. If you ever need the original ls for some reason, you'll be able to get at it without problems.

Second, what's with the -F option? I just threw it in to see if you were paying attention. It's actually quite useful; many users add it to their ls aliases. The -F option shows you the type of file in each directory by printing an extra character after each filename. Table 8-1 lists what the extra character can be.

Table 8-1. Filename types listed by ls -F

Character

Definition

(nothing)

The file is a regular file.

*

The file is an executable.

/

The file is a directory.

@

The file is a symbolic linkSection 10.4).

|

The file is a FIFO (named pipe) Section 43.11).

=

The file is a socket.

For example:

% la          Alias includes -F functionality
.cshrc   .login        Mail/         performance/
.emacs   .mailrc       mail.txt      powertools@

This says that Mail and performance are directories. powertools is a symbolic link (ls -l will show you what it's linked to). There are no executables, FIFOs, or sockets in this directory.

[If you use tcsh, it has a built-in ls called ls -F, which not only prints this extra information, but also supports color (Section 8.6) and caching of filesystem information for speed. I generally put alias ls ls -F in my .cshrc. — DH]

You may want this version instead:

$ alias la="ls -aFC"
% alias la ls -aFC

The -C option lists the files in multiple columns. This option isn't needed with ls versions where multicolumn output is the normal behavior. Note, however, that when piped to another command, ls output is single-column unless -C is used. For example, use ls -C | less to preserve multiple columns with a paged listing.

Finally, if you often need the full listing, use the alias:

$ alias ll="ls -l"
% alias ll ls -l

This alias may not seem like much of a shortcut until after you've typed it a dozen times. In addition, it's easy to remember as "long listing." Some Unix systems even include ll as a regular command.

—DG and ML

8.11 Can't Access a File? Look for Spaces in the Name

What's wrong here?

% ls
afile    exefiles   j       toobig
% lpr afile
lpr: afile: No such file or directory

Huh? ls shows that the file is there, doesn't it? Try using:

-v Section 12.4, -t -e Section 1125

% ls -l | cat -v -t -e
total 89$
-rw-rw-rw-  1 jerry          28 Mar  7 19:46 afile $
-rw-r--r--  1 root        25179 Mar  4 20:34 exefiles$
-rw-rw-rw-  1 jerry         794 Mar  7 14:23 j$
-rw-r--r--  1 root          100 Mar  5 18:24 toobig$

The cat -e option marks the ends of lines with a $. Notice that afile has a $ out past the end of the column. Aha . . . the filename ends with a space. Whitespace characters like TABs have the same problem, though the default ls -q (Section 8.12) option (on many Unix versions) shows them as ? if you're using a terminal.

If you have the GNU version of ls, try its -Q option to put double quotes around each name:

$ ls -Q
"afile "  "exefiles"  "j"  "toobig"

To rename afile, giving it a name without the space, type:

% mv "afile " afile

The quotes (Section 27.12) tell the shell to include the space as part of the first argument it passes to mv. The same quoting works for other Unix commands as well, such as rm.

— JP

8.12 Showing Nonprintable Characters in Filenames

From time to time, you may get filenames with nonprinting characters, spaces, and other garbage in them. This is usually the result of some mistake — but it's a pain nevertheless.

If you're using a version of ls that uses -q by default (and most do these days), the ls command gives you some help; it converts all nonprinting characters to a question mark (?), giving you some idea that something funny is there.[2] For example:

% ls
ab??cd

This shows that there are two nonprinting characters between ab and cd. To delete (or rename) this file, you can use a wildcard pattern like ab??cd.

Be careful: when I was new to Unix, I once accidentally generated a lot of weird filenames. ls told me that they all began with ????, so I naively typed rm ????*. That's when my troubles began. See Section 14.3 for the rest of the gruesome story. (I spent the next day and night trying to undo the damage.) The moral is: it's always a good idea to use echo to test filenames with wildcards in them.

If you're using an ls that came from System V Unix, you have a different set of problems. System V's ls doesn't convert the nonprinting characters to question marks. In fact, it doesn't do anything at all — it just spits these weird characters at your terminal, which can respond in any number of strange and hostile ways. Most of the nonprinting characters have special meanings — ranging from "don't take any more input" to "clear the screen." [If you don't have a System V ls, but you want this behavior for some reason, try GNU ls with its -N option. — JP]

To prevent this, or to see what's actually there instead of just the question marks, use the -b option.[3] This tells ls to print the octal value of any nonprinting characters, preceeded by a backslash. For example:

% ls -b
ab\013\014cd

This shows that the nonprinting characters have octal values 13 and 14, respectively. If you look up these values in an ASCII table, you will see that they correspond to CTRL-k and CTRL-l. If you think about what's happening — you'll realize that CTRL-l is a formfeed character, which tells many terminals to clear the screen. That's why the regular ls command behaved so strangely.

Once you know what you're dealing with, you can use a wildcard pattern to delete or rename the file.

— ML

8.13 Counting Files by Types

I use awk (Section 20.10) a lot. One of my favorite features of awk is its associative arrays. This means awk can use anything as an index into an array. In the next example, I use the output of the file (Section 12.6) command as the index into an array to count how many files there are of each type:

xargs Section 28.17

#!/bin/sh
# usage: count_types [directory ...]
# Counts how many files there are of each type
# Original by Bruce Barnett
# Updated version by yu@math.duke.edu (Yunliang Yu)
find ${*-.} -type f -print | xargs file |
awk '{
        $1=NULL;
        t[$0]++;
}
END {
        for (i in t) printf("%d\t%s\n", t[i], i);
}' | sort -nr   # Sort the result numerically, in reverse

The output of this might look like:

38  ascii text
32  English text
20  c program text
17  sparc executable not stripped
12  compressed data block compressed 16 bits
8   executable shell script
1   sparc demand paged dynamically linked executable
1   executable /bin/make script

— BB

8.14 Listing Files by Age and Size

If you find a large directory and most of the files are new, that directory may not be suitable for removal, as it is still being used. Here is a script that lists a summary of file sizes, broken down into the time of last modification. You may remember that ls -l will list the month, day, hour, and minute if the file is less than six months old and show the month, day, and year if the file is more than six months old. Using this, the script creates a summary for each of the last six months, as well as a summary for each year for files older than that:

xargs Section 28.17

#!/bin/sh
# usage: age_files [directory ...]
# lists size of files by age
#
# pick which version of ls you use
#   System V
#LS="ls -ls"
#   Berkeley
LS="ls -lsg"
#
find ${*:-.} -type f -print | xargs $LS | awk  '
# argument 7 is the month; argument 9 is either hh:mm or yyyy
# test if argument is hh:mm or yyyy format
{
   if ($9 !~ /:/) {
      sz[$9]+=$1;
   } else {
      sz[$7]+=$1;
   }
}
END {
   for (i in sz) printf("%d\t%s\n", sz[i], i);
}' | sort -nr

The program might generate results like this:

5715   1991
3434   1992
2929   1989
1738   Dec
1495   1990
1227   Jan
1119   Nov
953   Oct
61   Aug
40   Sep

[For the book's third edition, I thought about replacing this venerable ten-year-old script with one written in Perl. Perl, after all, lets you get at a file's inode information directly from the script, without the ls -awk kludge. But I changed my mind because this technique — groveling through the output of ls -l with a "summarizing" filter script — is really handy sometimes. — JP]

— BB

8.15 newer: Print the Name of the Newest File

Here's a quick alias that figures out which file in a group is the newest:

-d Section 8.5

alias newer "ls -dt \!* | head -1"

If your system doesn't have a head ( Section 12.12) command, use sed 1q instead.

For example, let's say that you have two files named plan.v1 and plan.v2. If you're like me, you (often) edit the wrong version by mistake — and then, a few hours later, can't remember what you did. You can use this alias to figure out which file you changed most recently:

% newer plan.v*
plan.v1

I could also have used command substitution (Section 28.14) to handle this in one step:

% emacs `newer plan.*`

— ML

8.16 oldlinks: Find Unconnected Symbolic Links

One problem with symbolic links is that they're relatively "fragile" (Section 10.6). The link and the file itself are different kinds of entities; the link only stores the name of the "real" file. Therefore, if you delete or rename the real file, you can be left with a "dead" or "old" link: a link that points to a file that doesn't exist.

This causes no end of confusion, particularly for new users. For example, you'll see things like this:

% ls -l nolink
lrwxrwxrwx   1 mikel     users    12 Nov  2 13:57 nolink -> /u/joe/afile
% cat nolink
cat: nolink: No such file or directory

The file's obviously there, but cat tells you that it doesn't exist.

There's no real solution to this problem, except to be careful. Try writing a script that checks links to see whether they exist. Here's one such script from Tom Christiansen; it uses find to track down all links and then uses perl to print the names of links that point to nonexistent files. (If you're a Perl hacker and you'll be using this script often, you could replace the Unix find utility with the Perl File::Find module.)

#!/bin/sh
find . -type l -print | perl -nle '-e || print'

The script only lists "dead" links; it doesn't try to delete them or do anything drastic. If you want to take some other action (such as deleting these links automatically), you can use the output of the script in backquotes (Section 28.14). For example:

% rm `oldlinks`

— ML

8.17 Picking a Unique Filename Automatically

Shell scripts, aliases, and other programs often need temporary files to hold data to be used later. If the program will be run more than once, or if the temp file needs to stay around after the program is done, you need some way to make a unique filename. Generally these files are stored in /tmp or /usr/tmp.

One way is with the shell's process ID number (Section 24.3), available in the $$ parameter. You might name a file /tmp/myprog$$; the shell will turn that into something like /tmp/myprog1234 or /tmp/myprog28471. If your program needs more than one temporary file, add an informative suffix to the names:

% errs=/tmp/ myprog -errs$$
% output=/tmp/ myprog -output$$

You can also use date's + option to get a representation of the date suitable for temporary filenames. For example, to output the Year, month, day, Hour, Minute, and Second:

% date
Wed Mar  6 17:04:39 MST 2002
% date +'%Y%m%d%H%M%S'
20020306170515

Use a + parameter and backquotes (``) (Section 28.14) to get a temp file named for the current date and/or time. For instance, on May 31 the following command would store foo.0531 in the Bourne shell variable temp. On December 7, it would store foo.1207:

% temp=foo.`date +'%m%d'`

If you'll be generating a lot of temporary files in close proximity, you can use both the process ID and the date/time:

% output=/tmp/ myprog $$.`date +'%Y%m%d%H%M%S'`
% echo $output 
/tmp/myprog 25297.20020306170222

—JP and DJPH

[1]  On some systems, ls -t will list the files in one column, with the newest file first. Although that's usually a pain, I actually find that more convenient when I'm interested in the most recent files. If your system does that and you don't like the single-column display, you can use ls -Ct. On other systems, if a single-column display would be handy, use ls -1t; the "1" option means "one column." You can also use ls -lt, since long listings also list one file per line. Throughout this article, we'll assume you're using an ls version that makes multicolumn output.

[2]  Even in lses that use it, the -q option is the default only when ls's standard output is a terminal. If you pipe the output or redirect it to a file, remember to add -q.

[3]  On systems that don't support ls -b, pipe the ls -q output through cat -v or od -c (Section 12.4) to see what the nonprinting characters are.

CONTENTS