CONTENTS

Chapter 45. Printing

45.1 Introduction to Printing

This chapter discusses printing, which is a surprisingly complicated subject. To understand why printing is so complicated, though, let's think a little bit about what you might want to print.

First, in the "olden days," we had line printers and their relatives: daisy-wheel printers, dot-matrix printers, and other pieces of equipment that generated typewriter-like output. Printing a simple text file was easy: you didn't need any special processing; you only needed some software to shove the file into the printer. If you wanted, you might add a banner page and do a little simple formatting, but that was really pretty trivial.

The one area of complexity in the printing system was the "spooling system," which had to do several things in addition to force-feeding the printer. Most printers were (and still are) shared devices. This means that many people can send jobs to the printer at the same time. There may also be several printers on which your file gets printed; you may care which one is used, or you may not. The spooling system needs to manage all this: receiving data from users, figuring out whether or not an appropriate printer is in use, and sending the file to the printer (if it's free) or storing the file somewhere (if the printer isn't free).

Historical note: why is this called the "spooling system"? Dave Birnbaum, a Principal Scientist at Xerox, says:

"SPOOL (Simultaneous Printing Off and On Line)" It was written for the early IBM mainframes (of the 3-digit, i.e., 709 kind) and extended to the early 1401 machines. Output for the printer was sent to the spool system, which either printed it directly or queued it (on tape) for later printing (hence the on/off line). There was also a 2nd generation version where the 1401 would act as the printer controller for the (by then) 7094. The two were usually connected by a switchable tape drive that could be driven by either machine." [There's some controversy about exactly what the acronym means, but Dave's is as good as any I've heard. — JP]

The first few articles in this chapter, Section 45.2, Section 45.3, Section 45.4, and Section 45.5, discuss the basic Unix spooling system and how to work with it as a user.

The next few articles talk about how to format articles for printing — not the kind of fancy formatting people think of nowadays, but simpler things like pagination, margins, and so on, for text files that are to be sent to a line printer or a printer in line-printer emulation mode. Section 45.6 describes this kind of simple formatting, and Section 45.7 gets a little more complicated on the same subject.

Historical note number two: why is the print spooler called lp or lpr? It typically spooled text to a line printer, a fast printer that used a wide head to print an entire line at a time. These printers are still common in data processing applications, and they can really fly!

In the mid-1970s, lots of Unix people got excited about typesetting. Some typesetters were available that could be connected to computers, most notably the C/A/T phototypesetter. Programs like troff and TEX were developed to format texts for phototypesetters. Typesetting tools are still with us, and still very valuable, though these days they generally work with laser printers via languages like PostScript. They're discussed in Section 45.10 through Section 45.17, along with the ramifications of fancy printing on Unix.

Finally, Section 45.19 is about the netpbm package. It's a useful tool for people who deal with graphics files. netpbm converts between different graphics formats.

— ML

45.2 Introduction to Printing on Unix

Unix used a print spooler to allow many users to share a single printer long before Windows came along. A user can make a printing request at any time, even if the printer is currently busy. Requests are queued and processed in order as the printer becomes available.

Unix permits multiple printers to be connected to the same system. If there is more than one printer, one printer is set up as the default printer, and print jobs are sent there if no printer is specified.

45.2.1 lpr-Style Printing Commands

Many systems use the lpr command to queue a print job. When you use lpr, it spools the file for printing.

$ lpr notes

The lpq command tells you the status of your print jobs by showing you the print queue for a given printer.

$ lpq
lp is ready and printing
Rank   Owner      Job  Files                Total Size
active fred       876  notes                7122 bytes
1st    alice      877  standard input       28372 bytes
2nd    john       878  afile bfile ...      985733 bytes

The word active in the Rank column shows the job that's currently printing. If your job does not appear at all on the listing, it means your job has finished printing or has been completely written into the printer's input buffer (or perhaps that you accidentally printed it to a different queue). If a job is not active, it's still in the queue.

You can remove a job with the lprm command. (Run lpq first to get the job number.)

$ lprm 877 
dfA877host  dequeued
cfA877host  dequeued

The command lpc status (Section 45.3) can be used to determine which printers are connected to your system and their names. If there is more than one printer, you can then use the -P option with lpr, lpq and lprm to specify a printer destination other than the default. For instance, if a laser printer is configured as laserp, you can enter:

$ lpr -Plaserp myfile

If you'll be using a certain printer often, put its name in the PRINTER environment variable (Section 45.4).

If you're using an older system that has only lp (see below), or if you'd like a fancier lpr that supports all sorts of handy features, take a peek at LPRng (available at http://www.lprng.com). It supports everything standard lpr does and more, including a GUI for detailed configuration.

45.2.2 lp-Style Printing Commands

The System V-style print system, which Solaris uses by default, has the lp command to queue a print job. (Solaris also optionally includes lpr-style printing commands, if you install the BSD compatibility package.) When you use lp, it spools the file for printing and returns the request id of your print job. The request id can later be used to cancel the print job, if you decide to do so.

$ lp notes
request-id is lp-2354 (1 file)

The lpstat command can be used to check on the status of your print jobs. Like lpq, it will tell whether your job is in the queue or fully sent to the printer. Unlike lpq, it shows you only your own jobs by default:

$ lpstat
lp-2354          14519 fred     on lp

The message on lp indicates that the job is currently printing. If your job does not appear at all on the listing, it means your job has either finished printing or has been completely written into the printer's input buffer (or you accidentally printed it to a different queue). If the job is listed, but the on lp message does not appear, the job is still in the queue. You can see the status of all jobs in the queue with the -u option. You can cancel a job with the cancel command.

$ lpstat -u
lp-2354          14519 fred     on lp
lp-2355          21321 alice
lp-2356           9065 john
$ cancel lp-2356
lp-2356: cancelled

The lpstat command can be used to determine what printers are connected to your system and their names. If there is more than one printer, you can then use the -d option with lp to specify a printer destination other than the default. For instance, if a laser printer is configured as laserp, then you can enter:

$ lp -dlaserp myfile

If you'll be using a certain printer often, put its name in the LPDEST environment variable (Section 45.4).

DD, TOR, and JP

45.3 Printer Control with lpc

The lpc(8) command, for lpr-style printing setups, is mostly for the superuser. (You may find it in a system directory, like /usr/sbin/lpc.) Everyone can use a few of its commands; this article covers those.

You can type lpc commands at the lpc> prompt; when you're done, type exit (or CTRL-d):

% lpc
lpc> help status
status          show status of daemon and queue
lpc> ...
lpc> exit
%

Or you can type a single lpc command from the shell prompt:

% lpc status imagen
imagen:
        queuing is enabled
        printing is enabled
        no entries
        no daemon present
%

The printer daemon (Section 1.10) watches the queue for jobs that people submit with lpr (Section 45.2). If queueing is disabled (usually by the system administrator), lpr won't accept new jobs.

lpc controls only printers on your local host. lpc won't control printers connected to other hosts, though you can check the queue of jobs (if any) waiting on your local computer for the remote printer.

The commands anyone can use are:

restart [printer]

This tries to start a new printer daemon. Do this if something makes the daemon die while there are still jobs in the queue (lpq or lpc status will tell you this). It's worth trying when the system administrator is gone and the printer doesn't seem to be working. The printer name can be all to restart all printers. The printer name doesn't need an extra P. For example, to specify the foobar printer to lpr, you'd type lpr -Pfoobar. With lpc, use a command like restart foobar.

status [printer]

Shows the status of daemons and queues on the local computer (see the preceding example). The printer name can be all to show all printers.

help [command]

By default, gives a list of lpc commands, including ones for the superuser only. Give it a command name and it explains that command.

exit

Quits from lpc.

— JP

45.4 Using Different Printers

Each printer on your system should have a name. By default, commands that send a file to a printer assume that the printer is named lp (a historical artifact; it stands for "Line Printer"). If you're using a single-user workstation and have a printer connected directly to your workstation, you can name your printer lp and forget about it.

In many environments, there are more options available: e.g., there are several printers in different locations that you can choose from. Often, only one printer will be able to print your normal documents: you may need to send your print jobs to a PostScript printer, not the line printer that the accounting department uses for billing.

There are two ways to choose a printer:

Note that Solaris and others that use lp can include both the lp and lpr print commands. This can make things confusing, particularly if you're using a script to process documents, and that script automatically sends your documents to the printer. Unless you know how the script works, you won't know which variable to set. I'd suggest setting both PRINTER and LPDEST for these systems.

By the way, if you have only one printer, but you've given it some name other than lp, the same solution works: just set PRINTER or LPDEST to the appropriate name.

— ML

45.5 Using Symbolic Links for Spooling

When you print a file, the file is copied to a "spooling directory." This can be a problem if you want to print a very large file: the copy operation might take a long time, or the act of copying might fill the spooling directory's filesystem.

Systems with the lpr family of commands provide a workaround for this problem. The -s option makes a symbolic link (Section 10.4) to your file from the spooling directory.

Here's such a command:

% lpr -s directions

Rather than copying directions, lpr creates a symbolic link to directions. The symbolic link is much faster, and you're unlikely to get a "filesystem full" error.

Using a symbolic link has one important side effect. Because the file isn't hidden away in a special spooling directory, you can delete or modify it after you give the lpr command and before the printer is finished with it. This can have interesting side effects; be careful not to do it.

Of course, this warning applies only to the file that actually goes to the printer. For example, when you format a troff file (Section 45.16) for a PostScript printer and then print using -s, you can continue to modify the troff file, because it's the resulting PostScript file that actually goes to the printer (thus the PostScript file, not the troff file, is symbolically linked).

— ML

45.6 Formatting Plain Text: pr

The line printer spooler (Section 45.2) prints what you send it. If you send it a continuous stream of text (and the printer is set up to print text files rather than PostScript), that's probably just what you'll get: no page breaks, indenting, or other formatting features.

That's where pr comes in. It's a simple formatter that breaks its input into "pages" that will fit onto a 66-line page. (You can change that length.) It adds a header that automatically includes the date and time, the filename, and a page number. It also adds a footer that ensures that text doesn't run off the bottom of the page.

This is just what you want if you are sending program source code or other streams of unbroken text to a printer. For that matter, pr is often very handy for sending text to your screen. In addition to its default behavior, it has quite a few useful options. Here are a few common options:

-f

Separate pages using formfeed character (^L) instead of a series of blank lines. (This is handy if your pages "creep" down because the printer folds some single lines onto two or three printed lines.)

-h str

Replace default header with string str. See Section 21.15.

-l n

Set page length to n (default is 66).

-m

Merge files, printing one in each column (can't be used with -num and -a). Text is chopped to fit. See Section 21.15. This is a poor man's paste (Section 21.18).

-s c

Separate columns with c (default is a tab).

-t

Omit the page header and trailing blank lines.

-w num

Set line width for output made into columns to num (default is 72).

+ num

Begin printing at page num (default is 1).

-n

Produce output having n columns (default is 1). See Section 21.15.

Some versions of pr also support these options:

-a

Multicolumn format; list items in rows going across.

-d

Double-spaced format.

-e cn

Set input tabs to every nth position (default is 8), and use c as field delimiter (default is a tab).

-F

Fold input lines (avoids truncation by -a or -m).

-i cn

For output, replace whitespace with field delimiter c (default is a tab) every nth position (default is 8).

-n cn

Number lines with numbers n digits in length (default is 5), followed by field separator c (default is a tab). See also nl (Section 12.13).

-o n

Offset each line n spaces (default is 0).

-p

Pause before each page. (pr rings the bell by writing an ALERT character to standard error and waits for a carriage-return character to be read from /dev/tty (Section 36.15).)

-r

Suppress messages for files that can't be found.

Let's put this all together with a couple of examples:

— TOR

45.7 Formatting Plain Text: enscript

enscript is a handy program that takes your text files and turns them into PostScript. enscript comes with a wide variety of formatting options. There is a GNU version available, and a few Unixes include a version by default. enscript is particularly useful when your main printer speaks primarily PostScript.

Detailed information on everything enscript can do is available in its manpage, but here are a few examples:

% enscript -G stuff.txt 
   Fancy ("Gaudy") headers
% enscript -2r stuff.txt 
   Two-up printing -- two pages side-by-side on each page of paper
% enscript -2Gr stuff.txt 
   Two-up with fancy headers
% enscript -P otherps stuff.txt 
   Print to the otherps   printer instead of the default
% enscript -d otherps stuff.txt 
   Ditto
% enscript -i 4 stuff.txt 
   Indent every line four spaces
% enscript --pretty-print=cpp Object.cc 
   Pretty print C++ source code
% enscript -E doit.pl 
   Pretty print doit.pl (and automagically figure out that it's Perl from the .pl suffix)

One thing to watch for: enscript's default page size is A4, and in the United States most printers want letter-sized pages. You can set the default page size to letter when installing enscript (many U.S. pre-built binary packages do this for you), or you can use the -M letter or - -media=letter option when you call enscript.

If you want a default set of flags to be passed to enscript, set the ENSCRIPT environment variable. Anything you pass on the command line will override values in ENSCRIPT.

— DJPH

45.8 Printing Over a Network

Sometimes you'd like to be able to print to a printer that's physically attached to another Unix machine. lpd, the print spool daemon, supports this easily.

lpd is configured using the printcap printer capabilities database, generally stored in /etc/printcap. Generally, a local printer is given a line that looks something like this:

lp|local line printer:\
       :lp=/dev/lpt0:\
       :sd=/var/spool/output/lpd:lf=/var/log/lpd-errs:

The first line sets the printer name, in this case lp, and gives it a more descriptive name (local line printer) as well. The rest of the lines define various parameters for this printer using a parameter=value format. lp specifies the printer device — in this case, /dev/lpt0. sd specifies the local spool directory, that is, where lpd will store spooled files while it's working with them. lf specifies the log file, where lpd will write error messages and the like for this printer.

To set up a remote printer, all you have to do is provide a remote machine (rm) and a remote printer (rp) instead of a printer device:

rlp|printhost|remote line printer:\
       :rm=printhost.domain.com:rp=lp:\
       :sd=/var/spool/output/printhost:lf=/var/log/lpd-errs:

Note that we added another name; since this is the default printer for the host printhost, either rlp or printhost will work as printer names. We also used a different spool directory, to keep files spooled for printhost separate from local files; this isn't strictly necessary, but it's handy. Don't forget to create this spool directory before trying to spool anything to this printer!

Some network connected printers have lpd-compatible spoolers built in. Talking to one of these printers is just as easy; just provide the printer's hostname for rm. Generally you won't have to provide rp unless the printer supports different printing modes by using different remote printer names, since the default name lp is almost always supported by these sorts of printers.

— DJPH

45.9 Printing Over Samba

Samba provides SMB networking to Unix boxes; in English, that means it allows Unix machines to share disks and printers with Windows machines and vice versa. Chapter 49 details Samba; here we'll talk a bit about tricks for printing over Samba, since it's so useful and parts of it are fairly arcane.

45.9.1 Printing to Unix Printers from Windows

This is the easy one. Simply configure your printer normally using printcap, then set this in your smb.conf:

    load printers = yes

This tells Samba to read the printcap file and allow printing to any printer defined there. The default [printers] section automatically advertises all printers found and allows anyone with a valid login to print to them. You may want to make them browsable or printable by guest if you're not particularly worried about security on your network. Some Windows configurations will need guest access to browse, since they use a guest login to browse rather than your normal one; if you can't browse your Samba printers from your Windows client, try setting up guest access and see if that fixes it.

If you want to get really fancy, current versions of Samba can support downloading printer drivers to clients, just like Windows printer servers do. Take a look at the PRINTER_DRIVER2.txt file in the Samba distribution for more about how to do this.

45.9.2 Printing to Windows Printers from Unix

This one's a little more tricky. lpd doesn't know how to print to a Windows printer directly, or how to talk to Samba. However, lpd does know how to run files through a filter (Section 45.17). So what we'll do is provide a filter that hands the file to Samba, and then send the print job right to /dev/null:

laserjet:remote SMB laserjet via Samba\
    :lp=/dev/null:\
    :sd=/var/spool/lpd/laser:\
    :if=/usr/local/samba/bin/smbprint:

Samba comes with a sample filter called smbprint; it's often installed in an examples directory and will need to be moved to somewhere useful before setting this up. smbprint does exactly what we want; it takes the file and uses smbclient to send it to the right printer.

How does smbprint know which printer to send it to? It uses a file called .config in the given spool directory, which looks something like this:

server=WINDOWS_SERVER
service=PRINTER_SHARENAME
password="password"

The smbprint script is reasonably well documented in its comments. Look through it and tweak it to fit your own needs.

— DJPH

45.10 Introduction to Typesetting

Once upon a time, printers were simple. You hooked them up to your machine and dumped text out to them, and they printed the text. Nothing fancy, and not very pretty either. As printers got smarter, they became capable of more things, printing in a different font, perhaps. Printing got a bit more complex. If you wanted to use fancy features, you had to embed special characters in your text, specific to the printer.

Printers got even smarter, and could draw pictures, print images, and use all sorts of fonts. They started using complex languages (Section 45.14) to print, which made dealing with them more complex but at least somewhat more consistent. People wrote tools to convert text (Section 45.7) so it could be printed.

Webster defines typesetting as "the process of setting material in type or into a form to be used in printing," literally, the setting of type into a printing press. As computers have gotten more sophisticated, it has come to include the formatting of text and images to send to typesetting machines and then, later, smart printers. These days, your average printer is pretty smart and can handle everything the typesetters of old could do and more. Windows systems provide What You See Is What You Get (WYSIWYG, pronounced whiz-ee-wig) editors as a matter of course, most of which do all of their typesetting without any user intervention (and often badly, to boot).

On Unix, typesetting generally involves describing the formatting you want using a formatting language and then processing the source file to generate something that a printer can understand. There are a variety of tools and languages that do this, with various purposes, strengths, and weaknesses. Many formatting languages are markup languages, that is, they introduce formatting information by "marking up" the text you want formatted.

There is an entire science (and art) of typography that we won't try to get into here. My personal favorite books on the subject are Robert Bringhurst's The Elements of Typographic Style for general typography and Donald Knuth's Digital Typography for issues of typesetting with computers.

What we will try to cover are formatting languages (Section 45.12 and Section 45.13), printer languages (Section 45.14), and ways to use Unix to get those formatting languages out to your printer usefully (Section 45.15 through Section 45.17).

Relatively recently, open source WYSIWYG tools have become available for Unix. OpenOffice, available at http://www.openoffice.org, is a good example. OpenOffice does its own typesetting behind the scenes and dumps out PostScript. If you don't have a PostScript printer and you're interested in using something like OpenOffice, Section 45.18 might help.

— DJPH

45.11 A Bit of Unix Typesetting History

Unix was one of the first operating systems to provide the capability to drive a typesetter. troff is both a markup language and a tool for generating typesetter output.

Originally, troff was designed to drive a device called a C/A/T phototypesetter, and thus it generated a truly frightening collection of idiosyncratic commands. For a while, there were several version of troff and troff-related tools, including tools to translate C/A/T output into something useful, versions of troff that output slightly saner things than C/A/T, and so forth. It was all very confusing.

Most systems these days still have a version of troff, often GNU's groff, which outputs PostScript and other device-independent formats. Unix manpages are still written in nroff, a related tool that takes the same input and spits out ASCII-formatted text, using the man macro package. However, most people don't use troff and its related tools for general text formatting much any more.

So why do we care about troff? The Jargon Dictionary (Version 4.2.2) has this to say:

troff /T'rof/ or /trof/ n.

The gray eminence of Unix text processing; a formatting and phototypesetting program, written originally in PDP-11 assembler and then in barely-structured early C by the late Joseph Ossanna, modeled after the earlier ROFF which was in turn modeled after the Multics and CTSS program RUNOFF by Jerome Saltzer (that name came from the expression "to run off a copy"). A companion program, nroff, formats output for terminals and line printers.

In 1979, Brian Kernighan modified troff so that it could drive phototypesetters other than the Graphic Systems CAT. His paper describing that work ("A Typesetter-independent troff," AT&T CSTR #97) explains troff's durability. After discussing the program's "obvious deficiencies — a rebarbative input syntax, mysterious and undocumented properties in some areas, and a voracious appetite for computer resources" and noting the ugliness and extreme hairiness of the code and internals, Kernighan concludes:

"None of these remarks should be taken as denigrating Ossanna's accomplishment with TROFF. It has proven a remarkably robust tool, taking unbelievable abuse from a variety of preprocessors and being forced into uses that were never conceived of in the original design, all with considerable grace under fire."

The success of TEX and desktop publishing systems have reduced troff's relative importance, but this tribute perfectly captures the strengths that secured troff a place in hacker folklore; indeed, it could be taken more generally as an indication of those qualities of good programs that, in the long run, hackers most admire.

— DJPH

45.12 Typesetting Manpages: nroff

The definitive documentation system for every Unix is manpages. (Much GNU software is documented fully in info pages instead, but manpages are so foundational that even those packages generally provide some sort of manpage.) What is a manpage, then?

A manpage is a text file, marked up with nroff commands, specifically using the man macro package. (Well, technically, using the tmac.an standard macro package — t/nroff takes a -m option to specify which tmac.* macro package to use. Thus, man uses nroff -man.) A simple manpage (in this case, the yes(1) manpage from FreeBSD) looks something like this:

.Dd June 6, 1993
.Dt YES 1
.Os BSD 4
.Sh NAME
.Nm yes
.Nd be repetitively affirmative
.Sh SYNOPSIS
.Nm
.Op Ar expletive
.Sh DESCRIPTION
.Nm Yes
outputs
.Ar expletive ,
or, by default,
.Dq y ,
forever.
.Sh HISTORY
The
.Nm
command appeared in
.At 32v .

This collection of difficult-to-read nroff commands, when formatted by nroff via the man command on my FreeBSD machine, looks something like this:

YES(1)                  FreeBSD General Commands Manual                 YES(1)

NAME
     yes - be repetitively affirmative

SYNOPSIS
     yes [expletive]

DESCRIPTION
     Yes outputs expletive, or, by default, "y", forever.

HISTORY
     The yes command appeared in Version 32V AT&T UNIX.

4th Berkeley Distribution        June 6, 1993                                1

The various nroff/man macros allow you to define things such as the name of the command, the short description of what it does, the list of arguments, and so forth, and formats it all into the standard look of a manpage. To write your own manpages, take a look at existing manpages for examples, and read the man(1) and man(7) manpages.

— DJPH

45.13 Formatting Markup Languages — troff, LATEX, HTML, and So On

Section 45.12 shows an example of a simple formatting markup language; the one used by man via nroff. Don't laugh — it may seem arcane, but it is fairly simple. Like all markup languages, it attempts to abstract out certain things, to allow you to describe what you'd like the end result to look like. Manpages are simple to describe, so the markup language for them is relatively simple.

Full troff is somewhat more complex, both because it allows expressing far more complex ideas, and because it allows definition of macros to extend the core markup language. Similarly, TEX (pronounced "tek") is essentially a programming language for typesetting. It provides a very thorough model of typesetting and the ability to, essentially, write programs that generate the output you want.

Available on top of TEX is LATEX (pronounced "lah-tek" or "lay-tek"), a complex macro package focused on general document writing. It allows you to describe the general structure of your document and let LATEX (and underneath, TEX) sort out the "proper" way to typeset that structure. This sort of markup is very different to deal with than working in a WYSIWYG word processor, where you have to do all of the formatting yourself. As an example, a simple LATEX document looks something like this (taken from The Not So Short Introduction to LATEX2e):

\documentclass[a4paper,11pt]{article}
% define the title
\author{H.~Partl}
\title{Minimalism}
\begin{document}
% generates the title
\maketitle
% insert the table of contents
\tableofcontents
\section{Start}
Well, and here begins my lovely article.
\section{End}
\ldots{} and here it ends.
\end{document}

Much like the nroff input earlier, this describes the structure of the document by inserting commands into the text at appropriate places. The LyX editor (http://www.lyx.org) provides what they call What You See Is What You Mean (WYSIWYM, or whiz-ee-whim) editing by sitting on top of LATEX. Lots of information about TEX and LATEX is available at the TEX Users' Group web site, http://www.tug.org. TEX software is available via the Comprehensive TEX Archive Network, or CTAN, at http://www.ctan.org. I strongly recommend the teTEX distribution as a simple way to get a complete installation of everything you need in one swell foop.

In contrast, while HTML is also a markup language, its markup is focused primarily on display and hypertext references rather than internal document structure. HTML is an application of SGML; you probably know about it already because it is the primary display markup language used on the web. The following is essentially the same as the sample LATEX document, but marked up using HTML formatting:

<html>
<head>
<title>Minimalism</title>
</head>
<body>
<h1>Minimalism</h1>
...table of contents...
<h2>Start</h2>
<p>Well, and here begins my lovely article.</p>
<h2>End</h2>
<p>&hellip; and here it ends.</p>
</body>
</html>

Other markup languages common on Unixes include DocBook, which is also an application of SGML or XML, and in which a lot of Linux documentation is written, and texinfo, the source language of info pages, in which most GNU documentation is written. The manuscript for this edition of Unix Power Tools is written in a variant of SGML-based DocBook, in fact.

— DJPH

45.14 Printing Languages — PostScript, PCL, DVI, PDF

Printing languages, also sometimes called page description languages, are representations of exactly what needs to be on the screen or printed page. They are generally a collection of drawing commands that programs can generate, often with extra features to make drawing complex pictures or doing fancy things with text easier.

PostScript was developed by Adobe in the early 1980s to provide some sort of generic page description language. It is a fairly complete language; I've written complex PostScript programs by hand. This makes it much easier to write software that can generate PostScript output. Modern troffs can generate PostScript, and ghostscript can be used to process PostScript into printer-specific output for certain non-PostScript printers, so PostScript is a very useful output form.

Printer Command Language, or PCL, was originally developed by Hewlett-Packard, also in the early 1980s, to provide a generic printer language for their entire range of printers. Early versions were very simple, but PCL 3 was sophisticated enough that other printer manufacturers started to emulate it, and it became a de facto standard. PCL's more current incarnations are quite flexible and capable. Incidentally, ghostscript can turn PostScript into PCL, and most printers that can't speak PostScript can speak some form of PCL these days. My primary printer these days speaks PCL 5E, and I use it from both Windows machines and Unix machines.

DVI stands for "device independent" and is the primary output from TEX (and thus LATEX). Like PostScript, it's a generic language for describing the printed page. There are converters that convert DVI into PostScript, PCL and PDF.

PDF is Adobe's successor to PostScript. PDF has a special place on the web, because it's been promoted as a way to distribute documents on the web and have them displayed consistently in a wide variety of environments, something not possible in HTML. This consistency is possible for the same reasons any page description language can provide it: the focus of such a language is on describing exactly what the page should look like rather than being human readable or editable, like most markup languages. However, Adobe has provided Acrobat Reader free for multiple platforms and promoted PDF extensively, so it is the de facto standard for page description languages on the web these days.

— DJPH

45.15 Converting Text Files into a Printing Language

Section 45.7 introduced one tool that can convert plain text into PostScript for printing. In general, if your printer isn't an old text-only printer and you want to be able to print text files, you'll need some sort of filter (or filters) to convert the text into something useful.

If your printer supports PostScript, tools like a2ps and enscript (Section 45.7) can do what you need. If your printer supports PCL or another printer language, you may want to add ghostscript to the mix. ghostscript can read PostScript and PDF and output correct representations to a variety of printers. Incidentally, ghostscript can also do a host of other useful things, like create PDFs from PostScript and the like.

Here's an example of using enscript, ghostscript, and lpr to print the background.txt file to my printer (an HP LaserJet 6L):

% enscript -2Gr background.txt -o background.ps
% gs -q -dNOPAUSE -sDEVICE=ljet4 -sOutputFile=background.lj4 background.ps -c quit
% lpr background.lj4
% rm background.lj4 background.ps

-2Gr tells enscript that I want two-up pages with fancy headers, and -o sends the output to background.ps (remember that enscript generates PostScript). -q tells gs to run quietly. -dNOPAUSE disables ghostscript's usual behaviour of pausing and prompting at the end of each page. -sDEVICE=ljet4 says to create output for a ljet4 device. -sOutputFile=background.lj4 redirects the output of ghostscript to background.lj4, and -c quit says to quit once background.ps is done. Then we use lpr to spool the now-ready output file, delete the temporary files, and we're all done.

Seems like sort of a pain, but it does show all of the steps needed to get that output to go to the printer properly. Section 45.17 shows how to arrange for most of that to be done for you by the spooler automatically.

— DJPH

45.16 Converting Typeset Files into a Printing Language

Section 45.15 showed the steps necessary to convert plain text into something printable. Generally the steps involved are similar for a typeset source file, with perhaps an extra step or two.

troff generates PostScript by default in most installations these days, or it can be made to easily enough. GNU troff (groff ) can also generate PCL, DVI, and HTML by using the appropriate -T option.

TEX generates DVI; the teTEX package includes dvips to convert DVI into PostScript, dvilj4 to convert it into PCL, dvipdf to convert it into PDF, and several others.

HTML can be converted into PostScript using html2ps.

An example of using LATEX, dvilj4, and lpr to print the article.tex file to my printer (an HP LaserJet 6L):

% latex article.tex
% dvilj4 article.dvi
% lpr article.lj
% rm article.lj article.dvi

This time it's slightly simpler than the example in Section 45.15, because the default options all do what we want. Even so, it can be made even simpler; Section 45.17 shows how.

— DJPH

45.17 Converting Source Files Automagically Within the Spooler

Section 45.15 and Section 45.16 showed what sorts of steps are required to get files into a printable form. They seem tedious, however, and computers are really quite good at tedium, so how can we make the spooler do all this for us automatically?

There are a couple of options. One of the more well-known is apsfilter, which is a set of filter scripts designed to work with lpd to automatically convert incoming source files to an appropriate output format before dumping them to the printer. Extensive information is available at http://www.apsfilter.org, and apsfilter has its own automatic setup scripts, but I'll give a quick overview to give you an idea of what configuring lpd's filters looks like.

In Section 45.9, we used an input filter trick to print to a Samba printer by putting a if entry in the printcap for that printer. if stands for "input filter," and there are several other kinds of filters available in standard lpd, including a ditroff filter, a Fortran filter (!), and an output filter.

apsfilter installs itself as the input filter for any printer it manages, and looks at the source file. It decides based on a number of pieces of information what kind of source file it is, automatically processes it with the right set of programs, and poof, you have correct output coming out of your printer. There's a reason this kind of tool is called a "magic filter" (and why the title of this chapter says "Automagically"). Having a magic filter installed makes life so much easier.

If you look at your printcap once apsfilter is installed, you'll notice this entry (or something much like it):

lp|local line printer:\
     ...
     :if=/usr/local/sbin/apsfilter:\
     ...

That's all it takes to hook into lpd and tell the spooler to give apsfilter a shot at the text on the way through. apsfilter looks at the incoming file and its configuration for the printer queue and converts the source into the appropriate printer language using whatever filter or set of filters are needed.

Other magic filters include LPD-O-Matic and magicfilter. http://www.linuxprinting.org has all sorts of information about this and other printing subjects. Don't be fooled by the name — much of the information it provides can help you with printing on any Unix system, not just Linux.

— DJPH

45.18 The Common Unix Printing System (CUPS)

The Common Unix Printing System (CUPS) is a full network-capable printing package available for a wide variety of Unix platforms. From their web page:

CUPS is available at:

http://www.cups.org/

CUPS provides a portable printing layer for UNIX-based operating systems. It has been developed by Easy Software Products to promote a standard printing solution for all UNIX vendors and users. CUPS provides the System V and Berkeley command-line interfaces.

CUPS uses the Internet Printing Protocol ("IPP") as the basis for managing print jobs and queues. The Line Printer Daemon ("LPD") Server Message Block ("SMB"), and AppSocket (a.k.a. JetDirect) protocols are also supported with reduced functionality. CUPS adds network printer browsing and PostScript Printer Description ("PPD") based printing options to support real-world printing under UNIX.

CUPS is headed towards becoming the Linux standard for printing, and it is an easy way to configure all your printing tools at once regardless of your platform. Visit their web page for extensive information.

— DJPH

45.19 The Portable Bitmap Package

There are dozens of formats used for graphics files across the computer industry. There are tiff files, PICT files, and gif files. There are different formats for displaying on different hardware, different formats for printing on different printers, and then there are the internal formats used by graphics programs. This means that importing a graphics file from one platform to another (or from one program to another) can be a large undertaking, requiring a filter written specifically to convert from one format to the next.

figs/www.gif Go to http://examples.oreilly.com/upt3 for more information on: netpbm

The netpbm package can be used to convert between a wide variety of graphics formats. netpbm evolved from the original Portable Bitmap Package, pbmplus, written by Jef Poskanzer. A group of pbmplus users on the Internet cooperated to upgrade pbmplus; the result was netpbm. netpbm has relatively recently seen some active development again on SourceForge, and its current home page is http://netpbm.sourceforge.net.

The idea behind pbm is to use a set of very basic graphics formats that (almost) all formats can be converted into and then converted back from. This is much simpler than having converters to and from each individual format. These formats are known as pbm, pgm, and ppm: the portable bitmap, graymap, and pixmap formats. (A bitmap is a two-dimensional representation of an image; a graymap has additional information encoded that gives grayscale information for each bit; a pixmap encodes color information for each bit.) The name pnm is a generic name for all three portable interchange formats (with the n standing for "any"), and programs that work with all three are said to be "anymap" programs.

The netpbm package contains well over a hundred conversion programs. There are three basic kinds of programs:

I frequently like to create X11 (Section 1.22) bitmaps out of pictures in newspapers or magazines. The way I do this is first to scan the picture in on a Macintosh and save it as tiff or PICT format. Then I ftp (Section 1.21) the file to our Unix system and convert it to pnm format, and then use pbmtoxbm to convert it to X bitmap format. If the picture is too big, I use pnmscale on the intermediary pnm file. If the picture isn't right-side-up, I can use pnmrotate and sometimes pnmflip before converting the pnm file to X11 bitmap format.

There are far too many programs provided with the netpbm package to discuss in detail, and some of these formats are ones that you've probably never even heard of. However, if you need to fiddle with image files (or, now, video files!), netpbm almost certainly has a converter for it. Take a peek through the documentation sometime.

—LM and JP

CONTENTS