Emacspeak was conceived as a full-fledged, eyes-free user interface to everyday computing tasks. To be full-fledged, the system needed to provide direct access to every aspect of computing on desktop workstations. To enable fluent eyes-free interaction, the system needed to treat spoken output and the auditory medium as a first-class citizen—i.e., merely reading out information displayed on the screen was not sufficient.
To provide a complete audio desktop, the target environment needed to be an interaction framework that was both widely deployed and fully extensible. To be able to do more than just speak the screen, the system needed to build interactive speech capability into the various applications.
Finally, this had to be done without modifying the source code of any of the underlying applications; the project could not afford to fork a suite of applications in the name of adding eyes-free interaction, because I wanted to limit myself to the task of maintaining the speech extensions.
To meet all these design requirements, I picked Emacs as the user interaction environment. As an interaction framework, Emacs had the advantage of having a very large developer community. Unlike other popular interaction frameworks available in 1994 when I began the project, it had the significant advantage of being a free software environment. (Now, 12 years later, Firefox affords similar opportunities.)
The enormous flexibility afforded by Emacs Lisp as an extension language was an essential prerequisite in speech-enabling the various applications. The open source nature of the platform was just as crucial; even though I had made an explicit decision that I would modify no existing code, being able to study how various applications were implemented made speech-enabling them tractable. Finally, the availability of a high-quality advice implementation for Emacs Lisp (note that Lisp's advice facility was the prime motivator behind Aspect Oriented Programming) made it possible to speech-enable applications authored in Emacs Lisp without modifying the original source code.
Emacspeak is a direct consequence of the matching up of the needs previously outlined and the affordances provided by Emacs as a user interaction environment.
The Emacspeak code base has evolved over a period of 12 years. Except for the first six weeks of development, the code base has been developed and maintained using Emacspeak itself. This section summarizes some of the lessons learned with respect to managing code complexity over time.
Throughout its existence, Emacspeak has always remained a spare-time project. Looking at the code base across time, I believe this has had a significant impact on how it has evolved. When working on large, complex software systems as a full-time project, one has the luxury of focusing one's entire concentration on the code base for reasonable stretches of time—e.g., 6 to 12 weeks. This results in tightly implemented code that creates deep code bases.
Despite one's best intentions, this can also result in code that becomes hard to understand with the passage of time. Large software systems where a single engineer focuses exclusively on the project for a number of years are almost nonexistent; that form of single-minded focus usually leads to rapid burnout!
In contrast, Emacspeak is an example of a large software system that has had a single engineer focused on it over a period of 12 years, but only in his spare time. A consequence of developing the system single-handedly over a number of years is that the code base has tended to be naturally "bushy." Notice the distribution of files and lines of code summarized in Table 31-1.
| Layer | Files | Lines | Percentage |
|---|---|---|---|
| TTS core | 6 | 3866 | 6.0 |
| Emacspeak core | 16 | 12174 | 18.9 |
| Emacspeak extensions | 160 | 48339 | 75.0 |
| Total | 182 | 64379 | 99.9 |
Table 31-1 highlights the following points:
The TTS core responsible for high-quality speech output is isolated in 6 out of 182 files, and makes up six percent of the code base.
The Emacspeak core—which provides high-level speech services to Emacspeak extensions, in addition to speech-enabling all basic Emacs functionality—is isolated to 16 files, and makes up about 19 percent of the code base.
The rest of the system is split across 160 files, which can be independently improved (or broken) without affecting the rest of the system. Many modules, such as emacspeak-url-template, are themselves bushy—i.e., an individual URL template can be modified without affecting any of the other URL templates.
advice reduces code size. The Emacspeak code base, which has approximately 60,000 lines of Lisp code, is a fraction of the size of the underlying system being speech-enabled. A rough count at the end of December 2006 shows that Emacs 22 has over a million lines of Lisp code; in addition, Emacspeak speech-enables a large number of applications not bundled by default with Emacs.
Here is a brief summary of the insights gained from implementing and using Emacspeak:
Lisp advice, and its object-oriented equivalent Aspect Oriented Programming, are very effective means for implementing cross-cutting concerns—e.g., speech-enabling a visual interface.
advice is a powerful means for discovering potential points of extension in a complex software system.
Focusing on basic web architecture, and relying on a data-oriented web backed by standardized protocols and formats, leads to powerful spoken web access.
Focusing on the final user experience, as opposed to individual interaction widgets such as sliders and tree controls, leads to a highly efficient, eyes-free environment.
Visual interaction relies heavily on the human eye's ability to rapidly scan the visual display. Effective eyes-free interaction requires transferring some of this responsibility to the computer because listening to large amounts of information is time-consuming. Thus, search in every form is critical for delivering effective eyes-free interaction, on the continuum from the smallest scale (such as Emacs' incremental search to find the right item in a local document) to the largest (such as a Google search to quickly find the right document on the global Web).
Visual complexity, which may become merely an irritant for users capable of using complex visual interfaces, is a show-stopper for eyes-free interaction. Conversely, tools that emerge early in an eyes-free environment eventually show up in the mainstream when the nuisance value of complex visual interfaces crosses a certain threshold. Two examples of this from the Emacspeak experience are: