Emacspeak: The Complete Audio Desktop > Painless Access to Online Information

31.3. Painless Access to Online Information

With all the necessary affordances to generate rich auditory output in place, speech-enabling Emacs applications using Emacs Lisp's advice facility requires surprisingly small amounts of specialized code. With the TTS layer and the Emacspeak core handling the complex details of producing good quality output, the speech-enabling extensions focus purely on the specialized semantics of individual applications; this leads to simple and consequently beautiful code. This section illustrates the concept with a few choice examples taken from Emacspeak's rich suite of information access tools.

Right around the time I started Emacspeak, a far more profound revolution was taking place in the world of computing: the World Wide Web went from being a tool for academic research to a mainstream forum for everyday tasks. This was 1994, when writing a browser was still a comparatively easy task. The complexity that has been progressively added to the Web in the subsequent 12 years often tends to obscure the fact that the Web is still a fundamentally simple design where:

Notice that the basic architecture just sketched out says little to nothing about how the content is made available to the end user. The mid-1990s saw the Web move toward increasingly complex visual interaction. The commercial Web with its penchant for flashy visual interaction increasingly moved away from the simple data-oriented interaction that had characterized early web sites. By 1998, I found that the Web had a lot of useful interactive sites; to my dismay, I also found that I was using progressively fewer of these sites because of the time it took to complete tasks when using spoken output.

This led me to create a suite of web-oriented tools within Emacspeak that went back to the basics of web interaction. Emacs was already capable of rendering simple HTML into interactive hypertext documents. As the Web became complex, Emacspeak acquired a collection of interaction wizards built on top of Emacs' HTML rendering capability that progressively factored out the complexity of web interaction to create an auditory interface that allowed the user to quickly and painlessly listen to desired information.

31.3.1. Basic HTML with Emacs W3 and Aural CSS

Emacs W3 is a bare-bones web browser first implemented in the mid-1990s. Emacs W3 implemented CSS (Cascading Style Sheets) early on, and this was the basis of the first Aural CSS implementation, which was released at the time I wrote the Aural CSS draft in February 1996. Emacspeak speech-enables Emacs W3 via the emacspeak-w3 module, which implements the following extensions:

31.3.2. The emacspeak-websearch Module for Task-Oriented Search

By 1997, interactive sites on the Web, ranging from Altavista for searching to Yahoo! Maps for online directions, required the user to go through a highly visual process that included:

  1. Filling in a set of form fields

  2. Submitting the resulting form

  3. Spotting the results in the resulting complex HTML page

The first and third of these steps were the ones that took time when using spoken output. I needed to first locate the various form fields on a visually busy page and wade through a lot of complex boilerplate material on result pages before I found the answer.

Notice that from the software design point of view, these steps neatly map into pre-action and post-action hooks. Because web interaction follows a very simple architecture based on URIs, the pre-action step of prompting the user for the right pieces of input can be factored out of a web site and placed in a small piece of code that runs locally; this obviates the need for the user to open the initial launch page and seek out the various input fields.

Similarly, the post-action step of spotting the actual results amid the rest of the noise on the resulting page can also be delegated to software.

Finally, notice that even though these pre-action and post-action steps are each specific to particular web sites, the overall design pattern is one that can be generalized. This insight led to the emacspeak-websearch module, a collection of task-oriented web tools that:

  1. Prompted the user

  2. Constructed an appropriate URI and pulled the content at that URI

  3. Filtered the result before rendering the relevant content via Emacs W3

Here is the emacspeak-websearch tool for accessing directions from Yahoo! Maps:

	(defsubst emacspeak-websearch-yahoo-map-directions-get-locations ( )
	  "Convenience function for prompting and constructing the route component."
	  (concat
	   (format "&newaddr=%s"
	           (emacspeak-url-encode (read-from-minibuffer "Start Address: ")))
	   (format "&newcsz=%s"
	           (emacspeak-url-encode (read-from-minibuffer "City/State or Zip:")))
	   (format "&newtaddr=%s"
	           (emacspeak-url-encode (read-from-minibuffer "Destination Address: ")))
	   (format "&newtcsz=%s"
	           (emacspeak-url-encode (read-from-minibuffer "City/State or Zip:")))))
	(defun emacspeak-websearch-yahoo-map-directions-search (query )
	  "Get driving directions from Yahoo."
	  (interactive
	   (list (emacspeak-websearch-yahoo-map-directions-get-locations))
	   (emacspeak-w3-extract-table-by-match
	    "Start"
	    (concat emacspeak-websearch-yahoo-maps-uri query))))


					    

A brief explanation of the previous code follows:


Pre-action

The emacspeak-websearch-yahoo-map-directions-get-locations function prompts the user for the start and end locations. Notice that this function hardwires the names of the query parameters used by Yahoo! Maps. On the surface, this looks like a kluge that is guaranteed to break. In fact, this kluge has not broken since it was first defined in 1997. The reason is obvious: once a web application has published a set of query parameters, those parameters get hardcoded in a number of places, including within a large number of HTML pages on the originating web site. Depending on parameter names may feel brittle to the software architect used to structured, top-down APIs, but the use of such URL parameters to define bottom-up web services leads to the notion of RESTful web APIs.


Retrieve content

The URL for retrieving directions is constructed by concatenating the user input to the base URI for Yahoo! Maps.


Post-action

The resulting URI is passed to the function emacspeak-w3-extract-table-by-match along with a search pattern Start to:

  • Retrieve the content using Emacs W3.

  • Apply an XSLT transform to extract the table containing Start.

  • Render this table using Emacs W3's HTML formatter.

Unlike the query parameters, the layout of the results page does change about once a year, on average. But keeping this tool current with Yahoo! Maps comes down to maintaining the post-action portion of this utility. In over eight years of use, I have had to modify it about half a dozen times, and given that the underlying platform provides many of the tools for filtering the result page, the actual lines of code that need to be written for each layout change is minimal.

The emacspeak-w3-extract-table-by-match function uses an XSLT transformation that filters a document to return tables that contain a specified search pattern. For this example, the function constructs the following XPath expression:

	(/descendant::table[contains(., Start)])[last( )]

This effectively picks out the list of tables that contain the string Start and returns the last element of that list.

Seven years after this utility was written, Google launched Google Maps to great excitement in February 2005. Many blogs on the Web put Google Maps under the microscope and quickly discovered the query parameters used by that application. I used that to build a corresponding Google Maps tool in Emacspeak that provides similar functionality. The user experience is smoother with the Google Maps tool because the start and end locations can be specified within the same parameter. Here is the code for the Google Maps wizard:

	(defun emacspeak-websearch-emaps-search (query &optional use-near)
	  "Perform EmapSpeak search. Query is in plain English."
	  (interactive
	   (list
	    (emacspeak-websearch-read-query
	     (if current-prefix-arg
	         (format "Find what near %s: "
	                 emacspeak-websearch-emapspeak-my-location)
	       "EMap Query: "))
	    current-prefix-arg))
	  (let ((near-p ;; determine query type
	         (unless use-near
	           (save-match-data (and (string-match "near" query) (match-end 0)))))
	        (near nil)
	        (uri nil))
	    (when near-p ;; determine location from query
	      (setq near (substring query near-p))
	      (setq emacspeak-websearch-emapspeak-my-location near))
	    (setq uri
	          (cond
	           (use-near
	            (format emacspeak-websearch-google-maps-uri
	                    (emacspeak-url-encode
	                     (format "%s near %s" query near))))
	           (t (format emacspeak-websearch-google-maps-uri
	                     (emacspeak-url-encode query)))))
	    (add-hook 'emacspeak-w3-post-process-hook 'emacspeak-speak-buffer)
	    (add-hook 'emacspeak-w3-post-process-hook
	              #'(lambda nil
	                  (emacspeak-pronounce-add-buffer-local-dictionary-entry
	                   "latin small letter ethmi" " miles ")))
	    (browse-url-of-buffer
	     (emacspeak-xslt-xml-url
	      (expand-file-name "kml2html.xsl" emacspeak-xslt-directory)
	      uri))))


					    

A brief explanation of the code follows:

  1. Parse the input to decide whether it's a direction or a search query.

  2. In case of search queries, cache the user's location for future use.

  3. Construct a URI for retrieving results.

  4. Browse the results of filtering the contents of the URI through the XSLT filter kml2html, which converts the retrieved content into a simple hypertext document.

  5. Set up custom pronunciations in the results to pronounce mi as "miles."

Notice that, as before, most of the code focuses on application-specific tasks. Rich spoken output is produced by creating the results as a well-structured HTML document with the appropriate Aural CSS rules producing an audio-formatted presentation.

31.3.3. The Web Command Line and URL Templates

With more and more services becoming available on the Web, another useful pattern emerged by early 2000: web sites started creating smart client-side interaction via Java-Script. One typical use of such scripts was to construct URLs on the clientside for accessing specific pieces of content based on user input. As examples, Major League Baseball constructs the URL for retrieving scores for a given game by piecing together the date and the names of the home and visiting teams, and NPR creates URLs by piecing together the date with the program code of a given NPR show.

To enable fast access to such services, I added an emacspeak-url-template module in late 2000. This module has become a powerful companion to the emacspeak-websearch module described in the previous section. Together, these modules turn the Emacs minibuffer into a powerful web command line that provides rapid access to web content.

Many web services require the user to specify a date. One can usefully default the date by using the user's calendar to provide the context. Thus, Emacspeak tools for playing an NPR program or retrieving MLB scores default to using the date under the cursor when invoked from within the Emacs calendar buffer.

URL templates in Emacspeak are implemented using the following data structure:

	(defstruct (emacspeak-url-template (:constructor emacspeak-ut-constructor))
	  name                                  ;; Human-readable name
	  template                              ;; template URL string
	  generators;; list of param generator
	  post-action                    ;; action to perform after opening
	  documentation                         ;; resource documentation
	  fetcher)


					    

Users invoke URL templates via the Emacspeak command emacspeak-url-template-fetch command, which prompts for a URL template and:

  1. Looks up the named template.

  2. Prompts the user by calling the specified generator.

  3. Applies the Lisp function format to the template string and the collected arguments to create the final URI.

  4. Sets up any post actions performed after the content has been rendered.

  5. Applies the specified fetcher to render the content.

The use of this structure is best explained with an example. The following is the URL template for playing NPR programs:

	(emacspeak-url-template-define
	 "NPR On Demand"
	 "http://www.npr.org/dmg/dmg.php?prgCode=%s&showDate=%s&segNum=%s&mediaPref=RM"
	 (list
	  #'(lambda ( ) (upcase (read-from-minibuffer "Program code:")))
	  #'(lambda ( )
	      (emacspeak-url-template-collect-date "Date:" "%d-%b-%Y"))
	  "Segment:")
	 nil; no post actions
	 "Play NPR shows on demand.
	Program is specified as a program code:
	ME              Morning Edition
	ATC             All Things Considered
	day             Day To Day
	newsnotes       News And Notes
	totn            Talk Of The Nation
	fa              Fresh Air
	wesat           Weekend Edition Saturday
	wesun           Weekend Edition Sunday
	fool            The Motley Fool
	Segment is specified as a two digit number --specifying a blank value
	plays entire program."
	 #'(lambda (url)
	     (funcall emacspeak-media-player url 'play-list)
	     (emacspeak-w3-browse-xml-url-with-style
	      (expand-file-name "smil-anchors.xsl" emacspeak-xslt-directory)
	      url)))


					    

In this example, the custom fetcher performs two actions:

  1. Launches a media player to start playing the audio stream.

  2. Filters the associated SMIL document via the XSLT file smil-anchors.xsl.

31.3.4. The Advent of Feed Readers

When I implemented the emacspeak-websearch and emacspeak-url-template modules, Emacspeak needed to screen-scrape HTML pages to speak the relevant information. But as the Web grew in complexity, the need to readily get beyond the superficial presentation of pages to the real content took on a wider value than eyes-free access. Even users capable of working with complex visual interfaces found themselves under a serious information overload. This led to the advent of RSS and Atom feeds, and the concomitant arrival of feed reading software.

These developments have had a very positive effect on the Emacspeak code base. During the past few years, the code has become more beautiful as I have progressively deleted screen-scraping logic and replaced it with direct content access. As an example, here is the Emacspeak URL template for retrieving the weather for a given city/state:

	(emacspeak-url-template-define
	 "rss weather from wunderground"
	 "http://www.wunderground.com/auto/rss_full/%s.xml?units=both"
	 (list "State/City e.g.: MA/Boston") nil
	 "Pull RSS weather feed for specified state/city."
	 'emacspeak-rss-display)

And here is the URL template for Google News searches via Atom feeds:

	(emacspeak-url-template-define
	 "Google News Search"
	 "http://news.google.com/news?hl=en&ned=tus&q=%s&btnG=Google+Search&output=atom"
	 (list "Search news for: ") nil "Search Google news."
	 'emacspeak-atom-display )


					    

Both of these tools use all of the facilities provided by the emacspeak-url-template module and consequently need to do very little on their own. Finally, notice that by relying on standardized feed formats such as RSS and Atom, these templates now have very little in the way of site-specific kluges, in contrast to older tools like the Yahoo! Maps wizard, which hardwired specific patterns from the results page.