Syntactic Abstraction: The syntax-case Expander > Expansion Algorithm

25.2. Expansion Algorithm

The syntax-case expansion algorithm is essentially a lazy variant of the KFFD algorithm that operates on an abstract representation of the input expression rather than on the traditional s-expression representation. The abstract representation encapsulates both a representation of an input form and a wrap that enables the algorithm to determine the scope of all identifiers within the form. The wrap consists of marks and substitutions.

Marks are like KFFD timestamps and are added to the portions of a macro's output that are introduced by the macro.

Substitutions map identifiers to bindings with the help of a compile-time environment. Substitutions are created whenever a binding form, such as lambda, is encountered, and they are added to the wraps of the syntax objects representing the forms within the scope of the binding form's bindings. A substitution applies to an identifier only if the identifier has the same name and marks as the substituted identifier.

Expansion operates in a recursive, top-down fashion. As the expander encounters a macro call, it invokes the associated transformer on the form, marking it first with a fresh mark and then marking it again with the same mark. Like marks cancel, so only the introduced portions of the macro's output—i.e., those portions not simply copied from the input to the output—remain marked.

When a core form is encountered, a core form in the output language of the expander (in our case, the traditional s-expression representation) is produced, with any subforms recursively expanded as necessary. Variable references are replaced by generated names via the substitution mechanism.

25.2.1. Representations

The most important aspect of the syntax-case mechanism is its abstract representation of program source code as syntax objects. As described above, a syntax object encapsulates not only a representation of the program source code but also a wrap that provides sufficient information about the identifiers contained within the code to implement hygiene:

	(define-record syntax-object (expr wrap))

The define-record form creates a new type of value with the specified name (in this case, syntax-object) and fields (in this case, expr and wrap), along with a set of procedures to manipulate it. The procedures in this case are:


make-syntax-object

Returns a new syntax object with the expr and wrap fields initialized to the values of its arguments


syntax-object?

Returns true if and only if its argument is a syntax object


syntax-object-expr

Returns the value of the expr field of a syntax-object


syntax-object-wrap

Returns the value of the wrap field of a syntax object

A complete implementation of syntax-case might also include, within each syntax object, source information to be tracked through the expansion process.

Each wrap, as explained previously, consists of a list of marks and substitutions. Marks are distinguished by their object identity and do not require any fields:

	(define-record mark ())

A substitution maps a symbolic name and list of marks to a label:

	(define-record subst (sym mark* label))

Labels, like marks, are distinguished by their identity and require no fields:

	(define-record label ())

The expand-time environment maintained by the expander maps labels to bindings. The environment is structured as a traditional association list—i.e., a list of pairs, each car of which contains a label and each cdr of which contains a binding. Bindings consist of a type (represented as a symbol) and a value:

	(define-record binding (type value))

The type identifies the nature of the binding: macro for keyword bindings and lexical for lexical variable bindings. The value is any additional information required to specify the binding, such as the transformation procedure when the binding is a keyword binding.

25.2.2. Producing Expander Output

The expander's output is a simple s-expression in the core language and is thus constructed for the most part using Scheme's quasiquote syntax for creating list structure. For example, a lambda expression may be created with formal parameter var and body body as follows:

	`(lambda (,var) ,body)

The expander does need to create fresh names, however, and does so via the gen-var helper, which makes use of the Scheme primitives for converting strings to symbols and vice versa, along with a local sequence counter:

	(define gen-var
	  (let ([n 0])
	    (lambda (id)
	      (set! n (+ n 1))
	      (let ([name (syntax-object-expr id)])
	        (string->symbol (format "~s.~s" name n))))))

25.2.3. Stripping Syntax Objects

Whenever a quote form is encountered in the input, the expander must return a representation of the constant contents appearing within the quote form. To do this, it must strip off any embedded syntax objects and wraps using the strip procedure, which traverses the syntax object and list structure of its input and recreates an s-expression representation of its input:

	(define strip
	  (lambda (x)
	    (cond
	      [(syntax-object? x)
	       (if (top-marked? (syntax-object-wrap x))
	           (syntax-object-expr x)
	           (strip (syntax-object-expr x)))]
	      [(pair? x)
	       (let ([a (strip (car x))] [d (strip (cdr x))])
	         (if (and (eq? a (car x)) (eq? d (cdr x)))
	             x
	             (cons a d)))]
	      [else x])))

Traversal terminates along any branch of the input expression when something other than a syntax object or pair is found—i.e., when a symbol or immediate value is found. It also terminates when a syntax object is found to be "top marked"—i.e., when its wrap contains a unique top mark:

	(define top-mark (make-mark))

	(define top-marked?
	  (lambda (wrap)
	    (and (not (null? wrap))
	    (or (eq? (car wrap) top-mark)
	        (top-marked? (cdr wrap))))))

When the expander creates a syntax object representing the original input, it uses a wrap that contains the top mark at its base, specifically to allow the stripping code detect when it has reached the syntax-object base and need not traverse the object further. This feature prevents the expander from traversing constants unnecessarily so that it can easily preserve shared and cyclic structure, and not be confused by the presence of quoted syntax objects in the input.

25.2.4. Syntax Errors

The expander reports syntax errors via syntax-error, which is defined as follows:

	(define syntax-error
	  (lambda (object message)
	    (error #f "~a ~s" message (strip object))))

If the implementation attaches source information to syntax objects, this source information can be used to construct an error message that incorporates the source line and character position.

25.2.5. Structural Predicates

The nonatomic structure of a syntax object is always determined with the patterns of a syntax-case form. The identifier? predicate determines whether a syntax object represents an identifier:

	(define identifier?
	  (lambda (x)
	    (and (syntax-object? x)
	         (symbol? (syntax-object-expr x)))))

Similarly, the self-evaluating? predicate is used, after stripping a syntax object, to deter-mine whether it represents a constant:

	(define self-evaluating?
	  (lambda (x)
	    (or (boolean? x) (number? x) (string? x) (char? x))))

25.2.6. Creating Wraps

A mark or substitution is added to a syntax object by extending the wrap:

	(define add-mark
	  (lambda (mark x)
	    (extend-wrap (list mark) x)))

	(define add-subst
	  (lambda (id label x)
	    (extend-wrap
	      (list (make-subst
	              (syntax-object-expr id)
	              (wrap-marks (syntax-object-wrap id))
	              label))
	x)))

If the syntax object is only partially wrapped, the wrap is extended simply by creating a syntax object encapsulating the partially wrapped structure. Otherwise, the syntax object is rebuilt with the new wrap joined to the old wrap:

	(define extend-wrap
	  (lambda (wrap x)
	    (if (syntax-object? x)
	        (make-syntax-object
	          (syntax-object-expr x)
	          (join-wraps wrap (syntax-object-wrap x)))
	        (make-syntax-object x wrap))))

Joining two wraps is almost as simple as appending the lists of marks. The only complication is that the expansion algorithm requires that two like marks cancel when they meet.

	(define join-wraps
	  (lambda (wrap1 wrap2)
	    (cond
	      [(null? wrap1) wrap2]
	      [(null? wrap2) wrap1]
	      [else
	       (let f ([w (car wrap1)] [w* (cdr wrap1)])
	         (if (null? w*)
	             (if (and (mark? w) (eq? (car wrap2) w))
	                 (cdr wrap2)
	                 (cons w wrap2))
	             (cons w (f (car w*) (cdr w*)))))])))

25.2.7. Manipulating Environments

Environments map labels to bindings and are represented as association lists. Extending an environment therefore involves adding a single pair mapping a label to a binding:

	(define extend-env
	  (lambda (label binding env)
	    (cons (cons label binding) env)))

25.2.8. Identifier Resolution

Determining the binding associated with an identifier is a two-step process. The first step is to determine the label associated with the identifier in the identifier's wrap, and the second is to look the label up in the current environment:

	define id-binding
	 (lambda (id r)
	   (label-binding id (id-label id) r)))

The marks and substitutions that appear in an identifier's wrap determine the associated label, if any. Substitutions map names and lists of marks to labels. Any substitution whose name is not the name of the identifier is ignored, as is any whose marks do not match. The names are symbols and are thus compared using the pointer equivalence operator, eq?.

The set of marks considered relevant are those that were layered onto the wrap before the substitution. Thus, the set of marks to which a substitution's marks are compared changes as the search through the wrap proceeds. The starting set of marks is the entire set that appears in the wrap. Each time a mark is encountered during the search for a matching substitution in the wrap, the first mark in the list is removed:

	(define id-label
	  (lambda (id)
	    (let ([sym (syntax-object-expr id)]
	          [wrap (syntax-object-wrap id)])
	      (let search ([wrap wrap] [mark* (wrap-marks wrap)])
	        (if (null? wrap)
	            (syntax-error id "undefined identifier")
	            (let ([w0 (car wrap)])
	              (if (mark? w0)
	                  (search (cdr wrap) (cdr mark*))
	                  (if (and (eq? (subst-sym w0) sym)
	                           (same-marks? (subst-mark* w0) mark*))
	                      (subst-label w0)
	                      (search (cdr wrap) mark*)))))))))

If no matching substitution exists in the wrap, the identifier is undefined, and a syntax error is signaled. It would be possible instead to treat all such identifier references as global variable references.

The id-label procedure obtains the starting list of marks via wrap-marks and uses the same-marks? predicate to compare lists of marks:

	(define wrap-marks
	  (lambda (wrap)
	    (if (null? wrap)
	        '()
	        (let ([w0 (car wrap)])
	          (if (mark? w0)
	              (cons w0 (wrap-marks (cdr wrap)))
	              (wrap-marks (cdr wrap)))))))

	(define same-marks?
	  (lambda (m1* m2*)
	    (if (null? m1*)
	        (null? m2*)
	        (and (not (null? m2*))
	             (eq? (car m1*) (car m2*))
	             (same-marks? (cdr m1*) (cdr m2*))))))

Once a label has been found, id-binding is used to find the associated binding, if any, using the assq procedure for performing association-list lookups. If an association is found, the binding in the cdr of the association is returned:

	(define label-binding
	  (lambda (id label r)
	    (let ([a (assq label r)])
	      (if a
	          (cdr a)
	          (syntax-error id "displaced lexical")))))

If no binding is found, the identifier is a "displaced lexical." This occurs when a macro improperly inserts into its output a reference to an identifier that is not visible in the context of the macro output.

25.2.9. The Expander

With the mechanisms for handling wraps and environments in place, the expander is straightforward. The expression expander, exp, handles macro calls, lexical variable references, applications, core forms, and constants. Macro calls come in two forms: singleton macro-keyword references and structured forms with a macro keyword in the first position.

The exp procedure takes three arguments: a syntax object x, a runtime environment r, and a meta environment mr. The runtime environment is used to process ordinary expressions whose code will appear in the expander's output, while the meta environment is used to process transformer expressions (e.g., on the righthand sides of letrec-syntax bindings), which are evaluated and used at expansion time. The difference between the runtime and meta environments is that the meta environment does not contain lexical variable bindings, because these bindings are not available when the transformer is evaluated and used:

	(define exp
	  (lambda (x r mr)
	    (syntax-case x ()
	   [id
	    (identifier? #'id)
	    (let ([b (id-binding #'id r)])
	     (case (binding-type b)
	       [(macro) (exp (exp-macro (binding-value b) x) r mr)]
	       [(lexical) (binding-value b)]
	       [else (syntax-error x "invalid syntax")]))]
	  [(e0 e1 ...)
	   (identifier? #'e0)
	   (let ([b (id-binding #'e0 r)])
	     (case (binding-type b)
	       [(macro) (exp (exp-macro (binding-value b) x) r mr)]
	       [(lexical)
	        `(,(binding-value b) ,@(exp-exprs #'(e1 ...) r mr))]
	       [(core) (exp-core (binding-value b) x r mr)]
	       [else (syntax-error x "invalid syntax")]))]
	  [(e0 e1 ...)
	   `(,(exp #'e0 r mr) ,@(exp-exprs #'(e1 ...) r mr))]
	  [_
	   (let ([d (strip x)])
	     (if (self-evaluating? d)
	         d
	         (syntax-error x "invalid syntax")))])))


					    

Macro calls are handled by exp-macro (described shortly) and then re-expanded. Lexical variables are rewritten into the binding value, which is always a generated variable name. Applications are rewritten into lists as in the traditional s-expression syntax for Lisp and Scheme, with the subforms expanded recursively. Core forms are handled by exp-core (described shortly); any recursion back to the expression expander is performed explicitly by the core transformer. A constant is rewritten into the constant value, stripped of its syntax wrapper.

The expander uses syntax-case and syntax (in its abbreviated form—i.e., #'template) to parse and refer to the input or portions thereof. Because the expander is also charged with implementing syntax-case, this may seem like a paradox. In actuality, it is handled by bootstrapping one version of the expander using a previous version. The expander would be much more tedious to write if syntax-case and syntax were not used.

The exp-macro procedure applies the transformation procedure (the value part of the macro binding) to the entire macro form, which may either be a single macro keyword or a structured expression with the macro keyword at its head. The exp-macro procedure first adds a fresh mark to the wrap of the input form, then applies the same mark to the wrap of the output form. The first mark serves as an "anti-mark" that cancels out the second mark, so the net effect is that the mark adheres only to the portions of the output that were introduced by the transformer, thus uniquely identifying the portions of the code introduced at this transcription step:

	(define exp-macro
	  (lambda (p x)
	    (let ([m (make-mark)])
	      (add-mark m (p (add-mark m x))))))

The exp-core procedure simply applies the given core transformer (the value part of the core binding) to the input form:

	(define exp-core
	  (lambda (p x r mr)
	    (p x r mr)))

The exp-exprs procedure used to process application subforms simply maps the expander over the forms:

	(define exp-exprs
	  (lambda (x* r mr)
	    (map (lambda (x) (exp x r mr)) x*)))

25.2.10. Core Transformers

Transformers for several representative core forms (quote, if, lambda, let, and letrec-syntax) are described here. Adding transformers for other core forms, such as letrec or let-syntax, is straightforward.

The exp-quote procedure produces an s-expression representing a quote form, with the data value stripped of its syntax wrap:

	(define exp-quote
	  (lambda (x r mr)
	    (syntax-case x ()
	      [(_ d) `(quote ,(strip #'d))])))

The exp-if procedure produces an s-expression representation of an if form, with the sub-forms recursively expanded:

	(define exp-if
	  (lambda (x r mr)
	    (syntax-case x ()
	      [(_ e1 e2 e3)
	        `(if ,(exp #'e1 r mr)
	             ,(exp #'e2 r mr)
	             ,(exp #'e3 r mr))])))

The exp-lambda procedure handles lambda expressions that have only a single formal parameter and only a single body expression. Extending it to handle multiple parameters is straightforward. It is less straightforward to handle arbitrary lambda bodies, including internal definitions, but support for internal definitions is beyond the scope of this chapter.

When the s-expression representation of a lambda expression is produced, a generated variable name is created for the formal parameter. A substitution mapping the identifier to a fresh label is added to the wrap on the body, and the environment is extended with an association from the label to a lexical binding whose value is the generated variable,during the recursive processing of the body:

	(define exp-lambda
	  (lambda (x r mr)
	    (syntax-case x ()
	  [(_ (var) body)
	   (let ([label (make-label)] [new-var (gen-var #'var)])
	     `(lambda (,new-var)
	        ,(exp (add-subst #'var label #'body)
	              (extend-env label
	                (make-binding 'lexical new-var)
	                r)
	              mr)))])))

The meta environment is not extended because the meta environment should not include lexical variable bindings.

The exp-let procedure that transforms single-binding let forms is similar to the transformer for lambda, but a bit more involved:

	(define exp-let
	  (lambda (x r mr)
	    (syntax-case x ()
	      [(_ ([var expr]) body)
	       (let ([label (make-label)] [new-var (gen-var #'var)])
	         `(let ([,new-var ,(exp #'expr r mr)])
	            ,(exp (add-subst #'var label #'body)
	                  (extend-env label
	               (make-binding 'lexical new-var)
	               r)
	             mr)))])))

The body is in the scope of the binding created by let, so it is expanded with the extended wrap and environment. The righthand-side expression, expr, is not within the scope, so it is expanded with the original wrap and environment.

The exp-letrec-syntax procedure handles single-binding letrec-syntax forms. As with lambda and let, a substitution mapping the bound identifier—in this case, a keyword rather than a variable—to a fresh label is added to the wrap on the body, and an association from the label to a binding is added to the environment while the body is recursively processed. The binding is a macro binding rather than a lexical binding, and the binding value is the result of recursively expanding and evaluating the righthand-side expression of the letrec-syntax form.

In contrast with let, the righthand-side expression is also wrapped with a substitution from the keyword to the label and expanded with the extended environment; this allows the macro to be recursive. This would not be done if the form were a let-syntax form instead of a letrec-syntax form. The output produced by expanding a letrec-syntax form consists only of the output of the recursive call to the expander on the body of the form:

	(define exp-letrec-syntax
	  (lambda (x r mr)
	    (syntax-case x ()
	      [(_ ((kwd expr)) body)
	       (let ([label (make-label)])
	         (let ([b (make-binding 'macro
	                    (eval (exp (add-subst #'kwd label #'expr)
	                               mr mr)))])
	    (exp (add-subst #'kwd label #'body)
	         (extend-env label b r)
	         (extend-env label b mr))))])))

Both the runtime and meta environments are extended in this case, since transformers are available both in runtime and transformer code.

25.2.11. Parsing and Constructing Syntax Objects

Macros are written in a pattern-matching style using syntax-case to match and take apart the input, and syntax to reconstruct the output. The implementation of the pattern matching and reconstruction is outside the scope of this chapter, but the following low-level operators can be used as the basis for the implementation. The syntax-case form can be built from the following set of three operators that treat syntax objects as abstract s-expressions:

	(define syntax-pair?
	  (lambda (x)
	    (pair? (syntax-object-expr x))))

	(define syntax-car
	  (lambda (x)
	    (extend-wrap
	      (syntax-object-wrap x)
	      (car (syntax-object-expr x)))))

	(define syntax-cdr
	  (lambda (x)
	    (extend-wrap
	      (syntax-object-wrap x)
	      (cdr (syntax-object-expr x)))))

The definitions of syntax-car and syntax-cdr employ the extend-wrap helper defined in the earlier section "Creating Wraps" to push the wrap on the pair onto the car and cdr.

Similarly, syntax can be built from the following more basic version of syntax that handles constant input, but not pattern variables and ellipses:

	(define exp-syntax
	  (lambda (x r mr)
	    (syntax-case x ()
	      [(_ t) `(quote ,#'t)])))

In essence, the simplified version of syntax is just like quote except that syntax does not strip the encapsulated value but rather leaves the syntax wrappers intact.

25.2.12. Comparing Identifiers

Identifiers are compared based on their intended use. They may be compared as symbols by using the pointer equivalence operator eq? on the symbolic names of the identifiers. They may also be compared according to their intended use as free or bound identifiers in the output of a macro.

Two identifiers are considered equivalent by free-identifier=? if they would resolve to the same binding if introduced into the output of a macro outside of any binding introduced by the macro. Equivalency is tested by comparing the labels to which the identifiers resolve, as described previously in the section "Identifier Resolution":

	(define free-identifier=?
	  (lambda (x y)
	    (eq? (id-label x) (id-label y))))

The free-identifier=? predicate is often used to check for auxiliary keywords, such as else in cond or case.

Two identifiers are considered equivalent by bound-identifier=? if a reference to one would be captured by an enclosing binding for another. This is accomplished by comparing the names and marks of the two identifiers:

	(define bound-identifier=?
	  (lambda (x y)
	    (and (eq? (syntax-object-expr x) (syntax-object-expr y))
	         (same-marks?
	           (wrap-marks (syntax-object-wrap x))
	           (wrap-marks (syntax-object-wrap y))))))

The bound-identifier=? predicate is often used to check for duplicate identifier errors in a binding form, such as lambda or let.

25.2.13. Conversions

The conversion from s-expression to syntax object performed by datum->syntax requires only that the wrap be transferred from the template identifier to the s-expression:

	(define datum->syntax
	   (lambda (template-id x)
	      (make-syntax-object x (syntax-object-wrap template-id))))

The opposite conversion involves stripping the wrap away from a syntax object, so syntax->datum is just strip:

	(define syntax->datum strip)

25.2.14. Starting Expansion

All of the pieces are now in place to expand Scheme expressions containing macros into expressions in the core language. The main expander merely supplies an initial wrap and environment that include names and bindings for the core forms and primitives:

	(define expand
	  (lambda (x)
	    (let-values ([(wrap env) (initial-wrap-and-env)])
	      (exp (make-syntax-object x wrap) env env))))

The initial wrap consists of a set of substitutions mapping each predefined identifier to a fresh label, and the initial environment associates each of these labels with the corresponding binding:

	(define initial-wrap-and-env
	  (lambda ()
	    (define id-binding*
	      `((quote . ,(make-binding 'core exp-quote))
	        (if . ,(make-binding 'core exp-if))
	        (lambda . ,(make-binding 'core exp-lambda))
	        (let . ,(make-binding 'core exp-let))
	        (letrec-syntax . ,(make-binding 'core exp-letrec-syntax))
	        (identifier? . ,(make-binding 'lexical 'identifier?))
	        (free-identifier=? . ,(make-binding 'lexical 'free-identifier=?))
	        (bound-identifier=? . ,(make-binding 'lexical 'bound-identifier=?))
	        (datum->syntax . ,(make-binding 'lexical 'datum->syntax))
	        (syntax->datum . ,(make-binding 'lexical 'syntax->datum))
	        (syntax-error . ,(make-binding 'lexical 'syntax-error))
	        (syntax-pair? . ,(make-binding 'lexical 'syntax-pair?))
	        (syntax-car . ,(make-binding 'lexical 'syntax-car))
	        (syntax-cdr . ,(make-binding 'lexical 'syntax-cdr))
	        (syntax . ,(make-binding 'core exp-syntax))
	        (list . ,(make-binding 'core 'list))))
	    (let ([label* (map (lambda (x) (make-label)) id-binding*)])
	      (values
	        `(,@(map (lambda (sym label)
	                   (make-subst sym (list top-mark) label))
	                 (map car id-binding*)
	                 label*)
	          ,top-mark)
	        (map cons label* (map cdr id-binding*))))))


					    

In addition to the entries listed, the initial environment should also include bindings for the built-in syntactic forms we have not implemented (e.g., letrec and let-syntax), as well as for all built-in Scheme procedures. It should also include a full version of syntax and, in place of syntax-pair?, syntax-car, and syntax-cdr, it should include syntax-case.