In principle, delta debugging could also minimize an entire program code so as to keep only what is relevant. Suppose your web browser crashes while printing a HTML page. Applying delta debugging on the program code means that only the bare code required to reproduce the failure would remain. Doesn't this sound neat? Unfortunately, it would hardly work in practice. The reason is that the elements of program code are heavily dependent on each other. Remove one piece, and everything breaks apart. The chances of getting something meaningful by randomly removing parts are very slim. Therefore, delta debugging would almost certainly require a quadratic number of tests. Even for a 1,000-line program, this would already mean a million tests—and years and years of waiting. We never had that much time, so we never implemented that feat.
Nonetheless, we still wanted to hunt down failure causes not only in the input or the set of changes, but in the actual source code—in other words, we wanted to have the statements that caused the program to fail. (And, of course, we wanted to get them automatically.)
Again, this was a task that turned out to be achievable by delta debugging. However, we did not get there directly. We wanted to make a detour via program states—that is, the set of all program variables and their values. In this set, we wanted to determine failure causes automatically, as in "At the call to shell_sort(), variable size causes the failure." How would that work?
Let us recapitulate what we had done so far. We had done delta debugging on program versions—one that worked and one that failed—and isolated the minimal difference that caused the failure. We had done delta debugging on program inputs—again, one that worked and one that failed—and isolated minimal differences that caused the failure. If we applied delta debugging on program states, we would take one program state from a working run, and one program state from a failing run, and eventually obtain the minimal difference that caused the failure.
Now, there are three problems in here. Problem number one: How does one obtain a program state? Eventually, I would instrument the gdb debugger to query all named variables first, and then unfold all data structures. If I encountered an array or a structure, I would query its elements; if I found a pointer, I would query the variable it pointed to, and so on—until I reached a fix point, or the set of all accessible variables. This program state would be represented as a graph of variables (vertices) and references (edges), as shown in Figure 28-2, abstracting away concrete memory addresses.
Next problem: How does one compare two program states? This was rather easy: there are known algorithms for computing common subgraphs between two graphs—and anything that would not be part of the subgraph became a difference. With such an algorithm properly implemented, we now could actually extract and determine the difference between two program states.
Third and last problem: How does one apply differences between states? This was a real challenge, as it involved not only observing but actually manipulating program states. To apply a difference in program state, we would have to set variables to new values, but also to replicate entire complex data structures, including allocating and deleting elements. Once this was done, we could do something quite fun; we could arbitrarily transfer program states between running processes. And not only entire program states, but also partial pro-gram states—ranging from small changes to a single variable to large changes of, say, half of a symbol table.
This idea of transferring program states while the program is executing is something that one needs time getting used to. I remember one of my first presentations at IBM where I explained the algorithm, its application on states, and came to the ultimate example: "We now have 879 differences between these two states. We now let delta debugging narrow down the failure cause. To this end, the algorithm takes half of the differences, that is, 439 state differences, and applies them. This means that in the passing run, 439 variables are now set to the values found in the failing run…."
At this moment, a fellow from the audience stepped up and said: "But doesn't this sound like a very insane thing to do?"
Of course, he was right. Nothing meaningful came out of setting 439 variables to values found in another run; nor did it help setting the other 440 variables. But this is just the situation in which delta debugging comes up with the idea of making smaller changes—that is, it tries 220 variables, 110, and so on. Eventually, it isolates the variable that caused the failure: "The compiler crash was caused by a loop in the abstract syntax tree." And this end, of course, justifies the means—in particular, for the people at IBM, who were all pretty busy developing (and debugging) compilers.
Thus, the demonstration that it worked helped people forget it was a pretty weird approach. Still, my first publication on the topic had a hard time getting accepted. One reviewer frankly admitted he was so appalled by the weird approach, he would not even bother to read on toward the results.
Nonetheless, finding failure causes in program state was only a detour toward the ultimate end. It was Holger Cleve who gave this technique the ultimate touch. Since he knew the failure-causing variables, he would simply trace back their values to the statements that caused them—and presto! We would end up in the statements that caused the failure: "The statement in line 437 caused a loop, which again caused the failure." Now this was true magic—and this paper had no trouble getting published, either.
So, as we had a complete automatic debugging solution on our hands, why do people today still use interactive debuggers? Why didn't we go public and become millionaires with automated debugging?