Loops that traverse all the elements of an array in compiled code are an essential feature that NumPy offers the Python programmer. The use of an iterator makes it relatively easy to write these loops in a straightforward and readable way that works for the most general (arbitrarily strided) arrays supported by NumPy. The iterator abstraction is an example in my mind of beautiful code because it allows simple expression of a simple idea, even though the underlying implementation details might actually be complicated. This kind of beautiful code does not just drop into existence, but it is often the result of repeated attempts to solve a set of similar problems until a general solution crystallizes.
My first attempt at writing a general-purpose, N-dimensional looping construct occurred around 1997 when I was trying to write code for both an N-dimensional convolution and a general-purpose arraymap that would perform a Python function on every element of an N-dimensional array.
The solution I arrived at then (though not formalized as an iterator) was to keep track of a C array of integers as indices for the N-dimensional array. Iteration meant incrementing this N-index counter with special code to wrap the counter back to zero and increment the next counter by 1 when the index reached the size of the array in a particular dimension.
While writing NumPy eight years later, I became more aware of the concept of, and use of, iterator objects in Python. I thus considered adapting the arraymap code as a formal iterator. In the process, I studied how Peter Vevreer (author of SciPy's ndimage package) accomplished N-dimensional looping and discovered an iterator very similar to what I had already been using. With this boost in confidence, I formalized the iterator, applying ideas from ndimage to the basic structural elements contained in the arraymap and N-dimensional convolution code.