Multidimensional Iterators in NumPy > Iterator Use

19.6. Iterator Use

A good abstraction proves its worth when it makes coding simpler under diverse conditions, or when it ends up being useful in ways that were originally unintended. Both of these affirmations of value are definitely true of the NumPy iterator object. With only slight modifications, the original simple NumPy iterator has become a workhorse for implementing other NumPy features, such as iteration over all but one dimension and iteration over multiple arrays at the same time. In addition, when we had to quickly add

some enhancements to the code for generating random numbers and for broadcast-based copying, the existence of the iterator and its extensions made implementation much easier.

19.6.1. Iteration Over All But One Dimension

A common motif in NumPy is to gain speed by concentrating optimizations on the loop over a single dimension where simple striding can be assumed. Then an iteration strategy that iterates over all but the last dimension is used. This was the approach introduced by NumPy's predecessor, Numeric, to implement the math functionality.

In NumPy, a slight modification to the NumPy iterator makes it possible to use this basic strategy in any code. The modified iterator is returned from the constructor as follows:

	it = PyArray_IterAllButAxis(array, &dim).

The PyArray_IterAllButAxis function takes a NumPy array and the address of an integer representing the dimension to remove from the iteration. The integer is passed by reference (the & operator) because if the dimension is specified as -1, the function determines which dimension to remove from iteration and places the number of that dimension in the argument. When the input dimension is -1, the routine chooses the dimension with the smallest nonzero stride.

Another choice for the dimension to remove might have been the dimension with the largest number of elements. That choice would minimize the number of outer loop iterations and reserve the most elements for the presumably fast inner loop. The problem with that choice is that getting information in and out of memory is often the slowest part of an algorithm on general-purpose processors.

As a result, the choice made by NumPy is to make sure that the inner loop is proceeding with data that is as close together as possible. Such data is more likely to be accessed more quickly during the speed-critical inner loop.

The iterator is modified by:

  1. Dividing the iteration size by the length of the dimension being removed.

  2. Setting the number of elements in the selected dimension to 1 (so the array storing one less than the total number is set to 0): dims_m1[i]=0.

  3. Setting the backstrides entry for that dimension to 0 so that the continual rewrapping of the counter in the given dimension back to 0 will never alter the data pointer.

  4. Resetting the contiguous flag to 0 because processing will not be contiguous in memory (each iteration has to skip an entire dimension of the array).

The altered iterator is returned by the function. It can now be used everywhere an iterator was previously used. Each time through the loop, the iterator will point to the first element of the selected dimension of the array.

19.6.2. Multiple Iterations

Another common task in NumPy is to iterate over several arrays in concert. For example, the implementation of array addition requires iterating over both arrays using a connected iterator so that the output array is the sum of each element of the first array multiplied by each element of the second array. This can be accomplished using a different iterator for each of the input elements and an iterator for the output array in the normal fashion.

Alternatively, NumPy provides a multi-iterator object that can simplify dealing with several iterators at once. This multi-iterator object also handles the broadcasting functionality of NumPy automatically. Broadcasting is the name given to the feature in NumPy that allows arrays with different shapes to be used together in operations that are supposed to work element-by-element. For example, broadcasting allows a (4,1)-shaped array to be added to a (3)-shaped array resulting in a (4,3)-shaped array. Broadcasting also allows simultaneous iteration over a (4,1)-shaped array, a (3)-shaped array, and a (5,1,1)-shaped array to produce a broadcasted iteration covering the elements of a (5,4,3)-shaped array.

The rules of broadcasting are:

The key to the implementation of broadcasting consists of surprisingly simple modifications to the array iterators. With these alterations, standard iterator loops can be used to implement the resulting calculations in a straightforward way. The modifications needed are changes to the shape of the iterators (not the underlying array) and changes to the strides and backstrides. The shape stored in the iterator is changed to match the broadcast shape. The strides and backstrides for broadcast dimensions are changed to 0. With a stride of 0, the standard iterator does not actually move the data pointer to the element in memory as the index in that dimension proceeds. This creates the desired effect of broadcasting without actually copying the memory.

The following code illustrates usage of the multi-iterator object:

	PyObject *multi;
	PyObject *in1, *in2;
	double *i1p, *i2p, *op;
	/* get in1 and in2 (assumed to be arrays of NPY_DOUBLE) */
	/* first argument is the number of input arrays; the
	   next (variable number of) arguments are the
	   array objects */
	multi = PyArray_MultiNew(2, in1, in2);
	/* construct output array */
	out = PyArray_SimpleNew(PyArray_MultiIter_NDIM(multi),
	                        PyArray_MultiIter_DIMS(multi),
	                        NPY_DOUBLE);
	op = PyArray_DATA(out);
	while(PyArray_MultiIter_NOTDONE(multi)) {
	    /* get (pointers to) the current value in each array */
	    i1p = PyArray_MultiIter_DATA(multi, 0);
	    i2p = PyArray_MultiIter_DATA(multi, 1);
	    /* perform the operation for this element */
	    *op = *ip1 + *ip2
	    op += 1; /* Advance output array pointer */
	    /* Advance all the input iterators */
	    PyArray_MultiIter_NEXT(multi);
	}


					    

The code is very similar to a standard iterator loop, except the multi-iterator handles adjustments of the input iterators to accomplish broadcasting, as well as incrementing all the other input iterators. This code handles the broadcasting automatically as part of the iterator processing, so that the addition of a (3,1)-shaped array to a (4)-shaped one will produce a (3,4)-shaped output array.

19.6.3. Anecdotes

The NumPy iterator is used throughout the NumPy code base to simplify the construction of N-dimensional loops. Having the iterator available allowed me to code algorithms for more general (noncontiguous) arrays. Normally, the difficulty of thinking about how to handle the noncontiguous arrays would have convinced me to just force an array to be contiguous (by making a new copy if necessary) and then use an easy looping algorithm. The existence of the NumPy iterator allowed me to write much more general code that is still as readable, with a very minor cost in speed for arrays that are actually contiguous. This slight disadvantage is offset by the very great decrease in memory requirements for arrays that are noncontiguous. The improved productivity in writing such loops is sufficient to justify the existence of the NumPy iterator.

However, it is NumPy's encapsulation for broadcasting where the utility of the abstraction really shines. It shone particularly bright when the multi-iterator allowed me to enhance the random-number generators of NumPy to deal with arrays of parameters pertaining to the random number generators. The change took about two hours with only a few lines of code.

The random-number generator facility of NumPy was written by Robert Kern. He was not familiar with the C broadcasting API that had only been recently added. As a result, the original implementation required all parameters used to specify the random numbers to be scalar values (i.e., the value of for an exponential distribution).

This was an unfortunate restriction. It is quite common to need an array of random numbers drawn from a particular distribution where different parts of the array should have different parameters. For instance, a programmer might need a matrix of random numbers drawn from the exponential distribution where each row of numbers should be sampled using a different value of . To allow arrays of parameters, the bulk of the change was to use the multi-iterator loop (with its built-in broadcasting facility) and fill in the output array with the random samples.

Another opportunity to use the iterator surfaced when the code that copied data from one array to another needed to be altered to copy in a manner consistent with NumPy's definition of broadcasting. Previously, an array was copied over to another using the standard iterator. The only shape checking done was to ensure that the destination array got filled only once. If the destination ran out of elements, its iterator started over again. Eventually, it became clear that this was not the desired copying behavior because it basically implemented a different kind of "broadcasting" (as long as the total number of elements of one array was a multiple of another, any array could be copied into any other array regardless of shape). The kind of data replication that resulted from this copy command was inconsistent with the definition of broadcasting used in other places in NumPy. It became clear that it needed to be changed. Again, the multi-iterators and its built-in concept of iterator broadcasting was a useful abstraction because it allowed me to write the code to accomplish the copy (including size checking) very quickly with only very few lines of actual new code.