Multidimensional Iterators in NumPy > Memory Models for an N-Dimensional Array

19.2. Memory Models for an N-Dimensional Array

The simplest model for an N-dimensional array in computer memory can be used whenever all of the elements of the array are sitting next to each other in a contiguous segment. Under such circumstances, getting to the next element of the array is as simple as adding a fixed constant to a pointer to the memory location of the current data pointer. As a result, an iterator for contiguous memory arrays requires just adding a fixed constant to the current data pointer. Therefore, if every N-dimensional array in NumPy were contiguous, discussing iterators would be rather uninteresting.

The beauty of the iterator abstraction is that it allows us to think about processing and manipulating noncontiguous arrays with the same ease as contiguous arrays. Noncontiguous arrays arise in NumPy because an array can be created that is a "view" of some other contiguous memory area. This new array may not itself be contiguous.

For example, consider a three-dimensional array, a, that is contiguous in memory. With NumPy, you can create another array consisting of a subset of this larger array using Python's slicing notation. Thus, the statement b=a[::2, 3:, 1::3] returns another NumPy array consisting of every other element in the first dimension, all elements starting at the fourth element (with zero-based indexing) in the second dimension, and every third element starting at the second element in the third dimension. This new array is not a copy of the memory at those locations; it is a view of the original array and shares memory with it. But this new array cannot be represented as a contiguous chunk of memory.

A two-dimensional illustration should further drive home the point. Figure 19-1 shows a contiguous, two-dimensional, 4 x 5 array with memory locations labeled from 1 through 20. Above the representation of the 4 x 5 array is a linear representation of the memory for the array as the computer might see it. If a represents the full memory block, b=a[1:3, 1:4] represents the shaded region (memory locations 7, 8, 9, 12, 13, and 14). As emphasized in the linear representation, these memory locations are not contiguous.

Figure 19-1. A two-dimensional array slice and its linear representation in memory


NumPy's general memory model for an N-dimensional array supports the creation of these kinds of noncontiguous views of arrays. It is made possible by attaching to the array a sequence of integers that represent the values for the "striding" through each dimension.

The stride value for a particular dimension specifies how many bytes must be skipped to get from one element of the array to another along the associated dimension, or axis. This stride value can even be negative, indicating that the next element in the array is obtained by moving backward in memory. The extra complication of the (potentially) arbitrary striding means that constructing an iterator to handle the generic case is more difficult.