How are numpy arrays stored?

In [1]:
import numpy as np

Numpy presents an n-dimensional abstraction that has to be fit into 1-dimensional computer memory.

Even for 2 dimensions (matrices), this leads to confusion: row-major, column-major.

In [4]:
A = np.arange(9).reshape(3, 3)
print(A)
[[0 1 2]
 [3 4 5]
 [6 7 8]]

Strides and in-memory representation

How is this represented in memory?

In [6]:
A.strides
Out[6]:
(24, 8)
  • strides stores for each axis by how many bytes one needs to jump to get from one entry to the next (in that axis)
  • So how is the array above stored?
  • This captures row-major ("C" order) and column-major ("Fortran" order), but is actually much more general.

We can also ask for Fortran order:

In [10]:
A2 = np.arange(9).reshape(3, 3, order="F")
A2
Out[10]:
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

numpy defaults to row-major order.

In [11]:
A2.strides
Out[11]:
(8, 24)

Strides and Contiguity

How is the stride model more general than just saying "row major" or "column major"?

In [15]:
A = np.arange(16).reshape(4, 4)
A
Out[15]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
In [18]:
A.strides
Out[18]:
(32, 8)
In [14]:
Asub = A[:3, :3]
Asub
Out[14]:
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])

Recall that Asub constitutes a view of the original data in A.

In [19]:
Asub.strides
Out[19]:
(32, 8)

Now Asub is no longer a contiguous array!

From the linear-memory representation (as show by the increasing numbers in A) 3, 7, 11 are missing.

This is easy to check by a flag:

In [20]:
Asub.flags
Out[20]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False