Sparse Matrices in CSR Format

In addition to numpy, we need a module to handle sparse matrics. scipy.sparse has all of what we need.

In [30]:
import numpy as np
import scipy.sparse

from random import randrange
In [31]:
n = 9
A = np.zeros((n, n))

for i in range((int(n**2*0.3))):
    A[randrange(n), randrange(n)] = np.random.randn()
A.round(2)
Out[31]:
array([[ 0.  ,  0.  ,  0.23,  0.  ,  0.  ,  0.13,  0.91, -1.29,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  , -1.24,  1.46],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.32,  0.  ,  0.  ,  0.  , -0.67,  0.  ,  0.  ,  0.  , -0.26],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.61,  0.  ,  1.2 ],
       [-0.01,  0.84,  0.  ,  0.  ,  0.23,  0.  ,  0.  ,  0.  ,  1.02],
       [ 0.  , -1.09,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  , -1.43],
       [ 0.  , -0.27,  0.  ,  0.  ,  0.  ,  0.  , -1.65,  1.39,  0.  ]])

Now convert this to a CSR matrix:

In [32]:
Acsr = scipy.sparse.csr_matrix(A)
Acsr
Out[32]:
<9x9 sparse matrix of type '<class 'numpy.float64'>'
	with 20 stored elements in Compressed Sparse Row format>
In [33]:
Acsr.indptr
Out[33]:
array([ 0,  4,  6,  6,  9,  9, 11, 15, 17, 20], dtype=int32)
In [34]:
Acsr.indices
Out[34]:
array([2, 5, 6, 7, 7, 8, 0, 4, 8, 6, 8, 0, 1, 4, 8, 1, 8, 1, 6, 7], dtype=int32)
In [35]:
Acsr.data.round(2)
Out[35]:
array([ 0.23,  0.13,  0.91, -1.29, -1.24,  1.46,  0.32, -0.67, -0.26,
        0.61,  1.2 , -0.01,  0.84,  0.23,  1.02, -1.09, -1.43, -0.27,
       -1.65,  1.39])
  • How much memory does a dense matrix of double precision numbers of size $10^6 \times 10^6$ occupy?
In [3]:
# in gigabytes:
10**6 * 10**6 * 8 / 1e9

# i.e. 8 Terabytes
Out[3]:
8000.0
  • How much memory does a sparse identity matrix of the same size occupy?
In [6]:
occupied = (
    10**6 * 4 # (32-bit integers to store row indices)
    + 10**6 * 4 # (one 32-bit integer per row to store column index)
    + 10**6 * 8 # (one 64-bit double precision number per row to store value)
    )
# in gigabytes
occupied / 1e9
# I.e. 16 Megabytes
Out[6]:
0.016
In [ ]: