Building and Using Sparse Matrices

In [2]:
import numpy as np
import matplotlib.pyplot as pt

import scipy.sparse as sps

Building a sparse matrix

COOrdinate format is typically convenient for building ("assembling") a sparse matrix:

In [3]:
data = [5, 6, 7]
rows = [1, 1, 2]
columns = [2, 4, 6]

A = sps.coo_matrix(
        (data, (rows, columns)),
        shape=(10, 10), dtype=np.float64)
A
/usr/local/lib/python3.5/dist-packages/IPython/core/formatters.py:92: DeprecationWarning: DisplayFormatter._ipython_display_formatter_default is deprecated: use @default decorator instead.
  def _ipython_display_formatter_default(self):
/usr/local/lib/python3.5/dist-packages/IPython/core/formatters.py:669: DeprecationWarning: PlainTextFormatter._singleton_printers_default is deprecated: use @default decorator instead.
  def _singleton_printers_default(self):
Out[3]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
	with 3 stored elements in COOrdinate format>
In [4]:
A.todense()
Out[4]:
matrix([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  5.,  0.,  6.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  7.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
In [5]:
A.nnz
Out[5]:
3
In [6]:
pt.spy(A)
Out[6]:
<matplotlib.lines.Line2D at 0x7fcfcdd38f28>

For a COO matrix, the juicy attributes are data, row, and col.

In [9]:
print("row:", A.row)
print("col:", A.col)
print("data:", A.data)
row: [1 1 2]
col: [2 4 6]
data: [ 5.  6.  7.]

COOrdinate format is not the only format.

There is also Compressed Sparse Row:

In [12]:
Acsr = A.tocsr()
Acsr
Out[12]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
	with 3 stored elements in Compressed Sparse Row format>

For Compressed Sparse Row, look in data, indptr, and indices.

In [14]:
print("indptr:", Acsr.indptr)
print("indices:", Acsr.indices)
print("data:", Acsr.data)
indptr: [0 0 2 3 3 3 3 3 3 3 3]
indices: [2 4 6]
data: [ 5.  6.  7.]

Performance of the Matrix-Vector Product

The following code randomly generates a sparse matrix that has a given fill_percent percentage of nonzero entries:

In [18]:
fill_percent = 5

size = 1000
nentries = size**2 * fill_percent // 100

data = np.random.randn(nentries)
rows = (np.random.rand(nentries)*size).astype(np.int32)
columns = (np.random.rand(nentries)*size).astype(np.int32)

B_coo = sps.coo_matrix(
        (data, (rows, columns)),
        shape=(size, size), dtype=np.float64)

B_csr = sps.csr_matrix(B_coo)

B_dense = B_coo.todense()

Next, we time matrix-vector multiplication for different versions of B:

In [26]:
vec = np.random.randn(size)

from time import time
start = time()

for i in range(2000):
    B_dense.dot(vec)
    
print("time: %g" % (time() - start))
time: 1.96073