Benchmarking Matrix-Matrix Multiply (via BLAS)

In [1]:
import numpy as np
from time import time
In [4]:
if 0:
    A = np.random.randn(25, 25)
    B = np.random.randn(25, 800000)
else:
    A = np.random.randn(3000, 3000)
    B = np.random.randn(3000, 3000)
In [3]:
start = time()
A@B
elapsed = time() - start

Was that... efficient?

In [6]:
print(f"{A.size * B.shape[1] * 2/1e9/elapsed} GFlops/s")
print(f"{A.nbytes*3/elapsed/1e9} GB/s")
45.728458751952864 GFlops/s
0.18291383500781147 GB/s

How would we come up with reference quantities with which to compare the attained performance?

In [ ]: