numpy: Introduction

A Difference in Speed

Let's import the numpy module.

In [5]:
import numpy as np
In [19]:
n = 10  # CHANGE ME
a1 = list(range(n))
a2 = np.arange(n)

if n <= 10:
    print(a1)
    print(a2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0 1 2 3 4 5 6 7 8 9]
In [20]:
%timeit [i**2 for i in a1]
100000 loops, best of 3: 2.41 µs per loop
In [21]:
%timeit a2**2
The slowest run took 34.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 400 ns per loop

Numpy Arrays: much less flexible, but:

  • much faster
  • less memory

Why the difference?

In [22]:
# (This cell contains a bunch of voodoo that
# helps with the graphics below. You don't need to
# know what this does.)

%load_ext gvmagic
from objgraph_helper import dot_refs
The gvmagic extension is already loaded. To reload it, use:
  %reload_ext gvmagic
In [26]:
%dotstr dot_refs([a1])
ObjectGraph o139829398678216 list 10 items o10923808 int 9 o139829398678216->o10923808 o10923776 int 8 o139829398678216->o10923776 o10923744 int 7 o139829398678216->o10923744 o10923712 int 6 o139829398678216->o10923712 o10923680 int 5 o139829398678216->o10923680 o10923648 int 4 o139829398678216->o10923648 o10923616 int 3 o139829398678216->o10923616 o10923584 int 2 o139829398678216->o10923584 o10923552 int 1 o139829398678216->o10923552 o10923520 int 0 o139829398678216->o10923520
In [28]:
%dotstr dot_refs([a2])
ObjectGraph o139829398652528 ndarray array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [29]:
a2.strides
Out[29]:
(8,)

Ways to create a numpy array

  • Casting from a list
In [6]:
np.array([1,2,3])
Out[6]:
array([1, 2, 3])
  • linspace
In [7]:
np.linspace(-1, 1, 10)
Out[7]:
array([-1.        , -0.77777778, -0.55555556, -0.33333333, -0.11111111,
        0.11111111,  0.33333333,  0.55555556,  0.77777778,  1.        ])
  • zeros
In [8]:
np.zeros((10,10), np.float64)
Out[8]:
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

Operations on arrays

These propagate to all elements:

In [9]:
a = np.array([1.2, 3, 4])
b = np.array([0.5, 0, 1])

Addition, multiplication, power, .. are all elementwise:

In [10]:
a+b
Out[10]:
array([ 1.7,  3. ,  5. ])
In [11]:
a*b
Out[11]:
array([ 0.6,  0. ,  4. ])
In [12]:
a**b
Out[12]:
array([ 1.09544512,  1.        ,  4.        ])

Matrix multiplication is np.dot(A, B) for two 2D arrays.

Important Attributes

Numpy arrays have two (most) important attributes:

In [13]:
a = np.random.rand(5, 4, 3)
a.shape
Out[13]:
(5, 4, 3)

The .shape attribute contains the dimensionality array as a tuple. So the tuple (5,4,3) means that we're dealing with a three-dimensional array of size $5 \times 4 \times 3$.

(numpy.random.rand just generates an array of random numbers of the given shape.)

In [14]:
a.dtype
Out[14]:
dtype('float64')

Other dtypes include np.complex64, np.int32, ...