In [11]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Generate randomly distributed (x, y) pairs. Then, generate a copy of this data and shift each element by a random amount.

In [12]:
xy = np.random.randn(1000,2)
xshift = np.zeros(xy.shape)
xshift = 2*np.random.rand(xshift.shape[0],2)
xyhat = xy + xshift
In [13]:
plt.plot(xyhat[:,0], xyhat[:,1], 'rs', alpha=0.4)
plt.plot(xy[:,0], xy[:,1], 'o')
[<matplotlib.lines.Line2D at 0x7f3b584eacf8>]

Find the average (scalar) distance that each point moved. To do so, calculate the (vector) differences between points in the original data and corresponding points in the shifted data, and then take the two-norms of those differences. Finally, take the average over all of the two-norms.

This is what it looks like if we loop through each point individually:

In [14]:
n = xy.shape[0]

m1 = 0.0
for i in range(n):
    m1 += np.sqrt((xy[i,0]-xyhat[i,0])**2 + (xy[i,1]-xyhat[i,1])**2)

Since we are working with arrays, we can use numpy to be more efficient. Notice that this code gives the exact same answer:

In [15]:
err = (xy-xyhat)**2
err = err.sum(axis=1)
err = np.sqrt(err)
m1 = err.sum()

The code can be written more compactly using just one line.

In [16]:
err = np.sqrt(((xy - xyhat)**2).sum(axis=1)).sum()/n

What happens if we just take the norm of the entire difference vector? Is it the same as what we did above?

In [17]:

We can also pass an argument to the norm function to get a different norm.

In [18]:
dist = np.sqrt(((xy - xyhat)**2).sum(axis=1))
print(np.linalg.norm(dist, ord=np.inf))

Examine the documentation for numpy's norm routine.

In [19]:
In [ ]: