In [11]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Generate randomly distributed (x, y) pairs. Then, generate a copy of this data and shift each element by a random amount.

In [12]:
xy = np.random.randn(1000,2)
xshift = np.zeros(xy.shape)
xshift = 2*np.random.rand(xshift.shape[0],2)
xyhat = xy + xshift
In [13]:
plt.plot(xyhat[:,0], xyhat[:,1], 'rs', alpha=0.4)
plt.plot(xy[:,0], xy[:,1], 'o')
Out[13]:
[<matplotlib.lines.Line2D at 0x7f3b584eacf8>]

Find the average (scalar) distance that each point moved. To do so, calculate the (vector) differences between points in the original data and corresponding points in the shifted data, and then take the two-norms of those differences. Finally, take the average over all of the two-norms.

This is what it looks like if we loop through each point individually:

In [14]:
n = xy.shape[0]

m1 = 0.0
for i in range(n):
    m1 += np.sqrt((xy[i,0]-xyhat[i,0])**2 + (xy[i,1]-xyhat[i,1])**2)
    
print(m1/n)
1.54746284303

Since we are working with arrays, we can use numpy to be more efficient. Notice that this code gives the exact same answer:

In [15]:
err = (xy-xyhat)**2
err = err.sum(axis=1)
err = np.sqrt(err)
m1 = err.sum()
print(m1/n)
1.54746284303

The code can be written more compactly using just one line.

In [16]:
err = np.sqrt(((xy - xyhat)**2).sum(axis=1)).sum()/n
print(err)
1.54746284303

What happens if we just take the norm of the entire difference vector? Is it the same as what we did above?

In [17]:
print(np.linalg.norm(xy-xyhat))
52.2236429011

We can also pass an argument to the norm function to get a different norm.

In [18]:
dist = np.sqrt(((xy - xyhat)**2).sum(axis=1))
print(np.linalg.norm(dist, ord=np.inf))
2.76144343403

Examine the documentation for numpy's norm routine.

In [19]:
np.linalg.norm?
In [ ]: