Hello GPU

In [3]:
import pyopencl as cl
import numpy as np
import numpy.linalg as la

mf = cl.mem_flags

This notebook demonstrates a simple GPU workflow that touches all essential pieces:

  • Data transfer
  • Kernel compilation
  • Execution
In [4]:
a = np.random.rand(50000).astype(np.float32)

Now create a context ctx and a command queue queue:

In [5]:
ctx = cl.create_some_context()

queue = cl.CommandQueue(ctx)

Now allocate a buffer. Buffer(context, flags, size=None, hostbuf=None)

In [6]:
a_buf = cl.Buffer(ctx, mf.READ_WRITE, size=a.nbytes)

Then transfer data:

In [7]:
cl.enqueue_copy(queue, a_buf, a)
<pyopencl._cl.NannyEvent at 0x7f722c1a9888>

Here's our kernel source code:

In [8]:
prg = cl.Program(ctx, """
    __kernel void twice(__global float *a)
      int gid = get_global_id(0);
      a[gid] = 2*a[gid];

Run the kernel.

In [9]:
prg.twice(queue, a.shape, None, a_buf)
<pyopencl._cl.Event at 0x7f72249540f8>

Copy the data back.

In [10]:
result = np.empty_like(a)

cl.enqueue_copy(queue, result, a_buf)
<pyopencl._cl.NannyEvent at 0x7f722c916e08>

Check the result.

In [11]:
print(la.norm(result - 2*a), la.norm(a))
0.0 128.81612
In [ ]: