I will start my blog series with a subseries on technology that I know and come to love best - OpenCL™ from the Khronos Group. OpenCL aims to provide functional portability across computer systems comprised of general-purpose processors (e.g. CPUs) and special-purpose accelerators (e.g. GPUs), to which I will now and then refer to as accelerated systems.
OpenCL is often criticised for presenting a steep learning curve for software engineers but I don't think that's fair: the verbosity of the OpenCL API is really a consequence of its generality. Once you have written a couple of OpenCL programs, you realise that most of the code running on the host processor is actually boilerplate.
I will sidestep this verbosity of OpenCL altogether by using PyOpenCL - a neat Python module written by Andreas Klöckner. (If you are reading this Andreas, keep up the good work!)
Just install PyOpenCL and NumPy - and you are ready to roll!
importpyopenclascl
importnumpyasnp
Trivial example
Suppose we want to create an array containing integers from 0 to 15.
That's trivial using NumPy:
np_range=np.arange(N,dtype=np.int32)
np_range
Out[3]:
but our PyOpenCL example will fill in a similar array using OpenCL:
cl_range=np.zeros(N,dtype=np.int32)
cl_range
Gimme some context!
Creating a context could hardly be easier:
Ditto creating a command queue:
Building and running
An OpenCL C program equivalent to np.arange(N) follows:
source='''
kernel void arange(global int * buffer)
{
const size_t gid = get_global_id(0);
buffer[gid] = convert_int(gid);
}
'''
The kernel will be launched as N work-items over a one-dimensional range [0, N-1]. Each work-item will get its unique index gid in the range (that is, an integer between 0 and N-1 inclusive) and write it into argument buffer at offset gid.
Let's build the program:
allocate a memory buffer:
launch the kernel:
and copy the results from the buffer to cl_range:
cl.enqueue_copy(queue,cl_range,memory,wait_for=[kernel])
cl_range
Finally, let's confirm that arrays np_range and cl_range match element-wise:
Concluding remarks
I am very interested in getting feedback. I am aiming primarily at software engineers already familiar with OpenCL so I can jump straight to more advanced topics. But if you feel you like a little more explanation, just ask me: I will be happy to explain, or point you to some excellent learning resources out there, or indeed consider filling in details in my later posts...
Also, I prepared this post using the wonderful IPython Notebook environment. I haven't seen much in the way of using PyOpenCL and IPython together, so would be very grateful for any links.
In my next post, I will start discussing how to optimise OpenCL kernels for the ARM® Mali™-T600 GPU series.