I will start my blog series with a subseries on technology that I know and come to love best - OpenCL™ from the Khronos Group. OpenCL aims to provide functional portability across computer systems comprised of general-purpose processors (e.g. CPUs) and special-purpose accelerators (e.g. GPUs), to which I will now and then refer to as accelerated systems.

OpenCL is often criticised for presenting a steep learning curve for software engineers but I don't think that's fair: the verbosity of the OpenCL API is really a consequence of its generality. Once you have written a couple of OpenCL programs, you realise that most of the code running on the host processor is actually boilerplate.

I will sidestep this verbosity of OpenCL altogether by using PyOpenCL - a neat Python module written by Andreas Klöckner. (If you are reading this Andreas, keep up the good work!)

Just install PyOpenCL and NumPy - and you are ready to roll!

In [1]:

importpyopenclascl

importnumpyasnp

Trivial example

Suppose we want to create an array containing integers from 0 to 15.

In [2]:

N = 16

That's trivial using NumPy:

In [3]:

np_range=np.arange(N,dtype=np.int32)

np_range

Out[3]:

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)

but our PyOpenCL example will fill in a similar array using OpenCL:

In [4]:

cl_range=np.zeros(N,dtype=np.int32)

cl_range

Out[4]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Gimme some context!

Creating a context could hardly be easier:

In [5]:

context=cl.create_some_context()

Ditto creating a command queue:

In [6]:

queue=cl.CommandQueue(context)

Building and running

An OpenCL C program equivalent to np.arange(N) follows:

In [7]:

source='''

kernel void arange(global int * buffer)

{

const size_t gid = get_global_id(0);

buffer[gid] = convert_int(gid);

}

'''

The kernel will be launched as N work-items over a one-dimensional range [0, N-1]. Each work-item will get its unique index gid in the range (that is, an integer between 0 and N-1 inclusive) and write it into argument buffer at offset gid.

Let's build the program:

In [8]:

program = cl.Program(context, source).build()

allocate a memory buffer:

In [9]:

memory_flags = cl.mem_flags.WRITE_ONLY | cl.mem_flags.ALLOC_HOST_PTR

memory = cl.Buffer(context, flags=memory_flags, size=cl_range.nbytes)

launch the kernel:

In [10]:

kernel = program.arange(queue, [N], None, memory)

and copy the results from the buffer to cl_range:

In [11]:

cl.enqueue_copy(queue,cl_range,memory,wait_for=[kernel])

cl_range

Out[11]:

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)

Finally, let's confirm that arrays np_range and cl_range match element-wise:

In [12]:

np.all(np_range == cl_range)

Out[12]:

True

Concluding remarks

I am very interested in getting feedback. I am aiming primarily at software engineers already familiar with OpenCL so I can jump straight to more advanced topics. But if you feel you like a little more explanation, just ask me: I will be happy to explain, or point you to some excellent learning resources out there, or indeed consider filling in details in my later posts...

Also, I prepared this post using the wonderful IPython Notebook environment. I haven't seen much in the way of using PyOpenCL and IPython together, so would be very grateful for any links.

In my next post, I will start discussing how to optimise OpenCL kernels for the ARM® Mali™-T600 GPU series.

Introducing PyOpenCL

Trivial example

Gimme some context!

Building and running

Concluding remarks

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112