Welcome to my series of blogs on accelerated computing technologies! Special-purpose hardware designed to execute certain computations (prominently, GPUs designed to execute graphics computations) is expected to provide better performance than general-purpose hardware (prominently, CPUs). Better performance typically means faster or accelerated execution, but often means lower energy consumption as well. Expectations of better performance of course imply that software is also up to scratch.
As a way of introduction, I have been working on accelerated computing technologies for over ten years, first with CPU vector extensions like ARM Neon, then with vector co-processors like ClearSpeed CSX and Cell SPE, and more recently with GPUs supporting parallel computations like ARM Mali. I have experienced switching from just using vendor-specific APIs to both implementing and using vendor-independent standards such as OpenCL. Also, I have experienced both working in academia and industry which is bound to affect what I am going to write about.
I am aiming at engineering-minded people out there, so you should expect facts and informed opinions, no hype no politics. Following this picture
I am telling you there is a better way of writing software for accelerated systems. Stay tuned!