The Benefits of GPU Compute
- Reduced power consumption. The architectural characteristics of the Mali-T600 and Mali-T700 series of GPUs enable computation of many parallel workloads much more efficiently than alternative processor solutions. GPU Compute accelerated applications can therefore benefit by consuming less energy, which translates into longer battery life.
- Improved performance and user experience. Where raw performance in the target, the computation of heavy parallel workloads can also be significantly accelerated through the use of the GPU. This may translate in increased frame rate, or the ability to carry out more work in the same temporal/power budget, and can result in benefits such as improved UI responsiveness, more robust finger detection for gesture UIs in challenging lighting conditions, more accurate physics simulation, the ability to apply complex pre-/post-processing effects to multimedia on-device and in real-time. In essence: a significantly improved end-user experience.
- Portability, programmability, flexibility. Heterogeneous compute APIs such as OpenCL™ and RenderScript, are designed for concurrency. They allow the developer to migrate some of the load from the CPU to the GPU or other accelerator, or to distribute it between processors in order to enable better load-balancing across system resources. For example a video codec may offload motion vector calculations to the GPU, enabling the CPU to operate with fewer cores and at lower frequencies, or to be available to compute additional tasks, for example video analytics.
- Reduction of cost, risk and time to market. System designers may be influenced by various cost, flexibility and portability concerns when considering migrating functionality from dedicated hardware accelerators to software solutions which leverage the CPU/GPU subsystem. This approach is made viable and compelling due to the additional computational power provided by the GPU, now exposed through industry standard heterogeneous compute APIs.
Over the last few years ARM has worked very hard to create and develop a strong GPU Compute ecosystem. Collaborations were established across geographies, use-cases and applications, working with partners at all levels of the value chain. These partners were able to translate the benefits of GPU Compute into reality, to the ultimate avail of the end users, and were proudly showcasing their progress at the Multimedia Seminars.
Demonstrating Reduced Power Consumption
For the first time ever Ittiam Systems publically demonstrated how a software solution leveraging on the CPU+GPU compute subsystem is able to save power compared to a solution that does not make use of the GPU. Using an instrumented development board and power probing tools connected to a National Instruments DAQ unit they were able to demonstrate a typical reduction in power consumption of over 30% for 1080p30 video playback.
Mukund Srinivasan, Director and General Manager of the Consumer and Mobility Business Unit at Ittiam, said: "These paradigm shifts open a unique window of opportunity for focused media-related Intellectual Property providers like Ittiam Systems® to offer highly differentiated solutions that are not only compute efficient but also enhance user experience by way of a longer battery life, thanks to offloading significant compute to the ARM Mali GPU. The main hurdle to cross, in these innovative solutions, comes in the form of how to manage the enhanced compute demands of a complex codec standard on mobile devices, since it bears significantly more complex coding tools, for codecs like H.265 or VP9, as compared to VP8 or H.264. In order to gain the maximum efficiencies offered by the GPU technology, collaborating with a longstanding partner and a technology pioneer like ARM enabled us to generate original solutions to the complex problems posed in the design and implementation of consumer electronic systems. Working closely with ARM, we have been able to showcase not just a prototype or a demo, but a real working product delivering significant power savings, of the order of 30-35% improvement in energy efficiencies, when measured on the whole subsystem, using the ARM Mali GPU on board a silicon chip designed for use in the Mobile market"
Demonstrating reduced CPU Load
The algorithm is very computational intensive and very taxing on the CPU resource, which can at times result in poor responsiveness. Furthermore, a performance of 20+ FPS is required in order to deliver a good user experience and this is not achievable on the CPU alone. Fortunately, the heavy level of data parallelism, and large proportion of floating point and SIMD-friendly operations, make this use case great for GPU acceleration. Using RenderScript on Mali, Thundersoft were able to improve the performance from a poor 10fps to over 20fps, and at the same time reduce the CPU load from fluctuating between 70-100% to a consistent < 40%. Dynamic power reduction techniques are therefore able to disable and scale down operational points of the CPUs in order to save power.
Delivering improved Performance and User Experience
Improved User Experience for Computer Vision Applications
eyeSight machine vision algorithms make extensive use of machine learning (using neural networks). The capability to learn from a vast amount of data, at a reasonable amount of time is a key element for success. However, the required
computational resources of neural networks, are beyond the capabilities of standard CPUs. eyeSight’s utilization of deep learning methods can greatly benefit from running on GPU processors.
Alva demonstrate video HDR and real-time video stabilization
streams at 30fps and above. Alva Systems have also implemented an advanced HDR solution that corrects image distortion (common in multi-frame processing algorithms), removes ghosting and carries out intelligent tone mapping (to enable a more realistic result). Of course all of these features increase the computational requirements of the algorithm. GPU Compute enables real-time computation. Alva were able to measure performance improvement of individual blocks of around 13-15x compared to the reference CPU implementation of the same algorithm.
In Conclusion
Modern compute APIs enable efficient and portable heterogeneous computing. This includes enabling the use of the best processor for the task, the ability to balance workloads across system resources and to offload heavy parallel computation to the GPU. GPU Compute with ARM Mali brings tangible advantages for real world applications, including reduced cost and time to market, improved performance and user experience, and improved energy efficiency (measured on consumer devices). These benefits are being enabled by our ecosystem partners who use GPU Compute on Mali for a variety of applications including: advanced imaging, computer vision, computational photography and media codecs.