Quantcast
Channel: ARM Mali Graphics
Viewing all articles
Browse latest Browse all 266

Do Androids have nightmares of botched system integrations?

$
0
0

Today ARM launched the ARM® Mali™-DP500 Display Processor, the first high specification Display Processor that ARM has developed. Alongside the Display Processor itself, ARM will also be delivering an optimized software driver for Android™’s HW Composer HAL, providing a buttery smooth user experience for your Mali-DP500-based Android devices.

 

Adding the Mali-DP500 into ARM’s media processing portfolio, as explained below, makes a giant leap forward in the system integration story of the platforms you build with ARM’s products; but for ARM’s software driver team, it’s just another step along a path we've been following since Android Éclair (2.1) first sparked to life on an ARM Mali-200 development platform in 2009.

 

Since that day, almost five years ago, ARM’s software driver team have added support for eight different versions of Android (from Éclair to KitKat) into thirteen releases of our driver stacks running on top of seven generations of Mali GPU hardware and they have also helped deploy those drivers into literally hundreds of different devices, including working on the bleeding edge of Android for Google’s Nexus 10. This has been no small accomplishment and in the process of supporting Android for so long our software team has built up a huge amount of experience in the OS, not just in the GPU sub-system, but in the entire media sub-system, covering rendering, composition, video and camera, within which the GPU plays only one (albeit large and critical) part.

 

We’ve seen the struggles that customers can have trying to integrate a video from one vendor, camera ISP from another, GPU from ARM and in-house display solution together into Android and have worked with and guided many of them towards solutions that transform their systems into first-rate, truly beautiful Android experiences. Through these different customer experiences we've developed a clear understanding of what a fully integrated set of media sub-system drivers must look like and what features they must support in order to achieve the best performance and power results that any one platform is capable of on Android.

 

So why is system integration so difficult?

 

In this case I'm hoping that a picture is worth a thousand words or, for this picture in particular, a thousand or so man-hours of your engineers’ time spent on something else instead of system integration:

 

android_stack.png

 

The diagram above is a simplified (yep, in reality it’s worse than this), view of the interactions between Android’s common user-space components and the underlying software drivers, kernel components and hardware that is used to provide the user experience on Android. If you take each of your media components from different vendors then what you end up with is three (or more) software drivers that you first need to integrate into your platform separately after which you also must integrate them with each other in order to get decent system performance. If you get the integration wrong or if the different components don’t talk using the same standard interfaces then what you’re left with is a functional platform that runs too slow, or burns too much power, or in the worst case somehow manages to do both at the same time.

 

ARM has taken each of the common integration headaches it has seen happen time and time again on customer platforms and designed them away by producing a collection of drivers designed to integrate and perform together. Let’s have a look at the real issues we see customers facing and how our pre-integrated solution avoids them:

 

Pixel format negotiation(“My system components don’t talk the same language!”)

 

One of the key concerns during system integration is making sure each component in the media sub-system (be it, Video, GPU, Camera or Display) is actually capable of understanding the format of the graphical output from the other components it reads from as well as ensuring each is capable of generating content in a format other components can read:

  • Your video hardware may be capable of writing out video frames in five different YUV formats but if none of them are supported by your Display Processor then you have no choice other than to burn some GPU power to compose that video onto the display.
  • What if you’ve accidentally implemented a display processor that doesn’t understand pixel formats that have pre-multiplied alpha values (as used by most of Android’s user interface)? Suddenly your super clever display processor is nothing more than a glorified framebuffer controller, scanning out frames your GPU has had to generate for you.
  • What if your components are all able to understand 32-bit RGBA pixel formats perfectly but for some reason some of your apps are displaying with inverted colors? Now you’re wasting days of engineering time tracking down which component disagrees with everything else about the ordering of the Red and Blue components of 32-bit pixel formats as well as figuring out how to make it flip them the other way.

These are just some of the examples of real integration issues we’ve seen happen all of which are avoided by using our complete solution: each ARM component has been designed to work with each other as well as work within Android so you won’t get any last minute confusion over whether they all speak the same language. What’s more, ARM provides, alongside it’s software drivers, an open source allocation library (Gralloc) that is already setup with support for each of our components ensuring bring-up time is reduced even further.

 

Memory allocation negotiation (“My system components don’t talk to each other”!)

 

Another area which causes many issues is deciding where and how to allocate memory for the system’s graphic buffers. When you allocate memory you need to take into account the various constraints of the underlying hardware that is to access that memory. Some key questions that you have to be able to answer when integrating the components together are:

  • Do all my components have an sMMU? If not then for certain allocations you’ll be forced to allocate some memory as physically contiguous to ensure it can be read by all components.
  • What’s the ideal memory alignment for all of the targeted components? Without this knowledge for every component in the system you could end up making very inefficient memory accesses when processing the graphic buffers.
  • Is there a certain area of memory that some components cannot access? Or an area that they must access from?

 

The Gralloc library provided by ARM has built-in understanding for all the system constraints of ARM’s multimedia processors and can work together with the Android Kernel’s ION allocator to ensure the most appropriate and memory efficient allocations are made for each processor in the system.

 

In addition, each of the software drivers for ARM’s multimedia processors utilizes the standard Linux dma_buf memory sharing features. By ensuring that all of the drivers use the same interface, the same allocation can be written to by one processor and read from by another providing a “zero copy” path for all graphical and video content on the platform, ensuring that the memory bandwidth overhead remains as low as possible.

 

Synchronization(“My system components talk over each other!”)

 

When you have a “zero copy” path in your system, and two or more devices are using the same piece of memory directly, synchronization between those components becomes extremely important. You don’t want your display processor to start reading in a buffer before the GPU or Video processor has finished writing to it  or you’ll end up with some very nasty screen corruption.

 

In older versions of Android (before Jellybean MR1), synchronization was handled and controlled in the Android user space by the way of each component in the rendering pipeline performing the following steps: processing its commands in the software driver, performing its task in the HW, waiting for that task to complete in the SW driver and then passing responsibility onto the next stage of the pipeline. This allowed for a very simple and easy synchronization method between components but also caused large bubbles (gaps) in the rendering pipeline as you’d continually stall and ping-pong work between the CPU and the HW and you wouldn’t start the CPU processing of the next stage until the HW processing of the previous stage had completed. All of these pipeline stalls could mean the difference between a “buttery smooth” and a “stuttering along” end user experience.

 

With Jellybean MR1, a new synchronization method, Android fences, was added to the Android platform. These fences, provided the software driver supports them, allow each stage in the pipeline to do their CPU-side processing and queue work for their component even if the previous stage hasn’t finished in the hardware as well as pass control to the next stage of the pipeline before their own hardware processing has completed (or even begun). As each component in the pipeline completes its work, it signals a fence and the next stage is automatically triggered with as little CPU involvement as possible. This allows much smaller gaps between one piece of hardware completing and the next one in the chain starting, squeezing out every last bit of performance possible for your systems. In order to make full use of the benefits of Android fences, every component in the rendering pipeline needs to support them. If one of your components does not then that stage in the pipeline falls back to waiting in user space and a performance bubble is introduced into your system.

 

The big problem, however comes when all of your components support Android fences but one of them has a bug. The only way that these bugs will manifest is as a sudden graphical glitch in the system that almost instantly disappears. What do you do now? You’ve got three or more different vendors providing software drivers that all support Android fences and one of them has a bug, but how do you know which one? How do you track it down?  Before you know it you've had to kick off three separate investigations with your vendors to try and find a bug that only manifests when one vendor’s component uses a standard interface to communicate with another vendor’s component. These kind of bugs can be extremely difficult to find, especially when no single vendor will know anything about the other vendor’s software. They’re the kind of bugs that stretch your device's release date further and further out as you wait for someone to have a eureka moment. Luckily, this isn't your only option; if you've taken the complete solution from ARM then you already have a set of drivers that have been implemented and even validated together to run correctly and if you did find an issue there’s only one vendor you need to talk to and you can be confident that they already have all the expertise needed to find it and fix it quickly.

 

Efficient composition

 

System integration isn't all about avoiding problems, it’s also about bringing a number of components together to achieve more than they ever could do alone. ARM’s new Mali-DP500 is a product tuned precisely for getting the most out of Android by efficiently offloading composition work from the GPU where it counts. We've performed detailed investigations into the common composition scenes generated by Android and seen that most applications and games found on the Android market make use of three or fewer layers with four or five layers usually being the upper limit produced.

 

image008.pngimage009.png

 

Any composition engine must trade off the number of hardware layers it supports and silicon area/cost. When composing a frame with more layers than supported in hardware any additional layers are typically handled by flattening them together using the GPU. The Mali-DP500 software drivers handle this by having complete control over which layers are sent to the GPU to be flattened, this allows us to leverage our expert knowledge of how to get the best performance out of our GPUs. The Mali-DP500 software driver will make intelligent decisions, based on the scenes being generated by Android, about what to send to the GPU in order to use as little bandwidth and power as possible compared to doing the full composition on the GPU.

 

When technologies such as Transaction Elimination are deployed in the system’s GPU the Mali-DP500 software driver can ensure that the GPU is processing only static or infrequently changing layers, effectively reducing the amount of memory bandwidth used by the GPU to write those layers to memory to near zero; and when coupled with AFBC technology in the GPU the memory bandwidth used, even in cases where the GPU must actually process non-static content, is greatly reduced.

 

Conclusion

 

So there you have it. With the addition of optimized drivers for both Display and Video which sit alongside what we’ve already been providing for the GPU, ARM is now able to offer a complete, off-the-shelf, set of drivers that come pre-integrated and optimized together and, most importantly, have all been validated to ARM’s highest quality standards to work seamlessly together *before* your platform is even ready to run software for the first time.

 

System integration on Android no longer has to be a trade off between stability and performance or become a cross-vendor organizational nightmare. Androids can peacefully dream about electric sheep once more.


Viewing all articles
Browse latest Browse all 266

Trending Articles