Quantcast
Channel: ARM Mali Graphics
Viewing all articles
Browse latest Browse all 266

The Sensible Six (Optimization Techniques)

$
0
0

It's not often I get flown half way round the world in order to explain common sense but this March it happened as I was delivered to GDC in San Francisco to talk about best practices in mobile graphics. Unlike previous talks where I wax lyrical about the minutia of a specific optimization technique, this time I had to cover a wide range of things in just twenty minutes. Paired as I was with a similarly compressed talk from Stephen Barton about using our DS5 and MGD tools to analyse graphical applications for bottlenecks, it was a study in time management. One of the highlights of his talk was the latest MGD update, which you can read more about on his recent blog post. Pity our poor audience who, having had insufficient time to learn how to find their performance bottlenecks, were now going to be subject to my having insufficient time to tell them how to fix them.

 

We're making the slides available for this presentation (my section starts on slide 29) but unlike previous presentations there was no video taken, so some of the pages may need a little explanation here. Whereas usually I'd have time to look at an app and check for specific changes, obviously the people watching wanted to know what they could do on their own software. I therefore had to talk about the most common places where people leave room for improvement: Batching, Overdraw, Culling, Levels of Detail, Compression and Antialiasing.

 

Batching is a topic I have been outspoken about many times in the past and I really just gave some simple solutions here, such as combining static geometry into a single mesh. Though lip service was paid to dynamic batching and instancing, that topic is explained far better in my older post Game Set & Batch.

Although I've spoken about overdraw in the context of batching before, not much has been said about more scene related overdraw other than a sort of flippant "front to back, y'all" before talking about a batching solution. People often think of overdraw in the context of having to sort objects in a scene by their distance from the camera. One case lots of people complain about however, is what to do when the objects overlap or surround each other in some way, because then they can't be trusted. In that situation there's an even easier solution though. If you know one thing will always cover another, you can make a special ordering case for it in the code. There are even a number of very common savings to be made on a full screen scale. If the camera is inside a room, anything else inside the room can be rendered before the room itself, as they will always occlude the walls and the floor. This goes for rendering things before the ground in outdoor scenes too.

It's mostly about efficient scene management, but even when you don't know beforehand what order something will be drawn in you can make changes to reduce the impact of overdraw. If you have two pieces of geometry in the scene which use different shaders and for whatever reason it's hard to tell which to draw first, draw the least expensive shader first. At least that way the cheaper overdrawn pixels will be wasting less, and the occluded pixels from the expensive shader are saving more.

 

On a similar topic of not calculating that which is unseen, I then spoke of culling and the kind of large scale culling which is possible on the CPU side to reduce vertex calculations. This is achieved by reducing an object down to a bounding box, defined by eight points that can then be transformed to become a bounding rectangle on the screen. This rectangle can then be very quickly checked to see if it is on or off screen, or even if it's inside the bounds of a window or doorway through which we are seeing its scene. For most scenes this is the only kind of high level, large scale occlusion culling that makes sense because the next step would be to consider whether objects in a scene occlude each other. For that you need to think about the internal bounding volume which is guaranteed to occlude everything behind it regardless of its orientation and which must be generated to fit the geometry. Far more complicated than describing the bounding box.

 

Culling things in the distance is considered somewhat old hat in modern applications. We associate the sudden appearance of distant objects or emergence from a dense, opaque fog to be an undesirably retro aesthetic. In their place we have two new techniques. The fog is replaced by clever environmental design, limiting view distance by means of occluding walls and available eye-lines. For large, open spaces, having objects pop into reality had been replaced by dynamic levels of detail. The funny thing about levels of detail is that they don't have to be dynamic to be relevant. Levels of detail go beyond reducing unnecessary vertex processing, there's a small amount of overhead to process a triangle. This triangle setup cost is very small so ordinarily you never notice it, as it happens while the previous triangle is being turned into fragments, but if the fragment coverage of your triangles is too low, you can actually notice this cost bumping up your render times. Before you even worry about implementing dynamic levels of detail you ought to ask yourself if you've picked the right level of detail to begin with. If the average triangle coverage (which can be calculated in Streamline) is in single digits, you're probably doing something wrong. We actually see this all the time in projects where the artist has designed beautiful tree models and then they're lined up in the distance where none of that detail can be seen. If they're approachable then maybe a high detail model would be useful, switched in based on proximity, but if you just want a bunch of things in the background you may be better off with batched billboard sprites.

 

Having already talked about texture compression many times in the past, there's a lot of material on the specifics available from my previous presentations. This time, to take it in a different direction, I talked about how uncompressed textures have their pixels re-ordered in memory to give them better caching behaviour. This is similar to that seen in compressed textures but without the bandwidth saved when the cache misses and a block needs to be pulled from memory. This explains the block layout I've advocated many times in the past and I went on to talk more about the other rules of what makes texture compression a special case in image compression algorithms: Mainly the ability to immediately look up a block in the image (random access) decode it without any data from surrounding blocks (deterministic), and with no need for a dictionary of symbols (immediate).

 

One topic I was surprised to realize I'd never mentioned before was how texture compression works with mipmapping. Mipmapping is the technique of storing images at half, quarter, eighth (and so on) resolutions to reduce interference patterns and speed up texture loads. It's like automatic level of detail selection for textures. What people might not realise however is that whereas uncompressed texture mipmaps can be generated at load time with a single line of Open GL ES code, mipmaps for compressed textures have to be generated at compile time and themselves compressed and stored within the application's assets. It's a small price to pay for all that tasty, tasty efficiency however.

 

Finally I brought up antialiasing, because I figured room for improvement needn't necessarily be in terms of overhead reduction. Though I failed to bring it together due to time constraints on the day, the real message I wanted to impart in this talk was that optimization has become a dirty word in many ways. To suggest an application needs optimizing implies you've used everything the GPU's got and to make it run at a decent frame rate you'll have to make it look worse. That's not what optimization is. The metaphor I used was that if you don't optimize your application, it's like taking a single bite of an apple, throwing the rest away and complaining that it didn't have enough flesh. Well optimized code can munch away at that apple getting the absolute most out of it and done right, optimization doesn't make your application look worse, it gives you headroom to make it look better. Batching and culling let you put more stuff in your application, with level of detail and billboard impostors you can even have dense arrays of objects in the backgrounds. Compressed textures let you have more textures at a higher resolution, and full screen antialiasing is almost zero cost on Mali based systems.

 

That's the real message here.

 

That's not where the presentation ends, though. Part of my job involves taking apart people's graphics at the API level and listing all the things they've done wrong, or at least things they could do better. When they read the laundry list of sins however they very much interpret it as me telling them what they did wrong. So imagine my elation when given a chance to redress the balance and pick apart one of our own demos, in public, and discuss our own mistakes and faults. We're human too, you know.

 

Though difficult to describe in blog format, the screen shot slides at the end of the presentation show me stepping through the demo render process, explaining times when bad decisions were made regarding render order, when batchable objects were drawn individually, how practically nothing was culled and even a few glitches, such as the sky box covering the particle effects and the UI being rendered on screen even when it's opacity is zero. It's almost a shame that after identifying them we had to fix all these things, it would have been nice for the audience to know they were there.

 

If you're interested in using MGD and DS5 to profile your applications, there's a two part in-depth case study by Lorenzo Dal Col with far more detail than I could fit in my presentation:

Mali GPU Tools: A Case Study, Part 1 — Profiling Epic Citadel

Mali GPU Tools: A Case Study, Part 2 — Frame Analysis with Mali Graphics Debugger


Viewing all articles
Browse latest Browse all 266

Trending Articles