ARM Mali Brings Scalability & Power Saving to the MediaTek Helio P20

March 7, 2016, 2:05 am

≫ Next: GDC 2016: What Not to Miss!

≪ Previous: Virtual Reality – Blatant Latency and how to Avoid it

Everyone wants the best of both worlds. We don’t want to pay a fortune for a phone, but we do still expect it to do everything we need it to. Whether that’s taking crystal clear pictures to share on social media, or supporting high-end 3D gaming, it imposes additional strain on power and efficiency. The demands we place on the mainstream mobile these days mean chip manufacturers have to make some very smart decisions with their next generation offerings.

ARM partner MediaTek recently launched their latest offering to the mainstream mobile segment with the Helio P20 mobile processor. Building on the features of the P10, the P20 still demands the highest performance and provides a premium experience to the user, whilst maintaining superior power consumption and cost efficiency.

The choice to incorporate an ARM Mali-T880 GPU supports the goals of providing a premium performing mobile device without the price tag in a number of ways. The critical characteristic of a high-end mobile GPU is the ability to deliver the extra computational performance needed without draining the battery or exceeding the thermal budget of the device. The Mali-T880 is cleverly designed to deliver the best performance with the lowest energy consumption. It incorporates numerous microarchitecture optimizations as well as bandwidth reduction technologies such as ARM Frame Buffer Compression (AFBC), Smart Composition and Transaction Elimination.

The scalability of the Mali family of GPUs is one of its key features and the ability to choose single or multiple cores is one of the reasons MediaTek selected it for both the P20 and the X20, which is designed for high performing systems. This flexibility reduces the amount of input required to incorporate it into a chipset, thereby significantly reducing the time it takes to get a product to market. MediaTek took full advantage of these benefits in including the Mali-T880 in the Helio P20 processor and continue to select ARM Mali as their GPU of choice for mass market and premium mobile.

↧

GDC 2016: What Not to Miss!

March 8, 2016, 8:31 am

≫ Next: ARM Mali Graphics week - Thunderclap live: Win a Samsung Galaxy S7

≪ Previous: ARM Mali Brings Scalability & Power Saving to the MediaTek Helio P20

GDC is just around the corner and I’m sure you’re getting as excited as I am! There’ll be heaps to see and do and I don’t want you to miss out on any of the cool stuff we’ll be showing so I thought I’d pop down some of the highlights to get things started.

As in previous years, we’ll have some of our top ecosystem partners on stand with us showing how ARM®’s unbeatable leadership in the mobile sector has helped them achieve new heights of innovation and advancement in both hardware and software. Nibiru’s ROM, SDK and VR headset will provide all the tools you need to optimize for Mali™ and develop the VR games of your dreams. Tencent will be showing their new micro-console and you can even port your games to it using their brand new SDK with help from experts in the Tencent & ARM joint lab. Playcanvas’s Mali-optimized WebGL game engine enables you to develop for any platform, while Cocos2D-x game engine has a brand new toolchain for you to take advantage of. The best part is, both are free and open source!

Epic Games, the creators of the Protostar demo, and Samsung will also be on-stand showing Vulkan on Mobile with Unreal Engine 4, developed and optimized for Mali. Find out you how the Unreal Editor can enable you to create your first Vulkan game.

As if that wasn’t enough, we also have lots of the latest demos for you to try. Snail Games, the creators of the new mobile kung-fu games Age of Wushu Dynasty and Taichi Panda bring you the latest updates running on ARM Mali to kick things up a gear. Perfect World will be pushing the boundaries of mobile graphics with their latest 3D MMO RPG games and our in-house experts will talk you through everything you need to create your own high quality mobile content.

Our VR circle on ARM booth 1624 will feature the latest up and coming industry apps and you’ll have a chance to play with next level experiences and applications. Mali, the #1 shipping GPU in the world, will be showing its full potential powering such awesome tech as the Samsung Gear VR headset on which you can explore the Ice Cave in full immersion! Also on the Samsung Gear VR, Umbra’s demo combines automatic occlusion culling, visibility-based level of detail and 3D content streaming to transport you into a massive 3D environment. Consisting of 30 GB of original source data that has been automatically optimized, this is one not to miss!

ARM’s Enlighten global illumination solution will be on display with the amazing Lab demo which showcases the new technology, available to customers from Enlighten 3.03, which improves existing Enlighten reflections to raise visual quality without sacrificing performance or ease of integration. It provides advanced screen-space independent dynamic reflections that are perfect for enhanced gameplay experiences and can add a real sense of immersion to VR applications.

Other things to watch out for throughout the event include exciting announcements from MediaTek, DeePoon, ARM’s Enlighten team and more so be sure to come say hi and make sure you don’t miss out!

↧

ARM Mali Graphics week - Thunderclap live: Win a Samsung Galaxy S7

March 8, 2016, 9:06 am

≫ Next: Combined Reflections: Stereo Reflections in VR

≪ Previous: GDC 2016: What Not to Miss!

We're running ARM Mali Graphics week again! The event will kick off on April 18th-22nd 2016.

This year we want to run an even bigger Graphics week to support the developer community and the users of our ARM Mali GPU. Graphics week will follow our GDC presence where there will be a whole host of announcements and news from us and our partners. Graphics week will feature blogs, videos and give developers content that will help them with their projects and make use of the latest developments in the mobile industry.

Join our Thunderclap coming soon to be in with a chance to win an ARM Mali powered Samsung Galaxy S7! We need 500 people so tell your friends too!

Please stay tuned to this blog which will be updated with all the content when the event starts.

Win a Samsung Galaxy S7 - Thunderclap: ARM Mali Graphics Week

↧

Combined Reflections: Stereo Reflections in VR

March 10, 2016, 6:31 am

≫ Next: Our Top Five Talks Highlights @ GDC’16 !

≪ Previous: ARM Mali Graphics week - Thunderclap live: Win a Samsung Galaxy S7

Introduction

Reflections are an important element in how we visualize the real world and developers therefore use them extensively in their games. In a previous blog I discussed how to render reflections based on local cubemaps and the advantages of this technique for mobile games where resources must be carefully balanced at runtime. In this blog I want to return to this topic and show how to combine different types of reflections. Finally, I discuss the importance of rendering stereo reflections in VR and provide an implementation of stereo reflections in Unity.

Reflections based on Local Cubemaps

Despite the fact that the technique to render reflections based on local cubemaps has been available since 2004, it has only been incorporated into the major game engines in recent years. See for example the “Reflection Probe” in Unity 5 and “Sphere Reflection Capture” and “Box Reflection Capture” in Unreal 4. After the introduction in the major game engines this technique has been widely accepted and became very popular. Nevertheless, this technique has a limitation inherited from the static nature of the cubemap. If something changes in our scene after the cubemap has been baked it will be no longer valid. This is the case, for example, of dynamic objects in our scene. The reflections based on the cubemap won’t show the new state of the scene.

To overcome this situation we could update the cubemap at runtime, but it is interesting to note here that we don’t have to update the whole cubemap in a single frame. A more performance friendly approach would be to update the cubemap partially in different frames, but in most cases it is difficult to afford to update the cubemap at runtime for performance reasons, especially in mobile devices.

A more rational approach is to use the local cubemap technique to render reflections from the static geometry of our scene and use other well-known techniques to render reflections from dynamic objects at runtime.

Figure 1 Combining reflections from different types of geometry.

Combining different types of reflections

In the Ice Cave demo we combine reflections based on static cubemaps with planar reflections rendered at runtime. In the central platform the reflections from the wall of the cave (static geometry) are rendered using a static local cubemap whereas reflections from dynamic objects (phoenix, butterfly, etc.) are rendered every frame using a mirrored camera. Both types of reflections are finally combined in a single shader.

Figure 2 Combined reflections in the Ice Cave demo.

Further to these, a third type of reflection is also combined in the same shader: the reflections coming from the sky which is visible through the big hole at the top of the cave. In this case when rendering the reflections from a distant environment we don’t need to apply the local correction to the reflection vector when fetching the texture from the cubemap. We can use the straightforward reflection vector as this could be considered a special case when the scene bounding box is so large that the local corrected vector will be equal to the original reflection vector.

Rendering planar reflections

In the previous blog I showed how to render reflections using local cubemaps in Unity. I will explain here how to render planar reflections at runtime in Unity using the mirrored camera technique and finally, I will show how to combine both types of reflections in the shader. Although the code snippets provided in this blog are written for Unity they can be used in any other game engine with the necessary changes.

Figure 3 Reflection camera setup.

When rendering planar reflections at runtime in the Ice Cave demo relative to a reflective platform in the XZ plane, we render the world upside down and this affects the winding of the geometry (see Fig. 3). For this reason we need to invert the winding of the geometry when rendering the reflections and restore the original winding when finishing rendering reflections.

To render planar reflections using a mirrored camera we must follow the steps described in Fig. 4.

Figure 4. Steps for rendering runtime planar reflections with a mirrored camera.

The below functions can be used for setting up your reflection camera reflCam in the script attached to the reflective object. The clipPlaneOffset is an offset you can expose as a public variable to control how the reflection fits the original object.

public GameObject reflCam;

public float clipPlaneOffset ;

…

private void SetUpReflectionCamera(){

// Find out the reflection plane: position and normal in world space

Vector3 pos = gameObject.transform.position;

// Reflection plane normal in the direction of Y axis

Vector3 normal = Vector3.up;

float d = -Vector3.Dot(normal, pos) - clipPlaneOffset;

Vector4 reflPlane = new Vector4(normal.x, normal.y, normal.z, d);

Matrix4x4 reflection = Matrix4x4.zero;

CalculateReflectionMatrix(ref reflection, reflPlane);

// Update reflection camera considering main camera position and orientation

// Set view matrix

Matrix4x4 m = Camera.main.worldToCameraMatrix * reflection;

reflCam.GetComponent<Camera>().worldToCameraMatrix = m;

// Set projection matrix

reflCam.GetComponent<Camera>().projectionMatrix = Camera.main.projectionMatrix;

}

private static void CalculateReflectionMatrix(ref Matrix4x4 reflectionMat, Vector4 plane){

reflectionMat.m00 = (1.0f - 2 * plane[0] * plane[0]);

reflectionMat.m01 = (-2 * plane[0] * plane[1]);

reflectionMat.m02 = (-2 * plane[0] * plane[2]);

reflectionMat.m03 = (-2 * plane[3] * plane[0]);

reflectionMat.m10 = (-2 * plane[1] * plane[0]);

reflectionMat.m11 = (1.0f - 2 * plane[1] * plane[1]);

reflectionMat.m12 = (-2 * plane[1] * plane[2]);

reflectionMat.m13 = (-2 * plane[3] * plane[1]);

reflectionMat.m20 = (-2 * plane[2] * plane[0]);

reflectionMat.m21 = (-2 * plane[2] * plane[1]);

reflectionMat.m22 = (1.0f - 2 * plane[2] * plane[2]);

reflectionMat.m23 = (-2 * plane[3] * plane[2]);

reflectionMat.m30 = 0.0f;

reflectionMat.m31 = 0.0f;

reflectionMat.m32 = 0.0f;

reflectionMat.m33 = 1.0f;

}

The reflection matrix is just a transformation matrix that applies a reflection relative to a reflective plane given its normal and position.

Figure 5. The reflection transformation matrix.

When creating the reflection camera in Unity we must indicate the reflection texture it renders to. We must set the resolution of this texture according to the size of the reflective surface because if the resolution is not high enough, the reflections will appear pixelated. You can set a value of 256x256, for example, and increase it if necessary. As we will manually handle the rendering of the reflection camera it must be disabled.

Finally we can use the OnWillRenderObject callback function to perform the rendering of the reflection camera:

void OnWillRenderObject(){

SetUpReflectionCamera();

GL.invertCulling = true;

reflCam.GetComponent<Camera>().Render();

GL.invertCulling = false;

gameObject.GetComponent<Renderer> ().material.SetTexture ("_DynReflTex",

reflCam.GetComponent<Camera> ().targetTexture);

}

Combining planar reflections and reflections based on local cubemaps

In the previous blog we provided the shader implementation for rendering reflections based on local cubemaps. Let’s add to it the runtime planar reflections rendered with the mirrored camera.

We need to pass to the shader a new uniform for the runtime reflection texture _DynReflTex:

uniform sampler2D _DynReflTex;

In the vertex shader we add the calculation of the vertex coordinates in screen space as we need to pass them to the fragment shader to apply the reflection texture:

output.vertexInScreenCoords = ComputeScreenPos(output.pos);

ComputeScreenPos is a helper function defined in UnityCG.cginc.

Accordingly in the vertexOutput structure we add a new line:

float4 vertexInScreenCoords : TEXCOORD4;

In the fragment shader, after retrieving the texture with the local corrected reflection vector, we fetch the texel from the 2D texture (planar reflections), which is updated every frame:

float4 dynReflTexCoord = UNITY_PROJ_COORD(input.vertexInScreenCoords);

float4 dynReflColor = tex2Dproj(_DynReflTex, dynReflTexCoord);

Before we make any use of the texel we need to revert the blending with the camera background color:

dynReflColor.rgb /= dynReflColor.a;

Then we combine the local cubemap reflections with the planar reflections as shown below using the lerp function:

// -------------- Combine static and dynamic reflections -----------------------------

float4 reflCombiColor;

reflCombiColor.rgb = lerp( reflColor.rgb, dynReflColor.rgb, dynReflColor.a );

reflCombiColor.a = 1.0;

The final fragment color can be replaced by the below line:

return _AmbientColor + lerp(reflCombiColor, texColor, _StaticReflAmount );

Reflections in VR – Why use stereo reflections in VR?

Reflections are one of the most common effects in games; therefore it is important to render optimized reflections in mobile VR. The reflection technique based on local cubemaps could help us implement not only efficient reflections but also high quality reflections. As we are fetching the texture from the same cubemap every frame, we will not get the pixel instability or pixel shimmering that can occur when rendering runtime reflections to a different texture each frame.

Nevertheless, it is not always possible to use the local cubemap technique to render reflections. When there are dynamic objects in the scene we need to combine this technique with planar reflections (2D texture render target), which are updated every frame.

In VR, the fact that we render left and right eyes individually leads to the question: is it ok to optimize resources and use the same reflection for both eyes?

Well, from our experience it is important to render stereo reflections where reflections are a noticeable effect. The point is that if we do not render left/right reflections in VR the user will easily spot that something is wrong in our virtual world. It will break the sensation of full immersion we want the user to experience in VR and this is something we need to avoid at all costs.

Using the same reflection picture for both eyes is a temptation we need to resist if we care about the quality of the VR experience we are providing to the user. If we use the same reflection texture for both eyes the consequence is that reflections don't seem to have any depth. When porting the Ice Cave demo to Samsung Gear VR using Unity’s native VR implementation, we decided to use different textures for both eyes for all types of reflections on the central platform (see Fig. 2) to improve the quality of the VR user experience.

Stereo planar reflections in Unity VR

Below I describe step by step how to implement stereo planar reflections in Unity VR. You must have checked the option “Virtual Reality Supported” in Build Settings -> Player Settings -> Other Settings.

Let’s first look at the more complex case of rendering stereo reflections for dynamic objects, i.e. when rendering runtime reflections to a texture. In this case we need to render two slightly different textures for each eye to achieve the effect of depth in the planar reflections.

First we need to create two new cameras targeting left/right eye respectively and disable them as we will render them manually. We then need to create a target texture the cameras will render to. The next step is to attach to each camera the below script:

void OnPreRender(){

SetUpReflectionCamera ();

// Invert winding

GL.invertCulling = true;

}

void OnPostRender(){

// Restore winding

GL.invertCulling = false;

}

This script uses the method SetUpReflectionCamera already explained in the previous section with a little modification. After calculating the new view matrix by applying the reflection transformation to the main camera worldToCameraMatrix we also need to apply the eye shift in the X local axis of the camera. After the line:

Matrix4x4 m = Camera.main.worldToCameraMatrix * reflection;

We add a new line for the left camera:

m [12] += stereoSeparation;

For the right camera we add:

m [12] -= stereoSeparation;

The value of the eye stereo separation is 0.011f.

The next step is to attach the script below to the main camera:

public class RenderStereoReflections : MonoBehaviour

{

public GameObject reflectiveObj;

public GameObject leftReflCamera;

public GameObject rightReflCamera;

int eyeIndex = 0;

void OnPreRender(){

if (eyeIndex == 0){

// Render Left camera

leftReflCamera.GetComponent<Camera>().Render();

reflectiveObj.GetComponent<Renderer>().material.SetTexture(

"_DynReflTex", leftReflCamera.GetComponent<Camera>().targetTexture );

}

else{

// Render right camera

rightReflCamera.GetComponent<Camera>().Render();

reflectiveObj.GetComponent<Renderer>().material.SetTexture(

"_DynReflTex", rightReflCamera.GetComponent<Camera>().targetTexture );

}

eyeIndex = 1 - eyeIndex;

}

This script handles the rendering of the left and right reflection cameras in the OnPreRender callback function of the main camera. This method is called twice, first for the left eye of the main camera and then for the right eye. The eyeIndex is responsible for assigning the rendering order to each eye of the reflection camera. It is assumed the first time OnPreRender is called is for the left main camera (eyeIndex = 0). This point has been checked and this is the order Unity calls the OnPreRender method.

During the implementation of stereo rendering it was necessary to check that the update of the planar reflection texture in the shader was well synchronized with the left and right main camera. For this, just with the aim of debugging, I passed the eyeIndex as a uniform to the shader and used two textures with different colors simulating the planar reflection texture. The image below shows a screenshot taken from the demo running on the device. It is possible to see two different well defined left and right textures on the platform meaning that when the shader is used to render with the main left camera the correct left reflection texture is used and the same for the main right camera.

Figure 6. Left/Right main camera synchronization with runtime reflection texture.

Once the synchronization was tested there was no need to pass the eyeIndex to the shader and it is only used in the script to manage the rendering order of left/right reflection cameras from the main left/right cameras. Additionally, with the synchronization working correctly, a single reflection texture is enough to render runtime reflections as it is used alternatively by the left/right reflection cameras.

As demonstrated, implementing stereo reflections does not add any additional overload to the shader. The above described scripts are very simple and the only overload when compared with non-stereo reflection rendering is one extra runtime planar reflection rendering. This can be minimized if it is performed only for the necessary objects. It is recommended that you create a new layer and only add to this layer the objects that need runtime planar reflections. This layer must then be used as a mask for the reflection cameras.

The script attached to the main camera can be further optimized to ensure that it runs only when the reflective object needs to be rendered. For that we can use the OnBecomeVisible callback in a script attached to the reflective object:

public class IsReflectiveObjectVisible : MonoBehaviour

{

public bool reflObjIsVisible;

void Start (){

reflObjIsVisible = false;

}

void OnBecameVisible(){

reflObjIsVisible = true;

}

void OnBecameInvisible(){

reflObjIsVisible = false;

}

Then we can put all the code in the OnPreRender method under the condition:

void OnPreRender(){

if (reflectiveObjetc.GetComponent< IsReflectiveObjectVisible > ().reflObjIsVisible){

…

}

Finally I will address the case of stereo reflections from static objects, i.e. when the local cubemap technique is used. In this case we need to use two different reflection vectors when fetching the texel from the cubemap, one for each left/right main cameras.

Unity provides a built in value for accessing camera position in world coordinates in the shader: _WorldSpaceCameraPos; however, when working in VR we do not have access to the position of the left and right main cameras in the shader. We need to somehow calculate those values and pass them to the shader as a single uniform.

The first step is to declare a new uniform in our shader:

uniform float3 _StereoCamPosWorld;

The best place to calculate the left/right main camera position is in the script we have attached to the main camera. We add for the eyeIndex = 0 case the below code lines:

Matrix4x4 mWorldToCamera = gameObject.GetComponent<Camera> ().worldToCameraMatrix;

mWorldToCamera [12] += stereoSeparation;

Matrix4x4 mCameraToWorld = mWorldToCamera.inverse;

Vector3 mainStereoCamPos = new Vector3 (mCameraToWorld [12], mCameraToWorld [13], mCameraToWorld [14]);

reflectiveObj.GetComponent<Renderer> ().material.SetVector ("_StereoCamPosWorld",

new Vector3 (mainStereoCamPos.x, mainStereoCamPos.y, mainStereoCamPos.z));

The new lines get the worldToCameraMatrix from the main non-stereo camera and apply the eye shift in the local X axis. The next step is to find the left camera position from the inverse matrix. This value is then used to update the uniform _StereoCamPosWorld in the shader.

For the right main camera (eyeIndex = 1) the lines are the same except the one related to the eye shift:

mWorldToCamera [12] -= stereoSeparation;

Finally in the vertex shader section of the previous blog we replace the line:

output.viewDirInWorld = vertexWorld.xyz - _WorldSpaceCameraPos;

with this line:

output.viewDirInWorld = vertexWorld.xyz - _ StereoCamPosWorld;

In this way we get two slightly different view vectors and thus two different reflections vectors. After the local correction is applied in the fragment shader, the retrieved texture from the static cubemap will be slightly different for each eye.

Once our stereo reflections are implemented we can see when running our application in the editor mode that the reflection texture flickers as it constantly changes from left to right eye. In the device we will see a perfect stereo reflection that shows depth and contributes to increasing the quality of the VR user experience.

Figure 7. Stereo reflections on the central platform in the Ice cave demo.

Conclusions

The use of the local cubemap technique for reflections allows rendering high quality and efficient reflections from static objects in mobile games. This method can be combined with other runtime rendering techniques to render reflections from dynamic objects.

In mobile VR it is important to render stereo reflections to ensure we are building our virtual world correctly and contributing to the sensation of full immersion the user is supposed to enjoy in VR.

In mobile VR, combined reflections must be expediently handled to produce stereo reflections and in this blog we have shown that it is possible to implement combined stereo reflections in Unity with a minimum impact on performance.

↧

Our Top Five Talks Highlights @ GDC’16 !

March 15, 2016, 3:17 pm

≫ Next: GDC 2016: A First-Timer’s Take

≪ Previous: Combined Reflections: Stereo Reflections in VR

This year, GDC is 30 years old… and it has become the annual pilgrimage of any serious game developer: from latest hardware releases to new, optimized APIs, developer tools, middleware and game engines.

At ARM we have a myriad of free resources and tools for game developers. From game development tutorials and developer guides to SDKs, sample code and developer tools. Our engineering team works flat out in the run up to GDC’16 to create new developer resources as well as updating existing ones so that game developers are all set to get the most out of ARM Mali and ARM Cortex processors. They are dedicated to ensuring you can fully utilize the computational power available and achieve console quality graphics on mobile platforms.

TALK #1: Vulkan API on Mobile with Unreal Engine 4 Case Study

The new Vulkan graphics API was just released a few weeks ago. Up to now, developers had OpenGL graphics API for desktop environments and OpenGL ES for mobile platforms. The latter was a sub-set of the OpenGL graphics API features to accommodate to a mobile architecture. However, with the latest technology advances on mobile, the graphics APIs were due for an upgrade and now was a good time, in terms of hardware device capabilities, to merge to a single graphics API. Great news for game developers!

The talk is going to be a deep dive into how the Vulkan API works, the shaders and pipelines, synchronisation, memory management, command buffers, etc. There are three key Vulkan feature highlights for mobile platforms covered in the talk: multithreading, multi-pass render passes and memory management.

Multithreading is key for mobile. Mass market mobile devices already have four to eight cores and previous graphics APIs did not take full advantage of them as implementing multiple threads meant a lot of context switches, which was taking a toll on performance. Vulkan brings multithreading to the next level, giving developers flexibility and control over the resources and when and how to execute the threads.

The multi-pass render passes feature allows the use of on-tile cache memory on mobile. It also enables the driver to perform various optimizations when each pixel rendered in a subpass accesses the results of the previous subpass at the same pixel location. It is similar to the concept of Pixel Local Storage introduced by ARM and available in OpenGL ES as an extension [.EXT].

Memory management behaviour depends on the GPU and pipeline architecture. Immediate rendering used mainly in desktops is very different from deferred rendering in mobile devices. In the Vulkan API, memory management is much more explicit than in previous APIs. Developers can allocate and deallocate memory in Vulkan, whereas in OpenGL the memory management is hidden from the programmer.

Finally, the talk illustrates the joint collaboration with Epic Games. Epic Games released Vulkan API support on their Unreal Engine 4 at MWC and Epic Games and ARM had the goal of showcasing impressive real-time graphics with UE4 Vulkan on the Samsung Galaxy S7. The result is the awesome ProtoStar demo pictured beside and this talk covers the challenges faced and the lessons learnt.

TALK #2: Making light work of dynamic large worlds

You may have seen theannouncementearlier today: the Enlighten team has just released a new feature set specially designed to solve the challenge of bringing dynamic global illumination to large world games. By developing advanced level of detail mechanisms for terrain, non-terrain lightmaps and probes, it is now possible for game studios to add large scale lighting effects, such as time of day, to complex worlds with huge draw distances.

Image from Seastack Bay

This is great news – map sizes in games have been getting bigger and bigger in recent years and their popularity is incredible. The current generation of gaming platforms introduced beautiful, real-time rendering of environments where the player can roam freely through forests, canyons and vast, open terrain. Running such worlds with dynamic lighting effects at acceptable framerates required new innovations.

Seastack Bay is the demonstration designed to showcase Enlighten’s new large world feature set. In this talk its lighting artist, Ivan Pedersen, presents the challenges he and the Enlighten team were faced with when developing a brand new global illumination technology for open worlds. He will be joined on stage by Dominic Matthews of Ninja Theory, a local game studio that collaborated with the Enlighten team to create Seastack Bay. They are the studio behind the upcoming title, Hellblade, and will discuss in this session how Enlighten is helping the project fulfil its ambitions.

TALK #3: Achieving High Quality Mobile VR Games

This talk is a joint collaboration with Unity, nDreams and ARM. Unity has been the lead game engine to integrate native VR support for the Samsung GearVR and nDreams is the leading VR game studio with a team dedicated to developing for mobile VR using Unity engine.

The talk starts by describing the steps a developer needs to take to port any application or game to VR in Unity, and to enable the GearVR developer mode. It then gives a few key recommendations regarding the challenges you might encounter when porting a specific game to a new VR environment: motion sickness, UI interaction and camera settings. Furthermore, as the VR experience is immersive, the developer needs to take into account that any frightening or unsettling situation will be amplified in VR.

The focus of the talk then moves to a series of optimized rendering effects for mobile platforms which have been achieved using the local cubemap technique. This covers reflections, the innovative way of implementing dynamic soft shadows and refractions, as well as implementing stereo reflections in VR.

Stereo reflections will be explained in detail. In VR, the left and right eyes are rendered individually and it is not good practice to use exactly the same reflections for both eyes. The recommendation is to render stereo reflections when reflections are a noticeable effect for the end-user, otherwise the reflections don’t appear to have any depth and the VR application will not provide a fully immersive virtual experience to the end-user.

The talk session will cover how to implement and synchronize left and right cameras to achieve stereo reflections in custom shaders using Unity engine. Afterwards, Unity will explain how and why 3D and Virtual Reality are perceived, so that the developers learn to use this in Unity3D engine to build architectural and gaming environments that create the sense of presence and immersion.

The last part of the talk is covered by nDreams and they are going to talk about their experience creating the renowned VR titles SkyDIEving, Gunner and Perfect Beach. They will also discuss the research they did to implement movement using controllers for the GearVR and the Google Cardboard, both designed for smartphones.

TALK #4: Optimize your Mobile Games with Practical Case Studies

This talk first introduces you to the ARM tools and skills needed to profile and debug your application. ARM has several tools to help game developers optimize their games, all available free of charge.

First of all, it’s recommended developers use a profiler tool to analyze the system performance. ARM provides DS-5 Streamline which covers the whole system performance (CPU and GPU) so that you can find the performance bottlenecks in your code. The four main culprits are:

being CPU bound (for instance, the game physics being too complicated),
being vertex bound (for example, your assets might have too many vertices)
being fragment bound (most common case: you might have a high overdraw index)
and finally, you could be bandwidth bound (ie. loading textures every frame or loading large textures).

Once the developer has identified the code bottlenecks, the Mali Graphics Debugger (MGD) tool traces every Vulkan, OpenGL ES or OpenCL API call to help understand the rendering cycle, showing the graphics calls and checking the shaders, textures and buffers. This helps the developer identify the issues. The talk covers all the latest features of MGD: VR support, geometry viewer, render pass dependencies, shaders reports and statistics, frame capture and analysis, alternating drawing modes, etc. These all help developers gain a quick, deeper understanding of the performance.

Mali Graphics Debugger

Finally, the optimization work comes along and six best practise techniques are explained in deep detail:

batching draw calls
eliminating overdraw
frustrum, occlusion and distance culling
Level of Detail (LoD)
texture compression with ASTC
mip-mapping and the use of anti-aliasing.

TALK #5: An End-to-End Approach to Physically Based Rendering (PBR)

Wes and Sam work in two companies that produce technology integral to many studios’ physically based workflows – Geomerics and Allegorithmic. Despite the increasing number of AAA and independent studios integrating a physically based pipeline, in practice they have seen a common lack of understanding about the repercussions which decisions made in the material creation phase have on later phases of the development pipeline – most significantly the lighting phase. Yet if PBR is managed correctly across the entire art team, its benefits come alive:

The guesswork around authoring surface attributes to look realistic is removed
An artist can set up a material once and reuse it throughout the game
Materials have an accurate appearance independent of lighting conditions
Studios can have a universal workflow that produces consistent artwork across teams and even across companies

No matter whether you are a texture artist or a lighting artist, it is important to understand the fundamentals of each step in the development pipeline to ensure that the work you do has the desired contribution to the final visual result.

To start with, this talk explains in an artist-friendly manner the fundamentals of lighting physics and energy conservation. It will equip texture artists with the knowledge they need to bear in mind throughout the material creation phase in order to create materials that light predictably no matter what the lighting set-up, including:

The light ray model
Specular and diffuse reflections
Absorption and scattering

It will go on to explain the two main PBR workflows and describe in detail the properties texture artists need to consider when supplying information into the shader:

Metallic workflow
Specular/glossiness workflow
Key differences between the workflows

Moving from concept into reality, the talk will discuss practical guidelines for creating PBR textures and remove the guesswork from setting material values. It is worth noting that while adherence to our guidelines will ensure an artist authors maps correctly, the physics principles discussed earlier will equip the audience with the knowledge needed to follow his intuition and explore different creative styles successfully within a PBR pipeline. We will discuss:

Base color – diffuse (albedo)
Metal reflectance
Dielectric F0

The talk will conclude with a practical demonstration of the theory. Using a single scene, we will vary the texture and lighting set up and the audience will be shown the visual impact of supplying both incorrect and correct material information.

↧

GDC 2016: A First-Timer’s Take

March 16, 2016, 6:16 pm

≫ Next: GDC16: Day 2 and Still Standing (Just)!

≪ Previous: Our Top Five Talks Highlights @ GDC’16 !

Day one of GDC 2016 has flown by in a whirlwind so I’m pausing for a minute to pop down my highlights. As a relative newby to the graphics industry this is my first time at GDC (and in the US!) so I’m keen to share my thoughts.

Grey skies didn't last long

After an uncharacteristically soggy start to the week, the Californian sun has once again blessed us with its presence for the opening of the world’s largest professionals only game dev event. Even in the queue at passport control it was apparent just how huge GDC really is, with every second person seemingly citing it as their reason for entering. Sure enough, people turned out in their thousands today for the first day of the expo. Companies from all over the world and in every game-related field are here to flaunt the latest and greatest in game development. ARMed (sorry) with piles of ARM powered Bluetooth controllers to give away (come find me to get yours!) I took to the stand excited to see exactly how it would unfold.

Rocking the Bluetooth controller look

Booth Bits

On ARM booth 1624 we started the day with a fantastic talk from Tony Chia of Nibiru, who presented their interactive SDK and VR ROM for mobile and announced the launch of the ARM & Nibiru Joint Innovation Lab. Hot on their heels were Playcanvas with some great insight into optimizing WebGL for mobile GPUs and their free, open source game engine specially optimized for ARM Mali, the world’s number 1 mobile GPU. Talks were even more popular than expected as we’re giving away an awesome Pipo P4 tablet after every single one!

One very happy winner!

Epic Games and Samsung Mobile joined us bright and early to present an amazing sponsored session case study around the recently released Vulkan API from Khronos. They provided a deep dive on the features of Vulkan and exactly how it works on mobile as well as all the benefits to performance that can be achieved when using it with Unreal Engine 4, allowing developers to reach new heights of graphics quality on mobile.

All the Rage

GDC veterans tell me that Virtual Reality is the tech on everyone’s lips this year with a bigger presence than ever seen before. It was of course a hot topic for us too with ARM Senior Graphics Architect & VR whitepaper author Sam Martin sharing his expertise on the dos and don’ts of mobile VR and exactly what it takes to create a super experience. Seats were at a premium for this one with people crowded round to make sure they didn’t miss out. Right after that came nDreams’ Patrick O’Luanaigh with his key VR development tips and where he sees the future of the industry.

Sam Martin sharing the tricks of the mobile VR trade

Some of our other in-house experts also took to our on-stand lecture theatre to share their findings on porting an engine to the Vulkan API and making sure you get the full benefit of its enhanced features for mobile. Our engineers also shared some up to the minute techniques for rendering shadows, stereo reflections and heaps of other cool effects used in the super popular Ice Cave VR demo.

Playtime

As if that wasn’t enough, our stand is packed to bursting with awesome demos from some of our key partners. Umbra’s demo blew my mind with its amazing use of automatic occlusion culling, visibility-based level of detail and 3D content streaming to take 30GB of original source data and let you fly over cities (and pretend to be Superman, obviously.)

Snail Games also kicked butt with the latest updates to their mobile kung-fu games Age of Wushu Dynasty and Taichi Panda and kept us big kids entertained for hours!

If you’ve missed out on any of this, or won’t be able to make everything you want to see from ARM over the next few days, never fear! As in previous years we’ll be hosting the Mali Graphics Week from Monday 18^th April to bring you all of the talks and presentations from GDC as well as a whole host of new blogs and articles covering everything you might have missed.

Fancy a brand new Samsung S7? Join our Mali Graphics Week Thunderclap to be in with a chance to win!

Right now I’m off to soak my tired feet before venturing out to check out the San Fran nightlife. More on GDC 2016 tomorrow and if you’re around, come and say hi on ARM booth 1624!

↧

GDC16: Day 2 and Still Standing (Just)!

March 17, 2016, 6:31 pm

≫ Next: DeePoon Unveils ARM Powered, All-in-One VR Headset

≪ Previous: GDC 2016: A First-Timer’s Take

WOW. Day two of GDC has come to an end and I’m exhausted and hyped at the same time. If anything it was even bigger and better than yesterday with people everywhere, heaps of fantastic talks and some really great visitors to the ARM booth. Having slept like the dead after a hectic but fantastic first day it was back on our lecture theatre first thing this morning to welcome even more awesome speakers.

Lecture Theatre Life

It’s so humbling to see how many of our partners are keen to share the stories of what they do with ARM and how Mali GPUs enable them to create bigger and better devices, applications and games every single day.

The standout presentation for me, and a lot of others judging by the massive crowd it drew, was by Zak Parrish from Epic Games who gave an incredible live demo of exactly how easy it can be to create a mobile game in Unreal Engine 4. Not only was Zak a heap of fun but he also gave away a brand new Samsung Galaxy S7 to one lucky talk attendee, hard to beat!

It’s often the unexpected that provides the most entertainment and this proved true for me again today when we had a scheduling clash with one of our speakers. With a packed lecture theatre audience ready and waiting, a couple of us ARM bods had the chance to what we do best: play! True to our inner kids we whipped out my current favourite Samsung Gear VR game, Keep Talking and Nobody Explodes. With the help of a volunteer we got the whole crowd involved to help diffuse a virtual bomb live on stage, saving the virtual lives of all present and providing us all with a good laugh at the same time!

VR fun with 'Keep Talking and Nobody Explodes'

Last up for today, Ellie Stone from the ARM Enlighten team took to the stage to show us their truly beautiful new demo, Seastack Bay. This shows the effect of true global illumination on large world games and the impact it has on realism and believability. With the rapid growth of VR this could be a huge help in achieving true immersion and the expansion into mobile is really exciting for the future of mobile gaming.

Ellie entertaining the masses with Enlighten Global Illumination demo Seastack Bay

Smashing the Sponsored Sessions

When you’re manning the lecture theatre it can be hard to tell how things are going elsewhere in the event so it’s always great to get feedback. Especially exciting for me was when people started approaching me after our sponsored sessions which take place in various halls over the venue and seem very distant when you’re based on-stand. Not only had these guys sought us out specially but, even better, it was to tell me that that they were keen to make sure I knew just how amazing those sessions had been.

One of our senior engineers, robertolopezmendez, gave an incredible presentation with Unity and nDreams on how to achieve truly high quality VR games and having seen him speak before, I knew it would be good. When people come out of their way, though, to tell you just how inspiring it was, it feels like a real honour to work with such knowledgeable people so willing to share their expertise.

'Achieving High Quality Mobile VR Games' – Roberto Lopez Mendez, ARM; Carl Calleweart, Unity; Patrick O’Luanaigh, nDreams

Friday Finale

It may be a little quieter tomorrow with some attendees heading home, but there’s still so much to see. We have more lecture theatre talks running until 2pm and we’re STILL giving away a Pipo P4 tablet after every single one! Our very own Stephen Barton will appear, and might be familiar from today’s sponsored sessions where he and Stacy Smith used practical case studies to show how to optimize your mobile games. He’ll be running a live Mali Graphics Debugger session at 10.30 before speakers from NHTV, Deepoon, Cocos and Perfect World each take to the stage with their variety of game development talks to round out your dev ed experience at GDC 16.

Don’t forget to join our thunderclap to share the joys of the Mali Graphics Week kicking off April 18th and I look forward to seeing you tomorrow for the last day of a truly epic week!

↧

DeePoon Unveils ARM Powered, All-in-One VR Headset

April 1, 2016, 3:54 am

≫ Next: 1, 2, 3 Easy steps to win a Samsung Galaxy S7!

≪ Previous: GDC16: Day 2 and Still Standing (Just)!

ARM® partner DeePoon recently launched a brand new, all-in-one virtual reality headset powered by ARM. Based on an Exynos 7420 chipset, the DeePoon M2 features an octa core ARM Cortex®-A57 & Cortex-A53 CPU set with an ARM Mali™-T760MP8 GPU configuration to take care of the heavy graphics requirements of a VR device.

Why Mali?

The Mali family of GPUs is the number one shipping in the world and for good reason. The incorporated power saving technologies enable Mali GPUs to provide incredibly high performance whilst remaining safely within the power and thermal limitations of the mobile form factor. Advanced bandwidth saving features such as ARM Frame Buffer Compression (AFBC), Adaptive Scalable Texture Compression (ASTC) and Smart Composition are just some of the key power saving features behind DeePoon’s decision to team up with ARM and incorporate Mali into their new device.

Dennis Laudick, ARM VP of Partner Marketing, supports the M2 launch in China

How important are DeePoon’s tech choices?

Latency (or the delay between real-life movement and the subsequent update to the VR display) can be a real issue in a VR experience. This is because noticeable latency causes a disparity between what your eyes see on the display and what your brain understands as the logical visual adjustments in real life, and this mismatch can lead to dizziness and nausea. By allowing the GPU to render directly to the front buffer, bypassing the additional off-screen buffers used in traditional graphics rendering, Mali GPUs are able to significantly reduce the number of interactions taking place. This provides a latency guarantee and ensures the device remains below the 20ms maximum motions to photons latency prescribed by the industry as necessary for a great VR experience.

The super sleek and powerful DeePoon M2

The display is another important element in a VR device and DeePoon have covered this angle with the smart selection of Samsung’s 2k AMOLED display which provides them with a refresh rate between 60-70Hz. The real selling point of an AMOLED display for VR is the ability to achieve low persistence through partial illumination. This involves lighting a section of the display only when it is showing accurate information then immediately switching it off again. This significantly reduces blurring for the user and, if handled at a sufficiently high refresh rate, is imperceptible to the eye. This method allows you the added flexibility to show multiple partial images during a single refresh, thereby updating the user’s display mid-frame, reducing latency and adding to the sense of full immersion required in VR applications. DeePoon have also ensured there’s plenty of content to play with, partnering with numerous games companies to provide more than 100 VR games and applications available to the user.

The close co-operation between DeePoon, ARM and Samsung has provided a wealth of industry knowledge from the frontrunners in mobile VR, ensuring the M2 is well equipped to take the market by storm.

↧

1, 2, 3 Easy steps to win a Samsung Galaxy S7!

April 1, 2016, 6:52 am

≫ Next: Get Set for Graphics Week!

≪ Previous: DeePoon Unveils ARM Powered, All-in-One VR Headset

Hi,

You might've heard of ARM Graphics Week, it's an online event which will run from April 18-22nd. The week will include write ups of a range of VR content by ARM including a new technical demo and blogs as well as about Vulkan, Unity, the latest Geomerics demos and updates on our SDKs; We'll even have news and updates from our Ecosystem Partners. You can see there's a whole bunch of content coming for Developers and Tech enthusiasts alike! Bookmark ARM Graphics Week - Mali Developer Center for a full list of all the content or follow ARM Mali Graphics week - Thunderclap live: Win a Samsung Galaxy S7 which will be updated with a list of all the blogs on the ARM Connected Community.

Here's where you can help, we want to get the word out about Graphics Week and to do so we're trying a thing called Thunderclap. With Thunderclap you're able to support ARM's message on Graphics Week by using your Facebook, Twitter or Tumbler accounts. Thunderclap will then send out a 1 time only message on April 15th to let other people know that Graphics Week is starting on Monday April 18th.

As a thank you, we're offering a Samsung Galaxy S7 to be raffled off between everyone who supports us. In order for this to happen though we need to reach our goal of 500 supports so we have quite a way to go. Please share this with your friends to make sure we reach our target and can give our supporters a well earned thanks

1. Go to the ARM Graphics Week Thunderclap page Thunderclap: ARM Mali Graphics Week

2. Click the option to support us with your preferred Social Media account:

3. Customise the message if you wish before agreeing to have it posted on your social media, Click "ADD MY SUPPORT" and the message will go out on April 15th at 9:00am PDT (that's 5:00pm for those in the UK). No other messages will be sent or any further use of your account.

Don't forget to tell your friends to make sure we reach our target otherwise we can't get the message out and give away the Samsung Galaxy S7!

Thanks,

Ryan

↧

Get Set for Graphics Week!

April 11, 2016, 2:10 am

≫ Next: Get started with the new ARM® Vulkan SDK v1.0.0 for Android!

≪ Previous: 1, 2, 3 Easy steps to win a Samsung Galaxy S7!

With GDC behind us for another year we in the Mali™ team are looking ahead to the ARM® Mali Graphics Week running from 18^th-22^nd April. Last year saw the first ever Graphics Week and it was so popular we’ve made it a permanent fixture on the calendar!

We have heaps of great content to cover so to make it easier to keep up, and be sure you don’t miss out on anything that really matters to you, we’ve themed each day with its very own hashtag:

Monday 18^th

April

Tuesday 19^th

April

Wednesday 20^th

April

Thursday 21^st

April

Friday 22^nd

April

#Vulkan

#Unitytips

#VR

#ARMTools

#ARMMaliNext

Starting on Monday with #Vulkan we’ll be bringing you the Vulkan presentations from our GDC sponsored sessions and lecture theatre talks as well as brand new blogs and articles on how to get the best from Khronos’s new API.

The following day sees us join Unity for #UnityTips Tuesday where we’ll be sharing demos built on Unity as well as event presentations from our in-house experts and partners. Wednesday is dedicated to the tech everyone is talking about with a whole host of awesome VR content including ARM’s SDK, blogs and whitepapers as well as the latest in ARM powered VR devices. Thursday is #ARMTools day and features presentations, blogs and videos on all the ARM Mali tools you need to develop better-than-ever games. Finally, on Friday, we’ll be talking about what’s next for ARM Mali including upcoming events, talks and developer tools and guides.

Sign up to the Mali Developer Center newsletter to make sure you receive all the latest news and don’t forget to follow on Twitter to be the first to see every new release!

Want a brand new Samsung Galaxy S7?

Help us share the ARM Mali Graphics Week joy by joining our Thunderclap to send a one-time Graphics Week tweet or Facebook message to your followers and you could win!

↧

Get started with the new ARM® Vulkan SDK v1.0.0 for Android!

April 17, 2016, 1:29 pm

≫ Next: SPIRV-Cross, working with SPIR-V in your app

≪ Previous: Get Set for Graphics Week!

The Vulkan API released in Feb’16 by Khronos is the new generation graphics and compute API designed to work across all platform types from desktops to mobile and embedded. One of the key features is that it’s a very low level API, giving more explicit control of the underlying hardware architecture as well as memory management. Developers can therefore optimize their code, achieving better performance and reducing power consumption. Furthermore, with the increase of the number of CPU cores in smartphones, Vulkan supports multithreading, giving developers flexibility and control over the resources and when and how to execute the threads.

The new ARM Vulkan SDK v1.0.0 consists of a collection of reso

urces: get started guide, sample code and tutorials; so that graphics developers can produce applications that run Android and Vulkan on an ARM processor. The only pre-requisite for developers is to have the latest Android Studio and NDK installed with an ARM Mali GPU based device that runs Android and supports Vulkan, like the Samsung Galaxy S7.

Pictures above - What you need to set up your environment and start programming in Vulkan

After the developer environment is set up and the graphics developer has gone through the basics on how the Vulkan API is designed, the SDK offers a set of sample code with its source code and tutorial, explaining step-by-step the lines of code, and this first release starts with the basic examples such as the “hello triangle”, “Rotating Texture” to introduce texturing and uniform buffers, “Multithreading”, “Introduction to Compute Shaders”, “Multisampling” to showcase the most efficient way to implement Multi Sample Anti-Aliasing (MSAA) with ARM Mali, and finally a “Spinning Cube” sample to introduce depth testing and push constants. You can download the SDK from our ARM Mali Developer Center: http://malideveloper.arm.com/resources/sdks/mali-vulkan-sdk/

Marius Bjorge, Hans-Kristian Arntzen and Panos Christopoulos are our lead engineers who built our new Vulkan SDK v1.0.0 for Android, a precious resource for graphics developers, so I encourage them to keep up the good work! And I am already looking forward to their next update, aimed at more experienced Vulkan developers, can’t wait!

↧

SPIRV-Cross, working with SPIR-V in your app

April 18, 2016, 12:23 am

≫ Next: Moving the Market With the Mali Powered Samsung Galaxy S7

≪ Previous: Get started with the new ARM® Vulkan SDK v1.0.0 for Android!

If you have been following Vulkan lately, you will have heard about SPIR-V, the new shading language format used in Vulkan.

We decided early on to standardize our internal engine on SPIR-V,

as we needed a way to cleanly support both our OpenGL ES backend as well as Vulkan without modifying shaders.

From our experience, having #ifdefs in a shader which depend on the graphics API you are targeting is not maintainable.

Our old system was based on text replacement which became more and more unmaintainable as new API features emerged.

We wanted something more robust and having a standard IR format in SPIR-V was the final piece of the puzzle to enable this.

Using our engine, we developed a demo showcasing Vulkan at GDC, please see my colleague's blog post for more information on the topic.

Porting a Graphics Engine to the Vulkan API

Compiling down to SPIR-V

SPIR-V is a binary, intermediate shading language format. This is great because it means that you no longer have to deal with vendor-specific issues in the GLSL frontend of the driver.

The flipside of this however is that you now have to consider how to compile a high level shading language down to SPIR-V.

The best alternatives for this currently is using Khronos' glslang library https://github.com/KhronosGroup/glslang or Google's shaderc https://github.com/google/shaderc.

These tools can compile GLSL down to SPIR-V which you can then use in Vulkan.

The optimal place to do this compilation is during your build system so that you don't have to ship shader source in your app.

We're close then, at least in theory, to having a unified system for GLES and Vulkan.

One language, many dialects

While our intention to use GLSL as our high level language makes sense when targeting GL/Vulkan, there are problems, one of which is that

there are at least 5 main dialects of GLSL:

Modern mobile (GLES3+)
Legacy mobile (GLES2)
Modern desktop (GL3+)
Legacy desktop (GL2)
Vulkan GLSL

Vulkan GLSL is a new entry to the list of GLSL variants. The reason for this is that we need a way to map GLSL to newer features found in Vulkan:

https://www.khronos.org/registry/vulkan/specs/misc/GL_KHR_vulkan_glsl.txt

Vulkan GLSL adds some incompatibilities with all the other GLSL variants, for example:

Descriptor sets, no such concept in OpenGL
Push constants, no such concept in OpenGL, but very similar to "uniform vec4 UsuallyRegisterMappedUniform;"
Subpass input attachments, maps well to Pixel Local Storage Pixel Local Storage on ARM® Mali™ GPUs
gl_InstanceIndex vs gl_InstanceID. Same, but Vulkan GLSL's version InstanceIndex (finally!) adds base instance offset for you

This makes it problematic to write GLSL that can work in both GL and Vulkan at the same time

and whilst it is always possible to use #ifdef VULKAN, this is a road we don't want to go down.

As you might expect from the blog title, we solved this with SPIRV-Cross, but more on that later in the post

Shader reflection

If you're starting out with simple applications in Vulkan, you don't have to deal with this topic quite yet, but once your engine starts to improve, you will very soon run into a fundamental difference between OpenGL ES/GLSL and Vulkan/SPIR-V. It is generally very useful to be able to query meta-data about the shader file you are working with. This is especially important in Vulkan since your pipeline creation very much depends on information that is found inside your shaders. Vulkan, being a very explicit API expects you as the API user to know this information up front and expects you to provide a VkPipelineLayout that describes which types of resources are used inside your shader.

Kind of like a function prototype for your shader.

VkGraphicsPipelineCreateInfo pipeline = {     ...     .layout = pipelineLayout,     ...
};

The pipeline layout describes which descriptor sets you are using as well as push constants. This serves as the "function prototype" for your shader.

VkPipelineLayoutCreateInfo layout = {     .setLayoutCount = NELEMS(setLayouts),     .pSetLayouts = setLayouts,     ...
};

Inside the set layouts is where you describe which resources you are using, for example

// For first descriptor set
VkDescriptorSetLayoutBinding bindings[] = {     {          .binding = 0,          .descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER,          .descriptorCount = 1,          .stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT     },     {          .binding = 2,          .descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,          .descriptorCount = 1,          .stageFlags = VK_SHADER_STAGE_VERTEX_BIT     },
};

You might ask how you would know this information before you have compiled your pipeline, but Vulkan does not provide an API for this because this is vendor-neutral code that shouldn't need to be implemented the same way N times over by all vendors.

In your simple applications you probably know this information up front. After all, you wrote the shader, so you can fill in this layout information by hand, and

you probably just have one or two resources anyway, so it's not that big a deal.

However, start to consider more realistic shaders in a complex application and you soon realize that you need a better solution.

In GLES, the driver provides us with reflection to some extent, for example:

GLint location = glGetUniformLocation(program, "myUniform");
...
glUniform4fv(location, 1, uniformData);

GLint attrib = glGetAttribLocation(program, "TexCoords");

SPIRV-Cross

To solve the problems of reflection and Vulkan GLSL/GLSL differences, we developed a tool and library, SPIRV-Cross, that will be hosted by Khronos and you can find it on https://github.com/KhronosGroup/SPIRV-Cross shortly.

This tool was originally published on our Github as spir2cross, but we have donated the tool to Khronos Group and future development will happen there.

The primary focus of the library is to provide a comprehensive reflection API as well as supporting translating SPIR-V back to high level shader languages.

This allowed us to design our entire engine around Vulkan GLSL and SPIR-V while still supporting GLES and desktop GL.

Shader toolchain

We wanted to design our pipeline so that the Vulkan path was as straight forward as we could make it, and we could deal with how to get back to GL/GLES while still being robust and sensible.

We found that it is much simpler to deal with GL specifics when disassembling from SPIR-V, it is not trivial to meaningfully modify SPIR-V in its raw binary format.

It therefore made sense to write all our shader code targeting Vulkan, and deal with GL semantics later.

Vulkan

In Vulkan, as we write in Vulkan GLSL, we simply use glslang to compile our sources down to SPIR-V. This shader source can then be directly given to the Vulkan driver.

We still need reflection however, as we need to build pipeline layouts.

#include "spirv_cross.hpp"
{     // Read SPIR-V from disk or similar.     std::vector<uint32_t> spirv_binary = load_spirv_file();     spirv_cross::Compiler comp(std::move(spirv_binary));     // The SPIR-V is now parsed, and we can perform reflection on it.     spirv_cross::ShaderResources resources = comp.get_shader_resources();     // Get all sampled images in the shader.     for (auto &resource : resources.sampled_images)     {          unsigned set = comp.get_decoration(resource.id, spv::DecorationDescriptorSet);          unsigned binding = comp.get_decoration(resource.id, spv::DecorationBinding);          add_sampled_image_to_layout(set, binding);     }     // And so on for other resource types.
}

We also need to figure out if we are using push constants. We can get reflection information about all push constant variables which are actually in use per stage, and hence compute the range which should be part of the pipeline layout.

spirv_cross::BufferRanges ranges = compiler.get_active_buffer_ranges(resources.push_constant_buffers.front().id);

From this, we can easily build our push constant ranges.

GLES

In GLES, things are slightly more involved. Since our shader sources are in Vulkan GLSL we need to make some tranformations before converting back to GLSL.

The GLES backend consumes SPIR-V, so we still do not have to compile shader sources in runtime. From that, we perform the same kind of reflection as in Vulkan.

Push Constants

In SPIRV-Cross, push constants are implemented as uniform structs, which map very closely to push constants:

layout(push_constant, std430) uniform VulkanPushConstant
{     mat4 MVP;     vec4 MaterialData;
} registerMapped;

becomes:

struct VulkanPushConstant
{     mat4 MVP;     vec4 MaterialData;
};
uniform VulkanPushConstant registerMapped;

in GLSL.

Using the reflection API for push constants we can then build a list of glUniform calls which will implement push constants for us.

Resource binding

We also need to remap descriptor sets and bindings. OpenGL has a linear binding space which is sorted per type,

e.g. binding = 0 for uniform buffers and binding = 0 for sampled images are two different binding points, but this is not the case in Vulkan.

We chose a simple scheme which allocates linear binding space from the descriptor set layouts.

Let's say we have a vertex and fragment shader which collectively use these bindings.

uniform (set = 0, binding = 1)
texture (set = 0, binding = 2)
texture (set = 0, binding = 3)
uniform (set = 1, binding = 0)
texture (set = 1, binding = 1)

In set 0, the range of uniform bindings used are [1, 1]. Textures are [2, 3]. We allocate the first binding in uniform buffer space to set 0, with binding offset -1.

To remap set/binding to linear binding space, it's a simple lookup.

linearBinding = SetsLayout[set].uniformBufferOffset + binding

textureBinding = SetsLayout[set].texturesOffset + binding

...

For set 1 for example, binding = 1 would be mapped to binding = 2 since set 0 consumed the two first bindings for textures. Similarly, for uniforms, the uniform buffer in set 1 would be mapped to binding = 1.

Before compilation back to GLSL we strip off all binding information using the SPIRV-Cross reflection API. After we have linked our GL program we can bind the resources to their new correct binding points.

Using this scheme we managed to use Vulkan GLSL and a Vulkan-style API in GLES without too many complications.

Pixel Local Storage

Our old system supported Pixel Local Storage, and we did not want to lose that by going to Vulkan GLSL so

we use the SPIRV-Cross PLS API to convert subpass inputs to PLS inputs and output blocks.

vector<PlsRemap> inputs;
vector<PlsRemap> outputs;
// Using reflection API here, can have some magic variable names that will be used for PLS in GLES and subpass inputs in Vulkan.
compiler.remap_pixel_local_storage(move(inputs), move(outputs));

Hopefully this blog entry gave you some insights into how you can integrate better with SPIR-V in your Vulkan apps.

↧

Moving the Market With the Mali Powered Samsung Galaxy S7

April 18, 2016, 12:35 am

≫ Next: ARM® Guide for Unity Developers v3.1 is available!

≪ Previous: SPIRV-Cross, working with SPIR-V in your app

Arguably the most high profile Mali powered device to hit the hands of customers this year is the Samsung Galaxy S7. Launched at Mobile World Congress 2016 in Barcelona, the Galaxy S7 represents Samsung’s latest offering to the premium mobile market. Similar in design to the S6, the S7 and S7 Edge strike a balance of sleek elegance and super sturdiness. One of the key features Samsung are talking about is the fact that there is an efficiently water-proofed smartphone! For those of us with a tendency to drop our phones in sinks, puddles and who knows what else, this is pretty big news in itself!

For the graphics geeks among us though, the incredible graphics and clarity and depth of colour are a real attraction for Samsung’s latest offering. The chipset is the Exynos 8 Octa (8890) and features four CPU cores based on a 64-bit ARMv8 architecture. ARM’s big.LITTLE technology is utilized to its full advantage to strike the perfect balance between super high performance and premium power efficiency. The complex user interface and incredible graphics capability is powered by a Mali-T880MP12 GPU configuration, the most powerful Mali GPU on the market.

So why is Mali the GPU of choice to power high end devices? Simple, Mali’s the number 1 GPU in the world! Great leaps in energy efficiency come from the built in bandwidth saving technologies like ARM Frame Buffer Compression (AFBC), Smart Composition and Transaction elimination and make it the perfect choice for the latest high end devices. One of the reasons there’s such focus on performance and efficiency is the rise of the VR industry which is powering ahead at an unforeseen rate. Obviously VR requires fantastic graphics, when your eyes are just centimeters from a mobile screen the image needs to be spectacular, but another smart choice Samsung have made in this area is their display.

The AMOLED display works differently from a traditional LCD display in that each and every pixel is individually lit and adjusted by the amount of power travelling through the film behind it. This means that unlike LCD displays where there is a permanent backlight; Samsung’s AMOLED display allows you to completely turn off sections of the screen. Not only does this allow you to achieve a deeper, truer black than on an LCD display but it also means that in VR applications you can light only the part of the screen that is showing the correct view based on the user’s head position. This allows faster adjustment to the updated head positioning, lowering latency and providing a sharper, more immersive VR experience than is available on LCD displays. As Samsung are ahead of the VR game with the Oculus collaborated Samsung Gear VR headset, this is an important factor in staying ahead of the game.

With incredible Mali based visuals, superior battery life and a fantastic user interface, the Samsung S7 represents another step up for Android devices and we look forward to seeing what comes next.

↧

ARM® Guide for Unity Developers v3.1 is available!

April 19, 2016, 2:06 am

≫ Next: Multi-Threading in Vulkan

≪ Previous: Moving the Market With the Mali Powered Samsung Galaxy S7

Unity is a multi-platform game development engine used by the majority of game developers. It enables you to create and distribute 2D and 3D games and other graphics applications.

At ARM, we care about game developers. We know we can now achieve console quality games on mobile platforms and we therefore compiled the “ARM Guide for Unity Developers”, a compilation of best practises and optimized techniques to get the most from an ARM mobile platform. Whether you are anything from a beginner to an advanced Unity user, you will find the advice you need to increase the FPS in your graphics app.

Optimization Process

The guide starts by covering the optimization process, so that developers learn the optimal quality settings and the fundamentals of the optimization process. It showcases how to use the Unity Profiler and Debugger as well as the ARM developer tools (Mali™ Graphics Debugger and Streamline).

The profiler is used as a first step, to take measurements of the graphics application and analyze the data to locate any code bottlenecks. Then, we determine the relevant optimization to apply, and finally the developer needs to verify that the optimization works.

The guide dedicates a whole sub-chapter to another very useful ARM tool for Unity developers; the Mali Offline Shader Compiler, which enables developers to compile vertex, fragment and compute shaders into a binary form. Also, it provides information about the number of cycles the shaders are required to execute in each pipeline of the Mali GPU, so that developers can analyze and optimize for ARM Mali GPUs.

Optimizations

The optimizations chapter includes everything from ARM Cortex application processor optimizations with code snippets and settings examples, to ARM Mali GPU optimizations as well as asset optimizations.

The ARM Mali GPU optimization techniques include:

The use of static batching, a common optimization technique that reduces the number of draw calls therefore reducing the application processor utilization.
The use of 4 x MSAA, ARM Mali GPUs can implement 4x multi-sample anti-aliasing (MSAA) with very low computational overhead.

LOD group settings

Level of Detail (LOD), a technique where the Unity engine renders different meshes for the same object depending on the distance from the camera.
The use of lightmaps and light probes. Lightmaps pre-compute the lighting calculations and bake them into a texture called a lightmap. This means developers lose the flexibility of a fully dynamically lit environment, but they do get very high quality images without impacting performance. On the other hand, the use of light probes enables developers to add some dynamic lighting to light-mapped scenes. The more probes there are, the more accurate the lighting is.
ASTC Texture Compression is the most efficient and flexible texture compression format available for Unity developers. It provides high quality, low bitrate and many control options, which are explained in detail.
Mipmapping technique to enhance visual quality as well as the performance of the graphics application. Mipmaps are pre-calculated versions of a texture at different sizes. Each texture generated is called a level and it is half as wide and half as high as the preceding one. Unity can automatically generate the complete set of levels from the 1^st level at the original size down to a 1x1 pixel version.
Skypboxes as a means to draw the background of the camera using a single cubemap, requiring only a single cubemap texture and one draw call.
How to implement efficient real-time shadows in Unity. Unity supports transform feedback for calculating real-time shadows and for advanced developers the guide shows how to implement custom shadows based on a very efficient technique using local cubemaps in the “Advanced Graphics Techniques” chapter.
Occlusion Culling consists of not rendering the objects when they are not in line of view from the camera, thereby saving GPU processing power.
How to efficiently use OnBecameVisible() and OnBecomeInvisible() callbacks
Rendering Order is very important for performance. The most optimal way is to render opaque objects front-to-back, helping reducing overdraw. Developers can learn what latest hardware techniques are also available to reduce overdraw, like early-Z and Pixel Forward Kill (PFK), as well as what the options provided by the Unity engine.

Developers can optimize their application further by using asset optimizations and a whole sub-chapter addresses this, covering how to most effectively prepare textures and texture atlases, meshes and animations.

Enlighten

The Unity engine supports Global Illumination (GI) using Enlighten from v5 onwards. Enlighten is the ARM Geomerics real-time GI solution.

Enlighten in Unity can be used for baking light maps, light probes and for real-time, indirect lighting. The Enlighten components are not explicitly exposed in Unity, but they are referenced in the user interface and the guide therefore also explains what they are and how they work together.

The Enlighten section also explains how to configure Enlighten in custom shaders, the code flow and what developers need to do to set up Enlighten in the vertex and fragment shader code. It showcases a version of the Unity Standard Shader that is modified to include directional global illumination.

Enlighten Lightmap Images: above left - Ice Cave demo, above right –

its UV Chart lightmap, below left – its Irradiance lightmap, below right – its directionality lightmap

Advanced Graphics Techniques

Chapter 6, the longest chapter of the guide, explains Advanced Graphics Techniques. These techniques are mainly implemented using “Custom Shaders” as the Unity source code of built-in shaders does not include the majority of advanced effects. The chapter starts by describing how to write and debug custom shaders and then goes on to explain how to implement advanced graphics techniques used in the Ice Cave and Chess Room demos. It also shows source code snippets:

Reflections with a local cubemap: this technique is implemented in Unity v5 and higher with reflection probes. You can combine these reflections probes with other types of reflections, such as reflections rendered at runtime with your own custom shader.
Combining static reflections based on local cubemaps with dynamically generated reflections

Combining Different Types of Reflections

Dynamic soft shadows in a game scene there are moving objects and static environments such as rooms. Using dynamic soft shadows based on the local cubemap technique, developers can use a texture to represent the shadows and the alpha channel to represent the amount of light entering the room. $Refractions.png$
Refraction based on local cubemaps – another lighting effect using the highly optimized local cubemap technique. Developers can combine the refractions with reflections at runtime.
Specular effects using the very efficient Blinn technique. In the example provided from the Ice Cave demo, the alpha channel is used to determine the specular intensity, thus ensuring that the specular effect is applied only to surfaces that are lit.
Using Early-z to improve performance by removing overdrawn fragments.
Dirty lens effect– this effect invokes a sense of drama and is often used together with a lens flare effect. This can be implemented in a very light and simple way which is suitable for mobile devices.
Light shafts - they simulate the effect of crepuscular rays, atmospheric scattering or shadowing. They add depth and realism to a scene. This effect is based on truncated cone geometry and a script that uses the position of the sun to calculate the magnitude of the lower cross-section cone expansion, and the direction and magnitude of the cross-section shift.
Fog effects– they add atmosphere to a scene. There are two versions of the fog effect: procedural linear fog and particle-based fog.
Bloom– bloom reproduces the effects that occur in real cameras when taking pictures in a bright environment. This effect is noticeable under intense lighting and the guide demonstrates this effect implemented in a very efficient way by using a simple plane approach.
Icy wall effect.
Procedural skybox - to achieve a dynamic time of day effect, the following elements were combined in the Ice Cave demo skybox: a procedurally generated sun, a series of fading skybox background cubemaps that represent the day to night cycle, and a skybox clouds cubemap.
Fireflies– they are bright flying insects that are used in the Ice Cave demo to add more dynamism and show the benefits of using Enlighten for real-time global illumination.

Mobile Virtual Reality

Last but not least, the last chapter of the guide covers the best coding practises when developing graphics applications for Mobile Virtual Reality.

Unity natively supports some VR devices like the Samsung Gear VR and plug-ins can enable support of other devices like the Google Cardboard. The guide describes how to port a graphics application onto native Unity VR.

Screenshot from a VR application running in Samsung Gear VR developer mode

VR creates a more immersive user experienced compared to running the graphics application from your smartphone or tablet and therefore, camera animations might not feel comfortable for the user in VR. Also, VR can benefit from controllers that connect to the VR device using Bluetooth. Tips and methods to create the ultimate user experience are described in the guide.

A whole sub-chapter is dedicated to how to implement reflections in VR. They can use the same local cubemap technique explained earlier in the Advanced Graphics Techniques chapter. However, the technique must be modified to work with the stereo visual output that a user sees. This chapter therefore explains how to implement stereo reflections as well as combining different types of reflections.

We welcome the feedback on our ARM Guide for Unity Developers, which we keep updating on a regular basis and the document history is on our Mali Developer Center.

↧

Multi-Threading in Vulkan

April 19, 2016, 2:57 am

≫ Next: VR SDK v1.0 for Android launched at GDC’16!

≪ Previous: ARM® Guide for Unity Developers v3.1 is available!

In my previous blog post I explained some of the key concepts of Vulkan and how we implemented them in our internal graphics. In this post I will go into a bit more detail about how we implemented multi-threading and some of the caveats to watch out for.

Quick background

Vulkan was created from the ground up to be thread-friendly and there's a huge amount of details in the spec relating to thread-safety and the consequences of function calls. In OpenGL, for instance, the driver might have a number of background threads working while waiting for API calls from the application. In Vulkan, this responsibility has moved up to the application level, so it's now up to you to ensure correct and efficient multi-threading behavior. This is a good thing since the application often has better visibility of what it wants to achieve.

Command pools

In Vulkan command buffers are allocated from command pools. Typically you pin command pools to a thread and only use this thread when writing to command buffers allocated from its command pool. Otherwise you need to externally synchronize access between the command buffer and the command pool which adds overhead.

For graphics use-cases you also typically pin a command pool per frame. This has the nice side-effect that you can simply reset the entire command pool once the work for the frame is completed. You can also reset individual command buffers, but it's often more efficient to just reset the entire command pool.

Coordinating work

In OpenGL, work is executed implicitly behind the scenes. In Vulkan this is explicit where the application submits command buffers to queues for execution.

Vulkan has the following synchronization primitives:

Semaphores - used to synchronize work across queues or across coarse-grained submissions to a single queue
Events and barriers - used to synchronize work within a command buffer or a sequence of command buffers submitted to a single queue
Fences - used to synchronize work between the device and the host

Queues have simple sync primitives for ordering the execution of command buffers. You can basically tell the driver to wait for a specific event before processing the submitted work and you can also get a signal for when the submitted work is completed. This synchronization is really important when it comes to submitting and synchronizing work to the swap chain. The following diagram shows how work can be recorded and submitted to the device queue for execution before we finally tell the device to present our frame to the display.

In the above sequence there is no overlap of work between different frames. Therefore, even though we're recording work to command buffers in multiple threads, we still have a certain amount of time where the CPU threads sit idle waiting for a signal in order to start work on the next frame.

This is much better. Here we start recording work for the next frame immediately after submitting the current frame to the device queue. All synchronization here is done using semaphores. vkAcquireNextImageKHR will signal a semaphore once the swap chain image is ready, vkQueueSubmit will wait for this semaphore before processing any of the commands and will signal another semaphore once the submitted commands are completed. Finally, vkQueuePresentKHR will present the image to the display, but it will wait for the signaled semaphore from vkQueueSubmit before doing so.

Summary

In this blog post I have given a brief overview of how to get overlap between CPU threads that record commands into command buffers over multiple frames. For our own internal implementation we found this really useful as it allowed us to start preparing work for the next frame very early on, ensuring the GPU is kept busy.

↧

VR SDK v1.0 for Android launched at GDC’16!

April 19, 2016, 5:07 pm

≫ Next: Efficient Mirroring from Samsung Gear VR

≪ Previous: Multi-Threading in Vulkan

Virtual reality is a hot topic for mobile devices. 2015 was the year of the rise of mobile VR with Samsung and Oculus releasing the Galaxy Gear VR headset for the Samsung Galaxy Note 4 and Note 5, as well as Galaxy S6 smartphones. Around the same time, Google launched their Google Cardboard VR headsets and the trend grew with a myriad of other players releasing Head Mounted Displays (HMD) for VR. From Asia with Deepoon’s all-in-one headsets and VIRGLASS headsets with a smartphone tray, to Western companies like Carl Zeiss and their VR One HMD.

Also during 2015 Unity, the games and graphics engine most used by developers, added native support for Samsung Gear VR as well as for other VR/AR hardware using third party plug-ins.

At ARM®, in order to help our partner ecosystem to flourish in that area, we released our first VR SDK v0.1 alpha release during the summer of 2015. This March at GDC we launched our v1.0 publically to developers and our OEM and SiP partners.

The ARM Mali™ VR SDK is based on Android and OpenGL ES at this stage and includes sample code and libraries for VR developers. It’s applicable to everyone from VR application developers to HMD designers who would like to achieve the lowest latency, highest performance and minimal battery consumption on any ARM Mali based mobile platform.

To get your environment set up you need the Android Studio SDK and the Android NDK. The developer can either render the samples on any Android device or they can calibrate the sample for specific HMD they are designing. The samples cover everything from VR basics and the fundamentals of stereoscopy, to how best to use the Multiview extension and implementing Multi-Sampling.

You can download the VR SDK from our ARM Mali Developer Center, and standby for the future updates with all the new VR extensions on Android on their way!

↧

Efficient Mirroring from Samsung Gear VR

April 20, 2016, 12:31 am

≫ Next: Achieving High Quality Mobile VR Games

≪ Previous: VR SDK v1.0 for Android launched at GDC’16!

The Ice Cave demo is a Unity demo released by ARM® Ecosystem. With this demo we wanted to show that it is possible to achieve high quality visual content on current mobile devices powered by ARM Cortex® CPUs and ARM Mali™ GPUs. A number of highly optimized rendering effects were developed for this demo.

After the demo was released we decided to port it to Samsung Gear VR using the Unity native VR implementation. During the porting work we made several changes as not all of the features of the original demo were VR friendly. We also added a couple of new features, one of which was ability to mirror the content from the Samsung Gear VR headset to a second device. We thought it would be interesting to show people at events what the actual user of the Samsung Gear VR headset was seeing in real time. The results exceeded even our expectations.

Figure 1. Ice Cave VR mirroring from Samsung Gear VR at Unity AR/VR Vision Summit 2016.

At every event where we have shown the mirroring from the Ice Cave VR demo running on the Samsung Gear VR we have been asked how we achieved it. This short blog is the answer to that question.

I think the reason this technique raises so much interest is because we like to socialize our personal VR experience and at the same time, other people are simply curious about what the VR user is experiencing. The desire to share the experience works both ways. For developers it is also important to know, and therefore helpful to see, how users test and experience the game.

Available options

In 2014 Samsung publicly announced their AllShare Cast Dongle to mirror the content from their Samsung Gear VR. The dongle connects to any HMDI display and then mirrors the display of the smartphone onto the secondary screen in a similar way to Google Chromecast. Nevertheless, we wanted to use our own device and decided to test an idea we heard had worked for Triangular Pixels when Katie Goode (Creative Director) delivered a talk at ARM.

The idea was very simple: to run, in a second device, a non VR version of the application and send the required info via Wifi to synchronize both applications. In our case we just needed to send the camera position and orientation.

Figure 2. The basic idea of mirroring.

The implementation

A single script described below manages all the networking for both client and server. The server is the VR application running on the Samsung Gear VR headset, while the client is the non-VR version of the same application running on a second device.

The script is attached to the camera Game Object (GO) and a public variable isServer defines if the script works for the server or the client side when building your Unity project. A configuration file stores the network IP of the server. When the client application starts it reads the server’s IP address and waits for the server to establish a connection.

The code snippet below performs the basic operations to set up a network connection and reads the server’s IP network address and port in the function getInfoFromSettingsFile. Note that the client starts in a paused state (Time.timeScale = 0) as it will wait for the server to start before establishing a connection.

void Start(){

getInfoFromSettingsFile();

ConnectionConfig config = new ConnectionConfig();

commChannel = config.AddChannel(QosType.Reliable);

started = true;

NetworkTransport.Init();

// Maximum default connections = 2

HostTopology topology = new HostTopology(config, 2);

if (isServer){

hostId = NetworkTransport.AddHost(topology, port, null);

}

else{

Time.timeScale = 0;

hostId = NetworkTransport.AddHost(topology, 0);

}

When the server application starts running it sends the camera position and orientation data for every frame through the network connection to be read by the client. This process takes place in the Update function as implemented below.

void Update ()

{

if (!started){

return;

}

int recHostId;

int recConnectionId;

int recChannelId;

byte[] recBuffer = new byte[messageSize];

int recBufferSize = messageSize;

int recDataSize;

byte error;

NetworkEventType networkEvent;

do {

networkEvent = NetworkTransport.Receive(out recHostId, out recConnectionId,

out recChannelId, recBuffer, recBufferSize, out recDataSize, out error);

switch(networkEvent)

{

case NetworkEventType.Nothing:

break;

case NetworkEventType.ConnectEvent:

connected = true;

connectionId = recConnectionId;

Time.timeScale = 1; //client connected; unpause app.

if(!isServer){

clientId = recHostId;

}

break;

case NetworkEventType.DataEvent:

rcvMssg(recBuffer);

break;

case NetworkEventType.DisconnectEvent:

connected = false;

if(!isServer){

Time.timeScale = 0;

}

break;

}

} while(networkEvent!=NetworkEventType.Nothing);

if (connected && isServer){ //Server

send();

}

if (!connected && !isServer){ // Client

tryToConnect();

}

In the Update function different types of network events are processed. A soon as the connection is established the client application changes its state from paused to running (Time.timeScale = 1). If a disconnection event takes place then the client is paused again. This will occur, for example, when the device is removed from the Samsung Gear VR headset or when the user just removes the headset and the device detects this and goes into pause mode.

The client application receives the data sent from the server in the NetworkEventType.DataEvent case. The function that reads the data is shown below:

void rcvMssg(byte[] data)

{

var coordinates = new float[data.Length / 4];

Buffer.BlockCopy(data, 0, coordinates, 0, data.Length);

transform.position = new Vector3 (coordinates[0], coordinates[1], coordinates[2]);

// To provide a smooth experience on the client, average the change

// in rotation across the current and last frame

Quaternion rotation = avgRotationOverFrames (new Quaternion(coordinates [3], coordinates [4],

coordinates [5], coordinates [6]));

transform.rotation = rotation;

lastFrame = rotation;

}

The interesting point here is that the client doesn’t directly use the data received relative to camera position and orientation. Instead the quaternion that describes the rotation of the camera in the current frame is interpolated with the quaternion of the previous frame to smooth camera rotations and avoid sudden changes if a frame is skipped. The function avgRotationOverFrames performs the quaternion interpolation.

Quaternion avgRotationOverFrames(Quaternion currentFrame)

{

return Quaternion.Lerp(lastFrame,currentFrame, 0.5f);

}

As can be seen in the Update function, the server sends camera data over the network every frame. The implementation of the function send is shown below:

public void send()

{

byte error;

byte[] buffer = new byte[messageSize];

buffer = createMssg();

if (isServer){

try{

NetworkTransport.Send (hostId, connectionId, commChannel, buffer, buffer.Length, out error);

}

catch (Exception e){

Debug.Log("I'm Server error: +++ see below +++");

Debug.LogError(e);

return;

}

The function createMssg prepares an array of seven floats; three floats from the camera position coordinates and four floats from the camera quaternion that describes the camera orientation.

byte[] createMssg()

{

var coordinates = new float[] { transform.position.x, transform.position.y, transform.position.z,

transform.rotation.x, transform.rotation.y, transform.rotation.z,

transform.rotation.w};

var data = new byte[coordinates.Length * 4];

Buffer.BlockCopy(coordinates, 0, data, 0, data.Length);

return data;

}

This script is attached to the camera for both server and client applications, for the server the public variable isServer must be checked. Additionally, when building the client application the option Build Settings -> Player Settings -> Other Settings -> “Virtual Reality Supported” must be unchecked as the client application is a non-VR version of the application running on the Samsung Gear VR.

To keep the implementation as simple as possible the server IP address and port are stored in a config file on the client device. When setting up the mirroring system, the first step is to launch the client non-VR application. The client application reads the network data from the config file and enters into a paused state, waiting for the server to start to establish a connection.

Due to time constraints we didn’t devote much time to improving the mirroring implementation described in this blog. We would love to hear any feedback or suggestions for improvement that we can share with other developers.

The picture below shows the mirroring system we use to display what the actual user of the Samsung Gear VR is seeing. Using an HDMI adapter the video signal is output to a big flat panel display in order to share the Samsung Gear VR user experience with others.

Figure 3. Mirroring the Ice Cave VR running on the Samsung Gear VR. The VR server application runs on a Samsung Galaxy S6 based on the Exynos 7 Octa 7420 SoC

(4x ARM Cortex-A57 + 4x Cortex-A53 and ARM Mali-T760 MP8 GPU). The non-VR client application runs on a Samsung Galaxy Note 4 based on the Exynos 7 Octa 5433 SoC

(4x ARM Cortex-A57 + 4x Cortex-A53 and ARM Mali-T760 MP6 GPU).

Conclusions

The Unity networking API allows an easy and straightforward implementation of mirroring a VR application running on the Samsung Gear VR to a second device running a non VR version of the application. The fact that only the camera position and orientation data are sent every frame guarantees that no additional overload is imposed on either device.

Depending on the application there could be more data to send/receive to synchronize both server and client worlds but the principle to follow will be the same: for every object that needs to sync, send transform data and interpolate them.

The mirroring technique described in this blog will also work in the case of a multiplayer game environment. The name of server/client roles could potentially be swapped depending on the type of mirroring setup. We could have several VR headsets and a single screen, or several screens for a single VR headset or even several of each. Again, every device running on the Samsung Gear VR will send the sync data to one or more devices that share a view to a big screen panel. Each mirroring application has to instantiate every player connected to it, update the transforms of all synced objects following the same receipt and display a single camera view. This could be a view from any of the players or any other suitable view. Sending additional data to keep the mirroring worlds synced shouldn’t have a significant impact on the performance as the amount of info that needs updating per object is really minimal.

↧

Achieving High Quality Mobile VR Games

April 20, 2016, 1:04 am

≫ Next: Using Mali Graphics Debugger on a Non-rooted device

≪ Previous: Efficient Mirroring from Samsung Gear VR

Introduction

Last month the game developer community celebrated its main event in San Francisco: the Game Developers Conference (GDC). The longest-running event devoted to the game industry set a new record in its 30^th edition with more than 27,000 attendees. The expo hall was crowded until the very last minute and many talks were moved to bigger rooms to accommodate the demand.

In this blog I would like to provide a round-up of one of the ARM sponsored talks at GDC 2016: Achieving High Quality Mobile VR Games. I had the privilege of sharing the talk with two great colleges; Carl Callewaert (Unity Technologies Americas Director & Global Leader of Evangelism) and Patrick O'Luanaigh (nDreams CEO).

Figure 1. Delivering the presentation at GDC 2016.

The talk was devoted to mobile VR but each of the speakers presented different aspects of the topic. I spoke from the perspective of developers and shared our experience in porting Ice Cave demo to Samsung Gear VR. I also talked about some highly optimized rendering techniques, based on local cubemaps, that we have used in the demo to achieve high quality VR content. I also discussed the importance of rendering stereo reflections and showed how to implement them in Unity.

Carl talked from the perspective of the game platform which is used by more than a half of developers all over the world. He shared with the audience the latest news about the VR integration into Unity and discussed very interesting ideas about how to use well established architectural design principles to build VR gaming environments that create the sense of presence and immersion. To the delight of the attendants Carl showed the first part of the real-time rendered short film Adam, an impressive photorealistic demo that highlights Unity’s rendering capabilities.

Finally, Patrick presented from the perspective of the game studio that has already successfully released several VR games. As part of the development process nDreams has extensively researched movement in VR. In his talk Patrick shared some of their most interesting findings as part of their commitment to delivering the best user experience in their VR game catalogue.

The concept of local cubemaps

The content I delivered during the first part of the session was devoted mainly to describing several rendering techniques, based on local cubemaps, that we used in the Ice Cave demo. For those who are not very familiar with the concept of local cubemaps I explain it briefly below.

Let’s assume we have a local environment delimited by an arbitrary boundary and we have baked the surrounding environment in a cubemap from a given position inside of the local environment. We are looking at some star in the boundary in the direction defined by vector V and we want to answer the question: what is the vector we need to use to retrieve the star from the cubemap texture?

Figure 2. The concept of local cubemap.

If we use the same vector V instead of the star we will get the happy face as shown in the left picture of Figure 1. What then is the vector we need to use? As we can see from the middle picture we need to use a vector from the cubemap position to the intersection point of the view vector with the boundaries. We can solve this type of problem only if we assume some simplifications.

We introduce a proxy geometry to simplify the problem of finding the intersection point P as shown in the picture on the right. The simplest proxy geometry is a box, the bounding box of the scene. We find the intersection point P and we build a new vector from the position the cubemap was baked to the intersection point and we use this new “local corrected vector” to fetch the texture from the cubemap. The lesson here is that for every vector we use to retrieve whatever we bake in the local cubemap, we need to apply the local correction.

Improving VR quality & performance

Developing games for mobile devices is challenging as we need to very carefully balance runtime resources. Mobile VR is even more challenging as we have to deal with the added complexity of stereo rendering and the strict requirements for FPS performance to achieve a successful user experience.

Several highly efficient rendering techniques based on local cubemaps used in the Ice Cave demo have proved very suitable for VR as well.

Dynamic Soft Shadows based on local cubemaps

As we know, runtime shadows in mobile devices are expensive; in mobile VR they are a performance killer. The new shadow rendering technique based on local cubemaps developed at ARM contributes to saving runtime resources in mobile VR while providing high quality shadows. The implementation details of this technique can be found in several publications ¹^,²^,³.

Figure 3. Dynamic soft shadows based on local cubemaps.

The main idea of this technique is to render the transparency of the local environment boundaries to the alpha channel of a static cubemap off-line. Then at runtime in the shader we use the fragment-to-light vector to fetch the texel from the cubemap and determine if the fragment is lit or shadowed. As we are dealing with a local cubemap the local correction has to be applied to the fragment-to-light vector before the fetching operation. The fact that we use the same texture every frame guarantees high quality shadows with no pixel shimmering or instabilities which are present with other shadow rendering techniques.

Dynamic soft shadows based on local cubemaps can be used effectively with other runtime shadows techniques to combine shadows from static and dynamic geometry. Another important feature of this technique is the fact that it can efficiently reproduce the softness of the shadows, i.e. the fact that shadows are softer the further away they are from the object that creates them.

Figure 4. Combined shadows in the Ice Cave demo.

Reflections based on local cubemaps

The local cubemap technique can also be used to render very efficient and high quality reflections. When using this technique the local environment is rendered off-line in the RGB channels of the cubemap. Then at runtime in the fragment shader we fetch the texel from the cubemap in the direction of the reflection vector. Again though, as we are dealing with a local cubemap we first need to apply the local correction to the reflection vector, i.e. build a new vector from the position where the cubemap was generated to the intersection point P (Figure 4). We finally use the new vector R’ to fetch the texel from the cubemap.

Figure 5. Reflections based on local cubemaps.

The implementation details of this technique can be found in previous blogs ³^,⁴^,⁵. This technique can also be combined with other runtime reflection techniques to integrate reflections from static and dynamic geometry ³^,⁶.

Figure 6. Combined reflections in the Ice Cave demo.

Stereo reflections in VR

Stereo reflections are important in VR because if reflections are not stereo, i.e. we use the same texture for both eyes, then the user will easily notice that something is wrong in the virtual world. This will break the sense of full immersion, negatively impacting the VR user experience.

For planar reflections, rendered at runtime, that use the mirrored camera technique ⁶, we need to apply a mirror transformation to the main camera view matrix. We also need a half eye separation shift in the x axis to find the left/right position where reflections must be rendered from. The mirrored camera(s) alternately renders left/right reflections to a single texture that is used in the shader by the left/right eye of the main camera to apply the reflections to the reflective object.

At this point we must achieve a complete synchronization between the rendering of left/right reflection camera(s) and the left/right main camera. The picture below, taken from the device, shows how the left and right eyes of the main camera are using different colors in the reflection texture applied to the platform in the Ice Cave demo.

Figure 7. Left/right stereo reflection synchronization.

If we are dealing with reflections based on local cubemaps then we need to use two slightly different reflection vectors to fetch the texel from the cubemap. For this we need to find (if it is not provided) the left/right main camera position and build the left/right view vector used to find the reflection vector in the shader. Both vectors must be “locally corrected” before fetching the reflection texture from the cubemap.

A detailed implementation of stereo reflections in Unity can be found in a blog published recently ⁶.

Latest Unity improvements

During his presentation Carl pointed to the latest Unity effort in VR – the new VR editor that allows the building of VR environments directly from within an HMD. At GDC 2016 we saw a live demo that showed the progress of this tool in a presentation from Timoni West (Unity Principal Designer).

The Adam demo Carl displayed to attendees was also a nice proof-point for how much Unity has advanced in terms of real-time rendering capabilities. The picture below gives some idea of this.

Figure 8. A photogram from the first part of the Unity real-time rendered short film “Adam”.

Carl also went through some highlights of a presentation he had delivered the day before about how to create a sense of presence in VR. I found his ideas about the importance of creating depth perception when designing VR environments really interesting. Greeks and Romans knew very well how important it is to correctly manage perspective, light, shadows and shapes to create the right sense of presence that invites you to walk around and understand the space.

Movement in VR

The last part of the talk was devoted to movement in VR. Patrick’s talk attracted much attention from attendees and prompted a lot of questions at the end. Movement in VR is an important topic as it directly influences the quality of the VR experience. nDreams development team performed extensive research into different types of movement in VR and their impact on several groups of users. The figures Patrick presented in the talk about the results of this research were a valuable takeaway for attendees.

According to Patrick, mobile VR control will move towards controllers, tracked controllers and hand tracking, allowing more detailed input.

Initial nDreams tests confirmed some basic facts:

Movement needs to be as realistic as possible. When moving, aim to keep the speed to around 1.5 m/s as opposed to, for example, Call of Duty where the player often moves at 7 m/s. Keep any strafing to a minimum, and keep the strafe speed as low as possible.
Don’t take control of the camera away from the player i.e. camera shakes, cutscenes etc.
Ensure there is no perceived acceleration. A tiny negligible acceleration in movement for example can take the edge off starting and stopping, but acceleration over any period of time is incredibly uncomfortable.

Figure 9. Some nDreams basic findings.

In terms of translation movement nDreams researched two main modalities: instant teleport and blink. Blink is a kind of fast teleport where your move is completed within 120ms. This movement time is so short, there is no time to experience any sickness but the user has a sense of motion and tunnel effect. Teleport is seen as more precise due to the additional directional reticule, whereas blink feels more immersive.

Rotation study included trigger and snap modalities. Trigger rotations use the shoulder buttons of the controller to rotate in steps of 45 degrees to left/right each time. Snaps rotations used the joystick buttons instead. Rotation-wise, participants mostly preferred triggers; however the consumers who understood snap effectively preferred its flexibility.

Some figures about results of movement and rotation research are shown below.

Figure 10. Some figures from the nDreams’ movement and rotation research.

The table below summarizes some of the most important findings delivered by Patrick O'Luanaigh.

Movement needs to be as realistic as possible. Ideally keep your speed to around 1.5m/s.

Do not take control of the camera away from the player.

Ensure there is no perceived acceleration.

Lower speed moving and strafing speed is much more comfortable than a faster one. High rotation speed is seen as more comfortable,

since your rotation normally finished before you start to feel motion sick

The best solution for rotation is to turn with your body.

Alternative controls encourage players to move their body to look around. Snap rotations.

Rotation-wise participants mostly preferred triggers; however the consumers who understood snap effectively preferred its flexibility.

Fast teleport (blink) at 100 m/s is sickness free and more immersive than simple teleport.

Instant teleport is seen as more precise due to the additional directional reticule.

Remove movement and rotation altogether.

Figure 11. Summary of findings from nDreams’ research about movement in VR.

VR is just taking its first real steps and there is a lot still to explore and learn. This is the reason Patrick concluded his presentation with a recommendation I really liked: Test everything! What works for your game may be different from someone else’s.

Conclusions

The talk Achieving High Quality Mobile Games at GDC 2016 had a great turn out and lots of questions were discussed at the end. After the talk we had many people coming to the ARM booth to find out more about Ice Cave demo and the rendering techniques based on local cubemaps discussed in the talk. What GDC 2016 showed above all was the great uptake VR is experiencing and the increasing interest of the development community and game studios in this exciting technology.

Finally, I would like to thanks Carl Callewaert and Patrick O'Luanaigh for their great contribution to the presentation.

References

Efficient Soft Shadows Based on Static Local Cubemap.Sylwester Bala and Roberto Lopez Mendez, GPU Pro 7, 2016.
Dynamic Soft Shadows Based on Local Cubemap. Sylwester Bala, ARM Connected Community.
ARM Guide for Unity Developers, Mali Developer Center.
Reflections Based on Local Cubemaps in Unity. Roberto Lopez Mendez, ARM Connected Community.
The Power of Local Cubemaps at Unite APAC and the Taoyuan Effect. Roberto Lopez Mendez, ARM Connected Community.
Combined Reflections: Stereo Reflections in VR. Roberto Lopez Mendez, ARM Connected Community.
Travelling Without Moving - Controlling Movement in Virtual Reality. Patrick O'Luanaigh, Presentation delivered at VRTGO, Newcastle, 2015.

↧

Using Mali Graphics Debugger on a Non-rooted device

April 21, 2016, 12:35 am

≫ Next: The Sensible Six (Optimization Techniques)

≪ Previous: Achieving High Quality Mobile VR Games

Traditionally Mali Graphics Debugger(MGD) works on a rooted device, in this mode an interceptor layer sits between your application and the driver. Your application then calls into this interceptor layer and the interceptor then sends a copy of this data back to the MGD application and passes the call on to the driver.

However, this isn't the only way that MGD can be used. The second option has all of the functionality of the first option with the added benefit that it will also work on a standard Android device with no modification. The trade-off is that you need access to the full source code of the application that you want to profile. This blog explores the second option so you can debug your applications on non-rooted devices.

Prerequisites

Your computer should be setup for Android development, in particular:
- The Android SDK and NDK should be installed
- Your system path should include the adb binary.
You should have access to the full source code of your application
Your device must be running at least Android 4.2

Installation

Copy the folder called android-non-root from the target directory in your MGD installation to your application's root folder.
In your target application's Android.mk add the following code.

 include $(LOCAL_PATH)/../android-non-root/MGD.mk

In your projects main activity class you need to add the following:

static
{               try               {                               System.loadLibrary("MGD");                }                catch( UnsatisfiedLinkError e )                {                                // Feel free to remove this log message.                                Log.i("[ MGD ]", "libMGD.so not loaded.");                }
}

Recompile the application and install it on your Android device

Running your application

The first thing we need to do is to install the MGDDaemon application on the target. The MGDDaemon application is responsible for sending the information from the interceptor library to the host. Without it the host won't receive any data.

cd into the android-non-root directory and run adb install -r MGDDaemon.apk
Then run adb forward from a command prompt adb forward tcp:5002 tcp:5002
Then launch the MGD Application. This should take you to a list of applications that it has detected have the MGD interceptor library correctly inserted. Before you click on one you need to press the Mali Graphics Debugger daemon switch to on.

Once pressing the switch you should be able to connect to the process in the MGD host and a new tab for your current trace should be created. At this point you just need to click on your application in the MGD daemon application and the trace should work.

Following these steps you should be able to use MGD on any Mali based platform. If you have any issues please raise them on the community and someone will be more than happy to assist you through the process.

↧

The Sensible Six (Optimization Techniques)

April 21, 2016, 2:28 am

≫ Next: ARM & Nibiru Joint Innovation Lab set to Streamline Game Development

≪ Previous: Using Mali Graphics Debugger on a Non-rooted device

It's not often I get flown half way round the world in order to explain common sense but this March it happened as I was delivered to GDC in San Francisco to talk about best practices in mobile graphics. Unlike previous talks where I wax lyrical about the minutia of a specific optimization technique, this time I had to cover a wide range of things in just twenty minutes. Paired as I was with a similarly compressed talk from Stephen Barton about using our DS5 and MGD tools to analyse graphical applications for bottlenecks, it was a study in time management. One of the highlights of his talk was the latest MGD update, which you can read more about on his recent blog post. Pity our poor audience who, having had insufficient time to learn how to find their performance bottlenecks, were now going to be subject to my having insufficient time to tell them how to fix them.

We're making the slides available for this presentation (my section starts on slide 29) but unlike previous presentations there was no video taken, so some of the pages may need a little explanation here. Whereas usually I'd have time to look at an app and check for specific changes, obviously the people watching wanted to know what they could do on their own software. I therefore had to talk about the most common places where people leave room for improvement: Batching, Overdraw, Culling, Levels of Detail, Compression and Antialiasing.

Batching is a topic I have been outspoken about many times in the past and I really just gave some simple solutions here, such as combining static geometry into a single mesh. Though lip service was paid to dynamic batching and instancing, that topic is explained far better in my older post Game Set & Batch.

Although I've spoken about overdraw in the context of batching before, not much has been said about more scene related overdraw other than a sort of flippant "front to back, y'all" before talking about a batching solution. People often think of overdraw in the context of having to sort objects in a scene by their distance from the camera. One case lots of people complain about however, is what to do when the objects overlap or surround each other in some way, because then they can't be trusted. In that situation there's an even easier solution though. If you know one thing will always cover another, you can make a special ordering case for it in the code. There are even a number of very common savings to be made on a full screen scale. If the camera is inside a room, anything else inside the room can be rendered before the room itself, as they will always occlude the walls and the floor. This goes for rendering things before the ground in outdoor scenes too.

It's mostly about efficient scene management, but even when you don't know beforehand what order something will be drawn in you can make changes to reduce the impact of overdraw. If you have two pieces of geometry in the scene which use different shaders and for whatever reason it's hard to tell which to draw first, draw the least expensive shader first. At least that way the cheaper overdrawn pixels will be wasting less, and the occluded pixels from the expensive shader are saving more.

On a similar topic of not calculating that which is unseen, I then spoke of culling and the kind of large scale culling which is possible on the CPU side to reduce vertex calculations. This is achieved by reducing an object down to a bounding box, defined by eight points that can then be transformed to become a bounding rectangle on the screen. This rectangle can then be very quickly checked to see if it is on or off screen, or even if it's inside the bounds of a window or doorway through which we are seeing its scene. For most scenes this is the only kind of high level, large scale occlusion culling that makes sense because the next step would be to consider whether objects in a scene occlude each other. For that you need to think about the internal bounding volume which is guaranteed to occlude everything behind it regardless of its orientation and which must be generated to fit the geometry. Far more complicated than describing the bounding box.

Culling things in the distance is considered somewhat old hat in modern applications. We associate the sudden appearance of distant objects or emergence from a dense, opaque fog to be an undesirably retro aesthetic. In their place we have two new techniques. The fog is replaced by clever environmental design, limiting view distance by means of occluding walls and available eye-lines. For large, open spaces, having objects pop into reality had been replaced by dynamic levels of detail. The funny thing about levels of detail is that they don't have to be dynamic to be relevant. Levels of detail go beyond reducing unnecessary vertex processing, there's a small amount of overhead to process a triangle. This triangle setup cost is very small so ordinarily you never notice it, as it happens while the previous triangle is being turned into fragments, but if the fragment coverage of your triangles is too low, you can actually notice this cost bumping up your render times. Before you even worry about implementing dynamic levels of detail you ought to ask yourself if you've picked the right level of detail to begin with. If the average triangle coverage (which can be calculated in Streamline) is in single digits, you're probably doing something wrong. We actually see this all the time in projects where the artist has designed beautiful tree models and then they're lined up in the distance where none of that detail can be seen. If they're approachable then maybe a high detail model would be useful, switched in based on proximity, but if you just want a bunch of things in the background you may be better off with batched billboard sprites.

Having already talked about texture compression many times in the past, there's a lot of material on the specifics available from my previous presentations. This time, to take it in a different direction, I talked about how uncompressed textures have their pixels re-ordered in memory to give them better caching behaviour. This is similar to that seen in compressed textures but without the bandwidth saved when the cache misses and a block needs to be pulled from memory. This explains the block layout I've advocated many times in the past and I went on to talk more about the other rules of what makes texture compression a special case in image compression algorithms: Mainly the ability to immediately look up a block in the image (random access) decode it without any data from surrounding blocks (deterministic), and with no need for a dictionary of symbols (immediate).

One topic I was surprised to realize I'd never mentioned before was how texture compression works with mipmapping. Mipmapping is the technique of storing images at half, quarter, eighth (and so on) resolutions to reduce interference patterns and speed up texture loads. It's like automatic level of detail selection for textures. What people might not realise however is that whereas uncompressed texture mipmaps can be generated at load time with a single line of Open GL ES code, mipmaps for compressed textures have to be generated at compile time and themselves compressed and stored within the application's assets. It's a small price to pay for all that tasty, tasty efficiency however.

Finally I brought up antialiasing, because I figured room for improvement needn't necessarily be in terms of overhead reduction. Though I failed to bring it together due to time constraints on the day, the real message I wanted to impart in this talk was that optimization has become a dirty word in many ways. To suggest an application needs optimizing implies you've used everything the GPU's got and to make it run at a decent frame rate you'll have to make it look worse. That's not what optimization is. The metaphor I used was that if you don't optimize your application, it's like taking a single bite of an apple, throwing the rest away and complaining that it didn't have enough flesh. Well optimized code can munch away at that apple getting the absolute most out of it and done right, optimization doesn't make your application look worse, it gives you headroom to make it look better. Batching and culling let you put more stuff in your application, with level of detail and billboard impostors you can even have dense arrays of objects in the backgrounds. Compressed textures let you have more textures at a higher resolution, and full screen antialiasing is almost zero cost on Mali based systems.

That's the real message here.

That's not where the presentation ends, though. Part of my job involves taking apart people's graphics at the API level and listing all the things they've done wrong, or at least things they could do better. When they read the laundry list of sins however they very much interpret it as me telling them what they did wrong. So imagine my elation when given a chance to redress the balance and pick apart one of our own demos, in public, and discuss our own mistakes and faults. We're human too, you know.

Though difficult to describe in blog format, the screen shot slides at the end of the presentation show me stepping through the demo render process, explaining times when bad decisions were made regarding render order, when batchable objects were drawn individually, how practically nothing was culled and even a few glitches, such as the sky box covering the particle effects and the UI being rendered on screen even when it's opacity is zero. It's almost a shame that after identifying them we had to fix all these things, it would have been nice for the audience to know they were there.

If you're interested in using MGD and DS5 to profile your applications, there's a two part in-depth case study by Lorenzo Dal Col with far more detail than I could fit in my presentation:

Mali GPU Tools: A Case Study, Part 1 — Profiling Epic Citadel

Mali GPU Tools: A Case Study, Part 2 — Frame Analysis with Mali Graphics Debugger

↧