ARM & Nibiru Joint Innovation Lab set to Streamline Game Development

April 22, 2016, 12:45 am

≫ Next: Moving to Vulkan : How to Make Your 3D Graphics More Explicit

≪ Previous: The Sensible Six (Optimization Techniques)

At GDC 2016 ARM® and Nibiru, a key ecosystem partner, announced the exciting launch of the Joint Innovation Lab. Designed to give developers the best possible support when developing mobile games, the innovation lab promises to streamline and simplify the process of porting mobile games to Nibiru’s ARM-based platforms.

As VR is such a focal point for the mobile gaming industry it’s also a key focus for Nibiru. Currently offering over 40 different all-in-one VR devices designed to work with all levels of content, Nibiru are a thought leader in the standalone VR space.

One of the exciting upcoming VR releases from game studio Mad Rock and enabled by Nibiru is X-Planet, a first-person shooter game specially designed for their ARM Mali™ powered VR headsets. The game concept is familiar yet engaging, far far away there is a planet called X-Planet, and you, the pilot, are charged with defending it against unknown adversaries. What’s really cool about this is the use of eye-tracking software to interact with the game and control your movements using gaze based targeting to pilot an armed cockpit through intense battles. Your enemies become more powerful as you progress demanding you defeat progressively harder waves of robot enemies, attacking you from all sides!

The awesome soundtrack demands headphones for a fully immersive experience and the game can be fully enjoyed while seated to reduce the chance of over-excited users stepping on the cat mid-battle.

X-Planet from Mad Rock

X-Planet is due to launch across all of Nibiru’s high performance platforms including VR Launcher and VR AIO HMD, aiming to provide the ultimate VR gaming experience. It’s well known that VR places high demands on processors and Nibiru choose ARM Mali GPUs in order to get the best possible performance with the lowest possible power cost. The ARM & Nibiru Joint Innovation Lab can help take VR gaming to the next level.

Nibiru launching the Joint Innovation Lab at the ARM Lecture Theatre at GDC 2016

↧

Moving to Vulkan : How to Make Your 3D Graphics More Explicit

April 22, 2016, 1:03 am

≫ Next: Building a Unity Application with Mali Graphics Debugger Support

≪ Previous: ARM & Nibiru Joint Innovation Lab set to Streamline Game Development

Interested in Vulkan? Look out for the next Vulkan meet up which is being held on the ARM premises in Cambridge on May 26th.

This will be the 3rd VulkanDeveloper event which will take a deeper than ever dive into programming 3D graphics using the Vulkan API.

Join the meetup to register and while you're there you can join the Khronos UK Chapter to hear future news and events: http://www.meetup.com/khronos-uk-chapter/events/230192693/

The full agenda is also available at the link above and it coincides with the Cambridge Beer Festival which ARM are providing free transport to for "networking" purposes

Further information:

In this full-day of technical sessions we aim to provide 3D developers like yourself with everything you need to come up to speed on Vulkan and to forge ahead and explore how to use Vulkan in your engine or application.

Vulkan is a new generation graphics and compute API that provides high-efficiency, cross-platform access to modern GPUs. Khronos launched the Vulkan 1.0 specification on February 16th, 2016 and Khronos members released Vulkan drivers and SDKs on the same day. More info: Khronos.org/Vulkan

Prior Knowledge:

The sessions are aimed at 3D graphics developers who have hands-on experience of programming with API’s such as OpenGL, OpenGL ES, Direct3D and Metal.

↧

Building a Unity Application with Mali Graphics Debugger Support

May 11, 2016, 2:17 am

≫ Next: Making VR Dreams a Reality

≪ Previous: Moving to Vulkan : How to Make Your 3D Graphics More Explicit

In the blog Using Mali Graphics Debugger on a Non-rooted device we discussed the idea that you could use Mali Graphics Debugger(MGD) with a non-rooted phone. This blog will take this idea further by showing you how to use MGD with a Unity application on a non-rooted device. Although this can be more complicated than using a standard application, the same principles are used as in the previous guide:

Add the interceptor library to your build system.
Edit your activity to load the interceptor library
Install the MGDDaemon application on your device.

Let's explore these steps in detail and how to execute them in Unity. For this guide it is assumed that you have an Android application already created in Unity.

The first thing you need to do is create an Assets\Plugins\Android folder in your project. Then you need to copy the libMGD.so file into it. The libMGD.so file can be found in the target

\android-non-root\arm\[armaebi-v7a/arm64-v8a] folder in your MGD installation directory. This will make sure that the interceptor library will get packaged into your application.

Now the standard activity that is used by Unity when making Android applications won't load the MGD interceptor library by default, so we need to make our own. This is done via eclipse or command line outside of the Unity environment. Here is a template of the code you will need:

package test.application;
import com.unity3d.player.UnityPlayerActivity;
import android.os.Bundle;
import android.util.Log;

public class StandardActivity extends UnityPlayerActivity
{
    protected void onCreate(Bundle savedInstanceState)    {           try        {            System.loadLibrary("MGD");        }        catch( UnsatisfiedLinkError e)        {            Log.i("[ MGD ]", "libMGD.so not loaded.");        }        super.onCreate(savedInstanceState);    }
}

Note that whatever your package is you must make sure that your directory structure matches. So if you have a package of com.mycompany.myapplication, then your StandardActivity.java should be located in the directory structure com\mycompany\myapplication. In the case above you should store the StandardActivity.java in test\application\

As you need some functions that come directly from Android you need to add the Android.jar in your system to the classpath. It is usually located in the platforms\android-<X>\where X is the Android SDK version you are targeting. Also as you are extending from the UnityPlayerActivity class you need to add the Unity classes.jar file, which is located in your Unity folder under the path Editor\Data\PlaybackEngines\AndroidPlayer\Variations\mono\Development\Classes. Finally if

you are using a JDK that is greater than 1.6 you need to add the -source 1.6 and -target 1.6 to your compile line or Unity won't be able to use it correctly.

So your full line to compile your java file should resemble something like:

C:\scratch>javac -cp "C:\Program Files\Unity\Editor\Data\PlaybackEngines\AndroidPlayer\Variations\mono\Development\Classes\classes.jar;C:\android\sdk\platforms\android-21\android.jar" -source 1.6
-target 1.6 test\application\StandardActivity.java

or if you are using a Mac

javac -cp "/Users/exampleUser/android-sdk-macosx/platforms/android-23/android.jar:/Applications/Unity/PlaybackEngines/AndroidPlayer/Variations/mono/Release/Classes/classes.jar" -source 1.6
-target 1.6 test/application/StandardActivity.java

We then need to turn this class into a jar file so we can include it into our Unity project. To do that we need to write:

jar cvf myActivity.jar test\application\StandardActivity.class

Place the created jar file in your project's Assets\Plugins\Android folder you created in the first step.

Now just because we have created a new activity class doesn't mean that Unity is going to use it. For this to happen we also need to override the Android Manifest file that Unity uses. If you create an AndroidManifest.xml file in your assets\Android folder Unity will automatically use this one instead of the default one that is provided. The minimum that is recommended to put in this file is:

<?xml version="1.0" encoding="utf-8"?><manifest xmlns:android="http://schemas.android.com/apk/res/android">  <application android:icon="@drawable/app_icon" android:label="@string/app_name">    <activity android:name="StandardActivity"             android:label="@string/app_name">        <intent-filter>            <action android:name="android.intent.action.MAIN" />            <category android:name="android.intent.category.LAUNCHER" />        </intent-filter>    </activity>  </application></manifest>

Where activity android:name is the name of the activity you have created. Once this has finished you should be able to build your Android application in the usual way. One final thing to note is that your bundle in Unity must match the package that you gave to your activity. In our example this would be test.application (case sensitive).

Once your application has been built install it onto the device and then install the MGDDaemon app onto your device and use MGD. If you need more information about using and installing the MGD application consult the blog post: Using Mali Graphics Debugger on a Non-rooted device

↧

Making VR Dreams a Reality

May 18, 2016, 3:10 pm

≫ Next: Building the Next Generation of Game Developers With FXP

≪ Previous: Building a Unity Application with Mali Graphics Debugger Support

Virtual Reality (VR) has been a focus area for ARM^® in recent years with significant investment made in ensuring the ARM Mali™ range of graphics and multimedia processors is a great fit for mobile VR devices now and in the future.

We’re pleased to have been working with Google to ensure our range of Mali GPUs, Video and Display processors are able to deliver the ultimate mobile VR experience on Daydream. In addition, ARM has been working closely with a number of our leading silicon partners, enabling them to ship their first wave of Daydream ready devices.

Google’s announcement of high performance mobile VR support through Daydream, combined with our broad ecosystem of partners using the no.1 shipping GPU, is making VR accessible to hundreds of millions of consumers across the globe.

Why ARM and Mali for VR?

We’ve released a series of blogs over the past few months on the various VR technologies and activities which make Mali products a great fit.

VR places increasing performance demands on the systems we’re seeing today. Not only are we rendering for two different eyes, but we are also required to render at higher framerates and screen resolutions to produce a quality experience. Mali GPUs with their performance scalability and continual emphasis on energy efficiency ensure we are well positioned to address these ever increasing requirements.

Mali GPUs also offer additional features that benefit VR use-cases. ARM Frame Buffer Compression (AFBC) is a system bandwidth reduction technology and is supported across all of our multimedia IP. AFBC is able to reduce memory bandwidth (and associated power) by up to 50% across a range of content. This and other system wide technologies further enable efficient use-cases such as VR video playback. A number of other features including tile based rendering and other bandwidth saving technologies such as ASTC ensure we’re able to meet the high resolution and framerate requirements of VR. Mali GPUs also support 16x MSAA for best quality anti-aliasing. This is essential for a high quality user experience in VR as the proximity of our eyes to the images and the fact that we are viewing them in stereo means that any artefacts are much more noticeable than in traditional applications.

On the software side, a large amount of driver and optimization work has gone into our Mali DDK in order to reduce latency and ensure fast context switching required for VR. In addition to optimizations, we’ve enabled a number of extensions to OpenGL ES to support efficient rendering to multiple views for both stereo and foveated rendering.

VR is an incredibly exciting use-case for ARM and is an area in which we intend to continually invest and innovate to make the VR experience on mobile even more awesome. We’re proud to be in close collaboration with Google on Daydream and look forward to the opportunities this opens up for the industry.

↧

Building the Next Generation of Game Developers With FXP

May 20, 2016, 6:50 am

≫ Next: Seeing the Future With Computer Vision

≪ Previous: Making VR Dreams a Reality

As a world leading IP company ARM is passionate about protecting and promoting the ideas, innovations and skills required to produce next generation tech. A large part of that involves supporting the teaching of STEM subjects in schools and encouraging more of the future generations to get involved in programming and development.

One of the ways we do this is to collaborate on educational events with local institutions and share the knowledge of our experts with local students. Future Experience Points (FXP), will be held at Cambridge Regional College from June 25^th to 27^th and will feature a series of presentations, workshops and mentoring sessions that tie in with the computer science curriculum for students at local schools and colleges. Focussing on game development and graphic design the event is intended to bring theoretical subjects to life through practical application and hands on experience.

FXP will also feature a 48 hour game jam where teams of youngsters will work together to create and develop a mobile game with hands-on, practical support and training from industry experts. All the games developed as part of the event will be available for the public to play at Cambridge’s annual Big Weekend event in July. Prizes will be awarded in two categories, Concept and Development, and we’ll be giving away five Kindle Fire tablets to the winning team in the concept category!

With the prevalence of mobile devices on the market most students are already very familiar with mobile platforms and mobile gaming but often with no background knowledge of the technology that powers these devices. Providing an insight into the innovations and advancements that bring them the latest content adds a new dimension to the understanding of mobile technology.

It’s hoped the event will encourage more young people to pursue careers in graphic design and engineering, game development and related technology industries. As a local Cambridge company ARM considers it a top priority to advance the career opportunities for local teens and it would be great to see how many of these students could end up working with us in the future!

Can’t wait? We’ve also worked closely with Michael Warburton of Cambridge Regional College to produce a series of tutorials to help you get started developing your game for mobile devices!

Students get one to one advice and tips to kick start their graphics experience

↧

Seeing the Future With Computer Vision

May 26, 2016, 3:18 am

≫ Next: Mali-G71: ARM's Most Powerful, Scalable and Efficient GPU to Date

≪ Previous: Building the Next Generation of Game Developers With FXP

In 2016 so far there seems to be a big focus on automation. The rise of the Internet of Things is part of the reason for this and it’s opening our eyes as to how many aspects of our everyday lives can be streamlined. Simply by allowing machines, sensors and technologies to ‘talk’ to each other, share data and use it to make smart decisions; we can reduce the direct input we need to have to keep our world moving.

Home automation is one of the first things people think of but it soon to leads to discussions on smart agriculture, automated office management and remote monitoring and maintenance of vehicles and assets. Not only that, but an area garnering a whole lot of interest is smart automotive. We know that many of these examples, in order to operate safely and effectively, need to be able to take in enormous amounts of data and analyse it efficiently for an immediate response. Before your home can decide to let you in through the front door without a key for instance, it needs to know who you are. Before your autonomous car can be unleashed onto the streets, it needs to be able to spot a hazard, but how does it do it? One of the key drivers (see what I did there?) in this area is computer vision.

ARM®’s recent acquisition of Apical®, an innovative, Loughborough-based imaging tech company, helps us to answer these questions. With such a rich existing knowledge base and a number of established products, ARM, with Apical, is well placed to become a thought leader in computer vision technology. So what is computer vision? Computer vision has been described as graphics in reverse, in that rather than us viewing the computer’s world, the computer has turned around to look at ours. It is essentially exactly what it sounds like. Your computer can ‘see’, understand and respond to visual stimuli around it. In order to do this there are camera and sensor requirements of course, but once this aspect has been established, we have to make it recognise what it’s seeing. We have to take what is essentially just a graphical array of pixels and teach the computer to understand what they mean in context. We are already using examples of computer vision every day, possibly without even realising it. Ever used one of Snapchat’s daily filters? It uses computer vision to figure out where your face is and of course, to react when you respond to the instructions (like ‘open your mouth…’). Recent Samsung smartphones use computer vision too, a nifty little feature for a book worm like me is that it detects when your phone is in front of your face and overrides the display timeout so it doesn’t go dark mid-page. These are of course comparatively minor examples but the possibilities are expanding at breakneck speed and the fact that we already take these for granted speaks volumes about the potential next wave.

Computer vision is by no means a new idea, there were automatic number plate recognition systems as early as the 60s and 70s, but deep learning is one of the key technologies that has expanded its potential enormously. The early systems were algorithm based, removing the colour and texture of a viewed object in favour of spotting basic shapes and edges and narrowing down what they might represent. This stripped back the amount of data you had to deal with and allowed the processing power to focus on the basics in the clearest possible way. Deep learning flipped this process on its head and said, instead of algorithmically figuring out that a triangle of these dimensions is statistically likely to be a road sign, why don’t we look at a whole heap of road signs and learn to recognize them? Using deep learning techniques, the computer can look at hundreds and thousands of pictures of say, an electric guitar, and start to learn what an electric guitar looks like in different configurations, contexts, levels of daylight, backgrounds and environments. Because it sees so many variations it also starts to learn to recognise an item even when part of it is obscured because it knows enough about it to rule out the possibility that it’s something else entirely. Sitting behind all this cleverness are neural networks, computer models that are designed to mimic what we understand of how our brains work. The deep learning process builds up connections between the virtual neurons as it sees more and more guitars. With a neural net suitably trained, the computer can becoming uncannily good at recognising guitars, or indeed anything else it’s been trained to see.

The ImageNet competition tests how accurately computers can identify specific objects in a range of images

A key milestone for the adoption of deep learning was at the 2012 ImageNet competition. ImageNet is an online research database of over 14 million images and runs an annual competition to pit machines against each other to establish which of them produces the fewest errors when asked to identify the objects in a series of pictures. 2012 was the first year a team entered with a solution based on deep learning. Alex Krizhevsky’s system wiped the floor with the “shallow learning” competition that used more traditional methods and started a revolution in computer vision. The world would never be the same again. The following year there were of course multiple deep learning models and Microsoft broke records recently when their machine was actually able to beat their human control subject in the challenge!

A particularly exciting aspect of welcoming Apical to ARM is Spirit™, which takes data from video and a variety of sensors and produces a digital representation of the scene it’s viewing. This allows, for example, security staff to monitor the behaviour of a crowd at a large event and identify areas of unrest or potential issues based on posture, pose, mannerisms and numerous other important but oh so subtle factors. It also opens the doors for vehicles and machines to begin to be able to process their surroundings independently and apply this information to make smart decisions.

Spirit can simultaneously interpret different aspects of a scene into a digital representation

This shows us how quickly technology can move and gives some idea of the potential, particularly for autonomous vehicles as we can now see how precisely they could quantify the hazard of say, a child by the side of the road. What happens though, when it has a choice to make? Sure, it can differentiate between children and adults and assess that the child statistically holds the greater risk of running into the road. However, if there’s an impending accident and the only way to avoid it is to cause a different one, how can it be expected to choose? How would we choose between running into that bus stop full of people or the other one? By instinct? Through some internal moral code? Where does the potential of these machines effectively to think for themselves become the potential for them to discriminate or produce prejudicial responses? There is, of course, a long way to go before we see this level of automation but the speed at which the industry is advancing suggests these issues, and their solutions, will appear sooner rather than later.

ARM’s acquisition of Apical comes at a time when having the opportunity to exploit the full potential of technology is becoming increasingly important. We intend to be on the front line of ensuring computer vision adds value, innovation and security to the future of technology and automation. Stay tuned for more detail on up and coming devices, technologies and the ARM approach to the future of computer vision and deep learning.

↧

Mali-G71: ARM's Most Powerful, Scalable and Efficient GPU to Date

May 29, 2016, 8:34 pm

≫ Next: The Hardware Requirements of VR Today

≪ Previous: Seeing the Future With Computer Vision

The Mali-G71 GPU is the latest and greatest offering in the Mali high-performance family of GPUs. Built on the brand new Bifrost architecture, Mali-G71 represents a whole new level of high-end mobile graphics capabilities whilst still maintaining Mali’s position as a leading GPU in a highly competitive market.

Mali-G71 was developed taking into account the advanced, and ever advancing, use cases for high end mobile like Virtual Reality (VR), Augmented Reality (AR) and 3D gaming; and modern APIs such as Vulkan and OpenCL 2.0. It’s been a few years since the pinnacle of mobile gaming was Snake but the industry has advanced so fast and so far since then that even today’s high-end devices could struggle with the next generation of gaming requirements. Mali-G71 aims to address this potential shortfall by looking ahead to the next level of mobile graphics and ensuring the devices it powers will be more powerful, efficient (and generally more awesome) than ever before. So much so, that devices powered by the Mali-G71 GPU are even capable of competing with mid-range laptops in terms of graphics capability.

Bifrost

The new Mali Bifrost architecture represents a step change in the industry and enables the future of mobile graphics. There are numerous innovations and optimizations built in to the new design but we’ll highlight just a few.

Claused shaders allow you to group sets of instructions together into defined blocks that will run to completion atomically and uninterrupted. This means we can be sure all external dependencies are in place prior to clause execution and we can design execution units to allow temporary results to bypass accesses to the register bank. This reduces the pressure on the register file, drastically decreasing the amount of power it consumes and also contributes to area reduction by simplifying the control logic in the execution units.

Claused shaders provide significant power savings

Another innovation in the Bifrost architecture is Quad based vectorization. Midgard GPUs used SIMD vectorization which executed one thread at a time in the pipeline stage and was very dependent on the shader code executing vector instructions. Quad vectorization allows four threads to be executed together, sharing control logic. This makes it much easier to fill the execution units, achieving close to 100% utilization and better fits recent advances in how developers are writing shader code.

Scalability

The previous generation of High performance mobile GPUs were scalable from 1 to 16 cores. To reflect the ever growing performance requirements of mobile devices, Mali-G71 is scalable from 1 to 32 cores. The scalability of Mali-G71 means superior graphics performance is available across a wider than ever range of devices from DTVs through high end smartphones right up to cutting edge VR headsets, either mobile-based or standalone. This flexibility, along with the 40% improvement in area efficiency, allows our partners to configure their system to their exact requirements, striking the perfect balance between power, efficiency and cost in order to perfectly position their products in their target market.

High End Gaming

Mobile gaming is fast becoming the platform of choice for gamers everywhere. In 2017 the market for mobile gaming is expected to hit over US$40 billion, up $10 billion from 2016.* This rapid growth needs to be sustainable on up and coming mobile devices and with greater complexity appearing year on year, this is no mean feat. Our gaming demos from just a couple of years ago had half the number of vertices as the ones we’re producing today and this all adds up in terms of power and efficiency requirements. If applications continue to advance at this rate the ability to scale to 32 cores could rapidly become a basic necessity for premium mobile devices. On top of this, Mali-G71 delivers 20% higher energy efficiency compared to Mali-T880 under similar conditions – translating to higher sustained device performance in thermally limited premium devices.

Vulkan and OpenCL2.0

API advancements are something we take very seriously, after all, they define how developers interact with the underlying hardware. As a GPU and CPU company we need to meet developer needs so that end users get the best possible device experience. In recent years there’s been a move towards giving developers lower level access to the hardware, in Khronos, this trend lead to the emergence of the new Vulkan 1.0 API. In a similar vein, OpenCL 2.0 was developed to make heterogeneous compute more developer friendly and there are high hopes that we will see some radical new use cases popping up once OpenCL2.0 enabled devices are shipping in the market. Mali-G71 is not only designed to support Vulkan 1.0 and OpenCL 2.0 Full Profile – it even has support for Fine Grained buffers and shared virtual memory, enabled through full hardware coherency support. Again, this is primarily to ease software development effort, leading to better end user experiences.

Virtual reality (VR)

VR is what everyone’s talking about in the graphics industry at the moment: what it takes, what it needs and how to provide the very best VR experience to the user. The Mali-G71 GPU was built with just this sort of challenge in mind. The extensive performance requirements of VR mean that GPUs for high end devices have to be more energy efficient than ever before. Not only that, but other components of the mobile, like cameras and screen resolutions, are advancing and performing at ever higher rates and therefore all contributing to maxing out the thermal budget of the device. This puts even greater pressure on the GPU to reduce power usage wherever possible.

The Mali family of GPUs also has some great VR optimization features to allow for the best possible mobile VR experience. Front buffer rendering allows you to bypass the usual off screen buffers to render directly to the front buffer, saving time and reducing latency. Mali also supports the ‘multiview’ API extensions that allow the application to submit the draw commands for a frame to the driver once and have the driver instantiate the necessary work for each eye. This greatly reduces the CPU time required in both the application and driver. On Midgard and Bifrost based Mali GPUs we further optimize the vertex processing work, running the parts of the vertex shader that do not depend upon the eye once and sharing the results between each eye. These are just some of the features that make Mali-G71 the obvious choice for the future of mobile VR.

Content protection

We’re using our phones for more and more, these days many of us don’t even need a home computer or laptop because we can do everything we need on our phone, including downloading and viewing content and streaming it to other devices. The recently released Mali-DP650 display processor already has the capability to handle 4k content and the Mali-G71 allows this content to be streamed seamlessly to your TV without losing any of the quality. This means that, whilst 4k hasn’t yet taken off on mobile, you don’t need to miss out on any of the benefits when viewing the content on a separate 4k device.

Mali-G71 was designed and optimized as part of a complete system, working better together as part of the Mali Multimedia Suite with CCI-550 providing full coherency for CPU and GPU. Mali-G71 is achieving the highest possible performance for mobile graphics within the smallest possible power budget and silicon area, allowing our partners to achieve the pinnacle of mobile graphics in the most scalable and customizable way. With Mali-G71 based devices expected to hit the shelves early in 2017, next level mobile gaming and graphics is right within your grasp.

*Fortune.com

↧

The Hardware Requirements of VR Today

May 30, 2016, 4:07 am

≫ Next: Stride argument in OpenGL ES 2.0

≪ Previous: Mali-G71: ARM's Most Powerful, Scalable and Efficient GPU to Date

The hardware requirements of VR seemed to be an unknown quantity, there was always the argument that more GPU performance and CPU performance would translate to better VR. This has turned out to be partially untrue with the latest generation of mobile VR. The reality is that VR and latency-intensive high intensity workloads are a significant strain on both the CPU and GPU inside of a smartphone and as a result have caused throttling and overheating in some devices. Additionally, there were no hardware specifications tied to a good experience in VR, every company made the decisions they thought were right and built on those views in order to deliver the best VR in their mind. Most developers and hardware makers up until recently have been targeting things like GearVR and Google Cardboard with Cardboard having much less control over the experience. In fact, with the exception of GearVR, mobile VR has been quite stagnant since the creation of Google Cardboard and GearVR. The lack of new innovation was really starting to hurt mobile VR and VR as a whole.

Then, everything changed with Google's Daydream VR announcement at Google I/O 2016. Google's new Daydream platform was not just a focus on software and SDKs, they also introduced a minimum frame rate target of 60 FPS and a hardware specification that manufacturers needed to meet in order to deliver the 60 FPS. Manufacturers that make these Daydream enabled devices are going to do so later this year with the help of chip companies like ARM, Imagination, MediaTek, Qualcomm, Samsung and others. This means that the chip suppliers and manufacturers are going to have to be able to not only deliver a certain level of performance, but also to do so in a way that is sustainable over longer periods of time and doesn't suffer as much throttling. More importantly, Google has made it abundantly clear that both Android N and Daydream will both focus heavily on sustained performance and delivering the best long-term experience, not the one you see for a few seconds.

Companies like ARM are getting ahead of the Daydream trend with their latest chips like the new ARM Cortex-A73 and Mali Graphics G71. Both of ARM's new processors are designed to be made on the leading process nodes and both of them are designed to take advantage of these new process nodes by delivering unprecedented sustained performance. Plus, with optimized native support for APIs like Vulkan, ARM's Mali G71 is designed to be able to squeeze more performance per watt out of every smartphone while utilizing as little CPU as possible. With the significant improvements to sustained performance in processors like the ARM Cortex-A73, there are fewer chances a developer or gamer will see frame rate drops due to CPU throttling. ARM's new Mali-G71 and Cortex-A73 processors won't be available in the first batch of Daydream devices since many of the first Daydream devices will launch this fall. These new processors are expected to be available from ARM licensees in 2017 and using process new process nodes like TSMC's 10nm.

It has taken some time, but things are really starting to heat up in the mobile VR space. Mobile VR is going to be the VR that drives the entire industry one way or another and ultimately what will determine VR's arrival to the mainstream. Having concrete standards to follow and build to is going to be extremely helpful for many companies inside of the ecosystem. It is important that someone like Google stepped up to the plate and created a platform like Daydream to get the ecosystem to come together under one common target. It is also extremely important that there are companies out there like ARM answering Google's Daydream challenge and creating new processors utilizing the latest process technology so that their customers can build some of the best chips for VR that deliver Daydream's needs for high sustained performance and not just peak performance.

↧

Stride argument in OpenGL ES 2.0

May 31, 2016, 5:42 am

≫ Next: Eye Heart VR

≪ Previous: The Hardware Requirements of VR Today

I'm putting this information here, as it took me way more time than it should to understand how the stride argument works in glVertexAttribPointer.

This argument is extremely important if you want to pack data in the same order as they are accessed by the CPU/GPU.

When reading the manual, I thought that stride was the number of bytes the OpenGL implementation would skip after reading size elements from the provided array.

However, it tends to work like this. glVertexAttribPointer :

Start reading data from the provided address,
Read size elements from the address,
Pass the values to the corresponding GLSL attribute,
Jump stride bytes from the address it started reading from,
Repeat this procedure count times, where count is the third argument passed to glDrawArrays.

So, for example, let's take a float array stored at memory's address 0x20000, containing the following 15 elements :

GLfloat arr[] = {  /* 0x20000 */ -1.0f, 1.0f, 1.0f, 0.0f, 1.0f,  /* 0x20014 */ -1.0f, 0.0f, 1.0f, 0.0f, 0.0f,  /* 0x20028 */  0.0f, 1.0f, 1.0f, 1.0f, 1.0f
};

If you use glVertexAttribArray like this :

glVertexAttribArray(your_glsl_attrib_index, 3, GL_FLOAT, GL_FALSE, 20, arr);

And then use glDrawArrays, the OpenGL implementation will do something akin to this :

Copy the address arr (0x20000).
Start reading {-1.0f, 1.0f, 1.0f} from the copied address (referred as copy_arr here) and pass these values to the GLSL attribute identified by your_glsl_attrib_index.
Do something like copy_arr += stride. At this point, copy_arr == 0x20014.

Then, on the second iteration, it will read {-1.0f, 0.0f, 1.0f} from the new copy_arr address, redo copy_arr += stride and continue like this for each iteration.

Here's a concise diagram resuming this.

↧

Eye Heart VR

June 3, 2016, 12:51 am

≫ Next: [Quick tips] Use ffmpeg to convert pictures to raw RGB565

≪ Previous: Stride argument in OpenGL ES 2.0

Welcome to the next installment of my VR blog series. In previous VR blogs we’ve considered the importance of clear focus to a VR experience, as well as the essential requirement to keep ‘motions to photons’ latency below 20ms in order to avoid unnecessary visual discomfort (and vomiting). This time we’re going to look at eye tracking and the impact it could have on the way we use VR in the future. Eye tracking is not new – people have been doing it for nearly twenty years – but head mounted displays for VR could be the catalyst technology needed to unlock its true potential.

Say what you see

One of the aspects of VR that is still presenting a challenge is how to provide a quality user experience when navigating a VR environment. Current systems have a couple of options: The Samsung Gear VR uses a control pad on the side of the display to allow you to press and hold to move or tap to engage with objects. Google recently announced they will release a motion-enabled remote controller for their Daydream VR platform later this year and all tethered VR systems have fully-tracked controllers that mirror your hand movements in the virtual world. Alongside these there’s also growing interest in making more use of your eyes.

Eye tracking is currently getting a lot of hype on the VR circuit. The ability to follow the path of your pupil across the screen has wide ranging uses for all sorts of applications from gaming to social media to shopping. We don’t necessarily have full control over our eye movements as our eyes function as an extension of our brain. This unconscious motion is very different from how we interact with our hands so there is work still to be done to design just the right user interfaces for eye tracking. How, for example, do you glance across something without accidentally selecting it? Just think of the dangerous spending possibilities of selecting items to add to your cart simply by staring longingly at them!

Several eye tracking solutions are emerging from companies such as The Eye Tribe, Eyefluence and SMI, as well as eye tracking headsets such as FOVE. At GDC 2016 MediaTek were able to demonstrate eye tracking with the Helio x20. In all cases the path of your vision is minutely tracked throughout the VR experience. The only calibration typically required is a simple process of looking at basic shapes in turn so the sensors can latch on to your specific eye location and movement. This suggests eye tracking could be easy to adopt and use with mainstream audiences without specialist training. The first use for eye tracking that springs to mind is, as usual, gaming controls and there have indeed been demos released using modified Oculus and Samsung Gear VR headsets which use a built in eye tracking sensor to control direction and select certain objects simply by focussing steadily on them. FOVE have also shown how a depth-of-field effect could be driven from the area of the scene you are looking at, to give the illusion of focal depth.

An additional potential benefit of eye tracking in VR is the ability to measure the precise location of each eye and use it to calculate the interpupillary distance (IPD) of the user. This measurement is the distance between the centres of your pupils and changes from person to person. Some VR headsets, such as the HTC Vive, provide a mechanical mechanism for adjusting the distance between the lenses to match your IPD but many more simply space the lenses to match the human average. Having an accurate IPD measurement of the user would allow for more accurate calibration or image correction, resulting in a headset that would always perfectly suit your vision. Your eyes can also move slightly within the confines of the headset. Being able to detect and adjust for this in real time would allow even more precise updates of the imagery to further enhance the immersion of the VR experience.

Eye tracking allows the view to update in real time based on exactly where you’re looking in the scene

Beneficial blurriness

Foveated rendering is a power saving rendering technique inspired by the way our eyes and vision work. We have a very wide field of vision with the ability to register objects far to the side of the direction in which we are looking. However, those images and objects in the edges of our field of vision do not appear in perfect clarity to us. This is because our fovea – the small region in the centre of our retina that provides clear central vision – has a very limited field of view. Without eye tracking we can’t tell where the VR user is looking in the scene at any given moment, so we have to render the whole scene to the highest resolution in order to retain the quality of the experience. Foveated rendering uses eye tracking to establish the exact location of your pupil and display only the area of the image that our fovea would see in full resolution. This allows the elements of the scene that are outside of this region to be rendered at a lower resolution, or potentially multiple lower resolutions at increasing distances from the focal point. This adds complexity but saves GPU processing power and system bandwidth and reduces the amount of pressure placed on the power limits of the mobile device, whilst your brain interprets the whole scene as appearing in high resolution. This therefore allows headset manufacturers to utilize this processing power elsewhere, such as in higher quality displays and faster refresh rates.

The High Performance range in the ARM® Mali™ family of GPUs is ideal for the heavy requirements VR places on the mobile device. Achieving ever higher levels of performance and energy efficiency, the flexible and scalable nature of Mali GPUs allows partners to design their SoC to their individual requirements. Partners Deepoon and Nibiru have recently launched awesome Mali-powered standalone VR devices for this very reason and the recently released Mali-G71 GPU takes this another step further. Not only does it double the number of available cores but it also provides 40% bandwidth savings and 20% more energy efficiency to allow SoC manufacturers to strike their ideal balance between power and efficiency.

How foveated rendering displays only the immediate field of view in high resolution

Verify with vision

Another potentially game-changing use of eye tracking is for security and authentication. Retinal scanning is not an unfamiliar concept in high-end security systems so to extend the uses of eye tracking to this end is a logical step. The ability to read the user’s retinal ID for in-app purchases, secure log in and much more not only reduces boring verification steps but simultaneously makes devices and applications much more secure than they were before! So once you’ve used your unique retinal ID to access your virtual social media platform, it doesn’t stop there right? Of course not, social VR would be a weird place to hang out if your friends’ avatars never looked you in the eye. Eye tracking can take this kind of use case to a whole new level of realism and really start to provide an alternative way to catch up with people you maybe rarely get to see in person. Early examples are already inciting much excitement for the added realism of being able to interpret eye contact and body language.

Seemingly simple innovations like this can actually have a huge impact on an emerging technology like VR and provide incremental improvements to the level of quality we’re able to reach in a VR application. Foveated rendering in particular is a huge step up for bandwidth reduction in the mobile device so with advancements like these we’re getting ever closer to making VR truly mainstream.

↧

[Quick tips] Use ffmpeg to convert pictures to raw RGB565

June 4, 2016, 6:06 pm

≫ Next: Depth testing : Context is everything

≪ Previous: Eye Heart VR

Here's a quick tip to convert pictures to raw format with FFMPEG, in order to use them as a texture in OpenGL, with no extra conversion :

BMP files

ffmpeg -vcodec bmp -i /path/to/texture-file.bmp -vcodec rawvideo -f rawvideo -pix_fmt rgb565 texture.raw

PNG files

ffmpeg -vcodec png -i /path/to/texture-file.png -vcodec rawvideo -f rawvideo -pix_fmt rgb565 texture.raw

Loading a raw format picture as a texture in OpenGL

int fd = open("texture.raw", O_RDONLY);
read(fd, texture_buffer, raw_picture_file_size_in_bytes);
close(fd);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, picture_width, picture_height, 0, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, texture_buffer);
glGenerateMipmap(GL_TEXTURE_2D);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST_MIPMAP_NEAREST);

The biggest advantage is that FFMPEG will implicitly flip the picture upside-down during the conversion, meaning that the upper-left of your original texture is at UV coordinates 0.0,0.0 instead of 1.0,0.0.

Got this quick tip from BMPとRAWデータ(RGB8888/RGB565)の相互変換メモ - Qiita

↧

Depth testing : Context is everything

June 13, 2016, 1:36 am

≫ Next: Preview the Upcoming Mali Video Processor

≪ Previous: [Quick tips] Use ffmpeg to convert pictures to raw RGB565

I just lost a few hours trying to play with the Z index between draw calls, in order to try Z-layering, as advised by peterharris in my question For binary transparency : Clip, discard or blend ?.

However, for reasons I did not understand, the Z layer seemed to be completely ignored. Only the glDraw calls order was taken into account.

I really tried everything :

glEnable(GL_DEPTH_TEST);
glDepthFunc(GL_LESS);
glDepthMask(GL_TRUE);
glClearDepthf(1.0f);
glDepthRangef(0.1f, 1.0f);
glClear( GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT );

Still... each glDrawArrays drew pixels over previously drawn pixels that had an inferior Z value. Switched the value provided to glDepthFunc, switched the Z values, ... same result.

I really started to think that Z-layering only worked for one draw batch...

Until, I searched the OpenGL wiki for "Depth Buffer" informations and stumbled upon Common Mistakes : Depth Testing Doesn't Work :

Assuming all of that has been set up correctly, your framebuffer may not have a depth buffer at all. This is easy to see for a Framebuffer Object you created. For the Default Framebuffer, this depends entirely on how you created your OpenGL Context.

"... Not again ..."

After a quick search for "EGL Depth buffer" on the web, I found the EGL manual page : eglChooseConfig - EGL Reference Pages, which stated this :

EGL_DEPTH_SIZE

Must be followed by a nonnegative integer that indicates the desired depth buffer size, in bits. The smallest depth buffers of at least the specified size is preferred. If the desired size is zero, frame buffer configurations with no depth buffer are preferred. The default value is zero.
The depth buffer is used only by OpenGL and OpenGL ES client APIs.

The solution

Adding EGL_DEPTH_SIZE, 16 in the configuration array provided to EGL solved the problem.

I should have known.

↧

Preview the Upcoming Mali Video Processor

June 14, 2016, 7:59 am

≫ Next: Happy 10th Birthday Mali!

≪ Previous: Depth testing : Context is everything

We’ve recently been talking about a brand new video processor about to join the ARM Mali Multimedia Suite (MMS) of GPU, Video & Display IP. Egil, our next generation Mali video processor due for release later this year takes a step forward in functionality and performance to meet the needs of advancing video content. With more than 500 hours of video uploaded to YouTube every single minute it’s no surprise that optimizing video processing in sync with the full suite of products has been a key focus for us.

The MMS comprises software drivers and hardware optimized to work together right out of the box with the aim to maximize efficiency, enable faster time to market and vastly reduce potential support requirements. It has been designed to optimize performance between the various IP blocks through use of bandwidth saving technologies such as ARM Frame Buffer Compression (AFBC). AFBC can be implemented across the entire range of multimedia IP within an SoC and, depending on the type of content we’re talking about, this can produce bandwidth savings up to 50%. An AFBC-capable display controller or an AFBC-capable GPU can directly read the compressed frames produced by an AFBC-capable video decoder, such as Egil, reducing overall pipeline latency.

ARM approaches video processing in a different way from other IP providers. We believe it is better to provide all the codecs required in a unified Video IP solution, controlled by a single API, making it easier to develop flexible, multi-standard SoCs. To do this we analyse the codecs to be supported, establish which functions are required and develop hardware blocks to address each function - such as motion estimation & compensation, transforms, bitstreams and so on.

The hardware IP is developed as a core to operate at a set performance level, with multiple cores being used to address higher performance points. The ‘base core’ in Egil is designed to operate at 1080p60 frames per second, which will provide two Full HD encode and/or decode streams running simultaneously at 30 frames per second – assuming a 28HPM manufacturing process. To address 4K UHD 2160p at 60 frames per second (as for a 4K UHD TV) would require a four-core implementation.

At the same time as developing the hardware IP we also develop firmware to manage the video IP and interface with the host software. The firmware manages codec implementation, multi-core synchronization and communication requirements as well as additional specialist functions such as error concealment, picture reordering and rate control, saving the hardware and host CPU from getting involved in these steps at all. The result is unified video IP providing an easy to use, multi-standard, scalable solution capable of simultaneous encode and decode of multiple video streams, potentially even using different codecs at different resolutions!

Brand new in Egil is VP9 encode and decode capability, making it the first multi-standard video processor IP to support VP9 encode. We’ve also significantly enhanced HEVC encode and deliver an android reference software driver. Whilst currently OpenMaxIL based, this will be uprated to V4L2 as this is introduced to future versions of Android. This driver takes responsibility for setting up a particular video session, allocating memory, gating power dynamically and dramatically reduces the CPU load. The built in core scheduler manages multiple encode/decode streams and maps single or multiple video streams across multiple cores for maximum performance. This makes the new Mali video processor perfect for video conferencing and allows you to seamlessly share your viewed content with others. Not only that, but it means you can view multiple content streams at once, allowing you to keep one eye on the game throughout your meeting!

Another exciting aspect of ARM’s presence in the video space is our involvement with the Alliance for Open Media. As a founding member we’ve been working with leading internet companies in pursuit of an open and royalty-free AOMedia Video codec. We are heavily involved in this Joint Development Foundation project to define and develop media technologies addressing marketplace demand for a high quality open standard for video compression and delivery over the web. Timeline for the new codec, AV1, is to freeze the AV1 bitstream in Q1 2017, with first hardware support expected in the year that follows.

The multi standard nature of our new processor allows both encode and decode as well as supporting new and legacy codecs, all in a single piece of IP. Scalable to allow for every level of use case this next generation processor provides the perfect balance of efficiency and performance in the low power requirements of the mobile device. Egil is due for launch later in 2016.

↧

Happy 10th Birthday Mali!

June 17, 2016, 1:28 am

≫ Next: Pint of Science – Quenching Your Thirst for Knowledge

≪ Previous: Preview the Upcoming Mali Video Processor

Mali, the #1 shipping family of GPUs in the world, is celebrating 10 years with ARM this month! In honour of the occasion I’m going to take a look at some of the key milestones along the way and how Mali has developed to become the GPU of choice for today’s devices. Back in early 2006 Mali was just a twinkle in ARM’s eye, it wasn’t until June of that year that ARM announced the acquisition of Norwegian graphics company Falanx and ARM Mali was born.

This of course is not the real beginning of Mali’s story. Before Mali became part of the ARM family she was created by the Falanx team in Trondheim, Norway. In 1998 a small group of university students were tinkering with CPUs when someone suggested they try their hand at graphics. By 2001 a team of five had managed to prototype the Malaik 3D GPU with the intention of targeting the desktop PC market. They scouted a series of potential investors and whilst there was plenty of interest, they never quite got the support they were hoping for in order to break into the market.

Original (and shortlived) Falanx branding 2001, and their final logo, edvardsorgard's handwriting codified

Research showed them that the mobile market had the most potential for new entrants and that an IP model was potentially their best option. With that in mind, they set about building the GPU to make it happen. Having revised the architecture to target the smaller, sleeker requirements of the mobile market, the Falanx team felt the Malaik name needed streamlining too.

The four final Falanx founders

Mario Blazevic, one of the founders originally from Croatia, recognized “mali” as the Croatian word for “small” and this was deemed just right for the new mobile architecture. So, armed with the very first incarnation of Mali, they set about selling it. The prototype became Mali-55 and the SoC which featured it reached great success in millions of LG mobile phones. By this time they were six people and one development board and the dream was alive and well.

Meanwhile, ARM was very interested in the GPU market and had an eye on Falanx as a potential provider. Jem Davies, ARM fellow and VP of technology, was convinced the Falanx team’s culture, aspiration and skillset were exactly the right fit and ultimately recommended we moved forward. Over the course of a year, and a few sleepless nights for the Falanx team, the conversations were had, the value was established and the ARM acquisition of Falanx was completed on June 23^rd 2006.

The Falanx team at acquisition

In February 2007 the Mali-200 GPU was the first to be released under the ARM brand and represented the start of a whole new level of graphics performance. It wasn’t long before it became apparent that the Mali-200 had a lot of unexploited potential and so its multi-core version, the Mali-400 entered development. The first major licence proved the catalyst for success when its performance took the world by storm and Mali-400 was well on its way to where it stands today, as the world’s most popular GPU with a market share of over 20% all by itself. Mali-400 is a rockstar of the graphics game and still the go to option for power sensitive devices.

In late 2010 the continued need for innovation saw us announce the start of a ‘New Era In Embedded Graphics With the Next-Generation Mali GPU’. The Mali-T604, the first GPU to be built on the Midgard architecture, prompted a ramping up of development activities and Mali began to expand into the higher performance end of the market whilst still maintaining the incredible power efficiency so vital for the mobile form.

At Computex 2013 the Mali-V500 became the first ARM video processor and complemented the Mali range of GPUs perfectly. Now on the way to the third Mali VPU this is a product gaining more and more importance, particularly in emerging areas like computer vision and content streaming. Just a year on from that we were celebrating the launch of the Mali-DP500 display processor and the very first complete Mali Multimedia Suite became a possibility. Part of the strength of the ARM Mali Multimedia Suite is the cohesive way the products work together and fully exploit bandwidth saving technologies like ARM Frame Buffer Compression. This allows our partners to utilise an integrated suite of products and reduce their time to market. Another key Mali milestone came in mid-2014 when the Mali-T760 GPU became a record breaker by appearing in its first SoC configuration less than a year after it was launched. By the end of the year ARM partners had shipped 550 million Mali GPUs during 2014.

This year saw the launch of the third generation of Mali GPU architecture, Bifrost. Bifrost is designed to meet the growing needs and complexity of mobile content like Virtual Reality and Augmented Reality, and new generation graphics APIs like Vulkan. The first product built on the Bifrost architecture is the Mali-G71 high performance GPU for premium mobile use cases. Scalable to 32 cores it is flexible enough to allow SoC vendors to customise the perfect balance of performance and efficiency and differentiate their device for their specific target market.

Today Mali is the number 1 shipping GPU in the world, 750 million Mali-based SoCs were shipped in 2015 alone. As the Mali family of GPUs goes from strength to strength I’d like to take this opportunity to wish her and her team a very happy birthday!

↧

Pint of Science – Quenching Your Thirst for Knowledge

June 17, 2016, 7:26 am

≫ Next: Mali Graphics Debugger and Streamline Developer Survey is now live

≪ Previous: Happy 10th Birthday Mali!

Late May sees the annual Pint of Science events take place in pubs (and other somewhat unusual venues) around Cambridge, the UK and indeed the rest of the world. Designed to combine the city’s love of learning with its love of libation, the event has grown more popular every year with new venues and themes popping up every which way you look. Covering every aspect of science, and arguably every nuance of drinking culture, these volunteer run events are a great way to learn and laugh simultaneously.

ARM is always keen to support the furthering of knowledge and share some of the wisdom of our experts with the masses. The recent announcement of our acquisition of Apical, Loughborough based imaging and vision gurus, gave us a whole new thing to talk about this year: computer vision. ARM Fellow and general legend Jem Davies was on hand at the Architect pub to talk about why he thinks this is such an important industry development. He explained the different approaches required to make the most of the information we can receive from computer vision as well as the difficulty of processing and somehow interpreting the overwhelming quantity of mobile data produced every day. With 3.7 exabytes of mobile data traffic every single month, and more than half of that being mobile video, it was eye opening to think about how we can possibly store it all, let alone view it and take anything meaningful from it.

Jem explained that whilst seeing and understanding images comes naturally to us, our fundamental lack of understanding about what actually happens in the brain means it’s very difficult to provide this ability to computers. Deep learning and neural networks can come into play here so we can effectively train the brain to understand what we’re showing it and why it’s important. This is where this mountain of data can come in handy because it acts as a textbook for the computer to learn about our world and begin to recognise it.

For example, Spirit™, part of the Apical product line, is a system which takes object recognition and expands upon it to extract huge amounts of valuable data. It recognises people even in crowded and confusing situations and can be used to help assess large groups and provide early warnings of suspicious activity or potential trouble. Just think how valuable something like this could be in spotting dangerous over-crowding on a subway platform or a collapsed reveller at a concert. Not only does computer vision and deep learning enable these possibilities, but it will also be key to the mainstream adoption of things like autonomous vehicles. It will be the mechanism to allow the vehicle to see what’s happening around it and make smart decisions for safety and efficiency. It soon became clear how many exciting opportunities this kind of technology presents and I for one can’t wait to see where we can take it next.

Meanwhile, across town at La Raza some of us were getting chemical with our educational evening. Molecular cocktails were the order of the day and we were shown some super cool techniques like Gelification (yes apparently it’s a word) which allows you to make solid cocktails with a range of different techniques, the miniature jelly long island iced tea being a highlight for me. We also had the opportunity to try ‘spherification’ or the technique by which you can make tasty fruity bubbles to add to a cocktail. Using a sodium alginate solution, you can cause a skin to form around a drop of fruit syrup (or similar) holding it together whilst still keeping the centre liquid. It was great to be able to have a go at it ourselves and see just how much fun science can be.

With Pint of Science events taking place all over the world I highly recommend you check them out next time they’re in town. Not only can you learn a lot but you can also have a lot of fun in the process. Initiatives like this are really helping open up the sciences to the masses and get a lot more people interested in the tech that makes our world work.

↧

Mali Graphics Debugger and Streamline Developer Survey is now live

June 20, 2016, 6:46 am

≫ Next: Bitesize Bifrost: The benefits of clause shaders

≪ Previous: Pint of Science – Quenching Your Thirst for Knowledge

Have your say on the development of the Streamline and Mali Graphics Debugger(MGD) products. Is there a feature in MGD/Streamline that you would love to have and would make your development so much easier? Or is there particular part of MGD/Streamline that frustrates you and you have ideas on how we can improve it? Fill in the short survey below and let us know, we are always looking for feedback to improve our products in a way that matters to you.

https://www.surveymonkey.co.uk/r/developer-tools-survey

↧

Bitesize Bifrost: The benefits of clause shaders

July 5, 2016, 3:36 am

≫ Next: Stereo Reflections in Unity for Google Cardboard

≪ Previous: Mali Graphics Debugger and Streamline Developer Survey is now live

The recently released Mali™-G71 GPU is our most powerful and efficient graphics processor to date and is all set to take next generation high performance devices by storm. The Mali family of GPUs is well known for providing unbeatable flexibility and scalability in order to meet the broad-ranging needs of our customers but we’ve taken another step forward with this latest product. ARM®’s brand new Bifrost architecture, which forms the basis of the Mali-G71, will enable future generations of Mali GPUs to power all levels of devices from mass market to premium mobile. In a few short blogs I’m going to take a look at some of the key features that make Bifrost unique and the benefits they bring to ARM-powered mobile devices.

The first feature we’re going to look at is the innovative introduction of clauses for shader execution. In a traditional set up, the control flow might change between any two instructions. We therefore need to make sure that the execution state is committed to the architectural registers after each instruction and is retrieved at the start of the next. This means the instructions are executed sequentially after a scheduling decision is made before each one.

Classic Instruction Execution

The revolutionary changes ARM has implemented in the Bifrost architecture means instructions are grouped together and executed in clauses. These clauses provide more flexibility than a Very Long Instruction Word (VLIW) instruction set in that they can be of varying lengths and can contain multiple instructions for the same execution unit. However, the control flow within each clause is much more tightly controlled than a traditional architecture. Once a clause begins, execution runs from start to finish without any interruptions or loss of predictability. This means the control flow logic doesn’t need to be executed after every individual instruction. Branches may only appear at the end of clauses and their effects are therefore isolated in the system. A quad’s program counter can never be changed within a clause, allowing us to eliminate costly edge cases. Also, if you examine how typical shaders are written, you will find that they have large basic blocks which automatically make them a good fit for the clause system. Since instructions within a clause execute back-to-back without interruption, this provides us with the predictability we need to be able to optimize aggressively.

Clause Execution

As is the case in a classic instruction set, the instructions work on values stored in a register file. Each instruction reads values from the registers and then writes the results back to the same register file shortly afterwards. Instructions can then be combined in sequence due to the knowledge that the register retains its written value.

The register file itself is generally something of a power drain due to the large numbers of accesses to the register file. Since wire length contributes to dynamic power (long wires have more capacitance), the larger the register file, or the further away it is, the higher the power requirement to address it. The Bifrost architecture allocates a thread of execution to exactly one execution unit for its entire duration so that its working values can be stored in that Arithmetic Logic Unit (ALU)’s register file close by. Another optimization uses the predictability to eliminate back-to-back accesses to the register file, further reducing the overall power requirements for register access.

In a fine-grained, multi-threaded system we need to allow threads to request variable-latency operations, such as memory accesses, and sleep and wake, very quickly. We implement this using a lightweight dependency system. Dependencies are discovered by the compiler, which removes runtime complexity, and each clause can both request a variable-latency operation and also depend on the results of previous operations. Clauses always execute in order, and may continue to execute even if unrelated operations are pending. While waiting for a previous result, clauses from other quads can be scheduled, and this gives us a lot of run-time flexibility to deal with variable latencies with manageable complexity. Again, by executing this only at clause boundaries we reduce the power cost of the system.

The implementation of clause shaders not only reduces the overhead by spreading it across several instructions but it also guarantees the sequential execution of all instructions contained in a clause and allows us significant scope for optimization due to the predictability and overall power saving. This is just one of the many features of the Bifrost architecture which will allow new Mali based systems to perform more efficiently than ever before, including for high end use cases such as virtual reality and computer vision.

Many thanks to seanellis for his technical wizardry and don't forget to check back soon for the next blog in the Bitesize Bifrost series!

↧

Stereo Reflections in Unity for Google Cardboard

July 5, 2016, 4:24 am

≫ Next: Vulkan & Validation Layers

≪ Previous: Bitesize Bifrost: The benefits of clause shaders

Introduction

Developers have used reflections extensively in traditional game development and we can therefore expect the same trend in mobile VR games. In a previous blog I discussed the importance of rendering stereo reflections in VR to achieve a successful user experience and demonstrated how to do this on Unity. In this blog I demonstrate how to render stereo reflections in Unity specifically for Google Cardboard because, while Unity has built-in support for Samsung Gear VR, for Google Cardboard it uses the Google VR SDK for Unity.

This latest VR SDK supports building VR applications on Android for both Daydream and Cardboard. The use of an external SDK in Unity leads to some specific differences when implementing stereo reflections. This blog addresses those differences and provides a stereo reflection implementation for Google Cardboard.

Combined reflections – an effective way of rendering reflections

In previous blogs ¹^,² I discussed the advantages and limitations of reflections based on local cubemaps. Combined reflections have proved an effective way of overcoming the main limitation of this rendering technique derived from the static nature of the cubemap. In the Ice Cave demo, reflections based on local cubemaps are used to render reflections from static geometry while planar reflections rendered at runtime using a mirrored camera are used to render reflections from dynamic objects.

Figure 1 Combining reflections from different types of geometry.

The static nature of the local cubemap does have a positive impact in that it allows for faster and higher quality rendering. For example, reflections based on local cubemaps are up to 2.8 times faster than planar reflections rendered at runtime. The fact that we use the same texture every frame guarantees high quality reflections with no pixel instabilities which are present with other techniques that render reflections to texture every frame.

Finally, as there are only read operations involved when using static local cubemaps, the bandwidth use is halved. This feature is especially important in mobile devices where bandwidth is a limited resource. The conclusion here is that when possible, use local cubemaps to render reflections. When combining with other techniques they allow us to achieve higher quality at very low cost.

In this blog I show how to render stereo reflections for Google Cardboard for reflections based on local cubemaps and runtime planar reflections rendered using the mirrored camera technique. We assume here the shader of the reflective material that combines both reflections from static and dynamic objects to be the same as in the previous blog.

Rendering stereo planar reflections from dynamic objects

In the previous blog I showed how to set up the cameras responsible for rendering planar reflections for left and right eyes. For Google Cardboard we need to follow the same procedure but when creating the cameras we need to correctly set the viewport rectangle as shown below:

Figure 2. Viewport settings for reflection cameras.

The next step is to attach to each reflection camera the below script:

void OnPreRender() {
      SetUpReflectionCamera ();
     // Invert winding
     GL.invertCulling = true;
}
void OnPostRender() {
     // Restore winding
     GL.invertCulling = false;
}

The method SetUpReflectionCamera positions and orients the reflection camera. Nevertheless its implementation differs from the implementation provided in the previous blog. The Android VR SDK directly exposes the main left and right cameras that appear in the hierarchy as children of the Main Camera:

Figure 3. Main left and right cameras exposed in the hierarchy.

Note that LeftReflectionCamera and RightReflectionCamera game objects appear disabled because we render those cameras manually.

As we can directly access the main left and right cameras the SetUpReflectionCamera method can build the worldToCameraMatrix of the reflection camera without any additional steps:

void SetUpCamera(){

// Set up reflection camera

// Find out the reflection plane: position and normal in world space

Vector3 pos = chessBoard.transform.position;

// Reflection plane normal in the direction of Y axis

Vector3 normal = Vector3.up;

float d = -Vector3.Dot(normal, pos) - clipPlaneOffset;

Vector4 reflectionPlane = new Vector4(normal.x, normal.y, normal.z, d);

Matrix4x4 reflectionMatrix = Matrix4x4.zero;

CalculateReflectionMatrix(ref reflectionMatrix, reflectionPlane);

// Update left reflection camera considering main left camera position and orientation

Camera reflCamLeft = gameObject.GetComponent<Camera>();

// Set view matrix

Matrix4x4 m = mainLeftCamera.GetComponent<Camera>().worldToCameraMatrix * reflectionMatrix;

reflCamLeft.worldToCameraMatrix = m;

// Set projection matrix

reflCamLeft.projectionMatrix = mainLeftCamera.GetComponent<Camera>().projectionMatrix;

}

The code snippet shows the implementation of the SetUpCamera method for the left reflection camera. The mainLeftCamera is a public variable that must be populated by dragging and dropping the Main Camera Left game object. For the right reflection camera the implementation will be exactly the same but use instead the Main Camera Right game object.

The implementation of the function CalculateReflectionMatrix is provided in the previous blog.

The rendering of the reflection cameras is handled by the main left and right cameras. We attach the script below to the main right camera:

using UnityEngine;

using System.Collections;

public class ManageRightReflectionCamera : MonoBehaviour {

public GameObject reflectiveObj;

public GameObject rightReflectionCamera;

private Vector3 rightMainCamPos;

void OnPreRender(){

rightReflectionCamera.GetComponent<Camera> ().Render ();

reflectiveObj.GetComponent<Renderer> ().material.SetTexture ("_ReflectionTex",

rightReflectionCamera.GetComponent<Camera> ().targetTexture);

rightMainCamPos = gameObject.GetComponent<Camera> ().transform.position;

reflectiveObj.GetComponent<Renderer> ().material.SetVector ("_StereoCamPosWorld",

new Vector4(rightMainCamPos.x, rightMainCamPos.y, rightMainCamPos.z, 1));

}

This script issues the rendering of the right reflection camera and updates the reflection texture _ReflectionTex in the shader of the reflective material. Additionally, the script passes the position of the right main camera to the shader in world coordinates.

A similar script is attached to the left main camera to handle the rendering of the left reflection camera. Replace the public variable rightReflectionCamera with leftReflectionCamera.

The reflection texture _ReflectionTex is updated in the shader by the left and right reflection cameras alternately. It is worth to check in the shader that the reflection cameras are in sync with the main camera rendering. We can set the reflection cameras to update the reflection texture with different colours. The screenshot below taken from the devices shows a stable picture of the reflective surface (chessboard) for each eye.

Figure 4. Left/Right main camera synchronization with runtime reflection texture.

The OnPreRender method in the script can be further optimized, as it was in the previous blog, to ensure that it only runs when the reflective object needs to be rendered. Refer to the previous blog for how to use the OnWillRenderObject callback to determine when the reflective surface needs to be rendered.

Rendering stereo reflections based on local cubemap from static objects

To render reflections based on static local cubemaps we need to calculate the reflection vector in the fragment shader and apply the local correction to it. The local corrected reflection vector is then used to fetch the texel from the cubemap and render the reflection¹. Rendering stereo reflections based on static local cubemaps means that we need to use different reflection vectors for each eye.

The view vector D is built in the vertex shader and is passed as a varying to the fragment shader:

D = vertexWorld - _WorldSpaceCameraPos;

In the fragment shader, D is used to calculate the reflection vector R, according to the expression:

R = reflect(D, N);

where N is the normal to the reflective surface.

To implement stereo reflections we need to provide the vertex shader with the positions of the left and right main cameras to calculate two different view vectors and thus two different reflection vectors.

The last instruction in the scripts attached to the main left and right cameras sends the position of the main left/right cameras to the shader and updates the uniform _StereoCamPosWorld. This uniform is then used in the vertex shader to calculate the view vector:

D = vertexWorld - _StereoCamPosWorld;

Once reflections from both static and dynamic objects have been implemented in “stereo mode” we can feel the depth in the reflections rendered in the chessboard when seen through the Google Cardboard headset.

Figure 5. Stereo reflections on the chessboard.

Conclusions

The local cubemap technique for reflections allows rendering of high quality and efficient reflections from static objects in mobile games. When combined with other techniques it allows us to achieve higher reflection quality at very low cost.

Implementing stereo reflections in VR contributes to the realistic building of our virtual world and achieving the sensation of full immersion we want the VR user to enjoy. In this blog we have shown how to implement stereo reflections in Unity for Google Cardboard with minimum impact on performance.

References

↧

Vulkan & Validation Layers

July 6, 2016, 3:01 am

≫ Next: Mali Graphics Debugger V4.0 Released

≪ Previous: Stereo Reflections in Unity for Google Cardboard

Why the validation layers?

Unlike OpenGL, Vulkan drivers don't have a global context, don't maintain a global state and don't have to validate inputs from the application side. The goal is to reduce CPU consumption by the drivers and give applications a bit more freedom in engine implementation. This approach is feasible because a reasonably good application or game should not provide an incorrect input to the drivers in release mode and all the internal checks driver usually do are therefore a waste of CPU time. However, during development/debugging stages, an invalid input detecting mechanism is a useful and powerful tool which can make a developer's life a lot easier. As a new feature in the Vulkan driver all input validations have been moved into a separate standalone module called the validation layers. While debugging or preparing the graphics application to release, running the validation layers is a good self-assurance that there are no obvious mistakes being made by the application. While "clean" validation layers don't necessarily guarantee a bug-free application, they’re a good step towards a happy customer. The validation layers is an open source project which belongs to Khronos community so everyone is welcome to contribute or raise an issue: https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/issues

My application runs OK on this device. Am I good to ship it?

No you are not! Vulkan specifications are the result of contribution from multiple vendors and as such there is a list of functionalities that Vulkan API offers that can be used for Vendor A, but may be somewhat irrelevant to Vendor B. This is especially true for Vulkan operations that are not directly observable by applications, for instance layout transitions, execution of memory barriers etc. While applications are required to manage resources correctly, you don't know what exactly happens on a given device when, for example, memory barrier is executed on an image sub-resource. In fact, it depends heavily on the specifics of the memory architectures and GPU. From this perspective, mistakes in areas such as sharing of the resources, layout transitions, selecting visibility scopes and transferring resource ownership may have different consequences on different architectures. This is really a critical point as incorrectly managed resources may not show up on this device due to the implementation options chosen by the vendor, but may prevent the application from running on another device, powered by another vendor.

Frequently observed application issues with the Vulkan driver on Mali.

External resources ownership.

Resources like presentable images are treated as external to the Vulkan driver, meaning that it doesn’t have ownership of them. The driver obtains a lock of such an external resource on a temporary basis to execute a certain rendering operation or a series of rendering operations. When this is done the resource is released back to the system. When ownership is changed to be the driver's, the external resource has to be mapped and get valid entries in MMU tables in order to be correctly read/written on GPU. Once graphics operations involving the resource are finished it has to be released back to the system and all the MMU entries invalidated. It is the application's responsibility to tell the driver at which stage the given external resource ownership is supposed to be changed by providing this information as a part of render pass creation structure or as a part of the execution of a pipeline barrier.

Ex.When the presentable resource is expected to be in use by the driver layouts are transitioned from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR to VK_IMAGE_LAYOUT_GENERAL or VK_IMAGE_LAYOUT_COLOR{DEPTH_STENCIL}_ATTACHMENT_OPTIMAL. When rendering to the attachment is done and it's expected to be presented on display, layouts need to be transitioned back to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.

Incorrectly used synchronization

Vulkan Objects lifetime is another critical case in Vulkan applications. The Application must ensure that Vulkan objects, or the pools they were allocated from, are destroyed or reset only when they are no longer in use. The consequence of incorrectly managing object lifetimes is unpredictable. The most likely problem is MMU faults that will result in rendering issues and losing of a device. Most of these situations can be caught and reported by validation layers, for example, if the application is trying to reset a command pool while the command buffer which was allocated from it is still in flight; the validation layers should intercept it with the following report:

[DS] Code 54: Attempt to reset command pool with command buffer (0xXXXXXXXX)which is in use

Another example. When the application is trying to record commands into the command buffer which is still in flight, the validation layers should intercept it with the following report:

[MEM] Code 9: Calling vkBeginCommandBuffer() on active CB 0xXXXXXXXX before it has completed.

You must check CB fence before thiscall.

Memory requirements violation.

Vulkan applications are responsible for providing a memory backing image or buffer object via the appropriate calls to vkBindBufferMemory or vkBindImageMemory. The application must not make assumptions about appropriate memory requirements for an object even if it's, for example, a vkImage object created with VK_IMAGE_TILING_LINEAR tiling, as there is no guarantee of contiguous memory. Allocations must be done based on size and alignment return values from vkGetImageMemoryRequirements or vkGetBufferMemoryRequirements. Data upload to the subresource must then be done with respect to sub-resource layout values like offset to the start of sub-resource, size, row/array/depth pitch values. Violation of memory requirements for a Vulkan object can often result in segmentation faults or MMU faults on GPU and eventually VK_ERROR_DEVICE_LOST. It’s recommended to run validation layers as a means of protection against these kind of issues. While validation layers can detect situations like memory overflow, cross object memory aliasing, mapping/unmapping issues; insufficient memory being bound isn't currently detected by the validation layers for today.

↧

Mali Graphics Debugger V4.0 Released

July 7, 2016, 12:30 am

≫ Next: ARM Mali GPUs: Striking the perfect balance of power & efficiency

≪ Previous: Vulkan & Validation Layers

Recently we released V4.0 of the Mali Graphics Debugger. This is a key release that greatly improves the Vulkan support in the tool. The improvements are as follows:

Frame Capture has now been added to Vulkan: This is a hugely popular feature that has been available for OpenGL ES MGD users for several years. Essentially it is a snapshot of your scene after every draw call as it is rendered on target. This means if there is a rendering defect in your scene you immediately know which draw call is responsible. It is also a great way to see how your scene is composed, which draw calls contribute to your scene, and which draw calls are redundant.

Property Tracking for Vulkan: As MGD tracks all of the API calls that occur during an application it has pretty extensive knowledge of all of the graphics API assets that exist in the application. This spans everything from shaders to textures. Here is a list of Vulkan assets that are now tracked in MGD: pipelines, shader modules, pipeline layouts, descriptor pools, descriptor sets, descriptor set layouts, images, image views, device memories, buffers and buffer views.

Don't forget you can have your say on features we develop in the future by filling out this short survey:

https://www.surveymonkey.co.uk/r/developer-tools-survey

↧