Quantcast
Channel: ARM Mali Graphics
Viewing all 266 articles
Browse latest View live

ARM Mali GPUs: Striking the perfect balance of power & efficiency

$
0
0

Some devices, applications or use cases require the absolute peak of performance capability in order to deliver on their requirements. Some devices, applications or use cases however, need to save every little bit of energy expenditure in order to deliver extended battery power and run within the bounds of a thermally limited form factor. So how do we decide which end of the spectrum to target? Here in Team Mali, we don’t. Mali, the number 1 shipping GPU in the world, has reached such heights partly because it is able to target every single use case across this range. From the most powerful of mobile VR headsets needing lightning-fast refresh rates, to the tiniest of smartwatches required to run for as long as physically possible, there really is a Mali GPU for every occasion.

MALI RGB 2015.jpg

This mini-series of blogs will first introduce the overall scalability and flexibility of the ARM Mali range before taking a deeper dive into two products from either end of the spectrum. We will examine how these products have incorporated Mali in order to target the perfect balance of performance and efficiency their device requires. Not only does this flexibility help our partners reduce their time to market but it also means they can carefully balance resources to target the ideal positioning for their product.

 

So many choices so little time

There are three product roadmaps in the Mali family; Ultra low power, High area efficiency and High performance and these groupings allow partners to easily select the right set of products for their device’s needs. The Ultra low power range includes the Mali-400 GPU, one of the first in the ARM range of GPUs and still the world’s favourite option with over 25%* market share all by itself. The latest product in this roadmap is Mali-470, featuring advanced energy saving features to bring smartphone quality graphics to low power devices like wearables and Internet of Things applications. It halves the power consumption of the already hyper efficient Mali-400 in order to provide even greater device battery life and extended end use.

 

The high area efficiency roadmap is focused around providing optimum performance in the smallest possible silicon area to reduce cost of production for mass market smartphones, tablets and DTVs. IP in this roadmap includes Mali-T820 & Mali-T830, a pairing of products which incorporates the cost and energy saving features of their predecessor, Mali-T720, with the superior power of the simultaneously released high performance Mali-T860. The first cost efficient ARM Mali GPUs to feature ARM Frame Buffer Compression, these represented a big step up in terms of the flexibility to balance power and performance.

 

The high performance roadmap is exactly as you might expect based on the name. It features the latest and greatest in GPU design to optimize performance for high end use cases and premium mobile devices. The Mali-T880 represents the highest performing GPU based upon ARM’s famous Midgard architecture and is powering many of today’s high end devices including the Samsung Galaxy S7, the Huawei P9 smartphone as well as a whole host of awesome standalone VR products. You may have read recently of our brand new high performance GPU on the market, Mali-G71. The change in naming format indicates another step up in Mali GPU architecture with the advent of the Bifrost architecture. The successor to Midgard, Bifrost has been strategically designed to support Vulkan, the new graphics API from Khronos, which is giving developers a lot more control as well as a great new feature set especially for mobile graphics. Not only that but it’s also been designed to exceed the requirements of today’s advanced content, like 360 video and high end gaming, and support the advanced requirements of growing industries like virtual reality, augmented reality and computer vision.

  Mali-G71-Chip-Diagram.png

The possibilities are endless…

A large part of the flexibility inherent in the Mali range of products is down to the inbuilt scalability. Mali-400 came into being as the first dual core implementation of the original Mali-200 GPU once it became apparent there was a lot to be gained from this approach. High end Midgard based GPUs like Mali-T860 and Mali-T880 scale from 1 to 16 cores to allow even greater choice for our partners. We’ve seen configurations featuring up to 12 of those available cores at the top end of today’s premium smartphone to support specific use cases like mobile VR, where the requirements push the boundaries of mobile power limits. The new Bifrost GPU, Mali-G71, takes that to another level again with the ability to scale up to a possible 32 cores. The additional options were deemed necessary in order to comfortably support not only today’s premium use cases like mobile VR, but also allow room to adapt to the growing content complexity we’re seeing every day.

 

After the customer has established their required number of cores there is still a lot of scope for flexibility within the configuration itself. Balances can be reached between power, performance and efficiency in the way the chipset is implemented in order to provide another level of customizable options. The following images show a basic example of the flexibility inherent in the configuration of just one Mali based chipset but this is just the tip of the iceberg.

  config table.png

 

Example optimization points of one Mali GPU

config graph.png

 

 

Practical application

In the next blog we’ll be examining an example of a Mali implementation in a current high performance device and how the accelerated performance and graphical capability supports next-level mobile content. Following on from that we’ll look at a device with requirements to keep power expenditure to a minimum and how Mali’s superior power and bandwidth saving technologies have been implemented to achieve this. The careful balance between power and efficiency is an eternal problem in the industry but one we are primed to address with the flexibility and scalability of the ARM Mali range.

 

*Unity Mobile (Android) Hardware Stats 2016-06


What we’re up to at SIGGRAPH 2016

$
0
0

This year’s Siggraph is the 43rd international conference and exhibition on Computer Graphics & Interactive Techniques and takes place from the 24th to 28th July in Anaheim, California. A regular event on the ARM calendar, we’re looking forward to another great turn out with heaps to do and see from all the established faces in the industry as well as some of the hot new tech on the scene.

siggraph.JPGMoving Mobile Graphics

A particularly exciting part of Siggraph this year is the return of the popular Moving Mobile Graphics course. Taking place on Sunday 24th July from 2pm to 5.15pm, this half day course will take you through a technical introduction to the very latest in mobile graphics techniques, with particular focus on mobile VR. Talks and speakers will include:

  • Welcome & Introduction - Sam Martin, ARM
  • Best Practices for Mobile - Andrew Garrard, Samsung R&D UK
  • Advanced Real-time Shadowing - mbjorge, ARM
  • Video Processing with Mobile GPUs - Jay Yun, Qualcomm
  • Multiview Rendering for VR - Cass Everitt, Oculus
  • Efficient use of Vulkan UE4 - Niklas Smedberg, Epic Games
  • Making EVE: Gunjack - Ray Tran, CCP Games Asia

Visit the course page for more information. Slides will be available after the event so sign up to our Graphics & Multimedia Newsletter to be sure to receive all the latest in ARM Mali news.

 

Tech Talk

We’ll also be giving a great talk on Practical Analytic 2D Signed-Distance Field Generation. Unlike existing methods, instead of first rasterizing a path to a bitmap and then deriving the SDF, we can calculate the minimum distance for each pixel to the nearest segment directly from a path description comprised of line segments and Bezier curves. Our method is novel because none of the existing techniques work in vector space and our distance calculations are done in canonical quadratic space so be sure to come along to Ballroom B on Thursday from 15:45-17:15 to learn about this ground breaking technique.

 

Poster session

Elsewhere at the event we’ll be talking about Optimized Mobile Rendering Techniques Based on Local Cubemaps. The static nature of the local cubemap allows for faster and higher quality rendering and the fact that we use the same texture every frame guarantees high quality shadows and reflections with none of the pixel instabilities which are present with other runtime rendering techniques. Also, as there are only read operations involved when using static cubemaps, the bandwidth use is halved which is especially important in mobile devices where bandwidth must be carefully balanced at runtime. Our Connected Community members have already produced a number of blogs on this subject and have demonstrated how to work with soft dynamic shadows, reflections and refractions amongst other great techniques. Check these out here and come along at the event to speak to our experts!

Pokemon Go-es to show the power of AR

$
0
0

Ok I did it. I downloaded Pokemon Go. Yes I was trying to resist, yes it was futile, yes it’s an awesome concept. Whilst a strong believer in Virtual Reality as a driving force in how we will handle much of our lives in the future (see my extensive blog series on the subject), I can see that apps like this have the potential to take Augmented Reality (AR) mainstream much faster than VR. What with the safety (and aesthetic) issues inherent in walking round with a headset on, AR allows you to enter a semi immersive environment but still see the world around you. Although that fact doesn’t negate the need for a warning not to walk blindly into traffic mid-game. By overlaying graphics, user interface and interactive elements over the real world environment we can experience a much more ‘real life’ feel to gaming. The fact that it also gets a generation of console gamers on their feet and out into the big wide world is just an added bonus.

 

It turns out Pokemon Go isn’t the company’s first attempt at this kind of application. Back in 2012 they sought users to test a beta version of a similar real world game based on spies. The idea was that you followed the map on your phone to relevant locations to solve puzzles, make drops etc. You could argue that the reason this has taken off when that didn’t is that it now has the marketing superpower of Pokemon and Nintendo behind it, but I think it’s a little more than that. All anyone in the tech industry has been talking about in recent months is VR, AR and Computer Vision and this uses two of the three straight away. Not only that but it does so in a form that’s accessible to absolutely everyone with a smartphone (and in its early days, an external battery pack for those who want to use it for more than about ten minutes).

pokemon.JPGross.hookway& alexmercer Catching the Pokemon bug at ARM Cambridge campus

 

The idea of playing an adventure style game in my home city appeals to me anyway. The fact that Pokemon Go overlays itself onto your actual surroundings, rather than just as a point on an animated map, makes it a whole lot more relatable. This is where Computer Vision comes in, as your phone has to be able to recognise and interpret the locations and landmarks it sees in order to use AR to realistically overlay the Pokemon onto your surroundings. Without computer vision it could prove difficult to avoid bugs like trapping Pokemon in unreachable environments, or enticing people into dangerous situations.

 

There’s been something of a misconception that you need ‘special’ computer vision chips to be able to do things like computer vision, and that the subsequent additional silicon is unfeasible in the mobile form factor, but this just isn’t the case. Not only can you actually do this level of basic computer vision exclusively on the CPU but some companies also have an engine which can recognize if your device has an ARM Mali GPU and automatically redirect some of the workload to it. Not only does this free up the processing power and bandwidth of the CPU but it also allows us to access the superior graphical capabilities of the existing GPU with no need for additional hardware.

 

The huge and lightning fast adoption of Pokemon Go, in spite of its quite considerable bugs and glitches, demonstrates just how keen we are jump on board with the next big thing in smartphones. It also shows that a new, and potentially confusing, technology can reach global uptake simply due to clever and compelling packaging. Whilst I fully expect the game to be optimized and bug free in a very short time, it will also no doubt prompt a wave of similar concept applications. I’ll be interested to see how this develops and whether (or maybe when) it will make AR truly the next big thing.

Nibiru’s next-generation VR headsets harness the power of Mali

$
0
0

In the first of my Mali™ Power & Efficiency blogs we looked at the inbuilt flexibility and scalability of the Mali range of GPUs. It’s this that allows ARM® partners to target exactly the right performance and efficiency balance to suit their specific product, whether that’s a low power smartwatch or a top of the range premium smartphone. In this blog we’re going to take a deeper dive into the high performance end of this spectrum and look at one key ecosystem partner, Nibiru, who are implementing High Performance Mali GPUs in their range of awesome, standalone VR headsets.

 

Anyone who’s read my previous blogs on the growing market for Virtual Reality knows I feel strongly that VR and AR are set to change the way we work, live and play. Like most things in life however, it’s not that simple. In order to achieve a truly immersive, high quality VR experience there are some technical challenges we need to overcome. We’ve discussed the need for clear focus to help our brains believe what we’re seeing, we’ve talked about how low latency is key to avoiding nausea and dizziness, and we’ve looked at the future of the field with eye tracking and foveated rendering.

 

As you can see from the image below, there are a lot of intricate elements to be balanced in a VR headset and ensuring each of them is just right is not an easy task. By featuring a high quality, WQHD (2560x1440p), 5.67” Samsung AMOLED display, Nibiru ensures the user can experience the clearest imagery with the crispest possible colors due to the advanced technology of the screen. Every single pixel in an AMOLED display provides its own light source through the film which sits behind it, whereas a typical LCD screen is continuously backlit by white LEDs. Because colors are achieved by individually updating the colored LEDs behind the screen, it is possible to get brighter and sharper hues with stronger saturation. You can also turn off sections of the panel to achieve a deeper, truer black than is typically possible on a continuously-lit LCD. This is also beneficial for VR due to the latency reduction benefits discussed previously.

nib1.png

 

 

So how do we make all this come together into a truly awesome VR product? The answer is power.

A High Performance GPU is essential to achieving a truly great VR experience and Nibiru recognised this when they started designing their VR products. Focusing on mobile VR, Nibiru initially launched their VR OS and VR Launcher to support virtual reality via smartphone and their VR ROM when they began designing standalone devices. With around three million headsets shipped in 2016 so far, this is a company getting ahead of the VR curve. Their latest high end product, the Pro One Plus, is due for release towards the end of 2016 and uses one of the most powerful Mali-based SoCs available, the Samsung Exynos 8890. This SoC features an MP12 configuration of Mali-T880, the highest performing Mali GPU currently appearing in devices. Powering the Samsung Galaxy S7, and therefore the Samsung Gear VR, the Exynos 8890 has already proven its merits in the high performance smartphone space and is a perfect fit for a standalone VR device like Nibiru’s.

 

The Exynos implementation of MP12 is the highest number of cores we’ve seen in a Mali-T880 based chipset but we’re due for yet another step up with the recently released Mali-G71 which can scale up to 32 cores, double that available in the Mali-T880. Operating on Nibiru’s in-house VR OS this new device has 3GB RAM, 32GB in-built memory, HDMI input and supports customized third party VR apps for gaming, video streaming and more. It’s also optimized for Google Play and YouTube to make sure you never run out of awesome content.

nib2.pngPro One Plus provisional design (powered by Nibiru)

 

So why did Nibiru choose ARM Mali to power their devices? Nibiru co-founder Tony Chia explained that it was very important to them to choose a GPU that could effectively provide sufficient performance levels to ensure a smooth VR experience with minimal latency. He went on to explain that ‘user experience is very important to us and to make sure we can bring a great mobile VR experience to the mass market we had to have the right hardware in place from the beginning. Our initial focus has been around providing excellent VR video and experience based applications rather than high end gaming due the challenges of interacting with a virtual environment. ARM Mali GPUs allow us to choose an SoC that gives us peak performance whilst still saving power and extending battery life as long as possible.’  Not only are Mali GPUs scalable to allow multiple core implementation options but even the very way the chipset is configured allows vast scope for customization too. ARM Mali’s specialized bandwidth saving technologies like ARM Frame Buffer Compression (AFBC)and Adaptive Scalable Texture Compression (ASTC) contribute to efficiency by reducing bandwidth and freeing up power where it’s needed most.

 

With its sleek wireless design, Nibiru’s next generation, standalone VR headset represents the future of mobile VR. As we continue to work together on the ARM & Nibiru Joint Innovation Lab we aim to help streamline the game development process and enable fantastic content to complement it. Here in the ARM Mali team we can’t wait to see what they come up with next!

IndieDev innovation leads the VR revolution

$
0
0

As a graphics blogger I’m always interested in the next big thing in tech and particularly, VR, so when I saw a news item about a guy cycling the length of the country from the comfort of his living room I just had to know more.

 

I, for one, hate spin classes. I go because I know logically that it’s doing me good but my common complaint (to just about anyone who’ll listen) is ‘but what’s the point of pedalling your backside off when you’re going nowhere?!’ Well it’s almost as if innovative developer Aaron Puzey heard my lament and decided to address it with his Cycle VR app, so clearly I had to track him down. A little bit of cyber stalking and a desperate bid for more information led me to the kind of innovation that is at the heart of what we’re trying to do in bringing VR and AR to the mass market.

 

In deciding which platform to target in developing his app, Aaron realised that whilst console and desktop options are taking off with releases like the HTC Vive, ‘In two years time EVERYONE will have a phone capable of VR. It seems like an obvious market to head for.’ In choosing the Samsung Gear VR as his mobile platform of choice, Aaron was able to utilize the superior visual quality of the Galaxy S6’s AMOLED display as well as the high powered Mali-T760 MP8 GPU as part of the inbuilt Exynos 7420 SoC. The Mali High Performance range of GPUs supports the demanding performance requirements of VR whilst saving maximum power and bandwidth to ensure a super slick experience.

AP2.jpgThe stereoscopic display allows for minor differences between each eye, providing a sense of depth

 

We know that some of the key challenges to a successful VR experience are latency, framerate and resolution but these tricky areas didn’t actually present a problem for Aaron. As he was working with a single mesh and single texture with very little geometry to add complexity, he was able to achieve the quality he needed without too many issues. The biggest struggle was transforming the data from Google Maps Street View into a working 3D model because of the lack of available information but with a little digging Aaron was able to make use of work already done on this elsewhere. He initially attempted to stream the data live but the lack of multithreading in Unity meant this was causing a stall on every new texture load. He’s discovered the best way around this is to the cache the required data prior to each session and run it offline until a workaround to prevent the stalls can be found. The camera moves smoothly from one panorama to the next, producing some visual distortion but keeping the motion of the bike as realistic as possible.

 

We’ve seen lots of different ways of navigating a VR environment beginning to crop up, from VR chairs and stools which respond like a Segway when you lean in the direction of travel, to fully encapsulated treadmills. The latter let you move, walk and run freely around your virtual environment without the risk of crashing into people, pets or objects. However, instead of relying on expensive, dedicated hardware like these, Aaron simply customised his own existing exercise bike using a simple cadence monitor to record the RPM. Whilst it doesn’t measure the amount of effort put in, just the distance travelled, with the simple addition of adjusting the bike’s friction setting to emulate real road conditions, Aaron could get a pretty accurate output.

AP1.jpg

So, whilst still in its early stages, Aaron has high hopes for the project and is looking for the right partner to take it to the broader market. With plans to enhance the user experience, including adding multiplayer capability so you can race your friends cross country, I for one can’t wait to get my hands on the commercial version and ditch dull spin classes for good!

 

Got a great developer story? Get in touch!

 

Twitter: @FreddiJeffries

The Mali GPU: An Abstract Machine, Part 4 - The Bifrost Shader Core

$
0
0

We have recently announced the first GPU in the Mali Bifrost architecture family, the Mali-G71. While the overall rendering model it implements is similar to previous Mali GPUs the Bifrost family is still a deeply pipelined tile-based renderer (see the first two blogs in this series The Mali GPU: An Abstract Machine, Part 1 - Frame Pipelining and The Mali GPU: An Abstract Machine, Part 2 - Tile-based Rendering for more information) there are sufficient changes in the programmable shader core to require a follow up to the original "Abstract Machine" blog series.

 

In this blog, I introduce the block-level architecture of a stereotypical Bifrost shader core, and explain what performance expectations application developers should have of the hardware when it comes to content optimization and understanding the hardware performance counters exposed via tools such as DS-5® Streamline. This blog assumes you have read the first two parts in the series, so I would recommend starting with those if you have not read them already.

 

GPU Architecture

 

The top-level architecture of a Bifrost GPU is the same as the earlier Midgard GPUs.

 

mali-top-level.png

 

 

The Shader Cores

 

Like Midgard, Bifrost is a unified shader core architecture, meaning that only a single class of shader core which is capable of executing all types of shader programs and compute kernels exists in the design.

 

The exact number of shader cores present in a particular silicon chip varies; our partners can choose how many shader cores they implement based on their performance needs and silicon area constraints. The Mali-G71 GPU can scale from a single core for low-end devices all the way up to 32 cores for the highest performance designs.

 

Work Dispatch

 

The graphics work for the GPU is queued in a pair of queues, one for vertex/tiling/compute workloads and one for fragment workloads, with all work for one render target being submitted as a single submission into each queue.

 

The workload in each queue is broken into smaller pieces and dynamically distributed across all of the shader cores in the GPU, or in the case of tiling workloads to a fixed function tiling unit. Workloads from both queues can be processed by a shader core at the same time; for example, vertex processing and fragment processing for different render targets can be running in parallel (see the first blog for more details on this pipelining methodology).

 

Level 2 Cache and Memory Bandwidth

 

The processing units in the system share a level 2 cache to improve performance and to reduce memory bandwidth caused by repeated data fetches. The size of the L2 cache is configurable by our silicon partners depending on their requirements, but is typically 64KB per shader core in the GPU.

 

The number of bus ports out of the GPU to main memory, and hence the available memory bandwidth, depends on the number of shader cores implemented. In general we aim to be able to write one 32-bit pixel per core per clock, so it would be reasonable to expect an 8-core design to have a total of 256-bits of memory bandwidth (for both read and write) per clock cycle. The maximum number of AXI ports has been increased over Midgard allowing larger configurations with more than 12 cores to access a higher peak-bandwidth per clock if the downstream memory system can support it.

 

Note that the available memory bandwidth depends on both the GPU (frequency, AXI port width) and the downstream memory system (frequency, AXI data width, AXI latency). In many designs the AXI clock will be lower than the GPU clock, so not all of the theoretical bandwidth of the GPU is actually available to applications.

 

The Bifrost Shader Core

 

All Mali shader cores are structured as a number of fixed-function hardware blocks wrapped around a programmable core. The programmable core is the largest area of change in the Bifrost GPU family, with a number of significant changes over the Midgard "Tripipe" design discussed in the previous blog in this series:

 

Blog2.png

 

The Bifrost programmable Execution Core consists of one or more Execution Engines – three in the case of the Mali-G71 – and a number of shared data processing units, all linked by a messaging fabric.

 

The Execution Engines

 

The Execution Engines are responsible for actually executing the programmable shader instructions, each including a single composite arithmetic processing pipeline as well as all of the required thread state for the threads that the execution engine is processing.

 

The Execution Engines: Arithmetic Processing

 

The arithmetic units in Bifrost implement a quad-vectorization scheme to improve functional unit utilization. Threads are grouped into bundles of four, called a quad, and each quad fills the width of a 128-bit data processing unit.  From the point of view of a single thread this architecture looks like a stream of scalar 32-bit operations, which makes achieving high utilization of the hardware a relative straight forward task for the shader compiler. The example below shows how a vec3 arithmetic operation may map onto a pure SIMD unit (pipeline executes one thread per clock):

 

Blog3.png

 

 

 

 

 

 

 

 

 

 

... vs a quad-based unit (pipeline executes one lane per thread for four threads per clock):

 

Blog4.png

 

The power efficiency and performance provided by the narrower than 32-bit types is still critically important for mobile devices, so Bifrost maintains native support for int8, int16, and fp16 data types which can be packed to fill the 128-bit data width of the data unit. A single 128-bit maths unit can therefore perform 8x fp16/int16 operations per clock cycle, or 16x int8 operations per clock cycle.

 

The Execution Engines: Thread State

 

To improve performance and performance scalability for complex programs, Bifrost implements a substantially larger general-purpose register file for the shader programs to use. The Mali-G71 provides 64x 32-bit registers while still allowing the maximum thread occupancy of the GPU, removing the earlier trade off between thread count and register file usage described in this blog: ARM Mali Compute Architecture Fundamentals.

 

The size of the fast constant storage, used for storing OpenGL ES uniforms and Vulkan push constants, has also been increased which reduces cache-access pressure for programs using lots of constant storage.

 

Data Processing Unit: Load/Store Unit

 

The load/store unit handles all general purpose (non-texture) memory accesses, including vertex attribute fetch, varying fetch, buffer accesses, and thread stack accesses. It includes 16KB L1 data cache per core, which is backed by the shared L2 cache.

 

The load/store cache can access a single 64-byte cache line per clock cycle, and accesses across a thread quad are optimized to reduce the number of unique cache access requests required. For example, if all four threads in the quad access data inside the same cache line that data can be returned in a single cycle.

 

Note that this load/store merging functionality can significantly accelerate many data access patterns found in common OpenCL compute kernels, which are commonly memory access limited, so maximizing its utility in algorithm design is a key optimization objective. It is also with noting that even though the Mali arithmetic units are scalar, the data access patterns will still benefit from well written vector loads, so we still recommend writing vectorized shader and kernel code whenever possible.

 

Data Processing Unit: Varying Unit

 

The varying unit is a dedicated fixed-function varying interpolator. It implements a similar optimization strategy to the programmable arithmetic units; it vectorizes interpolation across the thread quad to ensure good functional unit utilization, and includes support for faster fp16 optimization.

 

The unit can interpolate 128-bits per quad per clock; e.g. interpolating a mediump (fp16) vec4 would take two cycles per four thread quad. Optimization to minimize varying value vector length, and aggressive use of fp16 rather than fp32 can therefore improve application performance.

 

Data Processing Unit: ZS/Blend

 

The ZS and Blend unit is responsible for handling all accesses to the tile-memory, both for built-in OpenGL ES operations such as depth/stencil testing and color blending, as well as programmatic access to the tile buffer needed for functionality such as:

 

Unlike the earlier Midgard designs, where the LS Pipe was a monolithic pipeline handling load/store cache access, varying interpolation, and tile-buffer accesses, Bifrost has implemented three smaller and more efficient parallel data units.  This means that tile-buffer access can run in parallel to varying interpolation, for example. Graphics algorithms making use of programmatic tile buffer access, which all tended to be very LS Pipe heavy on Midgard, should see a measurable reduction in contention for processing resources.

 

Data Processing Unit: Texture Unit

 

The texture unit implements all texture memory accesses. It includes 16KB L1 data cache per core, which is backed by the shared L2 cache. The architecture performance of this block in Mali-G71 is the same as the earlier Midgard GPUs; it can return one bilinear filtered (GL_LINEAR_MIPMAP_NEAREST) texel per clock. For example interpolating a bilinear texture lookup for each thread in a four thread quad would take four cycles.

 

Some texture access modes require multiple cycles to generate data:

  • Trilinear filtering (GL_LINEAR_MIPMAP_LINEAR) requires two bilinear samples per texel and so requires two cycles per texel.
  • Volumetric 3D textures require twice the number of cycles than a 2D texture would require; e.g. trilinear filtered 3D textures would take 4 cycles, bilinear filtered 3D textures would take 2 cycles.
  • Wide type texture formats (16-bits or more per color channel) may require multiple cycles per pixel.

 

One exception to the wide format rule, which is a new optimization in Bifrost, is depth texture sampling. Sampling from DEPTH_COMPONENT16 or DEPTH_COMPONENT24 textures, which is commonly needed for both shadow mapping techniques and deferred lighting algorithms, has been optimized and is now a single cycle lookup, doubling the performance relative to GPUs in the Midgard family.

 

The Bifrost Geometry Flow

 

In addition to the shader core change, Bifrost introduces a new Index-Driven Vertex Shading (IDVS) geometry processing pipeline. Earlier Mali GPUs processed all of the vertex shading before tiling, often resulting in wasted computation and bandwidth related to the varyings which only related to culled triangles (e.g. outside of the frustum, or failing a facing test).

 

Blog5.png

 

The IDVS pipeline splits the vertex shader into two halves; one processing the position, and one processing the remaining varyings.

 

Blog6.png

 

This flow provides two significant optimizations:

  • The index buffer is read first, and vertex shading is only submitted for small batches of vertices where at least one vertex in each batch is referenced by the index buffer. This allows vertex shading to jump spatial gaps in the index buffer.
  • Varying shading is only submitted for primitives with survive the clip-and-cull phase; this removes a significant amount of redundant computation and bandwidth for vertices contributing only to triangles which are culled.

 

To get the most benefit from the Bifrost geometry flow is it useful to deinterleave packed vertex buffers partially; place attributes contributing to position in one packed buffer, and attributes contributing to non-position varyings in a second packed buffer. This means that the non-position varyings are not pulled into the cache for vertices which are culled and never contribute to an on-screen primitive. My colleague stacysmith has written a good blog on optimizing buffer packing to exploit this type geometry processing pipeline here: Eats, Shoots and Interleaves.

 

Performance Counters

 

Like Midgard, we are planning a document which maps the hardware performance counters present in the GPU back to the block architecture presented in this blog, and the relevant OpenGL ES and Vulkan concepts which could cause application performance inefficiencies. This document will be available shortly, and I will update this blog with a link when it's available.

 

Comments and questions welcomed as always,

Cheers,

Pete

 


Pete Harris is the lead performance engineer for the Mali OpenGL ES driver team at ARM. He enjoys spending his time working on a whiteboard with other engineers to determine how to get the best performance out of combined hardware and software compute sub-systems.

Optimizing Virtual Reality: Understanding Multiview

$
0
0

Introduction

As you may have seen, Virtual Reality (VR) is getting increasingly popular. From its modern origins on desktop, it has quickly spread to other platforms, mobile being the most popular. Every time a new mobile VR demo comes out I am stunned by its quality; each time it is a giant leap forward for content quality. As of today, mobile VR is leading the way; based on our everyday phone it makes it the most accessible and because you are not bound to a particular location and wrapped in cables, you can use it wherever you want, whenever you want.

 

As we all know, smooth framerate is critical in VR, where just a slight swing in framerate can cause nausea. The problem we are therefore facing is simple, yet hard to address. How can we keep reasonable performance while increasing the visual quality as much as possible?

 

As everybody in the industry is starting to talk about multiview, let us pause and take a bit of time to understand multiview, what kind of improvements one can expect and why you should definitely consider adding it to your pipeline.

 

Stereoscopic rendering

What is stereoscopic rendering? The scope of this post doesn’t cover the theoretical details behind this question, but the important point is that we need to trick your brain into thinking that the object is real 3D - not screen flat. To do this you need to give the viewer two points of view on the object, or in other words, emulate the way eyes work. In order to do so we generate two cameras with a slight padding, one on the left, the other on the right. If they share the same projection matrix, obviously their view matrices are not the same. That way, we have two different viewpoints on the same scene.

Fig. 1: Stereo camera setup.

Now, let us have a look at an abstract of a regular pipeline for rendering stereo images:

  1. Compute and upload left MVP matrix
  2. Upload Geometry
  3. Emit the left eye draw call
  4. Compute and upload right MVP matrix
  5. Upload Geometry
  6. Emit the right eye draw call
  7. Combine the left and right images onto the backbuffer

 

We can obviously see a bit of a pattern here as we are emitting two draw calls, and sending the same geometries twice. If Vertex Buffer Objects can mitigate the latter, doubling the draw calls is still a major issue as it is adding an important overhead on your CPU. That is where multiview kicks in, as it allows you in that case, to render the same scene with multiple points of view with one draw call.

 

Multiview Double Action Extension

Before going into the details of the expected improvements, I would like to have a quick look at the code needed to get multiview up and running. Multiview currently exists in two major flavors: OVR_multiview and OVR_multiview2. If they share the same underlying construction, OVR_multiview restricts the usage of the gl_ViewID_OVR variable to the computation of gl_Position. This means you can only use the view ID inside the vertex shader position computation step, if you want to use it inside your fragment shader or in other parts of your shader you will need to use multiview2.

 

As antialiasing is one of the key requirements of VR, multiview also comes in a version with multisampling called OVR_multiview_multisampled_render_to_texture. This extension is built against the specification of OVR_multiview2 and EXT_multisampled_render_to_texture.

 

Some devices might only support some of the multiview extensions, so remember to always query your OpenGL ES driver before using one of them. This is the code snippet you may want to use to test if OVR_multiview is available in your driver:

const GLubyte* extensions = GL_CHECK( glGetString( GL_EXTENSIONS ) );
char * found_extension = strstr( (const char*)extensions, "GL_OVR_multiview" );
if (NULL == found_extension)
{     exit( EXIT_FAILURE );
}

 

In your code multiview manifests itself on two fronts; during the creation of your frame buffer and inside your shaders, and you will be amazed how simple it is to use it.

glFramebufferTextureMultisampledMultiviewOVR = PFNGLFRAMEBUFFERTEXTUREMULTISAMPLEDMULTIVIEWOVR(eglGetProcAddress("glFramebufferTextureMultisampleMultiviewOVR"));
glFramebufferTextureMultisampledMultiviewOVR (GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, textureID, 0, 0, 2);

 

That is more or less all you need to change in your engine code. More or less, because instead of sending a single view matrix uniform to your shader you need to send an array filled with the different view matrices.

Now for the shader part:

#version 300 es

#extension GL_OVR_multiview : enable

layout(num_views = 2) in;

in vec3 vertexPosition;

uniform mat4 MVP[2];

void main(){
     gl_Position = MVP[gl_ViewID_OVR] * vec4(vertexPosition, 1.0f);
}

 

Simple isn’t it?

 

Multiview will automatically run the shader multiple times, and increment gl_ViewID_OVR to make it correspond to the view currently being processed.

For more in depth information on how to implement multiview, see the sample code and article "Using Multiview Rendering".

 

Why using Multiview?

Now that you know how to implement multiview, I will try to give you some insights as to what kind of performance improvements you can expect.

The Multiview Timeline

Before diving into the numbers, let’s discuss the theory.

Fig. 2: Regular Stereo job scheduling timeline.

 

In this timeline, we can see how our CPU-GPU system is interacting in order to render a frame using regular stereo. For more in depth information on how GPU scheduling works on Mali, please see Peter Harris’ blogs.

 

First the CPU is working to get all the information ready, then the vertex jobs are executed and finally the fragment jobs. On this timeline the light blue are all the jobs related to the left eye, the dark blue to the right eye and the orange to the composition (rendering our two eyes side by side on a buffer).

  

Fig. 3: Multiview job scheduling timeline.


In comparison, this is the same frame rendered using multiview. As expected since our CPU is only sending one draw call, we are only processing once on the CPU. Also, on the GPU the vertex job is smaller since we are not running the non-multiview part of the shader twice. The fragment job, however, remains the same as we still need to evaluate each pixel of the screen one by one.

Relative CPU Time

As we have seen, multiview is mainly working on the CPU by reducing the number of draw calls you need to issue in order to draw your scene. Let us consider an application where our CPU is lagging behind our GPU, or in other words is CPU bound.

Fig. 4: Scene used to measure performances.

 

In this application the number of cubes is changing over time, starting from one and going up to one thousand. Each of them is drawn using a different draw call - obviously we could use batching, but that’s not the scope here. As expected, the more cubes we add, the longer the frame will take to render. On the graph below, where smaller is better we have measured the relative CPU time between regular stereo (Blue) and multiview (Red). If you remember the timeline, this result was expected as multiview is halving our number of draw calls and therefore our CPU time.

Fig. 5: Relative CPU time between multiview and regular stereo. The smaller the better, with the number of cubes on the x-axis and the relative time on the y-axis.

Multiview in red, and regular stereo in blue.

 

Relative GPU Time

On the GPU we are running vertex and fragment jobs. As we have seen in the timeline (Fig. 3), they are not equally affected by multiview, in fact only vertex jobs are. On Midgard and Bifrost based Mali GPUs only multiview related parts in the vertex shaders are executed for each view.

In our previous example we looked at relative CPU time, this time we have recorded the relative GPU Vertex jobs time. Again, the smaller the better, regular stereo in blue and multiview in red.

Fig. 6: Relative GPU time between multiview and regular stereo. The smaller the better, with the number of cubes on the x-axis and the relative time on the y-axis.

Multiview in red, and regular stereo in blue.

 

The savings are immediately visible on this chart as we are no longer computing most of the shader twice.

Wrap it up

From our measurements multiview is the perfect extension for CPU bound applications, in which you can expect between 40% and 50% improvements. If your application is not yet CPU bound multiview should not be overlooked as it can also somewhat improve your vertex processing time at a very limited cost.

 

It is noteworthy that multiview is rendering to an array of textures inside a framebuffer, thus the result is not directly ready for the front buffer. You will first need to render the two views side by side, this composition step is mandatory, but in most cases the time needed to do so is small compared to the rendering time, and can thus be neglected. Moreover, this step can be integrated directly in the lens deformation or timewarp process.

 

Multiview Applications

The obvious way, and the one already discussed in this article, is to use multiview in your VR rendering pipeline. Both of your views are then rendered using the same draw calls onto a shared framebuffer. If we try to think outside the box though, it opens up a whole new field in which we can innovate.

Foveated Rendering

Each year sees our device screen getting bigger and bigger, our content becoming increasingly more complicated and our rendering time staying the same. We have already seen what we could save on the CPU side but sometimes fragment shaders are the real bottleneck. Foveated rendering is based on the physical properties of the human eye where only 1% of our eye (called the fovea), is mapped to 50% of our visual cortex.

 

Foveated rendering uses this property to only render high resolution images in the center of your view, allowing us to render a low resolution version on the edges.

Fig. 7: Example of an application using foveated rendering.

 

For more information on foveated rendering and eye tracking applications, you can have a look at Freddi Jeffries’ blog Eye Heart VR. Stay tuned for a follow-up of this blog on foveated rendering theory.

 

We then need to render four versions of the same scene, two per eye, one high, one low resolution. Multiview makes this possible by sending only one draw call for all four views.

Stereo Reflections

Fig. 8: A different reflection for each eye, demonstrated here in Ice Cave VR.

 

Reflections are a key factor for achieving true immersion in VR, however, as for everything in VR it has to be in stereo. I won’t discuss the details of real time stereo reflections here, please see Roberto Lopez Mendez’s article Combined Reflections: Stereo Reflections in VR for that. In short, this method is based on the use of a secondary camera rendering a mirrored version of the scene. Multiview can help us achieve the stereo reflection at little more than the cost of a regular reflection, thus making real time reflections viable in mobile VR.

Conclusions

As we have seen throughout this article, multiview is a game changer for mobile VR as it allows us to unload our applications and finally consider the two similar views as one. Each draw call we save is a new opportunity for artists and content creators to add more life to the scenes and improve the overall VR experience.

 

If you are using your custom engine and OpenGL ES 3.0 for your project, you can already start working with multiview on some ARM Mali based devices, like the Samsung S6 and S7. Multiview is also drawing increased attention from industry leaders. Oculus, starting from Mobile SDK 1.0.3, is now directly supporting multiview on Samsung Gear VR and if you are using a commercial engine such as Unreal, plans are in progress to support multiview inside the rendering pipeline.

Bitesize Bifrost 2: System coherency

$
0
0

In the first Bitesize Bifrost blog we introduced you to our new GPU architecture, Bifrost, and looked specifically at the extensive optimization and power saving benefits provided by clause shaders.  This time around we’re looking at system coherency, which allows the CPU and GPU to more effectively collaborate on workloads, and why this was considered an important focus for our newest GPU architecture.

SC.png

In earlier systems there was no coherency between the CPU and GPU. If you created data on the CPU but wanted the GPU to be able to work on it then the CPU would need to write the data to the main memory first. This allowed the GPU to see and access the data in order to process it. However, as the CPU operates with a cache, it was difficult to be certain that all data had been written to the main memory as opposed to simply being written to the cache. This meant the cache needed to be ejected to main memory and cleared (flushed) to ensure all the data was available to the GPU.

 

The issue this raises is that should you forget to flush the cache, you can’t be sure of the consequences. In some instances all the data would have been written out to main memory and you’d have no problem, or the data may be only marginally out of date and still not cause major issues. However, if the data is largely outdated you can experience serious, visible errors which are difficult to diagnose due to the different timings in the debugger affecting what’s in the cache. This makes it hard to reproduce the error and subsequently address it.

 

Additionally, as CPU cache sizes grow the cost of flushing them grows too. This can mean it’s only efficient to use the GPU for large, data heavy jobs which make the cache clean worthwhile and that the majority of jobs are therefore quicker and easier to keep on the CPU because of this overhead.

 

Our previous generation of GPU architecture, Midgard, used a concept known as IO coherency, which was originally used for input/output peripherals. This allows the GPU to check the CPU’s cache when it requests data from memory and effectively ask the CPU to confirm if it has the requested data in its cache. If it has, the GPU will copy the data into its own cache directly from the CPU cache, without going via the external memory. This way the memory latency is significantly reduced, as is external read bandwidth. However, this was a one-way system. Whilst the GPU also has caches of its own, in an IO-coherent system, the CPU cannot peek into the GPU’s caches.

 

As most of the required data in a graphics system flows from CPU to GPU rather than the other way around, this is an efficient tool for graphical tasks. Also, as GPU caches tend to be smaller, cleaning them at the end of a rendering pass is comparatively less costly and occurs at a single, regulated point in time making it less likely to be missed.

Capture.JPG

However, compute workloads can be vastly varying in size and the data needs to be able to travel between the CPU and GPU in both directions. This is why our new Bifrost architecture introduces full system coherency to products in the High Performance roadmap, allowing both the CPU and GPU to access each other’s caches. This eliminates the need for software to clean the caches and allows the CPU and GPU to collaborate on smaller jobs as well as larger ones. This extends the potential uses of the GPU’s compute capability and removes the risk of producing those difficult to detect errors that occur when a cache clean operation is missed.

 

As the Bifrost architecture is capable of scaling to 32 cores we’ve redesigned the level two cache to feature a modular design which is accessible by the cores as a single cache. This cache size is configurable to allow partners to balance just the right size and bandwidth for their specific system.

 

The single logical cache makes it simple for software to work with, both in the driver and on the GPU, so we can make the most of reusing cached data between shader cores. Partial cache line support means that we can effectively use it as a merging write buffer, resulting in fewer partial writes to DRAM and improving overall bandwidth utilization. The GPU also supports TrustZone™ memory protection, working to enforce restrictions on protected memory accesses.

 

As we look towards our next range of Bifrost based GPUs further advancements are on their way, so stay tuned and we’ll keep you up to date with the very latest in mobile graphics.


Indepth Comparison Review: Deepoon M2 vs Pico Neo, which one should you buy?

$
0
0

Indepth Comparison Review: Deepoon M2 vs Pico Neo, which one should you buy?

 

  Indepth Comparison Review: Deepoon M2 vs Pico Neo, which one should you buy? - Xiaomi Insider

After Oculus, HTC, Sony and Google have pioneered for Virtual Reality, there are two categories for VR Headset: Host Headset for powerful performance and Mobile Headset for portability.  As the market advances into niche, Mobile VR Headset rises up. Mobile VR Headset powers standalone processor and gets rid of wire, it is a all in one headset to give users total freedom to move and play. It is also called Mobile VR All In One Headset.

 

Indepth Comparison Review: Deepoon M2 vs Pico NeoIndepth Comparison Review:

Deepoon M2 vs

Pico Neo

 

The first VR All In One Headset is GameFace that features screen and processor for Android, and the first Mobile VR All In One Headset should be GearVR from Samsung.  The Deepoon M2 and Pico Neo are two new great Mobile VR All In One Headsets launched in early 2016. But what they differ from each other? We are going to figure out the differences.

Design Difference

 

Indepth Comparison Review: Deepoon M2 vs Pico NeoIndepth Comparison Review:

Pico Neo

 

Indepth Comparison Review: Deepoon M2 vs Pico NeoIndepth Comparison Review:

Deepoon M2

 

As seen above, Pico Neo features an antiglare black plastic material on forefront, and the rest of the device is wrapped with polycarbonate material. So it totally touches very cool. The logo of Pico lies on the left; while Deepoon M2 is divided into two eclipse surface with a Logo and a Breath Lamp on the forefront.

Pico Neo weighs 350 grams and Deepoon M2 weighs 398 grams. For more details on their specifications, please refer to the Pico Neo page and Deepoon M2 page on IWEARVR.

Happy 10th Birthday Mali!

$
0
0

Mali, the #1 shipping family of GPUs in the world, is celebrating 10 years with ARM this month! In honour of the occasion I’m going to take a look at some of the key milestones along the way and how Mali has developed to become the GPU of choice for today’s devices. Back in early 2006 Mali was just a twinkle in ARM’s eye, it wasn’t until June of that year that ARM announced the acquisition of Norwegian graphics company Falanx and ARM Mali was born.

Mali 10_RGB_Balloon.png

This of course is not the real beginning of Mali’s story. Before Mali became part of the ARM family she was created by the Falanx team in Trondheim, Norway. In 1998 a small group of university students were tinkering with CPUs when someone suggested they try their hand at graphics. By 2001 a team of five had managed to prototype the Malaik 3D GPU with the intention of targeting the desktop PC market. They scouted a series of potential investors and whilst there was plenty of interest, they never quite got the support they were hoping for in order to break into the market.

Capture.JPGOriginal (and shortlived) Falanx branding 2001, and their final logo, edvardsorgard's handwriting codified

 

Research showed them that the mobile market had the most potential for new entrants and that an IP model was potentially their best option. With that in mind, they set about building the GPU to make it happen. Having revised the architecture to target the smaller, sleeker requirements of the mobile market, the Falanx team felt the Malaik name needed streamlining too.

4 falanx founders.jpgThe four final Falanx founders

 

Mario Blazevic, one of the founders originally from Croatia, recognized “mali” as the Croatian word for “small” and this was deemed just right for the new mobile architecture. So, armed with the very first incarnation of Mali, they set about selling it. The prototype became Mali-55 and the SoC which featured it reached great success in millions of LG mobile phones. By this time they were six people and one development board and the dream was alive and well.

 

Meanwhile, ARM was very interested in the GPU market and had an eye on Falanx as a potential provider. Jem Davies, ARM fellow and VP of technology, was convinced the Falanx team’s culture, aspiration and skillset were exactly the right fit and ultimately recommended we moved forward. Over the course of a year, and a few sleepless nights for the Falanx team, the conversations were had, the value was established and the ARM acquisition of Falanx was completed on June 23rd 2006.

6 falanx whole team.jpg  The Falanx team at acquisition

 

In February 2007 the Mali-200 GPU was the first to be released under the ARM brand and represented the start of a whole new level of graphics performance. It wasn’t long before it became apparent that the Mali-200 had a lot of unexploited potential and so its multi-core version, the Mali-400 entered development. The first major licence proved the catalyst for success when its performance took the world by storm and Mali-400 was well on its way to where it stands today, as the world’s most popular GPU with a market share of over 20% all by itself. Mali-400 is a rockstar of the graphics game and still the go to option for power sensitive devices.

 

In late 2010 the continued need for innovation saw us announce the start of a ‘New Era In Embedded Graphics With the Next-Generation Mali GPU’. The Mali-T604, the first GPU to be built on the Midgard architecture, prompted a ramping up of development activities and Mali began to expand into the higher performance end of the market whilst still maintaining the incredible power efficiency so vital for the mobile form.

 

At Computex 2013 the Mali-V500 became the first ARM video processor and complemented the Mali range of GPUs perfectly. Now on the way to the third Mali VPU this is a product gaining more and more importance, particularly in emerging areas like computer vision and content streaming. Just a year on from that we were celebrating the launch of the Mali-DP500 display processor and the very first complete Mali Multimedia Suite became a possibility. Part of the strength of the ARM Mali Multimedia Suite is the cohesive way the products work together and fully exploit bandwidth saving technologies like ARM Frame Buffer Compression.  This allows our partners to utilise an integrated suite of products and reduce their time to market. Another key Mali milestone came in mid-2014 when the Mali-T760 GPU became a record breaker by appearing in its first SoC configuration less than a year after it was launched. By the end of the year ARM partners had shipped 550 million Mali GPUs during 2014.

 

This year saw the launch of the third generation of Mali GPU architecture, Bifrost. Bifrost is designed to meet the growing needs and complexity of mobile content like Virtual Reality and Augmented Reality, and new generation graphics APIs like Vulkan. The first product built on the Bifrost architecture is the Mali-G71 high performance GPU for premium mobile use cases. Scalable to 32 cores it is flexible enough to allow SoC vendors to customise the perfect balance of performance and efficiency and differentiate their device for their specific target market.

 

Today Mali is the number 1 shipping GPU in the world, 750 million Mali-based SoCs were shipped in 2015 alone. As the Mali family of GPUs goes from strength to strength I’d like to take this opportunity to wish her and her team a very happy birthday!

6 mali infographic.png

Building an Unreal Engine Application with Mali Graphics Debugger Support

$
0
0

In a previous blog we talked about running Mali Graphics Debugger on a non-rooted device. In this blog we will focus on how you can add support for Mali Graphics Debugger, on a non-rooted device, to your Unreal Engine application. The plan we are going to follow is very simple:

  1. Add the interceptor library to the build system
  2. Edit the activity to load the interceptor library
  3. Install the MGD Daemon application on the target device

 

For our first step, we will need to download a version of Unreal Engine from the sources available on Github. For more information on this step, please see Epic’s guide.

 

Once you have a working copy of the engine, we can focus on getting MGD working. You will first need to locate the android-non-root folder in your MGD installation directory, and your Unreal Engine installation folder (where you cloned the repository). Copy the android-non-root folder to Engine\Build\Android\Java\.

 

Next, we will need to change the Android makefile to ensure that the interceptor is properly packaged inside the engine build. For this, edit the Android.mk file under “Engine/Build/Android/Java/jni/”  add this line at the end, include $(LOCAL_PATH)/../android-non-root/MGD.mk. It should look like this:

LOCAL_PATH := $(call my-dir)

include $(CLEAR_VARS)
LOCAL_MODULE := UE4
LOCAL_SRC_FILES := $(TARGET_ARCH_ABI)/libUE4.so
include $(PREBUILT_SHARED_LIBRARY)


include $(LOCAL_PATH)/../android-non-root/MGD.mk

 

We will now specify to the main game activity that it needs to load the MGD library, locate GameActivity.java inside Engine\Build\Android\Java\src\com\epicgames\ue4\ and edit the onCreate function to look like so:

@Override
public void onCreate(Bundle savedInstanceState)
{     super.onCreate(savedInstanceState);     try {          System.loadLibrary("MGD");     }     catch( UnsatisfiedLinkError e ){          Log.debug( "libMGD not loaded" );     }     // create splashscreen dialog (if launched by SplashActivity)     Bundle intentBundle = getIntent().getExtras();         // Unreal Engine code continues there

 

Engine wise we are all set, we will now prepare our device. Install the MGD daemon on the target phone using the following command whilst being in the android-non-root folder:

adb install -r MGDDaemon.apk

 

Now before running your app you will need to run this command from the host PC (please ensure that the device is visible by running adb devices first):

adb forward tcp:5002 tcp:5002

 

Run the MGD daemon application on the target phone and activate the daemon itself:

mgddaemon.PNG

 

At that point you can connect it to MGD on the host PC, start your application and begin debugging it. Please refer to the MGD manual for more in-depth information on how to use it.

Following these steps you should be able to use MGD with Unreal applications on any Mali based platform. If you have any issues please raise them on the community and someone will be more than happy to assist you through the process.

MALI GPU driver for the imx6 freescale sabrelite

$
0
0

Hello,

 

I have tried to use the ARM DS-5 streamline to monitor the GPU on the imx6 sabrelite board, but I have encountered a problem with the driver MALI GPU for this board. I can't configure the performance counter related to the GPU on the DS5 tool.

So can someone help me to resolve this problem, to install the driver MALI GPU driver?

 

thanks for the help,

 

Mohamad

OpenGL Pitfalls : Distorted textures when packing texels colours components as GL_UNSIGNED_SHORT

$
0
0

I lost a few days wondering why some textures were completely distorted when loaded in OpenGL.

The thing is, they were only distorted when the colours components were packed as GL_UNSIGNED_SHORT_5_5_5_1 or GL_UNSIGNED_SHORT_4_4_4_4. When packing colour components as GL_UNSIGNED_BYTE (RGBA8888), the textures were loaded correctly.

 

Why ?

 

Since I'm using a small personal Ruby hack to generate raw textures from BMP with the desired colour packing, I really thought the problem was in the Ruby code. After verifying that the generated 4444 and 5551 textures were the exact counterpart of the working 8888 textures, and tracing the OpenGL glTexImage2D calls to be sure that the data were sent correctly, I wondered if a special parameter was to be passed to glTexImage2D after all.

 

Ok, maybe I missed something in the glTexImage2D manual...

 

Sure did...

 

width × height texels are read from memory, starting at location data. By default, these texels are taken from adjacent memory locations, except that after all width texels are read, the read pointer is advanced to the next four-byte boundary. The four-byte row alignment is specified by glPixelStorei with argument GL_UNPACK_ALIGNMENT, and it can be set to one, two, four, or eight bytes.

 

The solution

 

Either :

  • have textures with a width multiple of 4,
  • call glPixelStorei(GL_UNPACK_ALIGNMENT, 2); before calling glTexImage2D.

 

RTFM, as they always say !

Breaking news: VR is going mainstream!

$
0
0

In previous blogs we’ve looked at the scalability of the Mali™ family of GPUs which allows partners’ implementations to be tailored to fit all levels of device across multiple price, performance and area points. We’ve also taken a closer look at a high performance Mali implementation in Nibiru’s standalone VR headsets.

 

This time we’re exploring the other end of the Mali spectrum: Ultra low power. Today, the most shipped GPU in the world is still the Mali-400. Based on our original Utgard architecture, Mali-400 is the GPU of choice for devices where minimizing power consumption is key. Since the Mali-400 GPU was released, further optimizations have been applied in the design and implementation of subsequent Ultra-low power GPUs, Mali-450 and Mali-470.

Mali-450-Chip-Diagram.png

As you’ll know if you’ve read my previous blogs, VR places a whole lot of pressure on the power and thermal limitations of the mobile form factor. To ensure a great, immersive experience you need a solid framerate, high resolution and super low latency, amongst other things. To achieve this for top end content like AAA gaming can often require the highest performance hardware and a greater power budget than can be supported by a mid-range SoC. That, however, doesn’t necessarily mean you need to queue up and pay out for the next big flagship smartphone just to get on board with mobile VR.

 

In the tech industry it can often take a long time for high end content, use cases, or applications to become sufficiently well understood and developed to trickle down to the more mainstream device. The beauty of mobile VR is that the flexibility of the medium means you’re not locked out altogether just because you don’t want to spend on a top of the line device. In spite of the comparatively recent take off of VR products, every day use cases are already starting to become available and accessible to all on mainstream hardware. Whilst you wouldn’t want to try high end gaming (you’d almost certainly feel sick, if your system handled it at all) there are other, arguably more useful, ways in which the virtual world can change our lives.

 

Virtual spaces are where VR can meet mainstream devices to support a vast majority of business, social and communications needs. Whether you want to collaborate with overseas colleagues or just catch up with friends, virtual spaces allow you to interact in a more lifelike manner and can be supported in a much lower power budget than more complex content. The beauty of this concept is that there’s no need to navigate around a fully interactive virtual environment as you need to for VR gaming. Users can be limited to a smaller setting such a virtual boardroom, bar or café, which reduces the rendering complexity. This means you don’t need the highest performance SoC to support devices targeted at this type of content, as one of our innovative partners has recently shown.

 

Actions Semiconductor (Actions) is a leading Chinese fabless semiconductor company providing dedicated multimedia SoC solutions for mobile devices. Founded in 2001 and publically listed in 2005, Actions now has ~700 employees and one of the most informed and influential engineering teams in the industry.

 

One of their most recent products, the V700, is an SoC expressly designed for the cost-efficient end of the virtual reality market. Based on a 64-bit Quad-core ARM® Cortex®-A53 processor with TrustZone® Security system, graphics are provided by the powerful but highly efficient Mali-450 MP6 GPU. This provides maximized 3D/2D graphics processing delivering excellent rendering within a very small power and bandwidth budget, making it ideal for mid-range standalone VR devices.

v700.png

When asked why they chose the ARM Mali family of processors for this device Actions explained that it was very important to them to enable high quality VR content for the mainstream market. Not everyone is interested in spending vast sums of money on emerging technologies, particularly when there’s still some (in my opinion, misplaced) skepticism in the industry about the uptake of VR. Supporting VR content such as virtual spaces for social and business uses allows more people to access and utilize this exciting new technology. The superior power and bandwidth saving features of the products in the Mali Multimedia Suite make them the perfect choice for such a power hungry application as VR. In-built optimizations and synchronized technologies such as ARM Frame Buffer Compression and TrustZone allow our partners to achieve the high quality and security they need without limiting uptake to high-earning consumers.

 

It’s always great to see partners like Actions take such leaps in supporting exciting new Mali-based products and it will be interesting to watch the emergence of virtual spaces for the mainstream user in the coming months.

Unity’s first “Vulkan Renderer Preview” recommends the ARM® Mali™ Galaxy S7 as development platform

$
0
0

Unity-Vulkan-Logo.png On the 29th September, as promised at Google I/O, Unity released the first developer preview for their upcoming Vulkan renderer. Developers have been eagerly awaiting the release since Android Nougat was announced on the 22nd of August with Vulkan support as one of its key features.

 

Here at ARM we have been supporting graphics developers’ uptake of the Vulkan API since Khronos launched it publicly in February. ARM Mali graphics debugger and driver support were made available on release day and we’ve subsequently provided a set of educational developer blogs on using Vulkan, a Vulkan SDK and sample code. We also gave a series of talks and demonstrations on Vulkan at GDC, the world’s largest game developer conference, just a few weeks after the API was launched. All of our developer resources and content can be found here:  http://malideveloper.arm.com/vulkan.

vulkanbanner.pngFig 1. An example of a Vulkan demo developed by ARM

 

Developer resources and tools are not all we provide at ARM. Not only were we heavily involved in the development of Vulkan as part of Khronos’s Working Group, but we’ve also collaborated closely with Unity, the leading game engine platform downloaded by over 5 million game developers, to support this renderer release.

The results of this collaboration have been great news for mobile game developers as the ARM Mali-based Samsung Galaxy S7 (European version) has been recommended (and tested) as the first Android developer platform to run Unity’s initial Vulkan Renderer Preview. Developers can download the first preview release here: Get the experimental build from Unity’s beta page.

 

At this early stage of development, the main benefit Vulkan brings to the Unity engine is speed, thanks to the multithreading feature. Current mobile devices have multi-core CPUs and the ability to carefully balance workloads across these cores is key to achieving these improvements. The increase in power efficiency is realized by the balancing workloads across several CPUs to reduce voltage and frequency, while the increase in performance and speed is attributable to the ability to use the full compute resource of the CPU cores.

We in the ARM Mali team are pleased to be able to support such important industry advancement and look forward to seeing what our broad ecosystem of developers can do with the first Vulkan Renderer on Unity!

 

To know more about Unity's Vulkan Renderer Preview: https://blogs.unity3d.com/2016/09/29/introducing-the-vulkan-renderer-preview/


Initial comparison of Vulkan API vs OpenGL ES API on ARM

$
0
0

 

Reducing power consumption and optimizing CPU utilization in a multi-core architecture are key to satisfy the increasing demand of delivering sustained high-quality graphics meanwhile maintaining a lasting battery life. The new Vulkan API facilitates this and this blog covers a real demo recording showing the improvements on power efficiency and CPU usage that Vulkan provides compared to OpenGL ES.

 

Vulkan unifies graphics and compute across multiple platforms in a single API. Up to now, developers had OpenGL graphics API for desktop environments and OpenGL ES for mobile platforms. The GL APIs were designed for previous generations of GPU hardware and whilst the capabilities of hardware and technology evolved, the API evolution took a little bit longer. With Vulkan, the latest capabilities of modern GPUs can be exploited.

 

Vulkan gives developers far more control of the hardware resources than OpenGL ES. For instance, memory management in the Vulkan API is much more explicit than in previous APIs. Developers can allocate and deallocate memory in Vulkan, whereas in OpenGL the memory management is hidden from the programmer.

 

Vulkan API has a much lower CPU overhead compared to OpenGL ES thanks to supporting multithreading. Multithreading is a key feature for mobile as mainstream mobile devices generally have between four to eight cores.

 

On the left hand side of the video image, you can see the OpenGL ES CPU utilization at the bottom. The OpenGL ES API makes a single core CPU work very hard. On the right hand side, you can see the difference the Vulkan API brings with improved threading. The multithreading capability allows the system to balance the workload across multiple CPUs and to lower the voltage and frequency as well as enabling the code to run on little core CPUs.

 

OpenGL ES - Vulkan comparison - FINAL.png

Fig.1 Video screen capture, showcasing CPU utilisation

 

With regards to energy consumption, the video shows an energy dial on top which demonstrates the improved system efficiency that Vulkan brings. If we run the sequence up until the end and this is measured in a real SoC, the multithreading benefits bring a considerable saving in energy consumption. Even at this very early stage of software development on Vulkan, we could see an overall system power saving of around 15%.

OpenGL ES - Vulkan comparison - FINAL 2.png

Fig.2 Video screen capture, showcasing overall system power saving

 

To get you started using the Vulkan API, there is a wealth of developer resources here, from an SDK with sample code, to tutorials and developer tools to profile and debug your Vulkan application.

China’s tech industry speeds ahead

$
0
0

It’s no secret that everything in China moves fast. At ‘China Speed’ in fact. From building a skyscraper in 19 days, to the fastest trains in the world, China is all about embracing change and getting things done in the shortest possible time. In the technology industry this trend continues with innovation and technology uptake at an all-time high.  Wherever you look, people are talking about it.

 

In the past, China’s tech industry has been compromised by perceptions that it was a follower in the market and primarily focused on replicating ideas from global leaders.  Although some of this might have been true in the past, it was parallel with a rapid technology learning curve.  More recently, it’s easy to see that China has learned quickly and now has an intensity of innovation and development that puts it on a par with other major players and has enabled a technology revolution of its own. These days, from CNN to Bloomberg, all the major influencers are acknowledging China’s leadership in tech and potential to change the world, with lists of ‘Top Chinese Tech Companies’ rife. China is not only keeping up with global tech trends, but is surging ahead to break new ground and it’s doing it at China speed.

 

Alibaba, a member of ARM’s new family, the Softbank Group, is possibly the best known name in Chinese tech and indeed, one of the most famous e-commerce giants in the world. Its share price is around 30 times where it was a little over a decade ago and its services have expanded to a point where they rival any standard retail infrastructure. Baidu, China’s answer to Google, has fast become the country’s favourite search engine and has already expanded into food services, online payment systems and much more. Tencent is another name you’ve likely heard. Launching WeChat, the second incarnation of their QQ instant messaging service in 2010, it today boasts over 700 million users and acts as a one stop social shop for chatting, image sharing, news, payment and so much more.  All of these companies boast a now-established trend in innovation with rich R&D activities certain to continue to impress.

 

ARM partner Huawei is another perfect example of the power of China speed. With sales jumping 40% in the first half of 2016 and smartphone shipments up 25% in the same period, the meteoric rise of China’s premium smartphone maker is nothing if not impressive. On top of that, with flagship devices like the Mali-T880 powered P9, Huawei has managed to break into the global high end premium mobile market. This ability to compete with the world’s leading technology brands has allowed them to become a major driving force, with sales outlets alone up 116%.

CHina mkt.jpg

China’s smartphone uptake is still on the rise, with >62% of mobile users adopting them compared to ~55% in Europe

 

Not only is the smartphone market one to watch in China but mobile gaming is growing at breakneck speed too. In Q2 2016 the market reached an estimated RMB 24.4 billion (US$3.66 bn) which equates to a phenomenal 120% increase year on year. It’s also predicted that mobile gaming will continue to grow and take a larger share of the overall gaming market, from 33% in 2015 up to around 48% in 2019.

 

So why is China so much quicker off the mark with new and emerging tech and how have they turned around their image to become the ones to watch for new and exciting products? The answer is of course not a simple one, as multiple factors must come together to make it happen, but an example is the very different approach they take to projects when compared to the West.  While the West has a tradition and preference for exhaustive analysis and planning, China doesn’t wait. They see a great idea and an opportunity to develop it and they leap on it. They’re bold, brave and unafraid of risk and are therefore paving the way in new and emerging technology areas.

huawei-p9-1.jpg

Huawei's P9 smartphone ranks amongst the top premium smartphones of 2016

 

This assertiveness and drive is key in the technology industry, companies need a new product on the shelves before anyone’s even realised they want it. Not only that, but whether they realize it or not, it’s also important to the consumer. China ships the most smartphones worldwide and consumers expect to upgrade relatively frequently, but don’t want to pay a fortune. The market is so big that there is extensive competition and end users are therefore able to demand more premium performance even from a mainstream priced device such as an internet TV box or a smartphone. This puts pressure on all levels of the supply chain to design, implement and release the next bigger and better offering faster and cheaper.

 

It’s this need for speed which makes the ARM Mali Multimedia Suite such a great fit for the Chinese tech industry. Not only have Mali products led a relentless march in device capability but the flexibility and scalability of the Mali range means our partners can quickly achieve the right balance of performance and efficiency for their particular market needs. Indeed, records were broken in 2014 when Chinese semiconductor company Rockchip were able to produce the first Mali-T760 based silicon just a matter of months after the GPU launched. It’s not just about GPUs though and the ability to address an increasing range of media capability and functionality can be key to whether a product speeds ahead or idles on the side lines. This is why our pre-optimized Mali Multimedia Suite of GPU, Video and Display processors work together seamlessly. This not only reduces risk and implementation time when designing a new product but also allows our partners to fully exploit unique features and significant bandwidth savings through technologies like ARM Frame Buffer Compression (AFBC).  In addition to the technological benefits, ARM is also able to tap into the expertise of our rich ecosystem of software, middleware and application partners to help strengthen the offerings of a new or emerging licensee. Mali no 1.png

 

Given the rich cultural history, incredible rate of change and huge potential of the Chinese market, it’s no surprise that the Chinese government are keen to secure the country’s tech supply chain. It’s important for them to ensure it develops rapidly and is capable of capitalising on this opportunity and we can see this in such things as the governments ‘made in China 2025’ initiative. 

ARM and the Mali team are happy to be able to support the flourishing of such a strong new ecosystem by working closely to help provide the products and flexibility to make that happen. Our range of dedicated multimedia products, perfectly aligned to support a vast range of configurations and designs, is allowing us to help our partners develop the scalability and flexibility required to reduce time to market and continue leading the way in the future of tech.

The Mali™-G51 GPU brings premium performance to mainstream mobile

$
0
0

It’s not just high-end mobile devices which need to work like a dream, we expect a certain level of performance and a relatively advanced feature set even from a more modest, mainstream handset budget. Whilst it’s the top of the line products which often garner the most media attention, a large proportion of the global smartphone market is based on mainstream rather than premium devices. In manufacturing the high volume chips required for this market segment, the cost to the system contributed by silicon area will have a big impact on the final cost. In order to retain quality performance points within a mainstream budget, silicon area is therefore one of the key areas of focus for cost reduction.

 

The second GPU to be built on our innovative new Bifrost architecture, Mali-G51 is the first Bifrost GPU in ARM®’s High Area Efficiency roadmap. Exploiting the very latest ARM advances in bandwidth and power efficiency, combined with all-important area reduction, Mali-G51 is our most cost efficient GPU to date with up to 60% more area efficiency than Mali-T830 and 60% more energy efficiency.

 

Designed to bring premium experience to the mainstream device, Mali-G51 supports all the key everyday use cases from augmented reality (AR) and virtual spaces to casual gaming and a smooth, fluid user interface.

1.6x.png

Bringing Bifrost mainstream

In May 2016 you may have seen the launch of the first of our Bifrost based GPUs, Mali-G71. This propelled the new Bifrost architecture into the premium mobile space with the highest performance capabilities designed to support VR gaming and other complex, power hungry content. This doesn’t mean however, that Bifrost is all about the biggest and best of premium mobile capability. Designed from the outset to scale across all levels of device, Bifrost can be carefully deployed to achieve the perfect performance point for any level of product.

 

Targeting the mainstream smartphone market, Mali-G51 brings the Bifrost architecture to a different market tier with features and capability specifically tuned to the area and power limitations of mainstream mobile. Individual features of the underlying architecture have been analysed and assessed against real graphics applications in order to ensure mainstream graphics needs are prioritized for a well-balanced design.

 

Bifrost’s low level instruction set, which gives control to the compiler, has been further optimized for Mali-G51 and specifically rebalanced for power sensitive graphics workloads. Not only that, but a new dual-pixel shader core has been implemented to double texel and pixel rates and can be used asymmetrically with a uni-pixel shader core in order to access even further configurability and versatility.

 

bifrost.png

A step change in efficiency

It’s no secret that there are challenges inherent in the mobile form factor that aren’t present in other types of device. Not only do we not have the PC’s lovely big fans cooling everything down, but we also don’t have a handy mains power connection running continuously. Every component in an SoC needs power and in using it, creates heat that the device has to dissipate. This heat dissipation has actually become harder in newer mobile devices where the bevel is getting smaller and ever more of the surface area is taken up by the screen which doesn’t have the same cooling capacity as the metal case. Reducing the power consumed by the GPU frees up this power to be used elsewhere in the SoC and decreases the thermal pressure the GPU adds to the device. It also means less power is consumed from the system’s total budget, a key requirement not only for a smooth experience but also for smartphone users to get the most from their device’s battery life.

 

AFBC 1.2

Another exciting feature of the Mali-G51 GPU is the addition of the newest version of our advanced bandwidth saving technology, ARM Frame Buffer Compression (AFBC) 1.2. Latest optimizations include improved GPU performance in bandwidth limited scenarios as well as improved display processor performance for rotation use cases.

 

AFBC 1.2 also improves compression for constant colour blocks, providing further significant savings for user interface and 2D graphics applications. Fully backwards compatible with former versions, AFBC is therefore available across the full Mali Multimedia Suite (MMS) of Graphics, Display and Video processors with the newly launched Mali-V61 video processor. System wide optimizations like AFBC 1.2, Adaptive Scalable Texture Compression, and ARM TrustZone® allow all parts of the MMS to work seamlessly together, optimizing performance and bandwidth reduction and reducing our partners’ time to market.

 

Virtual spaces & AR

Virtual reality (VR) is one of the more demanding of today’s use cases when it comes to the burden it places on a mobile system. To ensure a fully immersive experience in VR gaming requires extensive power and performance optimization. This however, doesn’t mean that you can only join the virtual world by purchasing top of the line devices. Low power VR is becoming a market segment all of its own and is facilitating some of the arguably more useful, every day virtual interactions.

 

‘Virtual Spaces’ are how we refer to virtual environments that don’t require the fully interactive, highly reactive elements of AAA VR gaming. Virtual spaces are finite environments that can support interactive elements, like the people within them, whilst keeping the surroundings static and therefore minimizing GPU workload. Virtual spaces represent the obvious business application for VR, where you can collaborate with teammates, colleagues and customers across the globe in a much more realistic manner in a virtual boardroom, conference suite or even breakout area.

Picture1.png

 

Socially, virtual spaces allow you to meet up with friends in comfortable surroundings and talk face to face, without ever leaving your sofa. The ability to look at the person talking to you and respond in real time makes the distance between separated loved ones seem much easier to bridge.

 

The fact that these low power VR solutions can now be supported by mainstream area and energy efficient GPU’s like Mali-G51 means they are accessible to a much wider audience. Businesses no longer need to be constrained by a tighter tech budget and everyday consumers can experience the future of virtual communication without breaking the bank.

 

In designing the Mali-G51 GPU the ARM Mali team are excited to have brought such significant savings to such an important area of the market and we look forward to seeing them appearing in next generation mobile devices in 2018.

Mali™-V61 – Premium video processing for Generation Z and beyond

$
0
0

Earlier in 2016 we gave you a sneak preview of our brand new Mali video processor, then codenamed Egil. There was a great deal of interest in this exciting new product, not least because of some of the ground-breaking features included and the industry has been impatient to hear more. Well, the big day has finally arrived and we can now announce the official launch of the Mali-V61 video processor.

Vid_diagram_V61_expanded.png

In developing Mali-V61 we’ve continued to take an alternative approach to the standard video  processor which tends to target a specific codec or a very limited selection. Instead, we’ve developed a single, unified video solution which controls all the necessary features of the relevant codecs through firmware with all pixel processing handled by specified hardware blocks.  Our firmware is controlled through a single API and we currently provide reference drivers based on the latest Android releases along with a host interface specification. This not only allows flexibility in SoC design but also provides a multi-standard solution to the industry.

 

Something we consider of high importance through all of our IP development is the need to not only support today’s high end content and devices, but also to be able to adapt to the challenges and additional complexity the future of the industry may bring. With this in mind, we have implemented significant enhancements to our HEVC encode capability as well as creating support for VP9 encode and decode, making Mali-V61 the first multi-standard video processor to be contained in a single IP block.

 

As well as advanced encode and decode options, we provide an android reference software driver. It handles numerous tasks including the setup of a video session as well as dynamic power gating and memory allocation. The built in core scheduler manages multiple encode/decode streams and maps single or multiple video streams across multiple cores in order to provide maximum performance.

dots.png

 

Video conferencing and ‘chat’

The new Mali-V61 video processor’s flexibility in handling multiple encode/decode streams makes it the ideal solution for a range of important use cases. Two-way, real-time video communication is a rapidly increasing use case, whether in more formal video conferencing applications or the growing range of video chat applications that are now prevalent. The complexity required to simultaneously handle multiple video streams from different devices, locations and performance points often means that there are serious compromises in the quality of the final video output. Mali-V61 is able to efficiently handle all of these streams and allocate just the required amount of bandwidth in order to retain the maximum possible video quality and provide a superior visual experience to the end user. This avoids all the awkward delays and accidental interruptions we saw with early mobile video conferencing capabilities.

 

Video capture

With the rise of 4K displays has come a need for higher quality content to exploit them to their best advantage. Mali-V61 supports 4k video capture and streaming to a larger screen device, such as your home TV, as well as sharing your content directly with friends or on social media. This allows you to take, watch and share higher quality videos without the need for external hardware.

Picture2.png

Configurability and scalability

We’ve designed Mali-V61 to be sufficiently configurable to enable multiple levels of use of the video IP whilst retaining the same high quality encoding and decoding performance. This enables our partners to differentiate based on the requirements of their target devices. They can take into account considerations such as the preferred resolution and frame rate to be supported and whether they want to enable encode or offer only decode capability, for example if they are producing a video player without a camera. Partners can also design their configuration based on whether or not they want to support 10-bit and 8-bit video, or just 8-bit, as well as if they want to support all video codecs, or just a subset. This range of options allows partners using Mali-V61 to deliver very specific points of differentiation for their products, providing them with far greater control.scalable.png

 

Mali Multimedia Suite

Following on from the launch of the Mali-DP650 display processor in January 2016 and the Mali-G71 GPU in May 2016, Mali-V61 provides the third element to complete the latest high performance ARM® Mali Multimedia Suite configuration, designed for next generation premium devices. The entire Mali Multimedia Suite comes pre-optimized to work together to produce the highest quality user experience whilst exploiting the latest advances in energy efficiency and bandwidth saving.

 

A new version of one of ARM’s top bandwidth saving technologies, ARM Frame Buffer Compression (AFBC) has also been adopted for the Mali-V61 VPU as well as the newly released Mali-G51 GPU. This latest version of AFBC is fully backwards compatible while advances in this technology provide a new level of efficiency across the full Mali Multimedia Suite of products.

Tech Symposia 2016 – Shanghai keynotes

$
0
0

The first of this year’s ARM® Tech Symposia kicked off this morning on a rather damp day in Shanghai. With the rain coming down outside it was a perfect opportunity to check out our latest demos before convening in the ballroom for the first of the day’s presentations. Allen Wu, EVP and President of ARM China welcomed us to the event and discussed ARM’s commitment to supporting the development of China’s technology ecosystem. He then handed over to Ian Ferguson, VP Worldwide Marketing, for a deeper look at what we can expect from this year’s events and the future of ARM and our partners under our new umbrella, the Softbank Group. Ian talked about the opportunities this collaboration has provided for the future of automation and IoT technology as well as stressing the importance of continuity under this new model and ARM’s commitment to ensuring business as usual for all our partners and colleagues.

20161030_155533[1].jpg

In terms of the opportunities for automation and IoT technology, China is ahead of the game and taking the lead in accelerating ARM based server infrastructure. For example, parking spaces at the new Disneyland Shanghai have been enabled by Huawei for smart monitoring and reporting of capacity and usage patterns. This accelerated adoption will support the faster deployment of IoT based systems, the importance of this can be seen in a study by the EIU IoT Business Index which showed that in 2013 around 90% of businesses surveyed expected to be using IoT in 2016. Now, 75% of those are indeed seeing the impact of the IoT revolution on their business, with key focuses around security, cost and establishing a sufficient knowledge base to truly enable the industry’s growth.

 

Ian discussed security requirements in the context of recent incidents such as the covert deployment of thousands of DVRs to simultaneously attack DNS servers, bringing down huge sites like Twitter and Spotify whilst appearing to continue functioning as normal. Silent attacks such as this highlight the need for security technologies like ARM’s TrustZone® in protecting both content and devices. Security is not the only concern for a connected world, with a strong ecosystem required to facilitate sustainable growth. China’s ecosystem is not the same as that seen in the US, with initiatives like OPNFV providing a shift from proprietary hardware to open source software allowing our China partners to compete on a global scale. The distribution of the ARM powered BBC MicroBit to UK schoolchildren can be expanded to the Chinese education system to grow the next generation of programmers with open source platforms, software and specifications.

 

We’re seeing developments too in automated vehicles. Whilst widespread use may still be a way off, with safety critical implications under careful consideration, recently in the US a beer truck successfully travelled 120 miles to deliver its important cargo without a driver in sight. Drones too are becoming more valuable, with Amazon trialling them for deliveries in the air near our offices in Cambridge. These too require additional layers of technology to sustain their use. It’s not enough to be able to use GPS to program their destination, they need computer vision combined with machine learning in order to assess and avoid hazards, connectivity for real time updates and safety critical mechanisms to ensure security and protection for both content and consumer. Healthcare, too, is beginning to benefit from advances in IoT based applications and microprocessors, with innovative early detection initiatives emerging to detect cancer cells through smell sensors to ensure early treatment. Elsewhere, sound sensors on streetlights in dangerous areas can immediately alert police to gunfire in the vicinity without the delay of waiting for an emergency call from a member of the public. With such a vast range of applications for connected devices and automation, it’s clear that the IoT revolution really is upon us and it’s great to see the huge leaps we and our partners are taking in making this happen.

 

Next up was the product keynote, with Noel Hurley, VP & GM of the Business Segments Group, discussing the rapid uptake of ARM’s latest Premium Mobile products, Cortex®-A53 CPU and Mali™-G71 GPU, launched in May 2016. With these products starting to appear in devices it’s great to see the annual product launch cycle has been able to benefit our partners’ time to market. The product keynote was a key milestone for us in the Mali multimedia and graphics team, with Noel announcing the exciting launch of not one, but two new products into the ARM Mali Multimedia Suite. First up was the Mali-V61 video processor (VPU), which might be familiar to some of you as it was previewed earlier this year under the codename Egil. Now fully fledged, our brand new video processor boasts better than ever scalability and configurability as well as high quality VP9 encode and vast improvements to HEVC encode. Designed to support video across all device types and tiers from smartphones and drones, to cameras, Mali-V61 is looking to be the go-to IP for next gen video apps.

tsday11.jpg

Hot on the heels of the Mali-V61 VPU was the Mali-G51 GPU, the first mainstream GPU to be built on our exciting new Bifrost architecture. Launched earlier this year with the Mali-G71 high performance GPU, Bifrost has undergone some specialized optimizations in order to perfectly balance quality graphics performance with area and energy efficiency to allow Mali-G51 to meet the needs of the mainstream device market. Not only is VR reaching the mainstream in areas like Virtual Spaces, but the development of new APIs like Khronos’ Vulkan, as well as ever-growing screen resolutions, have been instrumental in creating the need for high performance graphics capability within a mainstream silicon budget.

TSday12.jpg

Not to be forgotten was the recent acquisition of Apical which allowed us to add computer vision and assertive camera and display technologies to our Imaging and Vision portfolio. Read more about the importance of Computer Vision to the future of technology here. Back on to IoT, Noel filled us in on the recent launch of the IoT subsystem block enabling a fast, secure route all the way from chip to cloud. nandannayampally 's blog explains why this was such a significant area of focus for us and what it brings to the IoT environment.

 

The next stop on the Tech Symposia tour takes us to Beijing where I’ll be bringing you all the highlights from the more technical presentations across the three streams of Next generation processing, Smart embedded & IoT and Intelligent implementation and infrastructure. See you there!

Viewing all 266 articles
Browse latest View live