Quantcast
Channel: ARM Mali Graphics
Viewing all articles
Browse latest Browse all 266

ARM’s experiences with the drawElements Quality Program

$
0
0

Here at ARM we continuously work to increase the quality of our driver software. Thorough testing is a significant proportion of our driver production process and its efficiency enables us to deliver drivers that meet and exceed our partners’ quality expectations  sooner than we would otherwise be able  to. You might have seen the announcement made in fall 2013: ARM Ltd., the leading semiconductor intellectual property (IP) supplier, expands its quality assurance processes with the adoption of the OpenCL™ and OpenGL® ES 3.0 test modules of the drawElements Quality Program – the dEQP™. This subscription was all part of ensuring that as the graphics industry evolves, the drivers that we deliver continue to be of the highest standard possible.


Based on our experience with the ARM® Mali™-400 GPU series we certainly had confidence in our test coverage for OpenGL ES 2.0 which we built up over multiple releases. Despite the fact that the ARM Mali-T600 GPU series is a radically new design comprising a unified shader architecture, the pool of test cases targeting the API  version 2.0  was easy to re-use right from the start. But for OpenGL ES 3.0, being a new API, there was barely anything out there - real world OpenGL ES 3.0 content was still to come. We based our initial testing on the conformance test package from Khronos and, to a much larger extent, on in-house testing of the new features. However, we wanted to take the quality of the driver higher than these two processes allow in order to exterminate any stubborn bugs. To do this, an external testing suite was in order.  Why? Well, it’s good that you asked.

 

For brand new features our in-house testing is typically what you might refer to as "white box" testing. Engineers familiar with the driver’s details develop targeted tests against new features, based on the OpenGL ES 3.0 specification from Khronos. If you want to factor in the inflow of new colleagues one might be willing to shift it into the "gray"-zone, but certainly the tests are not of the "black box" kind. Whereas such internal driver knowledge makes it possible to write tests targeting even very specific driver behaviour it ends up creating a one-sided view of driver performance. Engineers just "know more than they should" for developing black-box tests. Yet such black-box tests are vital to perform because the ultimate end-user, our partner, will not have the same intricate knowledge of ARM software as our engineers and so their actions and experience will be quite different. 

 

Still, one might raise the question “Your driver passed the conformance tests - what else is left to test then?” There's a short summary written up here describing how one obtains confidence on a per-feature basis from the conformance test package. But ARM is more interested in the combination of features - which is what one typically uses in real world applications – and this has less coverage. So even though we passed conformance, if we did not perform additional in-house testing there could be a higher number of bugs going out and impacting our partners and our only method for finding and fixing them would be partner feedback. Hardly an ideal situation.


So, what were our expectations when licensing an external test suite, adding more than 30,000new test cases to our in-house testing? Pass all of them? That would have been really cool and deserved plenty of cake (our replacement for beer here in Norway). The reality was that, when running the latest Mali-T600 GPU driver on an Exynos 5250 based platform running Linux with dEQP version 2013.4.1, we happily passed 98.5% of the OpenGL ES 3.0 functional test group and an even larger part for OpenCL 1.1, although we did not pass all of them   - which, at the very least, proved to us the value in the drawElements’ testing suite.


If your testing group tells you that there are roughly a hundred new cases waiting for further investigation, your first response certainly is not "Yeah - great!". But thinking a bit more broadly, maybe it should have been.  Getting over a hundred failing test cases "in" all of a sudden certainly has an impact on daily work and schedules. But that's what we bought them for - to prevent partners and developers from discovering these issues over time. It's better to see all potential issues in one go than waiting for them to trickle in across one or two years from the partner or even the developer community.  Within ARM’s release schedule which is, due to the target market, quite different from what you might be used to from your desktop GPU vendor, there is no room for a "quick-fix" once a driver is out. So everything we find and fix in our products upfront is very important to ARM and our partners.


dEQP provides challenging test cases for a variety of areas. The ones most interesting to us are:


"Exhaustive positive and negative testing of individual API features"

The situation of positive testing is quite clear: if our driver did not allow something the specification requires, we have a severe bug. Luckily for us we passed that hurdle well.

On negative testing the situation is a bit different: In the case that our driver allows things it should not, is this really a problem? Isn't that perhaps more of a feature, given it works in a sane way? Actually, it is a problem as it causes fragmentation on the market and leads to the unfortunate situations of "But this works with Vendor A!".  Those issues will hit developers when they start to migrate from a single development platform into the wild world to test their apps. If "outside the spec" behaviour is considered to be valuable it can always be properly captured in an extension specification.

Similarly, negative testing involves testing error handling by executing API calls which are not even supposed to work due to, for example, using the wrong parameters. Even though it is not standard practice to  base  application behaviour on the specific error code returned, we recognize the importance of correct error codes to assist debugging problems during development (you might want to read further about the debugging extension which eases the situation a bit). Long story short – with the help of dEQP we greatly improved our ability to return the right error codes.

 

Stress tests”

There is one error an application should always be aware of: The famous GL_OUT_OF_MEMORY. This one is raised whenever there are no system resources left to successfully complete an API call. One scarce resource is the amount of available (and free to use) memory. The drawElements’ test suite covers that part by forcefully driving the system into a low memory state to check how stable the driver can handle the situation.

As we saw during testing, this is a difficult situation to operate in. The Android™ OS, for example, has a low-memory process killer triggering at a higher threshold than the one on plain Linux, sometimes not even leaving time for the application to properly close before it is killed by the system underneath.  Passing these tests on every platform is a challenge, but a challenge that we are overcoming more rapidly with the help of drawElements’ testing suite.


"Precision tests"

Due to the way precision is specified by OpenGL ES 3.0, testing for it is a challenge. Rounding behaviour and INF/NAN handling are implementation defined and only the least amount of precision to maintain is specified. We realize it is challenging to come up with stable test cases as soon as they touch any of these "implementation defined" areas. And a few tests do  touch on these areas . So when it comes to answering the question of whether unexpected (failing) test results are still valid results within the constraints of the specification we spent quite some time verifying that our driver, the GPU compiler backend and finally the GPU all treat 'mediump' and 'lowp' precision qualifiers as mandated by the specification. In the end, the effort between us and drawElements was well spent on those cases. For example, 'mediump' is a great tool for saving memory and bandwidth and to reduce cache pressure where ever possible. But bear in mind that it is up to the developer to ensure that calculations stay within the least limits. For more details I refer you to Tom Olson's detailed series.


"Cut-down real-world use cases like shadow algorithms, post-processing chains for SSAO, complex transform feedback chains"

These test case are the most difficult ones to investigate due to their complexity. We take them one by one and as we build up confidence in the earlier areas we get better and better at pointing out which assumptions might be wrong in the complex test cases.  Sometimes we might even consider test cases as "overly complex/complicated to achieve a specific goal", but the question of "why should one do this" is no excuse if they don't work on our driver.

 

So far ARM has closely investigated around 130 test failures reported by drawElements’ test suite for OpenGL ES 3.0 which were neither covered by ARM’s existing test set nor by the Khronos conformance test suite. Compare that number to the amount of passed tests, which is over 35000! Roughly half of these failures were real bugs in our drivers, whereas the other half we found targeting behaviour outside of the specification. And what happened with those tests we found to be in conflict with the specification? Well, there are excellent engineers working at drawElements who take feedback seriously and certainly won't accept it blindly. Brief e-mail exchange was usually enough to decide whether the fixes were needed in the test case or the implementation. If a case is really ambiguous and not easily decidable based on the current specification we can raise the problem together within the Khronos Group to find an agreement amongst all participating members.


Last but not least - such a big test suite is also valuable for automated regression testing. Whenever you pass you must remember it was just a snapshot of your code base which was okay. New features, optimizations, refactoring, bug fixes - all carry a risk of breaking something unwontedly. With dEQP now part of our regression testing safety net, ARM’s confidence in the quality of our releases is even stronger.


Viewing all articles
Browse latest Browse all 266

Trending Articles