In 2016 so far there seems to be a big focus on automation. The rise of the Internet of Things is part of the reason for this and it’s opening our eyes as to how many aspects of our everyday lives can be streamlined. Simply by allowing machines, sensors and technologies to ‘talk’ to each other, share data and use it to make smart decisions; we can reduce the direct input we need to have to keep our world moving.
Home automation is one of the first things people think of but it soon to leads to discussions on smart agriculture, automated office management and remote monitoring and maintenance of vehicles and assets. Not only that, but an area garnering a whole lot of interest is smart automotive. We know that many of these examples, in order to operate safely and effectively, need to be able to take in enormous amounts of data and analyse it efficiently for an immediate response. Before your home can decide to let you in through the front door without a key for instance, it needs to know who you are. Before your autonomous car can be unleashed onto the streets, it needs to be able to spot a hazard, but how does it do it? One of the key drivers (see what I did there?) in this area is computer vision.
ARM®’s recent acquisition of Apical®, an innovative, Loughborough-based imaging tech company, helps us to answer these questions. With such a rich existing knowledge base and a number of established products, ARM, with Apical, is well placed to become a thought leader in computer vision technology. So what is computer vision? Computer vision has been described as graphics in reverse, in that rather than us viewing the computer’s world, the computer has turned around to look at ours. It is essentially exactly what it sounds like. Your computer can ‘see’, understand and respond to visual stimuli around it. In order to do this there are camera and sensor requirements of course, but once this aspect has been established, we have to make it recognise what it’s seeing. We have to take what is essentially just a graphical array of pixels and teach the computer to understand what they mean in context. We are already using examples of computer vision every day, possibly without even realising it. Ever used one of Snapchat’s daily filters? It uses computer vision to figure out where your face is and of course, to react when you respond to the instructions (like ‘open your mouth…’). Recent Samsung smartphones use computer vision too, a nifty little feature for a book worm like me is that it detects when your phone is in front of your face and overrides the display timeout so it doesn’t go dark mid-page. These are of course comparatively minor examples but the possibilities are expanding at breakneck speed and the fact that we already take these for granted speaks volumes about the potential next wave.
Computer vision is by no means a new idea, there were automatic number plate recognition systems as early as the 60s and 70s, but deep learning is one of the key technologies that has expanded its potential enormously. The early systems were algorithm based, removing the colour and texture of a viewed object in favour of spotting basic shapes and edges and narrowing down what they might represent. This stripped back the amount of data you had to deal with and allowed the processing power to focus on the basics in the clearest possible way. Deep learning flipped this process on its head and said, instead of algorithmically figuring out that a triangle of these dimensions is statistically likely to be a road sign, why don’t we look at a whole heap of road signs and learn to recognize them? Using deep learning techniques, the computer can look at hundreds and thousands of pictures of say, an electric guitar, and start to learn what an electric guitar looks like in different configurations, contexts, levels of daylight, backgrounds and environments. Because it sees so many variations it also starts to learn to recognise an item even when part of it is obscured because it knows enough about it to rule out the possibility that it’s something else entirely. Sitting behind all this cleverness are neural networks, computer models that are designed to mimic what we understand of how our brains work. The deep learning process builds up connections between the virtual neurons as it sees more and more guitars. With a neural net suitably trained, the computer can becoming uncannily good at recognising guitars, or indeed anything else it’s been trained to see.
The ImageNet competition tests how accurately computers can identify specific objects in a range of images
A key milestone for the adoption of deep learning was at the 2012 ImageNet competition. ImageNet is an online research database of over 14 million images and runs an annual competition to pit machines against each other to establish which of them produces the fewest errors when asked to identify the objects in a series of pictures. 2012 was the first year a team entered with a solution based on deep learning. Alex Krizhevsky’s system wiped the floor with the “shallow learning” competition that used more traditional methods and started a revolution in computer vision. The world would never be the same again. The following year there were of course multiple deep learning models and Microsoft broke records recently when their machine was actually able to beat their human control subject in the challenge!
A particularly exciting aspect of welcoming Apical to ARM is Spirit™, which takes data from video and a variety of sensors and produces a digital representation of the scene it’s viewing. This allows, for example, security staff to monitor the behaviour of a crowd at a large event and identify areas of unrest or potential issues based on posture, pose, mannerisms and numerous other important but oh so subtle factors. It also opens the doors for vehicles and machines to begin to be able to process their surroundings independently and apply this information to make smart decisions.
Spirit can simultaneously interpret different aspects of a scene into a digital representation
This shows us how quickly technology can move and gives some idea of the potential, particularly for autonomous vehicles as we can now see how precisely they could quantify the hazard of say, a child by the side of the road. What happens though, when it has a choice to make? Sure, it can differentiate between children and adults and assess that the child statistically holds the greater risk of running into the road. However, if there’s an impending accident and the only way to avoid it is to cause a different one, how can it be expected to choose? How would we choose between running into that bus stop full of people or the other one? By instinct? Through some internal moral code? Where does the potential of these machines effectively to think for themselves become the potential for them to discriminate or produce prejudicial responses? There is, of course, a long way to go before we see this level of automation but the speed at which the industry is advancing suggests these issues, and their solutions, will appear sooner rather than later.
ARM’s acquisition of Apical comes at a time when having the opportunity to exploit the full potential of technology is becoming increasingly important. We intend to be on the front line of ensuring computer vision adds value, innovation and security to the future of technology and automation. Stay tuned for more detail on up and coming devices, technologies and the ARM approach to the future of computer vision and deep learning.