For GDC 2014 I wrote a presentation intended to capture people’s interest and get them to the ARM lecture theatre using a lot of the buzz surrounding the current uptake in 3D tech. Alas, we never got to see the talk in the lecture theatre as so many partners offered to speak on our stand that we didn’t have time for it, so instead we recorded it in the office to present in ARMflix format.
When I first started at ARM on the demo team, the first demo I saw was True//Force, running in stereoscopic 3D on a 36 inch screen. Connected to that screen was a single core ARM® Mali™-400 MP GPU based development board.
That demo was adapted to many other platforms through the years, and later we reinvented the 3D camera system for a gesture input UI, again designed for large stereoscopic scenes. Throughout this development we made several interesting observations about pitfalls faced in generating stereoscopic content, which this year we were able to share over the ARMflix channel.
Admittedly the video above may be a little overlong in its description of 3D screen technology, most of this is already very common knowledge, although the advice about splitting the signal top/bottom instead of left/right still throws some developers. Since screens with alternating polarised lines already have a reduction in vertical resolution, the reduction in vertical resolution from a top/bottom split screen isn’t seen. A left/right split screen will reduce the horizontal resolution, forcing the screen to interpolate between pixels, and the vertical detail gets lost in the interlacing of lines anyway.
The basics of stereoscopy can be understood by thinking about lines of sight coming out from your eyes and intersecting at an object. If the object gets closer those lines converge nearer, if it moves into the distance those lines converge further away. When an object is drawn on a stereoscopic display, each eye sees the object in different places, and where those eye lines cross is where it appears to be in space, which could be in front of or even behind (or inside) the screen.
This projection into different views in order to emulate this difference in what each eye sees is the basis of stereoscopy, but stereoscopy is only one factor used by your brain to discern the shape of the world. If you’re going to implement stereoscopy, it has to remain congruent with other mental models of how the world works.
One such model is occlusion. For example, you would never expect to see something through a window which was further away than the object you’re seeing through it. Anything which comes out of the screen in a stereoscopic setup is basically this exact scenario. It’s okay so long as the object is in the middle of the screen, because then it could conceivably be in front of it, but if it clips the edge of the screen your brain becomes aware that it is seeing through the screen like a window, and yet the object being occluded by it is closer. This creates a visual dissonance which breaks the spell, so to speak.
As it approaches the edge what you will observe is a kind of impossible monocular defect, where the eyes disagree to a degree that it becomes hard to discern spatial information. Monocular defects occur in real life too, imagine peering round the corner of a wall such that only one eye sees an object in the distance, or looking at an object which has a highly angle sensitive iridescence or noisy high gloss texture such that it seems a different colour or pattern in each eye. The difference in these cases is that they arise from genuine circumstance and we can rely on other factors to provide additional information, from small lateral movements of the head to using optical focus as a secondary source of depth information.
In a simulation, the focus of the eyes is always on the screen, as there is currently no way to defocus the image on screen to converge at a different focal point for your eyes, and without sophisticated head tracking hardware, your TV can’t correct perspective for you leaning to the left or coming closer.
More importantly, as the generation of stereoscopic viewpoints is being faked on a 2D screen, it’s possible to do odd things like occlude something with an object that appears to be behind it. This is a mistake frequently made by overlaying the menu on the screen for each eye but not thinking about how far in or outside the screen it will appear.
You could also occlude protrusions from the screen by the screen itself. The imagined reality of watching something stereoscopic on a screen is that the screen is a window you’re looking through into a fully 3D world. Occasionally things may bulge out of this window, and that’s fine, but if they then slide sideways out of view, they’re being occluded by an object behind them: the frame of this virtual window.
Best to try to keep the space in front of the screen a little bit special, by keeping the collision plane of the camera and the convergence plane of the projections in the same place. With that in mind, let’s talk about convergence planes.
This is a good point to remind you that this is advice for stereoscopy on screens, as virtual headsets use a very different method of projection.
So you’ve got two cameras pointing forwards, some distance apart. This will give you a kind of 3D, but unfortunately you’re not quite done yet. As mentioned before, the distance at which your eyes converge on the screen is the place in 3D space where objects should appear in the same place on the two eye images. But if you draw two visibility frustums pointing parallel, you’ll see that actually there is no point where an object appears in the same place in both. It will tend towards convergence at infinity, when the distance between the frustums is negligible compared to the width of the view plane. That means that the screen is essentially the furthest away anything can be in this setup, whereas what you probably want is a sense of depth into the screen. To achieve this you need to cross the streams, make the eyes look inward a little.
It’s not as simple as just rotating the frustums however. Unlike the spherical optics of an actual eye (admittedly rather exaggerated in the video), the flat projection of a frustum will intersect but never truly converge. For convergence of the view plane you must skew the frustum.
The equation for this is far simpler than it looks, a multiplication by a matrix that looks like this:
g is the gradient of the skew, defined as s/d (where s is half the separation between the two cameras, and d is the perpendicular distance between the desired convergence plane and the cameras).
So now you need to decide what the eye distance and the convergence plane are. The best advice I can give is to attempt to replicate as best you can the relationship between the physical world and the virtual. Basically, you would expect someone playing a game to sit at least a meter from the screen and have eyes which are between 6 and 7cm apart. However, more important than an exact emulation is the ability to parameterise it for the far distance. The draw frustum has an angular factor in the field of view.
Typically the field of view is around 90 degrees, which means the screen size at the convergence plane is sin(45)d or roughly 0.7d so in the scenario above where we assume the screen is about 1 meter away from the cameras, the screen width is 70cm. I’m currently sat in front of a 24 inch monitor and I measure the width as approximately 50cm. If the eyes were placed 6cm apart in this scenario, the far plane would tend towards the virtual 7cm eye separation required for the eye vectors to be parallel. Scaled down onto my 50cm screen however this would be a 5cm eye separation. My eye lines would not be parallel and so the far plane would actually have a tangible optical distance*.
Ideally you want the distance of the far plane separation on screen to be the same as your user’s actual physical eye separation. This of course requires parameterisation for user eye separation and screen size. Sticking with 7cm eye separation and a 50cm screen for a moment, let’s look at our options.
1. You can move the cameras further apart (by a ratio of 7/5 to a virtual 9.8cm in this case) and adjust the skew gradient to suit.
2. You can make the field of vision (FOV) narrower (to sin-1(0.5) or 30 degrees in this case) so that the convergence plane is the same size as the screen.
In case it isn’t obvious which of these is the better option (a 30 degree FOV, really you think that’s okay?), consider the math if the screen is 10cm across. Obviously you wouldn’t see much through a 5.7 degree FOV, so that option is right out.
Option 1 has one major drawback however, which is that the whole world’s scale will scale with the eye distance. In this case the whole world would be 5/7ths its original size. Again, do the math with a 10cm screen and you’ll see that the world would be 1/7th the size.
There is a temptation to move the cameras closer to the convergence plane, but to keep the screen the same size and place that plane closer, the FOV would have to widen, creating a mismatch between the perspective in the real and virtual space. What this mismatch does is move things towards the far plane more rapidly, as the sense of depth is an asymptotic power curve linked to perspective. The wider the FOV goes, the higher the power and the tighter the curve. If this curve is different to what you’re used to, the depth will feel very odd and less real. However, an incorrect convergence or even a lack of convergence is far preferable to the worst sin of stereoscopy: divergence.
Since the entire basis of stereoscopy is your eyes pointing to two different places on the screen and the imaginary sight lines crossing at the depth being simulated, we know that these lines being parallel means the object is infinitely far away. But what if the images for each eye move even further apart? If the gap is wider than the gap between your eyes the imaginary lines of sight are no longer parallel but actually diverge and this is one of the biggest head ache inducers in stereoscopy.
It is a scenario which you can literally never replicate in real life without optical equipment. If the images diverge by a reasonable amount it’s not too bad because you just can’t focus on them, which ruins the picture but not your eyes. If they diverge by a tiny amount your eyes will have a go at achieving binocular focus and you’ll probably end up with the mother of all migraines in very little time. This scenario is the most likely to arise if you don’t allow users to fine tune their own stereoscopy settings.
There are other simple mistakes and pitfalls covered in the video at the top of this blog, such as forgetting to make menus stereoscopic and being careful with the perceived depth of overlays. It makes sense for a pop-up menu to have its text at screen depth on the convergence plane so that if the stereoscopy needs adjusting, the menu is still easy to read… but the menu has to be drawn over everything else, so how to avoid objects behind the menu overlapping the convergence plane at these times? Again, this comes down to keeping space in front of the convergence plane special.
Which just leaves me time to add one side ramble which isn’t in the video. By far the most popular stereoscopic display currently in use is that on the 3DS hand held game console from Nintendo, and when it first came out people talked about the weirdness of the little slider on the side which seemed to scale the 3D effect on the screen. If it was at the bottom it went 2D, and sliding it up slowly took it to full depth simulation. At the time people wondered why you would want a game to be in half 3D, where it was still stereoscopic, but appeared semi-flattened.
The answer is simple: The slider was actually adjusting the eye-separation, so that full depth stereoscopy could be tuned for use by children of all ages. If the anatomical eye gap of the user is wider than the highest setting of the device, it doesn’t matter too much, it just means that the far plane won’t look as far. But if the eye separation was fixed width and someone with a smaller eye gap tried to use it, it would cause line of sight divergence, and all the associated headaches.
So make sure you’ve got that slider in your software.
* I know you’ll be knocking yourselves out over this so I’ll do the math for you. The far plane would appear to be 2.5 meters behind the screen.