The VR software gap

When it comes to VR in the public’s mind, it’s all about the hardware. And that’s understandable, when there’s all that new and shiny tech out there: the Oculus Rift, the Leap Motion Leap, the Razer Hydra, the newly-announced Sony HMZ-T2 (couldn’t find an actual Sony link), you name it. But with that comes the unstated assumption that the hardware is all you need, that if you just buy the gadget, it will somehow work on its own. And at least when it comes to VR, that’s simply not the truth. Without proper software running it, the gadget is nothing but a glorified paperweight (of course the reverse is just as true, but I’m a software guy, so there).

The emphasis here is on proper software. Because all that shiny tech that came out in the past, and that nobody remembers (or tries to purge from their minds — remember the Virtual Boy? You’re welcome!), it all came with software. Just not software anybody was willing to use.

Which is why I was delighted to read this recent interview with Valve Software‘s Michael Abrash. Here’s a guy who gets it:

So first, I’ll tell you what’s necessary for VR to work well. For VR to work well, you need display technology that gives you an image both your brain and eye are happy with. Trust me, that’s much harder than you think. Even if it was just a HUD, people wouldn’t be that happy, because you’re always moving. Your head is never still. And this is moving relative to the world, and if your brain is trying to fuse it, that can be rather tiresome. I’ll tell you there are lots of issues with getting that image up in front of you.

I couldn’t agree more. Here’s what I have to add to this statement: as a game developer, Mr. Abrash should not have to worry about it in the first place. Should game developers worry about implementing the projective geometry and arithmetic necessary to turn triangles forming a 3D world into pixels on a screen in massively-parallel special-purpose silicon? No, that’s what OpenGL or Direct3D are for. Should game developers worry about how to scan the hardware of a keyboard to read key presses, or how to safely send data packets across a heterogeneous network of interconnected computers? No, that’s what the operating system is for.

Along the same lines, someone else should have to worry about how to properly display a 3D virtual world on a head-mounted display so that it looks correct and doesn’t cause eye strain or nausea, because that’s really hard, and really important. And while the Michael Abrashes and John Carmacks of this world can surely do it, others will get it wrong. I know that because others have been getting it wrong, for going on twenty years now. And it’s the wrong approaches that sour people of the whole VR idea.

But the problem is that, at this time, game developers still have to worry about it, because there is not an equivalent to OpenGL for VR yet, in the sense that it is a widely-accepted industry-standard toolkit that has all the functionality that’s required to build successful applications on it. Now, there are plenty of VR toolkits out there — and, for disclosure, I created one of them — but none fulfill these criteria. Let’s talk about that.

Like other support or middleware software, VR toolkits can work at several levels of abstraction. I’m going to use “standard” 3D graphics toolkits as analogies here, assuming that those who have read this far know about such things.

At the low level, we have things that are the equivalent of OpenGL itself, or the glut windowing toolkit built on top of it. These are things that give you minimum abstractions, and offload any higher-level functionality to individual applications. Take glut: it will open a display window for you (no easy task in itself), and allow you to query the mouse, but if you want to use the mouse to rotate your 3D scene in the window, you’re on your own, pal. Result: glut developers roll their own navigation interfaces, and if their actual goal is something besides navigation, and they just do the bare minimum, the results usually suck hard.

The equivalent to glut in the VR world would be a toolkit that opens windows for you and sets them up to do proper stereo, and gives you abstract input devices, typically represented as 4×4 homogeneous matrices. If you want to use those input devices to do anything, let’s hope you really grok projective geometry. The results are often, let’s say, somewhat glut-ish in nature.

The canonical example of a low-level VR toolkit is the cavelib, but there are many others. I want to mention one other, because I might catch flak for it: VR Juggler. Now I haven’t looked at version 3.0 yet, but in the VR Juggler I know, the above abstractions are what you get. There is a lot of work going on under the hood, with clever ways of dynamically managing input devices and displays etc., but in the end what you get is a number of set-up-for-3D windows, and a bunch of matrices. Everything else is up to you. Don’t get me wrong: I’m not saying that these toolkits are bad, I’m merely saying that they’re low-level. If low-level is what you want or need, these are for you.

On the other extreme, there are high-level toolkits (who’da thunk?). These are basically content creation and management engines, equivalent to things like commercial or open-source game engines, think id Tech, Unreal Engine, Ogre, Horde 3D, etc. These are very powerful and easy to use — at least relative to their feature sets — but they are written with a particular purpose in mind. You could probably tweak the Unreal Engine to do 3D visualization of volumetric data, but you’d really be better off not doing it.

The only high-level VR toolkit I know by more than just name is WorldViz, but I think it’s pretty canonical. It’s very easy to put together a 3D world with it, and show it on a wide variety of VR display devices, but if you have more specific needs, it will be so much harder to punch through the high abstraction layer to get to the guts you need to get to.

A quick secondary analogy: low-level is like raw X11, high level is like a certain office software suite, and the middle level is like gtk+ or Qt. You can see where I’m going: nobody has been writing apps in raw X11 for twenty years (with very good reason), and the really exciting part is in the middle, because developing apps in that unnamed office software suite is for code monkeys (that was a joke).

I haven’t seen many medium-level VR toolkits. In the non-VR world, scene graph toolkits like OpenSceneGraph or OpenSG would qualify for this level, but while there exist some VR embeddings of these toolkits, those are not quite standard, and — I believe — are still lacking in the input department.

It was this lack of medium-level software that led me to start my own VR toolkit back in the day. There’s much to be said about that, but it’s a topic for another post. For now, I just want to mention that what separates it from low-level are its built-in 3D interaction metaphors, such as navigation. If you want to rotate your scene with the mouse, you don’t have to reinvent the wheel. But if you do really want to make your own navigation metaphor, there’s an “official” way to do so — and that’s what separates it from high-level toolkits.

But back on topic. Why do I insist that game developers use VR middleware, instead of working on the bare metal themselves? I already mentioned that there’s the danger of getting it wrong, and having middleware that does it right prohibits that, but there’s another reason that holds even if all game developers do it right.

Games have been rolling their own user interfaces since day 1, and there’s a certain appeal to having vastly different looking interfaces in different games that fit with the visual style of each game, but there’s the thing: take away the skin, and they all work the same. You don’t have to read the manual to know how to navigate a game’s menu (or if you do, you should ask for your money back), and if you play certain genres, say first-person shooters, you know that they all use WASD+mouse, so you’re right at home. But imagine if games used functionally different interfaces. Simple example: imagine half of FPS games looking up when you push the mouse forward, and the other half looking down, and there being no way to change that. Now imagine you’re really good at one, and try the other. You’ll love it.

And that’s a problem in VR, because the number of potential ways of doing the same thing, multiplied by the number of fundamentally different input devices (gamepad? Wiimote? data glove? Kinect? else?) would lead to an explosion of mutually incompatible choices. Using a common middleware, which is based on tested and working interaction metaphors, and allows users to pick their own favorite metaphors out of a large pool and use them across all applications, would really help here.

To break up this Wall of Text, I’m going to throw in two related videos. Both show VR “games,” with somewhat different ways of incorporating the players’ bodies into the action. The first one is a pretty straight-up FPS. It’s decidedly old-school, being based on maps and models from 1997’s Descent (best game ever!), so look past the dated graphics and observe the seamless integration of the player, and the physical user interface, particularly the aiming. Keep in mind the catch-22 of VR: in order to film this, the user can’t see properly, which is why my aim is so poor. If done for real, it’s much better. Please watch both halves, the second (starting at 2:18) makes it clearer what’s going on in the first:

The second video also shows an FPS, at least on the surface, being based on maps from Doom 3. But the video is not only a lot more whimsical (feel free to roll your eyes), it also doesn’t feature shooting, and shows a larger variety of bodily interactions, including being able to draw free-handedly in 3D space for an ad-hoc noobs’ round of tic-tac-toe. It’s not a game, as it’s meant to show remote collaboration with virtual holograms, but it was a blast nonetheless, some hardware trouble notwithstanding:

Back to middleware, and the final issue: think configuring a desktop PC game is bad? With all the drivers and graphics options and knobs and twiddles, and FAQs on the web how to get it wide-screen etc.? VR is a hundred times worse, and on top of that, if you get it slightly wrong, it will make you sick. Now imagine you’ve just set it up perfectly in one game, and have to do it all over again for the next game, and the knobs and dials you have to twiddle are completely different. If there’s common middleware, you only have to do it once. Wouldn’t it be nice if same-genre games of today, like FPSs, would share mouse and keyboard settings at least? Or if you tell one to run at 1920×1080, the next one will, too? One can dream, right? Well, with a good medium-level toolkit, that is exactly what happens.

I am not proposing that all games should work exactly the same, not even games in the same genre. Even when using a medium-level toolkit with powerful built-in user interface features, there is still a huge amount of design space for individual games to establish their own look & feel, or provide special-purpose interaction metaphors — allowing that is the whole point of medium-level toolkits — but at least the fundamentals are strong.

I kept the best news for last: a good medium-level VR toolkit does not only work in actual VR, it also works splendidly on a desktop — in fact, an application based on a VR toolkit done right is functionally indistinguishable from a native desktop application. I have plenty of VR applications to prove it. That means that game developers will not have to make specific VR and desktop versions of their games. VR (or the desktop, depending on your perspective) will come for free.

So here’s my call to arms: now is the exactly right time to get going on that middleware thing. We can’t wait until great consumer-level VR hardware hits the mainstream market; the software has to be already there and ready the moment it does. The people making VR middleware, and the people who should be using it, should already be talking at this point. Are they?

10 thoughts on “The VR software gap

  1. I hope your input is taken into account with the oculus SDK!

    What about VR’s potential in real estate? Is there software out there that can integrate several hundred photos into a 3d panorama (the user stays still, and can only tilt their head around)?

    The gigapan epic 100 is perfect for this application, but manually editing hundreds of photos, combined with the varied lighting sounds very hard! But I don’t write much if any software or know much about 3d. Help.

    I know Microsoft bought a company that did something like this. I imagine a single 3d photo that is zoomable, which is a bit different than what is currently possible.

    • I hope so, too. 🙂

      There are a couple of promising approaches. An interesting one is Building Rome in a Day by the University of Washington. Then there’s Autodesk’s 123D, but that seems more for desktop application.

      I’m personally most interested in using 3D cameras to capture and reconstruct 3D spaces, such as the Kinect. Best way to find those is to google for “Kinect SLAM.”

      The end result is a full 3D model of the captured scene, meaning that it’s not just a 3D photo to zoom around in, but a fully walkable 3D environment, more like those seen in 3D video games. I don’t have a good video showing that, right now.

  2. Pingback: Is VR dead? | Doc-Ok.org

  3. Pingback: The reality of head-mounted displays | Doc-Ok.org

  4. Pingback: First impressions from the Oculus Rift dev kit | Doc-Ok.org

  5. Pingback: Fighting Motion Sickness due to Explicit Rotation | Doc-Ok.org

  6. Pingback: Gaze-directed Text Entry in VR Using Quikwrite | Doc-Ok.org

    • I’m not sure. I am talking to the OSVR people, but I haven’t looked at their proposed SDK/API in detail. Until I find out otherwise, I’m assuming their abstractions are similar to existing VR toolkits or SDKs, which I generally consider too low-level to lead to truly portable VR applications. I know a bit more about their proposed input device abstraction model, and it’s along the lines of VRPN (Virtual Reality Peripheral Network), which would, for example, allow to abstract the differences between a STEM and a PS Move or any comparable 6-DOF input device, but not the differences between a STEM and a mouse/keyboard, or a joystick, or a tracked skeleton, etc. The kind of middleware that I’m talking about would still sit above that level.

Please leave a reply!