How do all the technical components of VR headsets, e.g., screens, lenses, tracking, etc., actually come together to create realistic-looking virtual environments? Specifically, why do virtual environment in VR look more “real” compared to when viewed via other media, for example panoramic video?
The reason I’m bringing this up again is that the question keeps getting asked, and that it’s really kinda hard to answer. Most attempts to answer it fall back on technical aspects, such as stereoscopy, head tracking, etc., but I find that this approach somewhat misses the point by focusing on individual components, or at least gets mired in technical details that don’t make much sense to those who have to ask the question in the first place.
I prefer to approach the question from the opposite end: not through what VR hardware produces, but instead through how the viewer perceives 3D objects and/or environments, and how either the real world on the one hand, or virtual reality displays on the other, create the appropriate visual input to support that perception.
The downside with that approach is that it doesn’t lend itself to short answers. In fact, last summer, I gave a 25 minute talk about this exact topic at the 2016 VRLA Summer Expo. It may not be news, but I haven’t linked this video from here before, and it’s probably still timely:
After some initial uncertainty, and accidentally raising a stink on reddit, I did manage to attend Oculus Connect last weekend after all. I guess this is what a birthday bash looks like when the feted is backed by Facebook and gets to invite 1200 of his closest friends… and yours truly! It was nice to run into old acquaintances, meet new VR geeks, and it is still an extremely weird feeling to be approached by people who introduce themselves as “fans.” There were talks and panels, but I skipped most of those to take in demos and mingle instead; after all, I can watch a talk on YouTube from home just fine. Oh, and there was also new mobile VR hardware to check out, and a big surprise. Let’s talk VR hardware. Continue reading →
I’ve recently received an Oculus Rift Development Kit Mk. II, and since I’m on Linux, there is no official SDK for me and I’m pretty much out there on my own. But that’s OK; it’s given me a chance to experiment with the DK2 as a black box, and investigate some ways how I could support it in my VR toolkit under Linux, and improve Vrui’s user experience while I’m at it. And I also managed to score a genuine Oculus VR Latency Tester, and did a set of experiments with interesting results. If you just want to see those results, skip to the end.
The Woes of Windows
If you’ve been paying attention to the Oculus subreddit since the first DK2s have been delivered to developers/enthusiasts, there is a common consensus that the user experience of the DK2 and the SDK that drives it could be somewhat improved. Granted, it’s a developer’s kit and not a consumer product, but even developers seem to be spending more time getting the DK2 to run smoothly, or run at all, than actually developing for it (or at least that’s the impression I get from the communal bellyaching).
I have talkedmanytimes about the importance of eye tracking for head-mounted displays, but so far, eye tracking has been limited to the very high end of the HMD spectrum. Not anymore. SensoMotoric Instruments, a company with around 20 years of experience in vision-based eye tracking hardware and software, unveiled a prototype integrating the camera-based eye tracker from their existing eye tracking glasses with an off-the-shelf Oculus Rift DK1 HMD (see Figure 1). Fortunately for me, SMI were showing their eye-tracked Rift at the 2014 Augmented World Expo, and offered to bring it up to my lab to let me have a look at it.
Figure 1: SMI’s after-market modified Oculus Rift with one 3D eye tracking camera per eye. The current tracking cameras need square cut-outs at the bottom edge of each lens to provide an unobstructed view of the user’s eyes; future versions will not require such extensive modifications.
I just moved all my Kinects back to my lab after my foray into experimental mixed-reality theater a week ago, and just rebuilt my 3D video capture space / tele-presence site consisting of an Oculus Rift head-mounted display and three Kinects. Now that I have a new extrinsic calibration procedure to align multiple Kinects to each other (more on that soon), and managed to finally get a really nice alignment, I figured it was time to record a short video showing what multi-camera 3D video looks like using current-generation technology (no, I don’t have any Kinects Mark II yet). See Figure 1 for a still from the video, and the whole thing after the jump.
Figure 1: A still frame from the video, showing the user’s real-time “holographic” avatar from the outside, providing a literal kind of out-of-body experience to the user.
Why do I think it’s worth talking about? Because, while there is an actual design for something called Holovision, and that design is theoretically feasible, and possibly even practical, the public’s impression of the product advertised on Kickstarter is decidedly not. The concept imagery associated with the Kickstarter project presents this feasible technology in a way that (intentionally?) taps into people’s misconceptions about holograms (and I’m talking about the “real” kind of holograms, those involving lasers and mirrors and beam splitters). In other words, it might not be a scam per se, and it might even be unintentional, but it is definitely creating a false impression that might lead to very disappointed backers.
My friend Serban got his Oculus Rift dev kit in the mail today, and he called me over to check it out. I will hold back a thorough evaluation until I get the Rift supported natively in my own VR software, so that I can run a direct head-to-head comparison with my other HMDs, and also my screen-based holographic display systems (the head-tracked 3D TVs, and of course the CAVE), using the same applications. Specifically, I will use the Quake ||| Arena viewer to test the level of “presence” provided by the Rift; as I mentioned in my previous post, there are some very specific physiological effects brought out by that old chestnut, and my other HMDs are severely lacking in that department, and I hope that the Rift will push it close to the level of the CAVE. But here are some early impressions.
Figure 1: What it would look like to unbox an Oculus VR dev kit, if one were to have such a thing.
I don’t think there’s need to introduce the Oculus Rift HMD. Everyone’s heard of it, and everyone’s psyched – including me.
However, HMDs are prone to certain issues, and while that shouldn’t detract us from embracing them, we should be careful to do it right this time. The last thing the VR field needs right now is a viral YouTube video along the lines of “Oh, an Oculus Rift! Cool! Let me try it on… Wow, that’s awesoBLEEAAARRGHHH.”
To back up a little: when HMDs became a thing in the 80s, they tended to induce dizziness and nausea in viewers, after a relatively short time of using them. Interestingly, HMDs had generally worse effects than other types of immersive display environments such as CAVEs. The basic theory of simulation sickness is based on virtual motion, and does not account for this difference.
The commonly stated explanation for this difference is display lag. In an HMD, the screens move with the viewer’s head, and any delay will cause the virtual world to move along with the viewer until the display system catches up. Imagine wearing an HMD and quickly turning your head to the side. Say it takes 30ms total until this motion is noticed by the head tracking system, the application updates its internal state, renders the new state, and refreshes the HMD’s screens. During this interval, the world will turn with you, and it will snap back to its original orientation once the delay time has passed. The real world does not behave like that, and because HMD-based graphics tap deeply into our brain’s visual system, this is very disorienting and adds to the discomfort. In a CAVE, on the other hand, the screens do not move with the viewer. Delay will still cause a disturbance in the projection of the virtual world, as the actual viewer position will not match the virtual one, but because screens are large and relatively far away, this will be barely noticeable. So far, so good.
Alas, there is an additional, often overlooked, factor — display calibration. Any immersive graphics system, HMD or CAVE or else, needs to exactly replicate how virtual objects are projected onto the system’s real screens, and then seen by the user (how exactly that works is a topic for another post). The bottom line is that the graphics software needs to know the absolute positions and orientations of all screens, and the absolute positions of the viewer’s eyes. Determining this is the job of head tracking and system calibration. But in an HMD, unlike in a CAVE, the tolerances for calibration are very low. The screens are very small and very close to the viewer’s eyes, which doesn’t leave much room for error (see Figure 1). Even worse, there is no way to precisely don an HMD short of putting screws into one’s skull; every time you put it on, it sits slightly differently. And that means any pre-configured projection parameters will not match reality.
Figure 1: Diagram of a hypothetical HMD for calibration purposes. The HMD consists of small real screen mounted directly in front of the viewer’s eyes, and uses optics to create larger virtual screens at a longer distance away to allow users to properly focus on those screens. For proper calibration, graphics software needs to know the precise positions of the viewer’s pupils and the exact positions and sizes of the virtual screens, in some coordinate system. Head tracking will provide the mapping from this viewer-attached coordinate system to the world coordinate system to allow users to look and walk around.
These mismatches have several effects. For one, imagine that a viewer wears an HMD slightly askew, so that the two screens have different vertical positions in front of their respective eyes. If the software does not account for that, the two stereo images will be vertically displaced, something that does not happen in real life. The viewer’s eyes will make up for it, up to a point, by moving up/down independently, but that is an unnatural motion and causes eye strain. It’s the same effect as watching a 3D movie in a theater while not holding one’s head level — it will hurt later.
Another, more subtle, effect is that in a miscalibrated display system the virtual world does not behave as the real world would. Do a simple experiment: fire up some first-person video game that allows view configuration, such as Doom3, and set a high field of view. Then rotate the view and observe. The virtual world will display a strong distortion effect, meaning that the sizes of objects, and their internal angles, change as the viewpoint changes. This is an extreme example, but even slight discrepancies are subconsciously unsettling, because our visual system is very good at detecting if something is not right with the world, and it tells us that by making us sick.
Even in non-immersive 3D graphics, a too large discrepancy between real field of view (how large the screen looms in our visual field) and programmatic field-of-view is known to cause motion sickness, and immersive 3D graphics with the same issue will be much worse. FOV discrepancy is only one symptom of miscalibration, but it’s the one that’s easiest to demonstrate; the others are more subtle (but that’s a topic for another post). In the end, miscalibration is a nasty problem because it is subtle, very hard to correct, and causes significant ill effects.
I noticed these things when I started experimenting with my own HMDs a while ago (I have an eMagin Z800 3DVisor and a Sony HMZ-T1). I experimented with rapid motions, but those didn’t really make me dizzy. I did notice, however, that the world didn’t seem solid, but as if it was made from jelly. I expected that, not having done proper calibration yet, so I used an interactive calibration utility to set up the system just so. After that, the world seemed stable, and interestingly I didn’t notice any more issues from lag. Not having done any further experiments, my hunch is that miscalibration is actually a bigger problem than lag. (Disclosure: while I was using a low-latency Intersense IS-900 tracking system, the computer running the show was fairly old, and the Quake3 renderer had no particular performance tweaking, so I estimated total system delay around 30ms).
So what’s the take-home message from this wall of text? If we want HMDs to succeed, we need to treat them properly in our graphics software. We need to use proper projection models instead of the standard camera model (but that’s a topic for another post), and not simply apply ad-hoc stereo models such as toe-in etc. (but that’s a topic for another post). It might work for a demo, but it won’t be pretty, and it will make our users sick. Instead, we need to know exactly how the HMD is laid out internally (screen placement and size, effects from the optical system in front of the screens, lens distortions, etc.), and, just as importantly, we need to know exactly where the viewer’s eyes are with respect to the screens (see Figure 1). This last one is the hard part. Maybe a future perfect HMD will contain one pair of stereo cameras per screen that will accurately track the viewer’s pupils and allow the graphics software to set up the projection parameters correctly, no matter how the HMD is worn and how the viewer moves. But until then, we need to come up with a practical approach, and we need to find simple methods to calibrate HMDs on the fly, and teach our users how to use those methods.
Well, and, of course, we mustn’t forget about minimizing lag, either. That would be too easy.
Oh, and by the way, want to get a quick glimpse of just how immersive the Oculus Rift will be (going by current specs)? If your monitor is X inches wide, put your eye X/2 inches in front of the monitor’s center — that’s about what it will look like. If you want to play a first-person game from that viewpoint and have it look right, set the horizontal field of view to 90 degrees.