How Does VR Create the Illusion of Reality?

I’ve recently written a loose series of articles trying to explain certain technical aspects of virtual reality, such as what the lenses in VR headsets do, or why there is some blurriness, but I haven’t — or at least haven’t in a few years — tackled the big question:

How do all the technical components of VR headsets, e.g., screens, lenses, tracking, etc., actually come together to create realistic-looking virtual environments? Specifically, why do virtual environment in VR look more “real” compared to when viewed via other media, for example panoramic video?

The reason I’m bringing this up again is that the question keeps getting asked, and that it’s really kinda hard to answer. Most attempts to answer it fall back on technical aspects, such as stereoscopy, head tracking, etc., but I find that this approach somewhat misses the point by focusing on individual components, or at least gets mired in technical details that don’t make much sense to those who have to ask the question in the first place.

I prefer to approach the question from the opposite end: not through what VR hardware produces, but instead through how the viewer perceives 3D objects and/or environments, and how either the real world on the one hand, or virtual reality displays on the other, create the appropriate visual input to support that perception.

The downside with that approach is that it doesn’t lend itself to short answers. In fact, last summer, I gave a 25 minute talk about this exact topic at the 2016 VRLA Summer Expo. It may not be news, but I haven’t linked this video from here before, and it’s probably still timely:

On the road for VR: Silicon Valley Virtual Reality Conference & Expo

I just got back from the Silicon Valley Virtual Reality Conference & Expo in the awesome Computer History Museum in Mountain View, just across the street from Google HQ. There were talks, there were round tables, there were panels (I was on a panel on non-game applications enabled by consumer VR, livestream archive here), but most importantly, there was an expo for consumer VR hardware and software. Without further ado, here are my early reports on what I saw and/or tried.

Figure 1: Main auditorium during the “60 second” lightning pitches.

Small Correction to Rift’s Projection Matrix

In a previous post, I looked at the Oculus Rift’s internal projection in detail, and did some analysis of how stereo rendering setup is explained in the Rift SDK’s documentation. Looking at that again, I noticed something strange.

In the other post, I simplified the Rift’s projection matrix as presented in the SDK documentation to

$P = \begin{pmatrix} \frac{2 \cdot \mathrm{EyeToScreenDistance}}{\mathrm{HScreenSize} / 2} & 0 & 0 & 0 \\ 0 & \frac{2 \cdot \mathrm{EyeToScreenDistance}}{\mathrm{VScreenSize}} & 0 & 0 \\ 0 & 0 & \frac{z_\mathrm{far}}{z_\mathrm{near} - z_\mathrm{far}} & \frac{z_\mathrm{far} \cdot z_\mathrm{near}}{z_\mathrm{near} - z_\mathrm{far}} \\ 0 & 0 & -1 & 0 \end{pmatrix}$

which, to those in the know, doesn’t look like a regular OpenGL projection matrix, such as created by glFrustum(…). More precisely, the third row of P is off. The third-column entry should be $\frac{z_\mathrm{near} + z_\mathrm{far}}{z_\mathrm{near} - z_\mathrm{far}}$ instead of $\frac{z_\mathrm{far}}{z_\mathrm{near} - z_\mathrm{far}}$, and the fourth-column entry should be $2 \cdot \frac{z_\mathrm{far} \cdot z_\mathrm{near}}{z_\mathrm{near} - z_\mathrm{far}}$ instead of $\frac{z_\mathrm{far} \cdot z_\mathrm{near}}{z_\mathrm{near} - z_\mathrm{far}}$. To clarify, I didn’t make a mistake in the derivation; the matrix’s third row is the same in the SDK documentation.

What’s the difference? It’s subtle. Changing the third row of the projection matrix doesn’t change where pixels end up on the screen (that’s the good news). It only changes the z, or depth, value assigned to those pixels. In a standard OpenGL frustum matrix, 3D points on the near plane get a depth value of 1.0, and those on the far plane get a depth value of -1.0. The 3D clipping operation that’s applied to any triangle after projection uses those depth values to cut off geometry outside the view frustum, and the viewport projection after that will map the [-1.0, 1.0] depth range to [0, 1] for z-buffer hidden surface removal.

Using a projection matrix as presented in the previous post, or in the SDK documentation, will still assign a depth value of -1.0 to points on the far plane, but a depth value of 0.0 to points on the (nominal) near plane. Meaning that the near plane distance given as parameter to the matrix is not the actual near plane distance used by clipping and z buffering, which might lead to some geometry appearing in the view that shouldn’t, and a loss of resolution in the z buffer because only half the value range is used.

I’m assuming that this is just a typo in the Oculus SDK documentation, and that the library code does the right thing (I haven’t looked).

Oh, right, so the fixed projection matrix, for those working along, is

$P = \begin{pmatrix} \frac{2 \cdot \mathrm{EyeToScreenDistance}}{\mathrm{HScreenSize} / 2} & 0 & 0 & 0 \\ 0 & \frac{2 \cdot \mathrm{EyeToScreenDistance}}{\mathrm{VScreenSize}} & 0 & 0 \\ 0 & 0 & \frac{z_\mathrm{near} + z_\mathrm{far}}{z_\mathrm{near} - z_\mathrm{far}} & 2 \cdot \frac{z_\mathrm{far} \cdot z_\mathrm{near}}{z_\mathrm{near} - z_\mathrm{far}} \\ 0 & 0 & -1 & 0 \end{pmatrix}$

A Follow-up on Eye Tracking

Now this is why I run a blog. In my video and post on the Oculus Rift’s internals, I talked about distortions in 3D perception when the programmed-in camera positions for the left and right stereo views don’t match the current left and right pupil positions, and how a “perfect” HMD would therefore need a built-in eye tracker. That’s still correct, but it turns out that I could have done a much better job approximating proper 3D rendering when there is no eye tracking.

This improvement was pointed out by a commenter on the previous post. TiagoTiago asked if it wouldn’t be better if the virtual camera were located at the centers of the viewer’s eyeballs instead of at the pupils, because then light rays entering the eye straight on would be represented correctly, independently of eye vergence angle. Spoiler alert: he was right. But I was skeptical at first, because, after all, that’s just plain wrong. All light rays entering the eye converge at the pupil, and therefore that’s the only correct position for the virtual camera.

Well, that’s true, but if the current pupil position is unknown due to lack of eye tracking, then the only correct thing turns into just another approximation, and who’s to say which approximation is better. My hunch was that the distortion effects from having the camera in the center of the eyeballs would be worse, but given that projection in an HMD involving a lens is counter-intuitive, I still had to test it. Fortunately, adding an interactive foveating mechanism to my lens simulation application was simple.

Turns out that I was wrong, and that in the presence of a collimating lens, i.e., a lens that is positioned such that the HMD display screen is in the lens’ focal plane, distortion from placing the camera in the center of the eyeball is significantly less pronounced than in my approach. Just don’t ask me to explain it for now — it’s due to the “special properties of the collimated light.” 🙂

A Closer Look at the Oculus Rift

I have to make a confession: I’ve been playing with the Oculus Rift HMD for almost a year now, and have been supporting it in Vrui for a long time as well, but I haven’t really spent much time using it in earnest. I’m keenly aware of the importance of calibrating head-mounted displays, of course, and noticed right away that the scale of virtual objects seen through the Rift was way off for me, but I never got around to doing anything about it. Until now, that is.

The Holovision Kickstarter “scam”

Update: Please tear your eyes away from the blue lady and also read this follow-up post. It turns out things are worse than I thought. Now back to your regularly scheduled entertainment.

I somehow missed this when it was hot a few weeks or so ago, but I just found out about an interesting Kickstarter project: HOLOVISION — A Life Size Hologram. Don’t bother clicking the link, the project page has been taken down following a DMCA complaint and might not ever be up again.

Why do I think it’s worth talking about? Because, while there is an actual design for something called Holovision, and that design is theoretically feasible, and possibly even practical, the public’s impression of the product advertised on Kickstarter is decidedly not. The concept imagery associated with the Kickstarter project presents this feasible technology in a way that (intentionally?) taps into people’s misconceptions about holograms (and I’m talking about the “real” kind of holograms, those involving lasers and mirrors and beam splitters). In other words, it might not be a scam per se, and it might even be unintentional, but it is definitely creating a false impression that might lead to very disappointed backers.

Figure 1: This image is a blatant lie.

Stereo Sue

I was just reminded of an article in The New Yorker that I read a long time ago, in June 2006. The article, written by eminent neurologist Oliver Sacks, describes the experience of an adult woman, Susan R. Barry, a professor of neurobiology herself, who had been stereoblind her entire life, and suddenly regained stereoscopic vision after intensive visual training at the age of 48. While the full original article, titled “Stereo Sue,” is behind the New Yorker’s pay wall, I just found an awesome YouTube video of a joint interview with Drs. Barry and Sacks:

Intel’s “perceptual computing” initiative

I went to the Sacramento Hacker Lab last night, to see a presentation by Intel about their soon-to-be-released “perceptual computing” hardware and software. Basically, this is Intel’s answer to the Kinect: a combined color and depth camera with noise- and echo-cancelling microphones, and an integrated SDK giving access to derived head tracking, finger tracking, and voice recording data.

Figure 1: What perceptual computing might look like at some point in the future, according to the overactive imaginations of Intel marketing people. Original image name: “Security Force Field.jpg” Oh, sure.

How head tracking makes holographic displays

I’ve talked about “holographic displays” a lot, most recently in my analysis of the upcoming zSpace display. What I haven’t talked about is how exactly such holographic displays work, what makes them “holographic” as opposed to just stereoscopic, and why that is a big deal.

Teaser: A user interacting with a virtual object inside a holographic display.

GPU performance: Nvidia Quadro vs Nvidia GeForce

One of the mysteries of the modern age is the existence of two distinct lines of graphics cards by the two big manufacturers, Nvidia and ATI/AMD. There are gamer-level cards, and professional-level cards. What are their differences? Obviously, gamer-level cards are cheap, because the companies face stiff competition from each other, and want to sell as many of them as possible to make a profit. So, why are professional-level cards so much more expensive? For comparison, an “entry-level” $700 Quadro 4000 is significantly slower than a$530 high-end GeForce GTX 680, at least according to my measurements using several Vrui applications, and the closest performance-equivalent to a GeForce GTX 680 I could find was a Quadro 6000 for a whopping \$3660. Granted, the Quadro 6000 has 6GB of video RAM to the GeForce’s 2GB, but that doesn’t explain the difference.