I have to make a confession: I’ve been playing with the Oculus Rift HMD for almost a year now, and have been supporting it in Vrui for a long time as well, but I haven’t really spent much time using it in earnest. I’m keenly aware of the importance of calibrating head-mounted displays, of course, and noticed right away that the scale of virtual objects seen through the Rift was way off for me, but I never got around to doing anything about it. Until now, that is.
The primary reason for my negligence was that I didn’t know enough about the internal details of the Rift to completely understand how to account for different viewer parameters, i.e., mainly the positions of the viewer’s eyes in front of the screens/lenses. Vrui uses the viewer/screen camera model, which makes it very easy to calibrate a wide range of VR display environments because calibration is based entirely on directly measurable parameters: the size and position of screens, and the position of the viewer’s eyes relative to those screens, in real-world measurement units.
The Rift’s firmware exposes the basic required parameters: screen size and position, and distance from the viewer’s pupils to the screen. Screen size is reported as 149.76mm x 93.6mm for both halves, and because there’s only one physical screen, the left half starts at -74.88mm along the X axis, and the right half at 0.0mm, when using a convenient HMD-centered coordinate system. Both screens start at -46.8mm along the Z axis (I like Z going up), and the viewer’s pupils are 49.8mm from the screen. The Rift SDK assumes that the viewer’s face is symmetrical, i.e., that the left pupil is at -ipd/2 and the right at +ipd/2 along X where ipd is the viewer’s inter-pupillary distance in mm, and both are at 0.0mm along Z (by the way, Vrui is more flexible than that and handles asymmetric conditions without extra effort).
That would be all that’s needed to set up a perfectly calibrated display, if it weren’t for those darned lenses. The lenses’ image distortion, and their larger effects on the Rift’s viewing parameters, are treated as black magic by the Rift SDK and its documentation (quote: “Here, lens separation is used instead of the eye separation due to
the properties of the collimated light” — sure, that explains it). The Rift’s firmware only reports the horizontal distance between the lenses’ centers and the coefficients for the post-rendering lens distortion correction shaders. It doesn’t report the lenses’ distance from the screen, or anything about the lenses’ optical properties.
So when I tried doing what I normally do and configured the eye positions according to my own IPD, things got worse — virtual objects appeared even more distorted than when using the SDK-default eye separation (and fixed lens separation) of 64mm. That’s when I decided to shelve the issue for later, and then never got around to un-shelving it.
At least, until we started buying additional Rifts for an undisclosed project we’re cooking, and most test users reported issues with scale and motion sickness. We can’t have that, so deeper exploration was in order. Since the SDK documentation wasn’t helpful, and the Googles didn’t turn up anything useful either, I figured I’d have to write a small “HMD simulator” to finally grok what’s really going on, and enable proper calibration when running Vrui on the Rift. Here’s the result:
As this video is rather long (my longest, actually, at 21:50m), here’s the executive summary: lens distortion affects the viewer/screen model in rather counter-intuitive ways, and while there is a simple approximation to get things more or less right even for viewers with IPDs other than 64mm, doing it properly would require precise knowledge of the Rift’s lenses, i.e., their geometric shape and the index of refraction of their material. Oh, and it would require eye tracking as well, but I knew that going in. The good news is that I now know how to create the proper approximating calibration, and that doing it on a per-user basis is straightforward. Turns out the viewer/screen model works even with lenses involved.
Update: Following a suggestion by reader TiagoTiago, see this follow-up post for an improved approximation to correct 3D rendering under lack of eye tracking.
For future reference, let me set a few things straight that really threw me off when trying to parse the Oculus SDK documentation. The main document (Oculus VR SDK Overview, version 0.2.5) is almost obfuscating in its description, and if it had done a better job, it would have saved me a lot of time. So let’s complain a little.
At the top of page 23 (section 5.5.1), the doc says that “unlike stereo TVs, rendering inside of the Rift does not require off-axis or asymmetric projection,” which is then followed by two pages deriving — guess what — an off-axis or asymmetric projection matrix. Yeah. Multiplying an on-axis projection matrix (P in the doc) from the left with a translation matrix (H in the doc) results in an off-axis projection matrix. If you don’t believe me, calculate matrix P’ = HP yourself, invert it, and multiply the eight corners of the normalized device coordinate view volume, (±1, ±1, ±1, 1), with that matrix. The eight transformed vertices will indeed form a skewed truncated pyramid in eye space. What I don’t understand is why the doc makes a big deal out of the projection’s on- or off-axisness in the first place.
Edit: I just found out I can directly embed code into my posts! OMG!
While I’m talking about matrix P, the doc derives it through a field-of-view angle, following the convention used by the canonical camera model. First the doc calculates , but in the derivation of P, is only used in the form of , which conveniently cancels out to .
P also uses a, the screens’ aspect ratio, which the doc calculates as , which is conceptually wrong. Aspect ratio should be calculated as , i.e., in physical units like all other parameters. This makes sense, because HScreenSize/2 and VScreenSize are precisely the width and height of the screen half to which the projection matrix applies. What’s the difference, you ask? The result is the same, so much is true. But it’s only the same because the Rift’s LCD uses square pixels. Think about it: if the Rift’s screen had twice the horizontal resolution, but the same physical size, i.e., 1:2 rectangular pixels, the aspect ratio between horizontal and vertical field-of-view would still be 0.8, and not 1.6. I know LCDs typically have square pixels, but see below why having done it right would have simplifed things.
In toto, P is presented in the doc as
where and .
Had the doc calculated aspect ratio correctly, based on physical sizes (see above), and not insisted on using a field-of-view angle in the first place, it could have derived P as
Wouldn’t that have been easier? (Update: If the above doesn’t look like a normal OpenGL projection matrix to you, then that’s because it isn’t one. Please see my correction post on the matter.)
To be honest, the above complaints are mere nitpicks. What really caused problems for me was the calculation of the translation (or, rather, skew) matrix H. Based on the Rift’s internal layout, and the rules of 3D perception (and the viewer/screen model), the displacement value h should have been based on the viewer’s eye distance, not the Rift’s lens distance (this is also where the doc waves its hands and refers to the special “properties of the collimated light”).
Since this setup of H deviates from physical reality, and lens distortion correction is a black box, this is where I gave up on first reading. After having run the simulation, it is now perfectly clear: the truth is that using lens distance for h is not supposed to make sense — it’s a performance optimization. What’s really happening here is that a component of lens distortion correction is bleeding into projection set-up. Looking through a lens from an off-axis position introduces a lateral shift, which should, in principle, be corrected by the post-rendering lens distortion correction shader. But since shift is a linear operation, it can be factored out and be put into the projection matrix, where it incurs no extra cost while saving an addition in the per-pixel shader. So using lens distance for h in matrix H is a composite of using eye distance during projection, and the difference of eye distance and lens distance during distortion correction. That’s all perfectly fine, but optimization has no place in explanation — or as the master said, “premature optimization is the root of all evil.”
I don’t want to get into the derivation of lens distortion correction and field-of-view (section 5.5.4 in the doc), where makes an appearance again, besides saying that using physical sizes instead of field-of-view would have made this simpler as well.
So I guess what I’m really saying is that explaining the Rift’s projection setup using the viewer/screen model would have been considerably better than bending over backwards and framing it in terms of the canonical camera model. But I think that’s what they call fighting a lost battle.