I have to make a confession: I’ve been playing with the Oculus Rift HMD for almost a year now, and have been supporting it in Vrui for a long time as well, but I haven’t really spent much time using it in earnest. I’m keenly aware of the importance of calibrating head-mounted displays, of course, and noticed right away that the scale of virtual objects seen through the Rift was way off for me, but I never got around to doing anything about it. Until now, that is.
The primary reason for my negligence was that I didn’t know enough about the internal details of the Rift to completely understand how to account for different viewer parameters, i.e., mainly the positions of the viewer’s eyes in front of the screens/lenses. Vrui uses the viewer/screen camera model, which makes it very easy to calibrate a wide range of VR display environments because calibration is based entirely on directly measurable parameters: the size and position of screens, and the position of the viewer’s eyes relative to those screens, in real-world measurement units.
The Rift’s firmware exposes the basic required parameters: screen size and position, and distance from the viewer’s pupils to the screen. Screen size is reported as 149.76mm x 93.6mm for both halves, and because there’s only one physical screen, the left half starts at -74.88mm along the X axis, and the right half at 0.0mm, when using a convenient HMD-centered coordinate system. Both screens start at -46.8mm along the Z axis (I like Z going up), and the viewer’s pupils are 49.8mm from the screen. The Rift SDK assumes that the viewer’s face is symmetrical, i.e., that the left pupil is at -ipd/2 and the right at +ipd/2 along X where ipd is the viewer’s inter-pupillary distance in mm, and both are at 0.0mm along Z (by the way, Vrui is more flexible than that and handles asymmetric conditions without extra effort).
That would be all that’s needed to set up a perfectly calibrated display, if it weren’t for those darned lenses. The lenses’ image distortion, and their larger effects on the Rift’s viewing parameters, are treated as black magic by the Rift SDK and its documentation (quote: “Here, lens separation is used instead of the eye separation due to
the properties of the collimated light” — sure, that explains it). The Rift’s firmware only reports the horizontal distance between the lenses’ centers and the coefficients for the post-rendering lens distortion correction shaders. It doesn’t report the lenses’ distance from the screen, or anything about the lenses’ optical properties.
So when I tried doing what I normally do and configured the eye positions according to my own IPD, things got worse — virtual objects appeared even more distorted than when using the SDK-default eye separation (and fixed lens separation) of 64mm. That’s when I decided to shelve the issue for later, and then never got around to un-shelving it.
At least, until we started buying additional Rifts for an undisclosed project we’re cooking, and most test users reported issues with scale and motion sickness. We can’t have that, so deeper exploration was in order. Since the SDK documentation wasn’t helpful, and the Googles didn’t turn up anything useful either, I figured I’d have to write a small “HMD simulator” to finally grok what’s really going on, and enable proper calibration when running Vrui on the Rift. Here’s the result:
As this video is rather long (my longest, actually, at 21:50m), here’s the executive summary: lens distortion affects the viewer/screen model in rather counter-intuitive ways, and while there is a simple approximation to get things more or less right even for viewers with IPDs other than 64mm, doing it properly would require precise knowledge of the Rift’s lenses, i.e., their geometric shape and the index of refraction of their material. Oh, and it would require eye tracking as well, but I knew that going in. The good news is that I now know how to create the proper approximating calibration, and that doing it on a per-user basis is straightforward. Turns out the viewer/screen model works even with lenses involved.
Update: Following a suggestion by reader TiagoTiago, see this follow-up post for an improved approximation to correct 3D rendering under lack of eye tracking.
For future reference, let me set a few things straight that really threw me off when trying to parse the Oculus SDK documentation. The main document (Oculus VR SDK Overview, version 0.2.5) is almost obfuscating in its description, and if it had done a better job, it would have saved me a lot of time. So let’s complain a little.
At the top of page 23 (section 5.5.1), the doc says that “unlike stereo TVs, rendering inside of the Rift does not require off-axis or asymmetric projection,” which is then followed by two pages deriving — guess what — an off-axis or asymmetric projection matrix. Yeah. Multiplying an on-axis projection matrix (P in the doc) from the left with a translation matrix (H in the doc) results in an off-axis projection matrix. If you don’t believe me, calculate matrix P’ = HP yourself, invert it, and multiply the eight corners of the normalized device coordinate view volume, (±1, ±1, ±1, 1), with that matrix. The eight transformed vertices will indeed form a skewed truncated pyramid in eye space. What I don’t understand is why the doc makes a big deal out of the projection’s on- or off-axisness in the first place.
Edit: I just found out I can directly embed code into my posts! OMG!
While I’m talking about matrix P, the doc derives it through a field-of-view angle, following the convention used by the canonical camera model. First the doc calculates , but in the derivation of P,
is only used in the form of
, which conveniently cancels out to
.
P also uses a, the screens’ aspect ratio, which the doc calculates as , which is conceptually wrong. Aspect ratio should be calculated as
, i.e., in physical units like all other parameters. This makes sense, because HScreenSize/2 and VScreenSize are precisely the width and height of the screen half to which the projection matrix applies. What’s the difference, you ask? The result is the same, so much is true. But it’s only the same because the Rift’s LCD uses square pixels. Think about it: if the Rift’s screen had twice the horizontal resolution, but the same physical size, i.e., 1:2 rectangular pixels, the aspect ratio between horizontal and vertical field-of-view would still be 0.8, and not 1.6. I know LCDs typically have square pixels, but see below why having done it right would have simplifed things.
In toto, P is presented in the doc as
where and
.
Had the doc calculated aspect ratio correctly, based on physical sizes (see above), and not insisted on using a field-of-view angle in the first place, it could have derived P as
Wouldn’t that have been easier? (Update: If the above doesn’t look like a normal OpenGL projection matrix to you, then that’s because it isn’t one. Please see my correction post on the matter.)
To be honest, the above complaints are mere nitpicks. What really caused problems for me was the calculation of the translation (or, rather, skew) matrix H. Based on the Rift’s internal layout, and the rules of 3D perception (and the viewer/screen model), the displacement value h should have been based on the viewer’s eye distance, not the Rift’s lens distance (this is also where the doc waves its hands and refers to the special “properties of the collimated light”).
Since this setup of H deviates from physical reality, and lens distortion correction is a black box, this is where I gave up on first reading. After having run the simulation, it is now perfectly clear: the truth is that using lens distance for h is not supposed to make sense — it’s a performance optimization. What’s really happening here is that a component of lens distortion correction is bleeding into projection set-up. Looking through a lens from an off-axis position introduces a lateral shift, which should, in principle, be corrected by the post-rendering lens distortion correction shader. But since shift is a linear operation, it can be factored out and be put into the projection matrix, where it incurs no extra cost while saving an addition in the per-pixel shader. So using lens distance for h in matrix H is a composite of using eye distance during projection, and the difference of eye distance and lens distance during distortion correction. That’s all perfectly fine, but optimization has no place in explanation — or as the master said, “premature optimization is the root of all evil.”
I don’t want to get into the derivation of lens distortion correction and field-of-view (section 5.5.4 in the doc), where makes an appearance again, besides saying that using physical sizes instead of field-of-view would have made this simpler as well.
So I guess what I’m really saying is that explaining the Rift’s projection setup using the viewer/screen model would have been considerably better than bending over backwards and framing it in terms of the canonical camera model. But I think that’s what they call fighting a lost battle.
(continuing the discussion from http://doc-ok.org/?p=730#comment-1248 )
What i mean is, since any ray entering the pupil perfectly straight is always gonna be passing thru the center of the eyeball, couldn’t you project the environment on an imaginary sphere sharing the same center as the eyeball, and calculate the distortions that way? I guess my original question could be rephrased as: What would change if you just consider all directions as being stared at straight on, instead of calculating the peripheral vision for each direction?
That clarifies it. It’s true, when the eye focuses on a 3D point, the ray from that point enters the pupil on-axis, and in that sense a 3D projection onto an imaginary sphere could simulate that. But the problem is that at any time, the eye can only point in a single direction, and at that time, all rays from other points on the imaginary sphere would enter the pupil at a different angle than prescribed by the model, and the instantaneous impression on the retina would be distorted. I would have to write a new simulator to properly explain it, but the world would still wobble like Jell-O as the eye flits around. while the effect would be different in detail from what’s happening now, it would be just as wrong.
Would the distortions be as noticeable if the correction targeted the straight-on rays for all eyeball positions, instead of correcting the peripheral vision for an static eyeball?
OK, I think what you’re describing is the equivalent of placing the virtual camera position at the center of each eyeball, and leaving everything else the same. I just tried that with the simulator I’m using in the video, and the effect would be as you describe: as the viewer foveates on any point in the 3D scene, that point is perceived at the proper position and with locally correct size, but everything else appears distorted, and the distortions change rapidly as the viewer’s eyes dart around the scene. It would be hard to do a real test in the Rift without first getting rid of all the other distortions, but I have a hunch that it would feel very weird indeed.
Weirder than seeing things distorted when not looking straight ahead?
Short answer: no.
See follow-up post.
And thank you.
Btw, is there such a thing as a digital camera that has the same size and optical properties as the human eye? Are they affordable?
If the answer to both is yes, i was thinking, perhaps you could make a rig with a few servos to turn the camera, then show some test patterns rendered in 3d in the Oculus Rift and automatically extract a model of it’s optics, or calibration parameters to counter-distort 3d scenes?
I don’t think it’s even necessary to have a camera with the same properties as the human eye. I was thinking about a calibration rig where you remove the lenses from the Rift and put them onto a printed checkerboard pattern, and place a normal camera or webcam at the position of the viewer’s left or right eye relative to the lens. Then you take a picture, and run a standard lens distortion estimation algorithm. This would result in a set of correction parameters for each point you sample, and you could pick the optimal set of parameters for a given viewer setup and apply it. If you had eye-tracking, it could even be dynamic and real-time.
Still, having a precise geometric model of the lens(es) and calculating correction parameters on-the-fly would be my preferred approach.
Wouldn’t it be better to not dissemble it, to make sure everything stays exactly in place, and to not ignore unknown details that might be present (like distance between the pixels and the lens, the angle of the screen relative to the lens etc)?
And since it’s a screen, you could have the program test the validity of the calculations immediately and automatically by applying the estimated calibration and checking if everything is where it is supposed to be.
Fortunately, the Rift’s lenses are meant to be exchangeable. But you’re right, the removable lens holders don’t sit directly on the screen, so precise lens-screen distance remains unknown. Screen/lens angle is 0 degrees based on design.
It sounds like an HMD really needs a dynamic lens positioner that would scan the wearer’s face, determine the eye separation, distance from the screen and any other “yaw” type variances and auto-adjust the lenses into place. An “automatic” solution would be ideal, but there is no reason to think that one couldn’t write a manual calibration tool. However, I don’t think that the current Rift lets you finely adjust the position of the lenses in order to perform such a calibration.
Just knowing the viewer’s pupil positions relative to the lenses in real time would be the major step. Being able to adjust the lens separation to exactly match the viewer’s eye separation would be the icing on the cake.
Pingback: A Follow-up on Eye Tracking | Doc-Ok.org
Pingback: Small Correction to Rift’s Projection Matrix | Doc-Ok.org
Pingback: VR Movies | Doc-Ok.org
Pingback: How to Measure your IPD | Doc-Ok.org
Pingback: Someone at Oculus is Reading my Blog | Doc-Ok.org
Pingback: Will the Oculus Rift make you sick? | Doc-Ok.org
Pingback: On the road for VR: Silicon Valley Virtual Reality Conference & Expo | Doc-Ok.org
Pingback: An Eye-tracked Oculus Rift | Doc-Ok.org
Pingback: On the road for VR: Oculus Connect, Hollywood | Doc-Ok.org