Boy, is my face red. I just uploaded two videos about intrinsic Kinect calibration to YouTube, and wrote two blog posts about intrinsic and extrinsic calibration, respectively, and now I find out that the factory calibration data I’ve always suspected was stored in the Kinect’s non-volatile RAM has actually been reverse-engineered. With the official Microsoft SDK out that should definitely not have been a surprise. Oh, well, my excuse is I’ve been focusing on other things lately.
So, how good is it? A bit too early to tell, because some bits and pieces are still not understood, but here’s what I know already. As I mentioned in the post on intrinsic calibration, there are several required pieces of calibration data:
- 2D lens distortion correction for the color camera.
- 2D lens distortion correction for the virtual depth camera.
- Non-linear depth correction (caused by IR camera lens distortion) for the virtual depth camera.
- Conversion formula from (depth-corrected) raw disparity values (what’s in the Kinect’s depth frames) to camera-space Z values.
- Unprojection matrix for the virtual depth camera, to map depth pixels out into camera-aligned 3D space.
- Projection matrix to map lens-corrected color pixels onto the unprojected depth image.
Of these components, number 4 was the easiest to find. The calibration data contains measurements of the Kinect’s camera layout, and those are all that’s needed to convert from raw disparity to Z value. I grabbed those data from my current Kinect, and they are similar to the same values I created using my own calibration procedure. There are significant differences, but at least I’m certain that I’m extracting and parsing the calibration parameters correctly (see below for some analysis).
Number 5 is a mite fuzzier. Those same layout measurements also imply an unprojection matrix, under the assumption that the virtual depth camera’s pixel grid is exactly upright and square. Since that’s usually not the case, there might be some undiscovered parameters describing pixel skew and aspect ratio. But the unprojection matrix I created from the known values was very close to the one I created myself, minus a small skew component and a tiny deviation in aspect ratio.
There is no sign of number 3 anywhere in the decoded data. The Kinect’s depth camera has very obvious non-linear depth distortion, especially in the corners of the depth image, and correcting for that is a very simple procedure. In the calibration data, the correction would most probably be represented by two sets of coefficients for bivariate polynomials, but I didn’t see any evidence of that.
When someone asked me about factory calibration in the comments on the calibration video, I guessed that the focus would be primarily on mapping the color image onto the depth image, and less on accurately mapping the depth image back out into 3D space. While that’s the most important point for me, it’s not useful for the Kinect’s intended purpose, and that might be why it’s not part of factory calibration. Depth correction parameters might still be discovered, but until further notice I believe they’re not there.
At last, numbers 1, 2, and 6 are handled, but in a strange way, by being all mashed together. The known calibration data contain coefficients for a third-degree bivariate polynomial that slightly perturbs depth pixel locations. On top of that there is a horizontal pixel shift depending on a depth pixel’s camera-space Z value. Taken together, these parameters can map depth pixels to color pixels non-linearly, accounting for 2D distortion in both cameras in one go. Essentially, plugging a depth pixel and its Z value into the formulas results in a color pixel; in other words, this is a mapping from depth image space into color image space (providing scattered depth measurements in the color image) rather than the more traditional texture mapping from color image space into depth image space. It makes sense for the Kinect’s intended purpose (finding the Z position of objects that have been identified by their color), but it’s backwards for texture-mapping 3D geometry.
That said, the horizontal pixel shift operator assumes that the pixel grids of the color camera and the virtual depth camera are exactly aligned. Because polynomial distortion is applied to depth pixels before the Z-based shift is applied, there is no way to account for rotated pixel grids. That’s definitely not the greatest way of handling it.
Now’s the time to do some experiments with the factory calibration. Fortunately, I have a calibration target with very precisely known dimensions (the semi-transparent checkerboard). After adding a factory calibration parser to the Kinect code, and comparing both my own and the factory calibration on the target, here are the results. The target is exactly flat, 17.5″ wide, and 10.5″ tall. Using either calibration data, I measured all four sides of the target using KinectViewer.
I should mention that my own calibration is a quick&dirty one using only four tie points (it’s exactly the one I created in the calibration tutorial video; I forgot to back up the “real” calibration beforehand — d’oh!). Still, it is already significantly better than the factory one. Both calibrations are skewed towards the lower-right corner of the image (the right and bottom edges are oversized), but the overall RMS error of mine is 0.511%, and the factory’s is 2.190%, more than a factor of 4 worse. This, of course, is assuming that there is no additional depth-related “secret sauce” in the factory calibration data.
Is custom calibration worth the extra effort? That’s for everyone to decide for themselves, but after looking at these numbers, I don’t feel so bad for having spent the calibration time for each of my Kinects. I should also mention that my calibration does not contain depth correction, because that wasn’t part of the tutorial video. My software already supports it, and it should give even better overall results.
Update: I re-did the quick&dirty calibration used in the table above with non-linear depth correction and 12 tie points (to see how far it can be pushed), and the resulting RMS error for the same target, over a range of distances from the Kinect, is 0.188%. That’s not too shabby; a 2m object would be measured with an error of ±3.76mm (with factory calibration, the error would be ±43.8mm).
I haven’t tested color calibration yet because I still have to convert the factory calibration parameters to my software’s way of doing things. But I expect that factory color calibration will be better than mine, because it includes non-linear lens correction, and is the main intended application for the Kinect.
Update: I finished color calibration, and compared factory calibration to my own. I was surprised that factory calibration was slightly worse (see Figure 1). In the factory calibration, the depth image was overall too “low”, meaning that foreground objects had hanging fringes of background color at their bottoms. In my own calibration, color and geometry were very well aligned, even though my software does not do non-linear lens correction. Fortunately, the Kinect camera’s lenses are quite good.
Yet Another Update: As of version 2.8, my Kinect package has a utility to download factory calibration data from the Kinect (KinectUtil getCalib <camera index>). However, there is one minor issue. My own calibration procedure uses centimeters as world-coordinate units (that’s not hard-coded in, but just the way how I specify the size of the calibration target on the command line), and the KinectViewer application assumes centimeters as world-space units (now that’s hard-coded in). As a result, when using factory calibration data, the Kinect’s image won’t immediately show up centered in the window when starting KinectViewer. One has to zoom out a bit first, most easily by rolling the mouse wheel up about ten clicks, but then everything works as normal.