I finally managed to upload a pair of tutorial videos showing how to use the new grid-based intrinsic calibration procedure for the Kinect camera. The procedure made it into the Kinect package at least 1.5 years ago, but somehow I never found the time to explain it properly. Oh well. Here are the videos: Intrinsic Kinect Camera Calibration with Semi-transparent Grid and Intrinsic Kinect Camera Calibration Check.
Unlike the initial calibration method, which used a simple prop, this one requires a rather complex calibration target (on the upside, this one actually does calculate, and not just guesstimate, reprojection parameters, so it leads to physically accurate 3D reconstruction). To wit, it needs a semi-transparent checkerboard (see Figure 1). Checkerboards are standard camera calibration props, so that’s a no-brainer, but why semi-transparent? The answer is simple: A regular black-and-white checkerboard looks like a checkerboard to a color camera, but it looks like a simple rectangle to the Kinect’s depth camera, because that one is entirely colorblind. A semi-transparent checkerboard, on the other hand, will look like a checkerboard to both cameras. This enables a procedure where both cameras can be calibrated at the same time, and with respect to each other. The latter is very important, because without it, the resulting 3D reconstructions would have mismatches between 3D geometry and color texture.
Building a precise semi-transparent checkerboard is not simple. Here’s one approach that worked really well for me: First, print a large grid (with very thin grid lines) onto a large piece of paper, for example using a large-format printer or plotter. My target has 7 by 5 grid tiles (both need to be odd numbers), and each grid tile is 3.5″ x 3.5″. So the overall grid size is 24.5″ x 17.5″. Here is a PDF file to print this grid; many copy shops have large-format printers. The grid should not be smaller, because otherwise it would be too small to reliable calibrate the Kinect for larger distances, and it should not be much larger, because it would not fit into the Kinect’s field-of-view up close. My target more or less fills the Kinect’s field-of-view at the Kinect’s minimum viewing distance, but is large enough to be used up to about 2m away (which is the practical upper range on the Kinect’s depth perception, anyway).
Then, buy a sufficiently large piece of plate glass (about 35″ x 28″ to leave around 5″ of border around the grid), and glue the entire printed grid to the glass plate so that it is roughly centered.
Then, use a long metal ruler and a very sharp knife to cut along all grid lines, horizontally and vertically. Ensure that all grid tiles are cleanly separated from each other.
Finally, carefully peel off all odd grid tiles, ensuring to leave all corner tiles in place, and carefully remove any glue residue from the now transparent grid tiles.
The result should be a very precise regular grid. I found that the alternative process, cutting out individual 3.5″ x 3.5″ grid tiles and glueing them to the glass individually, is not nearly as precise. The tiles won’t line up properly, and the resulting grid will not be exactly rectangular. It also takes a lot longer.
Then, once you have the calibration target, follow the instructions in the first video to capture a large enough number of calibration tie points. Update: As of version 2.8 of the Kinect package, there is a slight change in the procedure to bind the grid drawing tool, and this change is not reflected in Kinect 2.8’s README file due to a packaging oversight. Instead of using buttons “1”-“5” when binding the “Draw Grids” tool, use buttons “1”, “W”, “2”, “3”, “4”, and “5”, in that order. Afterwards, calibration proceeds as shown in the video.
I recommend taking tie points from at least four different distances, starting at the closest distance where the Kinect can reliably see the grid, and working up to the maximum working distance at which you intend to use the Kinect. For each of the distances, capture one tie point head on, and two tie points from increasing angles, if possible going up to an almost grazing angle (in practice, say around 60°). Take one view from the left, and one (with a different angle) from the right. This results in twelve calibration tie points (four distances from three angles each). The at-an-angle poses are very important to establish constraints on the depth conversion formula that converts raw depth values reported by the Kinect into real 3D distances.
Afterwards, check the new calibration by following the instructions in the second video. Visually, the color image and 3D geometry should line up well. Specifically, check the edges of the grid tiles for alignment. Then measure the reconstructed size of the target, and compare it to the known real size. If the difference is larger than acceptable (about 1mm for an up-close target is achievable), redo the calibration with slightly different poses. The Kinect’s depth images are very blotchy due to the interpolation method that converts the raw, scattered depth measurements into a continuous depth image, and this blotchiness makes it difficult to align the observed and virtual grids (and prohibits an automatic matching method in the first place). Practice makes perfect.
One thing to keep in mind is that the current calibration procedure does not account for lens distortion. In the Kinect, lens distortion has three effects:
- Radial distortion in the color camera. This is lens distortion as it’s normally understood. It’s effects are that the manually-drawn virtual grids don’t exactly match the observed grid.
- Radial distortion in the depth camera. The depth camera is a virtual camera; it is the result of displacement matching on the IR projector and real IR camera, both of which have their own lens distortions. The depth camera is already rectified, but there is some subtle distortion left over.
- Depth distortion in the depth camera. The secondary effect of lens distortion in the real IR camera is radially increasing depth distortion. This manifests in the 3D reconstruction of a flat surface looking a little like a dinner plate.
The first effect has well-known remedies and calibration procedures, but the Kinect software doesn’t contain them yet. The second and third effects are specific to the Kinect, and there are no tried and true correction methods. I have an experimental depth distortion correction procedure, but it’s not fully integrated into all Kinect applications, and the depth radial distortion is so subtle that it is drowned out by the depth image’s blotchiness — it is measurable in the final 3D reconstruction, but I’m not sure how to measure it for calibration purposes.
The bottom line is that intrinsic calibration is not yet reliable at the edges of the depth image, and therefore the grid target should be lined up so that it is roughly centered in the depth camera’s frame for each of the calibration tie points.