Intrinsic camera calibration, as I explained in a previous post, calculates the projection parameters of a single Kinect camera. This is sufficient to reconstruct color-mapped 3D geometry in a precise physical coordinate system from a single Kinect device. Specifically, after intrinsic calibration, the Kinect reconstructs geometry in camera-fixed Cartesian space. This means that, looking along the Kinect’s viewing direction, the X axis points to the right, the Y axis points up, and the negative Z axis points along the viewing direction (see Figure 1). The measurement unit for this coordinate system is centimeters.
Extrinsic camera calibration, on the other hand, calculates a transformation from camera-fixed Cartesian space to an arbitrary other 3D coordinate system that is shared by multiple Kinect cameras, and a 3D graphics application. In other words, extrinsic calibration is required to use multiple Kinect cameras capturing the same space from different viewpoints, or to combine the 3D reconstructions of one or more Kinect cameras with other 3D geometry. In yet other words, extrinsic calibration calculates the position and orientation of any number of Kinect cameras in a shared space.
As an example, consider a setup where three Kinect cameras are set up in a triangle looking inwards, capturing a single 3D object (see Figure 2). Without extrinsic calibration, the partial reconstructions done by each Kinect, the so-called “facades,” won’t align properly because the Kinects don’t know their own positions and orientations in space. Only after extrinsic calibration can a 3D object captured by multiple Kinects be reconstructed properly.
Extrinsic calibration procedure
Update: the procedure described in the rest of this article is obsolete. I now have a much better procedure, and will release a new Kinect software package containing it as soon as possible, and create a tutorial video and article.
The actual procedure for extrinsic calibration is similar to the one for intrinsic calibration; in fact, it uses the same semi-transparent calibration target, and is performed using RawKinectViewer.
The first step, obviously, is to securely mount all Kinect cameras in the capture space. It is important that the cameras do not move at all after calibration; even the tiniest shift or, worse, rotation, will destroy the camera alignment. There is no recommended camera setup; it depends entirely on the number of cameras to be used, and the size and layout of the intended capture space.
The next step is to place the calibration target in the capture space such that its front side can be seen by all Kinect cameras that need to be aligned. There is currently no procedure for setup where there is no single location that can be observed by all cameras; while such setups are possible, the procedure becomes a lot more complicated, and is not yet implemented. For example, in a setup like the one in Figure 2, it will be necessary to place the calibration target horizontally, in the center of the capture space, below the heights of the cameras.
After placing the calibration target, it is necessary to define the shared coordinate system in which all cameras will work. In the most common setup, this coordinate system can be chosen completely arbitrary. The simplest approach is to align the coordinate system with the calibration target itself, by placing the origin in the target’s lower-left corner, and having the X and Y axes aligned with the target’s horizontal and vertical grid lines, respectively. In practice, the coordinate system is defined by entering the (x, y, z) locations of the calibration target’s interior grid corners into a text file, in left-to-right, bottom-to-top order. For example, my own calibration target has 7 x 5 grid tiles, meaning it has 6 x 4 interior grid corners. Each tile is 3.5″ x 3.5″, so the coordinate file for my target (which I arbitrarily call “TargetPoints.csv”) will be:
0,0,0 3.5,0,0 7,0,0 10.5,0,0 14,0,0 17.5,0,0 ... points for y=3.5 and y=7 omitted ... 0,10.5,0 3.5,10.5,0 7,10.5,0 10.5,10.5,0 14,10.5,0 17.5,10.5,0
In this case, because the point positions are measured in inches, the measurement unit of the resulting shared 3D space will be in inches, as well. If the points were measured in any other unit, say centimeters by replacing, e.g., (7,10.5,0) with (17.78,26.67,0), then the resulting space would use centimeters.
Another common situation is where the 3D video from the Kinects must line up with other 3D geometry, such as 3D tracking or motion capture results. In this case, the shared coordinate system is defined by the other 3D geometry, as in the coordinate space used by the motion capture system. There still needs to be a file containing the positions of the target’s interior grid corners, but now the positions are not assigned arbitrarily, but measured using the external coordinate space. For example, when the 3D video has to line up with 3D tracking results, one would touch each interior grid point with a tracking marker in turn, and record the reported 3D positions. This is left as an exercise for the reader.
In any way, after defining a shared 3D coordinate space by creating a file with the 3D positions of the target’s interior grid corners, each Kinect camera is mapped into that space in turn, by using RawKinectViewer and a grid drawing tool, just as explained in the Intrinsic Kinect Camera Calibration with Semi-transparent Grid video. Unlike in intrinsic calibration, the color image stream can be ignored; the grid only needs to be aligned in the depth image stream. After the grid is manually aligned — and special care has to be taken to ensure that the lower-left corner of the virtual grid coincides with the lower-left corner of the calibration target — an average depth frame is requested, and the grid stored as usual by pressing “2.” Then, instead of running an intrinsic calibration by pressing “4,” one unprojects the just-captured grid by pressing “5.” This will calculate and print the 3D positions of the interior grid corners of the virtual grid, based on the Kinect’s depth image. In other words, this will produce the same sequence of 3D grid positions that were created in the previous step, but now in the Kinect’s own camera-fixed coordinate space. The printed 3D positions have to be pasted into another coordinate file, ideally named by the serial number of the Kinect camera that gathered them. This step is repeated for all Kinect cameras, while being very careful not to move the calibration target. For example, here are the 3D positions for my calibration target, printed by RawKinectViewer. Because my Kinect has serial number B00364705077104B, I put them into a KinectPoints-B00364705077104B.csv file:
-17.202, -4.32324, -87.2923 -8.81969, -5.49674, -84.7768 -0.448412, -6.66869, -82.2645 7.91183, -7.83909, -79.7556 16.2611, -9.00795, -77.2499 24.5993, -10.1753, -74.7476 -16.8564, 4.14695, -84.8005 -8.45365, 2.96496, -82.2804 -0.0620604, 1.78454, -79.7636 8.31845, 0.605681, -77.2502 16.6879, -0.571623, -74.7401 25.0463, -1.74737, -72.2332 -16.509, 12.6589, -82.2964 -8.08581, 11.4684, -79.7718 0.326193, 10.2794, -77.2505 8.72706, 9.092, -74.7325 17.1168, 7.90619, -72.2179 25.4955, 6.72194, -69.7066 -16.1599, 21.2129, -79.7799 -7.71616, 20.0137, -77.2507 0.716363, 18.8162, -74.7249 9.13769, 17.6202, -72.2024 17.5478, 16.4258, -69.6832 25.9468, 15.233, -67.1674
Note that these measurements are in centimeters, because that’s the measurement unit of the Kinect’s camera-fixed coordinate space. That’s fine; a proper scaling factor is automatically calculated during the next calibration step.
The result of the previous step is a set of coordinate files, one for each Kinect camera, and one for the calibration target. The last step is to calculate the position, orientation, and scaling factor for each Kinect by finding the best possible alignment between each Kinect’s reconstructed position file, and the calibration target’s position file. The Kinect package contains a utility for that purpose, AlignGridPoints:
AlignPoints -OG KinectPoints-B00364705077104B.csv TargetPoints.csv
The -OG parameter instructs AlignPoints to calculate position, orientation, and a scaling factor for the Kinect all at once. Immediately after startup, the program will print the resulting calibration transformation, as in
Best transformation: translate (16.126455338041314, 11.7998499984038361, 29.0552340903434185) * rotate (-0.656249274504848379, 0.722081639485140925, 0.218940621243091993), 24.3572161500956561 * scale 0.395616359808937446
This is the actual output from running the two position files above. Parsing the result, the translate (…) component indicates that my Kinect was placed 29″ back from the calibration target, and 16″ to the right and 12″ above its lower-left corner. The rotate (…),… component indicates that its viewing direction was rotated by 24.3° relative to the target’s Z axis, and the 0.396 scale factor is almost exactly the conversion factor from the centimeters used by the Kinect to the calibration target’s inches.
This best transformation, or, more precisely, everything after “Best transformation:”, now needs to be copied into a file called ExtrinsicParameters-<serial number>.txt in the Kinect package’s configuration directory, which in the default installation is ~/Vrui-2.6/etc/Kinect-2.4, where <serial number> is the serial number of the Kinect camera that captured the points. In my case, the file name will be ExtrinsicParameter-B00364705077104B.txt.
After AlignPoints has been run on all 3D position files, and all ExtrinsicParameters-… files have been created, calibration is done. The next time KinectViewer is started for any of the calibrated cameras, it will automatically pick up the extrinsic calibration file and map the camera into the shared 3D space. If more than one camera are given on KinectViewer’s command line, their individual facade reconstructions should now line up as expected.
If this single-target calibration step does not lead to a good overall alignment, one can either redo extrinsic calibration until things improve, or use more than one target position for calibration. This is done by running calibration once, as described above, and then re-positioning the target. The target’s new interior grid positions are then appended to the target position file (meaning the file now has twice as many positions), and the reconstructed positions from RawKinectViewer are appended to their respective files, as well. Then AlignPoints is run as previously, but it will now use twice as many positions to calculate an alignment, which will result in better overall alignment if the target’s new position are correctly measured. In practice, this requires an external 3D tracking system to measure the actual target positions, or a very careful placement of the target.
A good example of two independent multi-Kinect capture setups, each extrinsically calibrated to its own 3D tracking system, can be seen in my Collaborative Visualization of Microbialites video (see Figure 3). In that video, two Kinect cameras are placed in our CAVE and calibrated with the CAVE’s 3D tracking system, and two more Kinects are placed in front of my low-cost 3D TV VR environment and are calibrated with that environment’s optical tracking system. Vrui‘s tele-collaboration infrastructure then maps both capture systems into the same shared 3D space to allow two users to jointly explore a 3D data set, even though they are not in the same place physically.