Multi-Kinect camera calibration

Intrinsic camera calibration, as I explained in a previous post, calculates the projection parameters of a single Kinect camera. This is sufficient to reconstruct color-mapped 3D geometry in a precise physical coordinate system from a single Kinect device. Specifically, after intrinsic calibration, the Kinect reconstructs geometry in camera-fixed Cartesian space. This means that, looking along the Kinect’s viewing direction, the X axis points to the right, the Y axis points up, and the negative Z axis points along the viewing direction (see Figure 1). The measurement unit for this coordinate system is centimeters.

Figure 1: Kinect’s camera-relative coordinate system after intrinsic calibration. Looking along the viewing direction, the X axis points to the right, the Y axis points up, and the Z axis points against the viewing direction. The unit of measurement is centimeters.

Extrinsic camera calibration, on the other hand, calculates a transformation from camera-fixed Cartesian space to an arbitrary other 3D coordinate system that is shared by multiple Kinect cameras, and a 3D graphics application. In other words, extrinsic calibration is required to use multiple Kinect cameras capturing the same space from different viewpoints, or to combine the 3D reconstructions of one or more Kinect cameras with other 3D geometry. In yet other words, extrinsic calibration calculates the position and orientation of any number of Kinect cameras in a shared space.

As an example, consider a setup where three Kinect cameras are set up in a triangle looking inwards, capturing a single 3D object (see Figure 2). Without extrinsic calibration, the partial reconstructions done by each Kinect, the so-called “facades,” won’t align properly because the Kinects don’t know their own positions and orientations in space. Only after extrinsic calibration can a 3D object captured by multiple Kinects be reconstructed properly.

Figure 2: Misreconstruction when multiple Kinects capture the same 3D object without extrinsic calibration. Each Kinect will reconstruct the part of the object that it can see (the “facade”) in its own camera-fixed coordinate space, without regard for the Kinects’ actual positions and orientations (right).
After extrinsic calibration, the individual facades line up and reconstruct the real 3D object (left).

Extrinsic calibration procedure

Update: the procedure described in the rest of this article is obsolete. I now have a much better procedure, and will release a new Kinect software package containing it as soon as possible, and create a tutorial video and article.

The actual procedure for extrinsic calibration is similar to the one for intrinsic calibration; in fact, it uses the same semi-transparent calibration target, and is performed using RawKinectViewer.

The first step, obviously, is to securely mount all Kinect cameras in the capture space. It is important that the cameras do not move at all after calibration; even the tiniest shift or, worse, rotation, will destroy the camera alignment. There is no recommended camera setup; it depends entirely on the number of cameras to be used, and the size and layout of the intended capture space.

The next step is to place the calibration target in the capture space such that its front side can be seen by all Kinect cameras that need to be aligned. There is currently no procedure for setup where there is no single location that can be observed by all cameras; while such setups are possible, the procedure becomes a lot more complicated, and is not yet implemented. For example, in a setup like the one in Figure 2, it will be necessary to place the calibration target horizontally, in the center of the capture space, below the heights of the cameras.

After placing the calibration target, it is necessary to define the shared coordinate system in which all cameras will work. In the most common setup, this coordinate system can be chosen completely arbitrary. The simplest approach is to align the coordinate system with the calibration target itself, by placing the origin in the target’s lower-left corner, and having the X and Y axes aligned with the target’s horizontal and vertical grid lines, respectively. In practice, the coordinate system is defined by entering the (x, y, z) locations of the calibration target’s interior grid corners into a text file, in left-to-right, bottom-to-top order. For example, my own calibration target has 7 x 5 grid tiles, meaning it has 6 x 4 interior grid corners. Each tile is 3.5″ x 3.5″, so the coordinate file for my target (which I arbitrarily call “TargetPoints.csv”) will be:

... points for y=3.5 and y=7 omitted ...

In this case, because the point positions are measured in inches, the measurement unit of the resulting shared 3D space will be in inches, as well. If the points were measured in any other unit, say centimeters by replacing, e.g., (7,10.5,0) with (17.78,26.67,0), then the resulting space would use centimeters.

Another common situation is where the 3D video from the Kinects must line up with other 3D geometry, such as 3D tracking or motion capture results. In this case, the shared coordinate system is defined by the other 3D geometry, as in the coordinate space used by the motion capture system. There still needs to be a file containing the positions of the target’s interior grid corners, but now the positions are not assigned arbitrarily, but measured using the external coordinate space. For example, when the 3D video has to line up with 3D tracking results, one would touch each interior grid point with a tracking marker in turn, and record the reported 3D positions. This is left as an exercise for the reader.

In any way, after defining a shared 3D coordinate space by creating a file with the 3D positions of the target’s interior grid corners, each Kinect camera is mapped into that space in turn, by using RawKinectViewer and a grid drawing tool, just as explained in the Intrinsic Kinect Camera Calibration with Semi-transparent Grid video. Unlike in intrinsic calibration, the color image stream can be ignored; the grid only needs to be aligned in the depth image stream. After the grid is manually aligned — and special care has to be taken to ensure that the lower-left corner of the virtual grid coincides with the lower-left corner of the calibration target — an average depth frame is requested, and the grid stored as usual by pressing “2.” Then, instead of running an intrinsic calibration by pressing “4,” one unprojects the just-captured grid by pressing “5.” This will calculate and print the 3D positions of the interior grid corners of the virtual grid, based on the Kinect’s depth image. In other words, this will produce the same sequence of 3D grid positions that were created in the previous step, but now in the Kinect’s own camera-fixed coordinate space. The printed 3D positions have to be pasted into another coordinate file, ideally named by the serial number of the Kinect camera that gathered them. This step is repeated for all Kinect cameras, while being very careful not to move the calibration target. For example, here are the 3D positions for my calibration target, printed by RawKinectViewer. Because my Kinect has serial number B00364705077104B, I put them into a KinectPoints-B00364705077104B.csv file:

-17.202, -4.32324, -87.2923
-8.81969, -5.49674, -84.7768
-0.448412, -6.66869, -82.2645
7.91183, -7.83909, -79.7556
16.2611, -9.00795, -77.2499
24.5993, -10.1753, -74.7476
-16.8564, 4.14695, -84.8005
-8.45365, 2.96496, -82.2804
-0.0620604, 1.78454, -79.7636
8.31845, 0.605681, -77.2502
16.6879, -0.571623, -74.7401
25.0463, -1.74737, -72.2332
-16.509, 12.6589, -82.2964
-8.08581, 11.4684, -79.7718
0.326193, 10.2794, -77.2505
8.72706, 9.092, -74.7325
17.1168, 7.90619, -72.2179
25.4955, 6.72194, -69.7066
-16.1599, 21.2129, -79.7799
-7.71616, 20.0137, -77.2507
0.716363, 18.8162, -74.7249
9.13769, 17.6202, -72.2024
17.5478, 16.4258, -69.6832
25.9468, 15.233, -67.1674

Note that these measurements are in centimeters, because that’s the measurement unit of the Kinect’s camera-fixed coordinate space. That’s fine; a proper scaling factor is automatically calculated during the next calibration step.

The result of the previous step is a set of coordinate files, one for each Kinect camera, and one for the calibration target. The last step is to calculate the position, orientation, and scaling factor for each Kinect by finding the best possible alignment between each Kinect’s reconstructed position file, and the calibration target’s position file. The Kinect package contains a utility for that purpose, AlignGridPoints:

AlignPoints -OG KinectPoints-B00364705077104B.csv TargetPoints.csv

The -OG parameter instructs AlignPoints to calculate position, orientation, and a scaling factor for the Kinect all at once. Immediately after startup, the program will print the resulting calibration transformation, as in

Best transformation: translate (16.126455338041314, 11.7998499984038361, 29.0552340903434185)
* rotate (-0.656249274504848379, 0.722081639485140925, 0.218940621243091993), 24.3572161500956561
* scale 0.395616359808937446

This is the actual output from running the two position files above. Parsing the result, the translate (…) component indicates that my Kinect was placed 29″ back from the calibration target, and 16″ to the right and 12″ above its lower-left corner. The rotate (…),… component indicates that its viewing direction was rotated by 24.3° relative to the target’s Z axis, and the 0.396 scale factor is almost exactly the conversion factor from the centimeters used by the Kinect to the calibration target’s inches.

This best transformation, or, more precisely, everything after “Best transformation:”, now needs to be copied into a file called ExtrinsicParameters-<serial number>.txt in the Kinect package’s configuration directory, which in the default installation is ~/Vrui-2.6/etc/Kinect-2.4, where <serial number> is the serial number of the Kinect camera that captured the points. In my case, the file name will be ExtrinsicParameter-B00364705077104B.txt.

After AlignPoints has been run on all 3D position files, and all ExtrinsicParameters-… files have been created, calibration is done. The next time KinectViewer is started for any of the calibrated cameras, it will automatically pick up the extrinsic calibration file and map the camera into the shared 3D space. If more than one camera are given on KinectViewer’s command line, their individual facade reconstructions should now line up as expected.

If this single-target calibration step does not lead to a good overall alignment, one can either redo extrinsic calibration until things improve, or use more than one target position for calibration. This is done by running calibration once, as described above, and then re-positioning the target. The target’s new interior grid positions are then appended to the target position file (meaning the file now has twice as many positions), and the reconstructed positions from RawKinectViewer are appended to their respective files, as well. Then AlignPoints is run as previously, but it will now use twice as many positions to calculate an alignment, which will result in better overall alignment if the target’s new position are correctly measured. In practice, this requires an external 3D tracking system to measure the actual target positions, or a very careful placement of the target.

A good example of two independent multi-Kinect capture setups, each extrinsically calibrated to its own 3D tracking system, can be seen in my Collaborative Visualization of Microbialites video (see Figure 3). In that video, two Kinect cameras are placed in our CAVE and calibrated with the CAVE’s 3D tracking system, and two more Kinects are placed in front of my low-cost 3D TV VR environment and are calibrated with that environment’s optical tracking system. Vrui‘s tele-collaboration infrastructure then maps both capture systems into the same shared 3D space to allow two users to jointly explore a 3D data set, even though they are not in the same place physically.

Figure 3: Screenshot from collaborative visualization video, showing two users in different locations jointly exploring a shared 3D data set. In either location, the local user is captured by two extrinsically-calibrated Kinect cameras, and the remote user is displayed as a virtual hologram. Because all involved Kinect cameras are calibrated to the same shared 3D space, physically distant users can interact just as if they were in the same space.

24 thoughts on “Multi-Kinect camera calibration

  1. Pingback: Kinect factory calibration |

  2. Pingback: Behind the scenes: “Virtual Worlds Using Head-mounted Displays” |

  3. Which version of the kinect are you using? If it’s the original do you expect much of an improvement with the new one?

  4. Hi sir,

    I want to Measure size of a box using kinect sensor.
    for example:

    suppose i have CPU box and that box placed in front of kinect sensor…and As a output i want to find all measurement of that box like width ,height and length.

    is this possiable with help of kinect sensor?

    Thanks in advance,

    • In principle yes. Doing it automatically would require computer vision approaches that can detect the object in question, and extract geometric measurements. For boxes, that would be comparatively easy. A detail problem is that the Kinect, like any camera, can only measure what it sees. So to measure all three extents of a box, three adjacent faces must be visible to the camera.

  5. Kindly can you tell me where is exactly the position of coordinate frame of kinect;where is the origin.I need it to make a relation between this coordinate frame and the coordinate frame of other camera.Can you help me if you found the answer.
    I have another question,I wonder if the camera tilted,this tilting affect on its coordinate frame.
    Thank you in advance

  6. To calibrate two kinect cameras, we place an object viewable to both the cameras. Now, you wrote that the origin of the shared Euclidean space can be lower-left corner of the object.
    I think we should have that origin at one of the cameras. In this way, we would have only one transformation (ie from one camera to another). Both camera will know their positions with respect to each other.

    • It depends on what you want to do. If all you need is to align multiple Kinects to each other, then you can pick an arbitrary one as the “origin,” so to speak, and align all the others to the picked one’s coordinate system using the method you and I describe. The AlignPoints utility in the Kinect package does exactly that, by calculating a transformation from the second point set to the first point set.

      However, if you need to align all Kinects to some global coordinate space, for example such that the floor of your capture space is mapped to the (x, y) plane such that people’s captured 3D models are upright and properly positioned, then you need to align all cameras to that global coordinate space, which is what I’m describing here because that’s my primary application.

      If you only align the Kinects to each other, then you would have to rotate your captured objects after the fact, using trial&error in, say, blender, to get them to be upright.

  7. When I get the resulting calibration transformation using AlignPoints, I tried to use it to create my own 4×4 transformation matrix, but I’m not quite sure how to do this? especially with the rotation matrix. what does the three numbers mean in the rotation part? In your article it is ” rotate (-0.656249274504848379, 0.722081639485140925, 0.218940621243091993), 24.3572161500956561 ”

    Many thanks!

    • It’s described in more detail in Vrui’s HTML documentation, specifically the configuration file reference, but the first three numbers are the rotation axis written as a vector, and the fourth number is the right-handed rotation angle around the rotation axis in degrees.

  8. Dear Okreylos

    It is said
    “Update: the procedure described in the rest of this article is obsolete. I now have a much better procedure, and will release a new Kinect software package containing it as soon as possible, and create a tutorial video and article”
    Where can I find your new multi-kinect calibration method? Dis you post it already?

    • I haven’t released that yet, unfortunately. It relies on features that would have been in Vrui-3.1-003, but that release is delayed because I decided to do a major change and make it Vrui-3.2-001. I’ll keep everyone posted.

  9. Hi Oliver,

    Your latest update says:
    “Update: the procedure described in the rest of this article is obsolete. I now have a much better procedure, and will release a new Kinect software package containing it as soon as possible, and create a tutorial video and article”

    Do you have a plans or date of publishing the procedure and articles soon ?


      • Thank you for the reponse.
        Between I am running into a problem. The extrinsic calibration is working well with two kinect camera using the steps described above but when I connect the third kinect camera and run command ./bin/KinectUtil list, I get this error:

        “Kinect 0: Kinect for Xbox 1414 , USB address 003:017, device serial number A00361A01660112A
        Kinect 1: Kinect for Xbox 1414 , USB address 002:011, device serial number B00365207720044B
        Kinect 2: Kinect for Xbox 1414 , USB address 002:008, device serial number B00362606196044B
        Warning: USB bus 2 is shared by Kinect devices 1 and 1.
        This will not work unless bus 2 is a super-speed USB 3 bus.”

        And When I try to run ./bin/KinectViewer -c 0 -c 1 -c 2
        I get this:
        Caught exception USB::Device::claimInterface: Interface 0 does not exist

        Could you please help me with this problem?

        • Sometimes the Kinects flake out. You can usually get them back by resetting the ones that misbehave via KinectUtil reset <index> or KinectUtil reset all, but once in a while you might have to unplug/replug one.

          The larger issue is bus contention. You can only have one Kinect per USB 2 bus, due to bandwidth limitations. As you have two on USB bus 002, you have to ensure that it’s a USB 3 bus (you can use lsusb to check for a “3.0 root hub”).