3D Camera Calibration for Mixed-Reality Recording

Mixed-reality recording, i.e., capturing a user inside of and interacting with a virtual 3D environment by embedding their real body into that virtual environment, has finally become the accepted method of demonstrating virtual reality applications through standard 2D video footage (see Figure 1 for a mixed-reality recording made in VR’s stone age). The fundamental method behind this recording technique is to create a virtual camera whose intrinsic parameters (focal length, lens distortion, …) and extrinsic parameters (position and orientation in space) exactly match those of the real camera used to film the user; to capture a virtual video stream from that virtual camera; and then to composite the virtual and real streams into a final video.

Figure 1: Ancient mixed-reality recording from inside a CAVE, captured directly on a standard video camera without any post-processing.

In most cases, including the video in Figure 1, the real camera employed to film the user is a regular 2D video camera, but there are certain benefits to using a 3D camera such as Microsoft’s Kinect for mixed-reality recording. Most importantly, a crucial step in compositing real and virtual video is to remove the user’s real surroundings from the recorded footage. This typically requires a carefully-prepared environment, with a large blue or green screen and flat uniform lighting for subsequent chroma-keying. With a 3D camera, which records depth information for (almost) every image pixel, chroma-keying is unnecessary. One can simply remove all pixels whose depth exceeds a pre-defined threshold, or capture a background image of the environment sans user, and then, during recording, remove all pixels whose depth values match or exceed those of the background image. This allows recording in an unprepared environment with normal lighting, which is much easier for non-professional applications.

A second benefit of using a 3D camera is that the recorded footage is not just a flat video stream, but a texture-mapped piece of dynamic 3D geometry at 1:1 scale. This enables embedding the real user into a virtual 3D scene as just another 3D object, directly inside the VR rendering engine used to display that scene. As a result, a user’s body can occlude and be occluded by virtual objects as if they were real, and, if the VR engine supports it, can even be affected by environmental effects such as lighting, transparency, or (volumetric) fog. Last but not least, all recording and compositing can be done inside the VR rendering pipeline as well, meaning post-processing is no longer required, and final video can even be captured and potentially uploaded in real time.

3D Camera Calibration

Regardless of whether a 2D or 3D camera is used, the main task in setting up mixed-reality recording is to calculate the real camera’s intrinsic and extrinsic parameters, so that the real and virtual cameras can be matched to each other with high accuracy. Mismatches between the cameras will lead to noticeable and distracting misalignments in the compositing result, which is especially problematic when trying to record fine-scale or precise interactions such as the user pressing a virtual button, interacting with a virtual piece of machinery, or holding a virtual object in their hand.

Figure 2: The calibration target, a CD or DVD covered in white paper on both sides (to avoid interfering with the HTC Vive’s Lighthouse tracking system). The cardboard sleeve on the back is to attach to a Vive controller (see Figure 3).

Fortunately, the Kinect 3D Video package has a built-in extrinsic calibration utility, ExtrinsicCalibrator, that makes this process straightforward, both for recordings where the camera is static in the real environment, and for those where it has a 6-DOF tracker attached to it and can be moved freely through the environment. The basic idea is to use a calibration target object of known shape, in this case a CD or DVD covered with white paper on both sides (see Figure 2), and to capture 3D images of the target at multiple positions and orientations inside the desired capture space. In the case of a static 3D camera, the target is attached to a 6-DOF tracker (see Figure 3), and in the case of a moving camera with a 6-DOF tracker attached to it (see Figure 4), the target is held in place at some arbitrary position inside the capture space (see Figure 5).

Figure 3: The calibration target attached to the back of an HTC Vive controller. The attachment sleeve needs to be tight enough for the disk not to move during calibration.

Figure 4: An HTC Vive controller attached to a 3D camera (a second-generation Microsoft Kinect) via three squares of double-sided tape and a rubber band. A stand-alone Vive Tracker would have been much easier to handle.

Figure 5: The calibration target held in place at an arbitrary position in the desired capture space. The target should be placed with good visibility from both tracking base stations.

With camera, controller, and calibration target prepared per the desired calibration mode, it is now time to start Vrui’s tracking driver (if it is not already running):

$ VRDeviceDaemon -rootSection Vive

and then the calibration utility, ExtrinsicCalibrator, installed with the Kinect 3D Video package:

$ ExtrinsicCalibrator

Without any command line arguments, ExtrinsicCalibrator connects to a tracking driver running on the same computer, and to the first 3D camera on the computer’s USB bus. On startup, ExtrinsicCalibrator shows a live 3D video feed from that camera, drawn from the camera’s point of view, and the calibration configuration dialog.

If the display window’s field-of-view is too narrow, not all of the camera image might be visible. In that case, it is easiest to dolly out of the camera view by holding down the left Shift key, and rolling the mouse wheel up until the entire image appears.

The configuration dialog has three panels: “Disk Center,” “Disk Extractor,” and “Extrinsic Calibration.” The first panel is mostly irrelevant for the new calibration algorithm. It used to be used to measure the calibration target’s position with respect to the controller to which it is attached. The new method performs this calculation as part of the main calibration procedure.

The second panel allows tweaking the disk extraction algorithm’s parameters. The default values should work well, but if the software has trouble following the target disk, its search parameters can be adjusted until it works reliably. The “Disk Radius” value is in centimeters, and corresponds to the actual radius of a standard CD or DVD. It should not be changed.

The third panel allows changing the calibration mode between static and moving camera and adjusting the maximum inlier distance for the RANSAC optimization algorithm in meters, and displays the RMS alignment residual of the current alignment in meters. The default inlier distance of 1.5cm should work well in most cases. If the algorithm has a hard time converging to a result, the inlier distance should be increased slowly.

Static Camera Calibration

To calibrate a static camera, un-select the “Moving Camera” toggle in the “Extrinsic Calibration” configuration panel, and collect tie points as shown in the video in Figure 6:

Figure 6: Calibration procedure for static camera.

To save the current calibration, press the “Save Alignment” button in the “Extrinsic Calibration” configuration panel. This will take a few seconds, as the utility will run a full non-linear optimization with a very large number of RANSAC iterations. The calibration from camera space to the world space of the tracking system will be stored in the selected 3D camera’s configuration files, meaning that it will be picked up automatically the next time a 3D video stream is requested from that camera.

Moving Camera Calibration

To calibrate a moving tracked camera, select the “Moving Camera” toggle in the “Extrinsic Calibration” configuration panel, and collect tie points as shown in the video in Figure 7:

Figure 7: Calibration procedure for moving tracked camera.

To save the current calibration, press the “Save Alignment” button in the “Extrinsic Calibration” configuration panel. This will again take a few seconds. The calibration from camera space to the local coordinate system of the controller or tracker attached to the camera will be stored in the selected 3D camera’s configuration files, meaning that it will be picked up automatically the next time a 3D video stream is requested from that camera.

 Calibration Results

The video in Figure 8 shows a mixed-reality recording with a static camera, following the calibration procedure performed in the video in Figure 6.

Figure 8: Mixed-reality recording with static calibrated camera. The final video footage was created entirely in-engine, showing proper occlusion between the captured 3D geometry and the virtual 3D environment, and illumination of the captured geometry from light sources in the environment, specifically a subtle red glow cast from the lightsaber blade.

2 thoughts on “3D Camera Calibration for Mixed-Reality Recording

Please leave a reply!