I just moved all my Kinects back to my lab after my foray into experimental mixed-reality theater a week ago, and just rebuilt my 3D video capture space / tele-presence site consisting of an Oculus Rift head-mounted display and three Kinects. Now that I have a new extrinsic calibration procedure to align multiple Kinects to each other (more on that soon), and managed to finally get a really nice alignment, I figured it was time to record a short video showing what multi-camera 3D video looks like using current-generation technology (no, I don’t have any Kinects Mark II yet). See Figure 1 for a still from the video, and the whole thing after the jump.
I decided to embed the live 3D video into a virtual 3D model of an office, to show a possible setting for remote collaboration / tele-presence (more on that coming soon), and to contrast the “raw” nature of the 3D video with the much more polished look of the 3D model. One of the things we’ve noticed since we started working with 3D video to create “holographic” avatars many years ago was that, even with low-res and low-quality 3D video, the resulting avatars just feel real, in some sense even more real than higher-quality motion-captured avatars. I believe it’s related to the uncanny valley principle, in that fuzzy 3D video that moves in a very lifelike fashion is more believable to the brain than high-quality avatars that don’t quite move right. But that’s a topic for another post.
Now, one of the things that always echoes right back when I bring up Kinect and VR is latency, or rather, that the Kinect’s latency is too high to be useful for VR. Well, we need to be careful here, and distinguish between the latency of the skeletal reconstruction algorithm that’s used by the Xbox and that’s deservedly knocked for being too slow, and the latency of raw depth and color video received from the Kinect. In my applications I’m using the latter, and while I still haven’t managed to properly measure the end-to-end latency of the Kinect in “raw” mode, it appears to be much lower than that of skeletal reconstruction. Which makes sense, because skeletal reconstruction is a very involved process that runs on the general-purpose Xbox processors, whereas raw depth image creation runs on the Kinect itself, in dedicated custom silicon.
The bottom line is that, at least to me and everybody else who has tried my system, latency of the 3D video is either not noticeable, or not a problem. Even when waving my hands directly in front of my face, they feel completely like my hands, and whatever latency is there does not lead to a disconnect. Based on several other observations we have made, such as the thing about swiveling in a chair I point out in the video, make me believe that 3D video, fuzziness and artifacts and all, creates a strong sense of presence in one’s own body.
A little more information about the capture site: It’s run by a single Linux computer (Intel Core i7 @ 3.5 GHz, 8 GB RAM, Nvidia Geforce GTX 770), which receives raw depth and color image streams from three Kinects-for-Xbox, is connected to an external tracking server for head position and wand position and orientation, and drives an Oculus Rift and a secondary monoscopic view from the same viewpoint (exactly the view shown in the video) on the desktop display.
To provide some (limited) freedom of movement to the user, the Rift is connected to the computer via extra long cables: a 15′ HDMI cable, an 11′ USB cable, and a 12′ power cord (I cut the Rift’s original cord in two and spliced in a 6′ extension). The user wears the Rift’s control box on a belt, and the long cables to the main computer are tied together via a spiral cable tunnel. That setup works quite well, but, as evident at 4:55 in the video, one better is careful not to yank the cable, or it might knock a Kinect out of alignment. Oops. 🙂 Fortunately, with the new calibration procedure, fixing that only takes a few minutes.
Update: Caving in to overwhelming public demand — well, some guy asked for it on reddit — I uploaded an Oculus Rift-formatted version of the above video to my YouTube channel. It’s exactly the same video, but if you already own an Oculus Rift dev kit version 1, you can watch the video in full 3D by dragging the playback window to the display feeding your Rift, and — this is very important! — full-screening it in 1280×800 resolution (1280×800, not 1280×720 or 1920×1080). Since I move my head a lot, and your view will be hard-locked to mine no matter how you move your head, the video might make you dizzy, so be careful. There are more detailed instructions and warning labels in the video description on YouTube: