I spent the last couple of days at the first annual meeting of “The Higher Education Campus Alliance for Advanced Visualization” (THE CAAV), where folks managing or affiliated with advanced visualization centers such as KeckCAVES came together to share their experiences. During the talks, I saw slides showing Vrui‘s Collaboration Infrastructure pop up here and there, and generally remote collaboration was a big topic of discussion. During breaks, I showed several people the following video on my smartphone (yes, I finally joined the 21st century), and afterwards realized that I had never written a post about this work, as most of it predates this blog. So here we go.
The above video, from early 2012, shows a collaborative visual data exploration and analysis session between three users: myself, wearing a maroon T-shirt, using a (then) low-cost VR environment consisting of a 72″ 3D TV with optical head tracking and an optically-tracked Wiimote as input device; Dawn Sumner, one of KeckCAVES’ core faculty and Mars Curiosity scientist, wearing an olive-ish sweater, using our CAVE; and Burak Yikilmaz, who is not seen in the video because he was operating the virtual camera through which the video was filmed (you can see him changing the camera’s viewpoint whenever the white crosshairs show up).
Both Dawn and I are captured as 3D pseudo-holographic avatars using two first-generation Kinect cameras each (remember, this is from 2012), and the resulting 3D video is streamed in real time between the CAVE in the Earth & Planetary Sciences building, the 3D TV in the Academic Surge building, and the desktop on which the video was recorded. The video has an audio track that was also recorded on the desktop, featuring Dawn and me discussing her data set. I am wearing a USB headset with integrated microphone, and Dawn is wearing a clip-on microphone, and hearing my voice through the four-speaker surround sound system in the CAVE.
The crucial feature of this collaboration framework, and what makes it the next best thing to direct in-person collaboration, is how users from different locations are mapped into the same shared virtual space. In the Vrui VR toolkit, each local VR environment defines its own so-called physical coordinate system, which is the system in which the local display screens are placed and/or head-mounted displays are tracked. Vrui applications, on the other hand, define a so-called navigational coordinate system, in which they construct their 3D geometry. In OpenGL lingo, the equivalent of navigational coordinates are model coordinates. The transformation from navigational coordinates to physical coordinates is the navigation transformation, and any viewpoint changes or locomotion of the user in the virtual space are expressed as changes to that single transformation (via translation, rotation, or uniform scaling).
In the collaboration framework, the physical coordinate systems of several VR environments are linked through the shared navigational coordinate system of the application that they are all running. The effect of this is that everything works exactly as expected. When a user navigates towards a virtual object in the shared environment, say by physically walking towards it, or by pulling the object towards them, the other users see the first user’s avatar move towards the shared object. If a user points at a virtual object — or another user’s avatar — with an input device or part of their body, the other users see exactly the same thing. Specifically, if one user looks at another user, the latter sees the former’s avatar looking at them. No matter how users navigate, sight lines are always maintained correctly. Users can high-five each other, or pretend to shake hands (there is no force feedback).
In addition, even movements that are not possible in reality work in an intuitive fashion. If a user picks up a shared model and turns it upside down, the other users see the first user’s avatar flip upside down — but pointing, sightlines, and touch still work. If one user zooms into a shared data set, that user’s avatar will appear to shrink from the other users’ perspectives — and still sightlines and touch work. It’s quite something to have a face-to-face conversation with a miniaturized human who is standing on the palm of one’s hand.
The above video, having been filmed on a desktop system, does a pretty good job of showing how interactions work, but does not really show how each user sees the other users’ avatars. Do they appear projected onto some screen, or are they flat cardboard cutouts? To try and show this aspect, I made another video a while ago, which is simply me interacting with a previously recorded version of myself, displayed at 1:1 scale, in a CAVE:
This video is intercut from two perspectives: from an outside fixed camera, and from a hand-held tracked camera, showing only the avatar (and a desk chair for scale).