I am a research computer scientist at the University of California, Davis. My research areas are scientific visualization, particularly in immersive ("virtual reality") environments, human/computer interaction in immersive environments, and 3D computer graphics.
My primary work is software development, from architecture over design to implementation and coding. I am the primary software developer for the UC Davis W.M. Keck Center for Active Visualization in the Earth Sciences (KeckCAVES). Some of my released packages are Vrui (a VR development toolkit), CollaborationInfrastructure (a tele-collaboration plug-in for Vrui), Kinect (a driver suite to capture 3D video from Microsoft Kinect cameras), LiDAR Viewer (a visualization package for very large 3D point clouds), 3D Visualizer (a system for interactive visual analysis of 3D volumetric data), Nanotech Construction Kit (an interactive molecular design program), and SARndbox (an augmented reality sandbox).
I also dabble in VR hardware, in the sense that I take existing custom or commodity hardware components (3D TVs, head-mounted displays, projectors, tracking systems, Wiimotes, Kinect cameras, ...) and build fully integrated immersive environments out of them. This includes a fair share of driver development to support hardware that either doesn't have drivers, or whose vendor-supplied drivers are not up to par.
I’ve written at length (here, here, here, and here) about the challenges of properly supporting immersive displays such as CAVEs or HMDs such as the upcoming Oculus Rift, and the additional degrees of freedom introduced by 3D tracking.
I just found this interesting post by James Iliff, talking about the same general issue more from a game design than game implementation point of view.
Out of his three points, motion tracking, and the challenges posed by it, is the one most closely related to my own interests. The separation of viewing direction, aiming direction (as related to shooting games) and movement direction is something that falls naturally out of 3D tracking, and that needs to be implemented in VR applications or games at a fundamental level. Specifically, aiming using a tracked input device does, in my opinion, not work in the canonical architecture set up by existing desktop or console shooter games (see video below for an example).
My main concern with James’ post is the uncritical mention of the Razer Hydra controller. We are using those successfully ourselves (that’s a topic for another post), but it needs to be pointed out that we are using them differently than other tracked controllers. This is due to their lack of global precision: while the controllers are good at picking up relative motions (relative to their previous position, that is), they are not good at global positioning. What I mean is that the tracking coordinate system of the Hydra is non-linearly distorted, a very common effect with magnetic 3D trackers (also see Polhemus Fastrak or Ascension Flock of Birds for old-school examples). It is possible to correct for this non-linear distortion, but the problem we observed with the Hydra is that the distortion changes over relatively short time frames. What this means is that the Hydra is best not used as a 1:1 input device, where the position of the device in virtual space exactly corresponds to the position of the device in real space (see video below for how that works and looks like), but as an indirect device. Motions are still tracked more or less 1:1, but the device’s representation is offset from the physical device, and by a significant amount to prevent confusion. This has a direct impact on usability: instead of being able to use the physical device itself as an interaction cursor, embodying the “embodiment” principle (pun intended), the user has to work with an explicit virtual representation of the device instead. It still works (very well in fact), but it is a step down in immersion and effectiveness from globally-tracked input devices, such as the optically tracked Wiimote used in our low-cost VR system design.
And just because it’s topical and I’m a really big fan of Descent (after all, it is the highest form of patriotism!), here’s that old chestnut again:
Note how the CAVE wand is used as a “virtual gun,” and how the virtual gunsights are attached directly to the physical controller itself, not to a virtual representation of the physical controller. As far as the user is concerned, the CAVE wand is the gun. (The slight offset between controller and target reticle is primarily due to problems when setting up a CAVE for filming). This globally-precise tracking comes courtesy of the high-end Intersense IS-900 tracking system used in our CAVE, but we achieve the same thing with a (comparatively) low-cost NaturalPoint OptiTrack optical tracking system. The Hydra is a really good input device if treated properly, but it’s not the same thing.
We are currently involved in an NSF-funded project to study the changes in global ocean flow patterns in response to past climate change, specifically the difference in flow patterns between the last glacial maximum (otherwise known as the “Ice Age”, ~25000 years ago) and the Holocene (otherwise known as “today”).
In layman’s terms, the basic idea is to use differences in the chemical composition, particularly the abundance of isotopes of carbon (13C) and oxygen (18O), of benthiccore samples collected from the ocean floor all around the world to establish correlations between sampling sites, and from that derive a global flow model that best explains these correlations. (By the way, 13C is not the carbon isotope used in radiocarbon dating; that honor goes to 14C).
This is a multi-institution collaborative project. The core sample isotope ratios are collected and collated by Lorraine Lisiecki and her graduate students at UC Santa Barbara, and the mathematical method to reconstruct flow patterns based on those samples is developed by Jake Gebbie at Woods Hole Oceanographic Institution. Howard Spero at UC Davis is the overall principal investigator of the project, and UC Davis’ contribution is visualization and analysis software, building on the strengths of the KeckCAVES project. I’ve posted previously about our efforts to construct low-cost immersive display systems at our collaborators’ sites so that they can use the visualization software developed by us in its native habitat, and also collaborate with us and each other remotely in real-time using Vrui’s collaboration infrastructure.
So here is the first major piece of visualization software developed specifically for this project. It was developed by Rolf Westerteiger, a visiting PhD student from Germany, based on the Vrui VR toolkit. Here is Rolf himself, using his application in the CAVE:
PhD student Rolf Westerteiger using his immersive visualization application in the KeckCAVES CAVE.
This application reads a database of core sample compositions created by Lorraine Lisiecki, and a reconstructed 3D flow field created by Jake Gebbie, and puts both into a global three-dimensional context. The software shows a block model of the Earth’s global ocean floor (at the same resolution as the 3D flow field, and vertically exaggerated by a significant factor), and allows a user to interactively query and explore the 3D flow.
The primary flow visualization method is line integral convolution (LIC), which creates dense and intuitive visualizations of complex flows. As LIC works best when applied to 2D surfaces instead of 3D volumes, Rolf’s application is based on a set of interactively controllable surfaces (one sphere of constant depth, two cones of constant latitude, two semicircles of constant longitude) which slice through the implicitly-defined 3D LIC volume. To indicate flow direction, the LIC texture is animated by cycling through a phase offset, and color-coded by either flow velocity or water temperature.
The special thing about this LIC visualization is that the LIC textures are not pre-computed, but generated in real time using the GPU and a set of GLSL shaders. This allows for even more interactive exploration than shown in this first result; a user could specify arbitrary slicing surfaces using tracked 3D input devices, and see the LIC pattern displayed on those surfaces immediately. From our experience with the 3D Visualizer software, which is based on very similar principles, we believe that this will lead to a very powerful exploratory tool.
A secondary flow visualization method are tracer particles, which can be injected into the global ocean at arbitrary positions using a tracked 3D input device, and leave behind a trail of their past positions. Together, these two methods provide rich insight into the structure of these reconstructed flows, and especially their evolution over geologic time.
A third visualization method is used to put the raw data that were used to create the flow models into context. A set of labels, one for each core sample in the database, and each showing the relative abundance of the important isotope ratios, are mapped onto the virtual globe at their proper positions to enable visual inspection of the flow reconstruction method.
Unfortunately, Rolf had to return to Germany before we were able to film a video showing off all features of his visualization application, so I had to make a video with myself standing in for him:
The next development steps are to replace the ocean floor block model read from the flow file with a high-resolution bathymetry model (see below), and to integrate the visualization application with Vrui’s remote collaboration infrastructure such that it can be used by all collaborators for virtual joint data exploration sessions.
Global high-resolution bathymetry model at 75x vertical exaggeration. View is centered on Northern Atlantic.
I’ve already mentioned KeckCAVES‘ involvement in NASA‘s newest Mars mission, the Mars Science Laboratory, in a previous post, but now I have an update. Dawn Sumner, UC Davis‘ member of the Curiosity science team, was interviewed last week for “Onward California,” which I guess is some new system-wide outreach and public relations effort to get the public’s mind off last fall’s “unpleasantries.” Just kidding UC, you know I love you.
Anyway… Dawn decided that the best way to talk about her work on Mars would be to do the interview in the CAVE, showing how our software, particularly Crusta Mars, was used during the planning stages of the mission, specifically landing site selection. I then suggested that it would be really nice to do part of the interview about the rover itself, using a life-size and high-resolution 3D model of the rover. So Dawn went to her contacts at the Jet Propulsion Laboratory, and managed to get us a very detailed 3D model, made of several million polygons and high-resolution textures, to load into the CAVE.
What someone posing with a life-size 3D model of the Mars Curiosity rover might look like.
As it so happens, I have a 3D mesh viewer that was able to load and render the model (which came in Alias|Wavefront OBJ format), with some missing features, specifically no specularity and bump mapping. The renderer is fast enough to draw the full, undecimated mesh at sufficient frame rate for immersive display, around 30 frames per second.
The next problem, then, was how to film the beautiful rover model in the CAVE without making it look like garbage, another topic about which I’ve posted before. The film team, from the Department of the 4th Dimension, fortunately was on board, and filmed the interview in several segments, using hand-held and static camera setups.
We have pretty much figured out how to film hand-held video using a secondary head tracker attached to the camera, but static setups where the camera is outside the CAVE, and hence outside the tracking system’s range, always take a lot of trial and error to set up. For good video quality, one has to precisely measure the 3D position of the camera lens relative to the CAVE and then configure that in the CAVE software.
Previously, I used to do that by guesstimating the camera position, entering the values into the configuration file, and then using a Vrui calibration utility to visually judge the setup’s correctness. This involves looking at the image and why it’s wrong, mentally changing the camera position to correct for the wrongness, editing the configuration file, and repeating the whole process until it looks OK. Quite annoying that, especially if there’s an entire film crew sitting in the room checking their watches and rolling their eyes.
After that filming session, I figured that Vrui could use a more interactive way of setting up CAVE filming, a user interface to set up and configure several different filming modes without having to leave a running application. So I added a “filming support” vislet, and to properly test it, filmed myself posing and playing with the Curiosity rover (MSL Design Courtesy NASA/JPL-Caltech):
Pay particular attention to the edges and corners of the CAVE, and how the image of the 3D model and the image backdrop seamlessly span the three visible CAVE screens (left, back, floor). That’s what a properly set up CAVE video is supposed to look like. Also note that I set up the right CAVE wall to be rendered for my own point of view, in stereo, so that I could properly interact with the 3D model and knew what I was pointing at. Without such a split-CAVE setup, it’s very hard to use the CAVE when in filming mode.
The filming support vislet supports head-tracked recording, static recording, split-CAVE recording (where some screens are rendered for the user, and some for the camera), setting up custom light sources, and a draggable calibration grid and input device markers to simplify calibrating a static camera setup when the camera is outside the tracking system’s range and cannot be measured directly.
All in all, it works quite well, and is a significant improvement over the previous setup method. It is now possible to change filming modes and camera setups from within a running application, without having to exit, edit configuration files, and restart.
I started working on low-cost VR, that is, cheap (at least compared to a CAVE or other high-end system) professional-grade holographic display systems about 4 1/2 years ago, after seeing one at the 2008 IEEE VR conference. It consisted of a first generation DLP-based projection 3D TV and a NaturalPointOptiTrack optical tracking system. I put together my own in Summer 2008, and have been building, or helped others building, more at a steadily increasing rate — one in my lab, one in our med school, one at UC Berkeley, one at UC Merced, one at UC Santa Barbara, a handful more at NASA labs all over the country, and probably some I don’t even know about. Here’s a video showing me using one to explore a CAT scan of a patient with a nasty head fracture:
Back then, I created a new subsite of my web site dedicated to low-cost VR, with a detailed shopping list and detailed installation and configuration instructions. However, I did not update either one for a long time after, leading to a very outdated shopping list and installation instructions that were increasingly divergent from state-of-the-art approaches.
But that has changed recently. As part of an NSF-funded project on paleoceanography, we promised to install two such systems at our partner institutions, University of California, Santa Barbara, and Woods Hole Oceanographic Institution. I installed the first one a couple of months ago. Then, I currently have two exchange students from the University of Georgia (this Georgia, not that Georgia) who came here to learn how to build these systems in order to build one for their department at home. To train them, I rebuilt my own system from scratch, let them take the lead on rebuilding the one at our medical school, and right now they’re on the east coast to install the new system at WHOI.
Observing “newbies” following my guide trying to build a system from scratch allowed me to significantly improve the instructions, to the point that I believe they’re now comprehensive and can be followed by first-time builders with some computing knowledge. I also updated the shopping list to again represent a currently-available system, with current prices.
So the bottom line is that I now feel comfortable to let people go wild with the low-cost VR subsite and build their own display systems. If no existing equipment (computers, 3D TVs, …) can be used, a very nice, large (65″ TV), and powerful system can be built for around $7000, depending on daily deals. While not exactly cheap-cheap, one has to keep in mind that this is a professional-grade system, fit for scientific and other serious uses.
I should mention that we have an even lower-cost design, replacing the $3500 optical tracking system with a $150 Razer Hydra controller, but there’s a noticeable difference in functionality between the two. I should also mention that there’s a competing design, the IQ Station, but I believe that ours is better (and I’m not biased at all!).
While I was in Santa Barbara recently to install a low-cost VR system, I also took the chance to visit the Allosphere. One of the folks behind the Allosphere is Tobias Höllerer, a computer science professor at UCSB who I’ve known for a number of years; on this visit, I also met JoAnn Kuchera-Morin, the director of the Allosphere Research Facility, and Matthew Wright, the media systems engineer.
Allosphere Hardware
The Allosphere is an audacious design for a VR environment: a sphere ten meters in diameter, completely illuminated by more than a dozen projectors. Visitors stand on a bridge crossing the sphere at the equator, five meters above ground. While I did take my camera, I forgot to take good pictures; Figure 1 is a pretty good impression of what the whole affair looks like.
Figure 1: What the Allosphere kinda looks like. Image taken from the Marvel Movies Wiki.
I got interested in remote collaboration because I hate traveling, so it’s somewhat funny that I’ll be traveling all over the place in the near future to install remote collaboration-capable immersive display systems. I guess I brought it upon myself.
The first stop in the grand low-cost VR world tour was UC Santa Barbara, where I just finished installing a system following our blueprint, with some updates to account for the inexorable march onwards of technology. Lorraine Lisiecki, in the UC Santa Barbara department of Earth Science, is one of the collaborators in our NSF grant on studying paleoclimate, and leading one of the two remote sites that will be equipped with one of our “holo-phone” prototypes, the other one being Woods Hole Oceanographic Institution on Cape Cod, where I’ll travel next.
Here’s a video showing what it’s like to use a “holo-phone:”
One of the biggest concerns going into this installation was how we would fit a 70″ 3D TV into a small faculty office. It’s not just that the TV itself is huge, but the tracking system cameras need to be mounted relatively far away from the TV so that their tracking volume covers the entire workspace in front of the TV. In the past, I have relied on high ceilings and mounted the cameras straight above the left and right edges of the TV and above the center, but that wasn’t an option in this case due to Lorraine’s office’s low 8.5 ft ceilings.
My solution was to push the left and right cameras further out, so that they look diagonally across the TV screen instead of straight down (see Figure 1). That turned out to cause more problems when designing “tracking antlers” for the tracked 3D glasses and the Wiimote, but it really helped increasing the tracking volume — we managed to track the entire screen surface — and had the side benefit of slightly increasing tracking accuracy due to the larger stereo reconstruction baseline.
Figure 1: Position of the three OptiTrack cameras above the TV. Unlike in other installations, I pushed the cameras out to the sides to get a larger tracking volume in spite of the low ceiling.
Another, secondary, worry was the stereo quality of the TV. I had only seen this particular TV model in a store before, but due to the environment, and low-depth and low-contrast 3D content typically shown in store demos, it is impossible to judge how well a 3D TV will actually work once installed.
Stereo quality is primarily determined by the amount of cross-talk (or ‘ghosting”) between the left and right-eye views. Noticeable cross-talk interferes with the brain’s ability to fuse stereo pairs into 3D perception. LCD-based active-stereo 3D TVs such as this Sharp are problematic, because LCD pixels are relatively slow to switch from a full-on to a full-off state. In a high-contrast stereo pair, a pixel might have to switch from full-black to full-white 120 times per second. While that doesn’t sound so bad, one has to consider that in order to avoid cross-talk, a pixel that is white in one view and black in the other must look exactly as black as its neighboring pixel that’s black in both images, and as white as its other neighboring pixel that’s white in both images. Meaning that the pixel must have completed its switch during the short time the active stereo glasses do their own switch from opaque to transparent, or there will be a perceived brightness difference. The bottom line is that LCDs simply can’t do it, and the resulting stereo quality differs unpredicably between manufacturers, models, and potentially even individual units in the same model line. The work-arounds common in 3D movies, low contrast and limited eye separation, don’t work in immersive graphics because eye separation is directly determined by the user’s eye positions and not a free parameter, and high contrast is an important factor for effective scientific visualization.
That said, the stereo quality of this particular TV turned out to be OK, but not great. It is about on the same level as other LCD 3D TVs with which I’ve worked, about on par with passive-stereo 3D TVs (where cross-talk is caused by imperfect polarization, not pixel switch time), and significantly worse than DLP-based 3D projectors or projection TVs, where pixel switch time is simply not an issue, and the little cross-talk there is is caused by the 3D glasses. In the Nanotech Construction Kit, cross-talk manifested itself as a marble-like texture on the atomic building blocks, which looked strange but actually worked OK. In applications with point- or line-based rendering, white points or lines on a black background had borderline-worrisome cross-talk.
One unexpected issue with the 3D TV was that it refused to accept the 1080p frame-packed HDMI 1.4a video signal that I tried to send it. Since the TV doesn’t support Quincunx (“checkerboard”) interleaved stereo, I had to use a top-to-bottom stereo signal, which caused a loss in resolution, and anisotropy effects when using point or line primitives. Not ideal but workable, and a firmware upgrade might fix it.
Apart from that, there were the usual off-site installation issues, requiring several trips to the hardware or computer store to solve. There’s always a missing cable or power strip, or a small broken widget. What took the cake in this installation was the mouse that shipped with the Dell tracking PC: someone at Dell must have clipped the mouse cable with a set of pliers (see Figure 2); the broken mouse was wrapped in a cut plastic bag inside an undamaged cardbox container together with the keyboard.
Figure 2: Someone at Dell must have had a bad case of the Mondays. At least it was a clean cut.
In the final analysis, it took the first day to assemble the main PC and install Linux and all the goodies on it, and to install the tracking software on the Dell PC. Added difficulty: the tracking software insisted on calling home during installation, but getting a PC on the network at UCSB requires intervention from IT support staff, who had left for the day. So I needed to set up ad-hoc NAT on the main PC, which only had a single network interface card. After an unsuccessful trip to Best Buy to buy a second network card, I found a way to do NAT over a single interface (whew!).
The second day was spent mounting the TV and tracking cameras on Lorraine’s office wall, with help from friendly staff (thanks Tim!), and then aligning the tracking cameras and internally calibrating the system. The last part of that was calibration of the Vrui-side of the system using a Leica TCR407 power Total Station, with help from Lorraine’s grad student Carlye. When I thought I was done for the day was when the real trouble started, from unexpected problems calculating the calibration equations to random X server lock-ups and crashes caused by the Nvidia graphics driver. Fortunately I found the problem with the former, and a work-around for the latter.
The calibration issues were caused by the TV’s stupid default setting where incoming native 1920×1080 HDMI images are zoomed and then resampled to the 1920×1080 display engine. Who ever thought this was a good idea? Anyway, the result was that we measured the zoomed 2D test pattern, but because 3D mode doesn’t zoom, we got a screen size mismatch.
So we had to re-do the screen part of the calibration on the third day; fortunately, I hadn’t moved or taken down the Total Station yet, so we didn’t have to re-do the other two parts. The rest of the third day was spent cleaning up by moving the computers to their temporary “final” position (see Figure 3), sorting out several odds and ends, installing support scripts and launcher icons, and, most importantly, training the new users (Lorraine and Carlye, see Figure 4).
Figure 3: Picture of the 3D system’s temporary “final” installation with dangling cables and the main and tracking computers jammed into a corner, using an old chair to hold the keyboard, mouse, and monitor. This will be cleaned up in the coming days after a trip to the furniture store to get a proper computer desk.
Figure 4: Lorraine (standing) and Carlye (wearing the tracked 3D glasses and holding the Wiimote input device) playing with the Nanotech Construction Kit during initial user training.
In the final analysis, it took three full days (8:15am-10:15pm, 7:30am – 2:00am, 9:15am – 5:00pm) to build, install, and calibrate the system entirely from scratch and give the users enough initial training to get them going, without any site preparation except having the components already on site. There are a few outstanding issues, such as the Wiimote requiring to push the reset button instead of 1+2 to connect, but those can be solved remotely in the future.
It turned out as a very good system, with OK stereo quality, ample screen space, very good tracking volume and excellent tracking calibration (not more than 1mm discrepancy between tracking marker and virtual indicator over the entire space). The TrackingTools software has improved a lot since the early versions I use in my own system; after moving away from Euler angles to represent orientations it no longer suffers from gimbal lock, and performance and accuracy have increased overall. The software is a lot more automated as well, requiring literally zero interaction after double-clicking the icon to get up and running.
Having run into a lot of new issues, I am now hoping that knowing them ahead of time will make the next off-site installation go smoother and take less time. Here’s hoping.
I just got a piece of unsolicited email from Microsoft, daring me to test Bing against Google. Since when did Microsoft start spamming? Do they think that sending HTMLy mass emails full of remote content is going to garner them lots of sympathy? What the hell is wrong with you people?
This is what it looked like:
Spam email from Microsoft daring me to take the Bing challenge. As if. Sure, sending mass emails full of remote content is going to make you real popular.
I feel kinda dirty for clicking on “show remote content” just so that I could take this screenshot.
One of the big problems with advocating for virtual reality is that it can really only be experienced first-hand; I can make all the movies I want, and still won’t get the main point across.
So I figured it might be a good idea to host a “VR Open House” at UC Davis, and invite readers of my blog, or the redditors of the virtualreality subreddit where I sometimes hang out, to come and see what I mean by immersive 3D graphics (or “VR”) in person.
But before I do any planning, I need to find out if there’s any interest in the first place; after all, I don’t know how many people live around northern California, or how far people would be willing to travel for something like this (we’re an hour north-east of San Francisco, right off I-80).
So here’s my idea: if you read this post, and would attend such an open house, please leave a reply. If enough people show interest, I’ll schedule something, and then do another poll for people to sign up on a first-come first-serve basis, as I can’t handle a group larger than, say, a dozen. If you do leave a reply, maybe indicate what date ranges won’t work for you.
Some details on what we could look at / play with:
Earlier this year, I branched out into augmented reality (AR) to build an AR Sandbox:
Photo of AR Sandbox, with a central “volcano” and several surrounding lakes. The topographic color map and contour lines are updated in real time as the real sand surface is manipulated, and virtual water flows over the real sand surface realistically.
I am involved in an NSF-funded project on informal science education for lake ecosystems, and while my primary part in that project is creating visualization software to drive 3D displays for larger audiences, creating a hands-on exhibit combining a real sandbox with a 3D camera, a digital projector, and a powerful computer seemed like a good idea at the time. I didn’t invent this from whole cloth; the project got started when I saw a video of such a system done by a group of Czech students on YouTube. I only improved on that design by adding better filters, topographic contour lines, and a physically correct water flow simulation.
The idea is to have these AR sandboxes as more or less unsupervised hands-on exhibits in science museums, and allow visitors to informally learn about geographical, geological, and hydrological principles by playing with sand. The above-mentioned NSF project has three participating sites: the UC Davis Tahoe Environmental Research Center, the Lawrence Hall of Science, and the ECHO Lake Aquarium and Science Center. The plan is to take the current prototype sandbox, turn it into a more robust, museum-worthy exhibit (with help from the exhibit designers at the San Francisco Exploratorium), and install one sandbox each at the three sites.
But since I published the video shown above on YouTube, where it went viral and gathered around 1.5 million views, there has been a lot of interest from other museums, colleges, high schools, and private enthusiasts to build their own versions of the AR sandbox using our software. Fortunately, the software itself is freely available and runs under Linux and Mac OS X, and all the hardware components are available off-the-shelf. One only needs a Kinect 3D camera, a data projector, a recent-model PC with a good graphics card (Nvidia GeForce 480 et al. to run the water simulation, or pretty much anything with water turned off) — and an actual sandbox, of course.
In order to assist do-it-yourself efforts, I’ve recently created a series of videos illustrating the core steps necessary to add the AR component to an already existing sandbox. There are three main steps: two to calibrate the Kinect 3D camera with respect to the sandbox, and one to calibrate the data projector with respect to the Kinect 3D camera (and, by extension, the sandbox). These videos elaborate on steps described in words in the AR Sandbox software’s README file, but sometimes videos are worth more than words. In order, these calibration steps are:
Step 1 is optional and will get a video as time permits, and steps 3, 6, and 8 are better explained in words.
Important update: when running the SARndbox application, don’t forget to add the -fpv (“fix projector view”) command line argument. Without it, the SARndbox won’t use the projector calibration matrix that you so carefully calibrated in step 7. It’s in the README file, but apparently nobody ever reads that. 😉
The only component that’s completely left up to each implementer is the sandbox itself. Since it’s literally just a box of sand with a camera and projector hanging above, and since its exact layout depends a lot on its intended environment, I am not providing any diagrams or blueprints at this point, except a few photos of our prototype system.
Basically, if you already own a fairly recent PC, a Kinect, and a data projector, knock yourself out! It should be possible to jury-rig a working system in a matter of hours (add 30 minutes if you need to install Linux first). It’s fun for the whole family!
So you just built yourself a CAVE. Or maybe you already have one, but you just finished implementing an awesome CAVE application. Either way, you want to tell the world about your achievement. You can invite people to come over and experience it, and while that’s the best way of going about it, you can only reach a handful of people.
So what do you do? Well, you make a movie of course. A movie can be seen by millions of people, and they can all see how awesome CAVEs are. Just one tiny little problem: how do you actually go about filming that movie?
The most obvious approach is to take a video camera, fire up the CAVE, put someone in there, point, and shoot. Since the CAVE’s walls are projection screens, the camera will record what’s on the walls, and that should be good enough, right? Well, I don’t think so. There are plenty of such videos out there, and I picked one more or less at random (no offense to whoever made this):
Now, I honestly don’t think this is convincing at all. If you haven’t already experienced a CAVE first hand, can you actually tell what’s going on? The images on the screens are blurry due to the stereoscopic display, severely distorted due to head-tracked rendering, and move and wobble around as the user moves in the CAVE. It is impossible to tell what the 3D model displayed in this CAVE really looks like, and I believe it is impossible to understand — for a lay audience — what the person in the CAVE is actually experiencing. The user being all “woo” and “aah” and “awesome” doesn’t really help either — looking at this with a cynical eye, you’d rightfully feel someone is trying to pull wool over your eyes. If I didn’t already know better, and saw this video, I wouldn’t be keen on seeing a CAVE in person. It looks rather lame and headache-inducing, to be honest.
So what’s the mistake? A CAVE works because the images on the screens are generated specifically for the person viewing them, which is why positional head tracking is a required ingredient for any CAVE (and why CAVEs are in principle single-user environments). But in this case, the person in the CAVE is not the intended audience: the people watching the movie are the audience. The solution is very simple: instead of head-tracking the user in the CAVE, you have to head-track the camera. And turn off stereoscopy while you’re at it, because the video camera is only monoscopic after all.
Here is one of my old movies, showing what that looks like:
This is a bit better. The movie is not particularly good quality (it’s old; I need to record a few new ones), but at least viewers can see what’s going on. The difference between this and the previous movie is that here the images are correctly projected onto the CAVE walls. Looking closely, you will see the seams where the walls meet, but you will also see that the 3D images cross those seams without being broken up or distorted. If you squint enough no longer to see the seams, it will seem as if the CAVE is one big flat screen. As a result, virtual 3D objects show up at their proper size and in proper relation to the real user in the CAVE. If the user touches a part of the data using the hand-held input device, this will show up properly in the video. In the latter parts, where a yellow “selection sphere” is attached to the input device, the sphere’s image in the video shows up in exactly the right place and size.
There are still two things wrong with this video: for one, it appears as if the user is having trouble working with the CAVE. We’ll address that later. The second issue is that the CAVE doesn’t look particularly dynamic. This is because the camera is on a tripod and doesn’t move throughout the movie (the movie only cuts between two different camera setups). One very strong depth cue that particularly applies to 2D video is motion parallax, where 3D objects move in a very particular way as the camera filming them is moved, and our brains are particularly good at picking up on that. Because the camera here doesn’t move, there is no motion parallax, and the CAVE looks somewhat “flat.”
This next movie addresses that issue by using a hand-held camera, which is still tracked by the CAVE as in the previous movie:
This looks a lot better. In fact, it looks so convincing that I have gotten many questions asking how I made the video to be 3D. Answer is, of course, that it’s not a 3D movie; it’s a regular 2D movie exploiting motion parallax. The trick is that I’m moving the camera as much as I can, to show how the virtual 3D objects appear to move in the exact same particular way as the real objects (in this case the user). Real filmmakers would tell me to cool it, but in this case egregious camera movement is a necessary evil. While I’m panning the camera, the virtual and real objects move in exactly the same way; in other words, the virtual objects appear real, which is exactly how and why a CAVE works.
But even this does not address the remaining issue: the 3D interactions captured in these movies seem awkward, as if the users didn’t know what they were doing, or, worse, as if a CAVE or VR in general were very hard to use, and not particularly effective. I call this the catch-22 of filming VR. Above I mentioned that in a CAVE, the images on the walls are generated specifically for one viewer, in order to appear real. But in these movies, the images are generated for the camera — and not even in stereo, to boot.
This means the actual user in the movies does not see the virtual objects properly, and is essentially flying blind. That’s the reason why the interactions here look awkward. Instead of simply being able to touch a virtual object as if it were real and then interact with it, the poor user has to judge his or her actions against the feedback of the generated images (which, from his or her perspective, don’t look like real objects at all), and adjust accordingly. This is why it was so hard to properly measure a distance on the globe in the second video, it was basically pointing trial-and-error.
So the choice seems to be: let the user in the CAVE be the “main” viewer, allowing them to interact properly and fluently, but create a movie that looks utterly unconvincing, or let the camera be the “main” viewer, capturing beautiful video, but giving the impression that CAVEs are hard to use. If you want to communicate that CAVEs are awesome to look at and easy to use, that’s a lose-lose situation.
Or at least that’s what I thought, until a member of the KeckCAVES group applied some lateral thinking and suggested to “split” the CAVE: on half of the screens, show the images from the user’s point of view; on the other half, show them from the camera’s point of view. Then, if the user only looks at the first half, and the camera only looks at the second half, you can capture good-looking video with fluent interactions. The only thing left for me to do was say “D’oh” and do it right away. Here’s one of the early videos showing this new approach:
This is more like it. For this particular setup, I only set up the right CAVE wall to render for the camera, and left the other three screens (back and left wall and floor) for myself. I put the camera on a tripod and aimed it straight at the right wall, and stood at the very edge of the CAVE to give the camera the best possible view. You’ll notice how I misaimed a bit: at the left edge of the video, the camera is capturing some of the user-centered stereo projection on the back wall, and it looks weird. But it’s not bad enough to warrant a complete re-shoot.
Now, unfortunately this means we’re back to static non-moving cameras. In this particular case, I shot the whole movie by myself, meaning I couldn’t have done hand-held anyway, but I’m not quite sure how to do hand-held in this setup. While moving the camera around, it will naturally capture more than one screen unless one is very careful, and if the camera sees one of the for-user screens, the illusion will break down. In short, I’m not sure. I guess the best compromise for now is to film two versions of the same movie and intercut them (like I did for the LiDAR Viewer movie): one with a hand-held camera to show how a CAVE looks, and one with a split CAVE and fixed camera to show how interactions work. With some clever editing, that could create stellar results — but I haven’t tried it yet, primarily for lack of time. Capturing good CAVE video is not quick.