There has been a lot of discussion about VR movies in the blogosphere and forosphere (just to pick two random examples), and even on Wired, recently, with the tenor being that VR movies will be the killer application for VR. There are even downloadable prototypes and start-up companies.
But will VR movies actually ever work?
This is a tricky question, and we have to be precise. So let’s first define some terms.
When talking about “VR movies,” people are generally referring to live-action movies, i.e., the kind that is captured with physical cameras and shows real people (well, actors, anyway) and environments. But for the sake of this discussion, live-action and pre-rendered computer-generated movies are identical.
We’ll also have to define what we mean by “work.” There are several things that people might expect from “VR movies,” but not everybody might expect the same things. The first big component, probably expected by all, is panoramic view, meaning that a VR movie does not only show a small section of the viewer’s field of view, but the entire sphere surrounding the viewer — primarily so that viewers wearing a head-mounted display can freely look around. Most people refer to this as “360° movies,” but since we’re all thinking 3D now instead of 2D, let’s use the proper 3D term and call them “4π sr movies” (sr: steradian), or “full solid angle movies” if that’s easier.
The second component, at least as important, is “3D,” which is of course a very fuzzy term itself. What “normal” people mean by 3D is that there is some depth to the movie, in other words, that different objects in the movie appear at different distances from the viewer, just like in reality. And here is where expectations will vary widely. Today’s “3D” movies (let’s call them “stereo movies” to be precise) treat depth as an independent dimension from width and height, due to the realities of stereo filming and projection. To present filmed objects at true depth and with undistorted proportions, every single viewer would have to have the same interpupillary distance, all movie screens would have to be the exact same size, and all viewers would have to sit in the same position relative the the screen. This previous post and video talks in great detail about what happens when that’s not the case (it is about head-mounted displays, but the principle and effects are the same). As a result, most viewers today would probably not complain about the depth in a VR movie being off and objects being distorted, but — and it’s a big but — as VR becomes mainstream, and more people experience proper VR, where objects are at 1:1 scale and undistorted, expectations will rise. Let me posit that in the long term, audiences will not accept VR movies with distorted depth.
Using these two cornerstones, what is the gold standard of VR movie? Simple: interactive real-time computer-generated experiences, i.e., VR video games. VR video games are full solid angle by nature, and if users properly calibrated their displays, they also show proper distortion-free 3D, at least as long as developers don’t screw up. (I won’t name names here, but I’ve seen some very bad stuff among the current Oculus Rift demos. Fortunately, the market will soon weed those out.) Since video games rely too much on interactivity to scratch the same itch as watching a movie, the closest thing to perfect VR movies right now are Half-Life-style cutscenes, where the viewer can freely move and look around while scripted events are unfolding, and the developers took great care to make sure viewers don’t miss important things by wandering off (watch the excellent Half-Life 2 embedded commentaries to learn what exactly goes into that).
So what’s the difference between VR video games as movie-like experiences and stereo movies? The former are free-viewpoint video, i.e., the viewer can change viewing position and direction dynamically after the movie has been produced, whereas the former have baked-in stereo, i.e., they show a 3D scene from a fixed viewpoint and using fixed viewing parameters, which cannot be changed afterwards and only look correct if viewed using the exact same parameters. Machinima is a good illustration: if machinima movies were distributed as script files that are viewed live inside the game engine that produced them, they would be VR movies; if they are rendered into (stereo or mono) movie files and viewed on YouTube, they are not.
To reiterate: stereo movies can only be watched properly if the projection parameters used during production precisely match those used during viewing (the old toe-in vs. skewed frustum debate is just one little aspect of that larger issue). That’s why I don’t call them 3D, and that’s why they don’t work in VR (with “work” defined as above).
Let’s assume there exists some camera rig that can capture completely seamless full-solid-angle video, to cover the panoramic aspect. How could this rig be used to capture stereo video? In regular stereo filming, there are two cameras side-by-side, approximating the average viewing parameters used during playback, i.e., approximating the position of the viewers’ eyeballs. But if the viewer is free to look around during playback, how can the fixed camera positions used during recording match that? Spoiler alert: they can’t. Now, there are approximations to create stereo movies that provide some sense of depth for 360° panoramic projection — but notice I’m using the 2D term “360°” here, on purpose. That’s because those methods account for viewers looking around horizontally, i.e., for yaw, but they don’t account for viewers looking up or down, i.e., for pitch, or at least they don’t right now. But even if the approximations did account for pitch (which would be a straightforward extension), there are two things they can not account for: head tilt, i.e., roll, and translation.
Let’s talk roll first. When viewing a stereo picture, the displacement vector between the left and right views must be parallel to the line connecting the viewer’s pupils. Do a simple experiment: look at a stereo picture, or go see a stereo movie, and tilt your head 90° to the left or right. Boom, stereo gone, instant eye pain. With baked-in stereo, there is no way to account for different roll angles short of rotating the entire video with the viewer, which would be a recipe for instant nausea (think about it).
Translation, such as imposed by positional head tracking, has the same fundamental problem. The reason why positional head tracking is important is because it introduces motion parallax, an important depth cue. But how can a movie with baked-in stereo reproduce motion parallax? If there is a cube directly in front of you, you see the front face. If you move to the left, you see the side face as well. But if the capturing camera didn’t see the side face, how can the movie show it to you during playback?
The bottom line is that showing a movie with baked-in stereo in a VR setting, e.g., on a head-mounted display, implies disabling positional head tracking, and disabling roll tracking. And even if viewers are conditioned not to move or tilt their heads, even yaw and pitch are only approximations that introduce some noticeable distortions. As the Oculus Rift dev kit v1 has shown, disabling positional tracking is bad enough for most people; disabling roll tracking is much, much worse.
Does this mean live-action VR movies are impossible? No. It only means that baked-in stereo is not an applicable technology, so don’t buy stock in companies that produce “360° 3D movie capture cameras.” They might get away with it now, but they won’t be able to for long. Just like black&white or silent movies don’t really have blockbuster appeal these days.
What are the alternatives? Machinima is one, clearly. Why shouldn’t there be a machinima Citizen Kane? If zombie Orson Welles were working today, he probably would be able to use the constraints of the medium to glorious effect. And Lucasfilm are on the job.
But to get to real live action, we need true 3D cameras instead of stereo cameras. The distinguishing feature of a true 3D camera is that it doesn’t capture baked-in stereo views of whatever scene it’s recording, but a true 3D model of the scene, which can then be played back from arbitrary viewpoints later on. The Kinect is one example of a (low-res) 3D camera. LiDAR is an alternative, and so are algorithms creating 3D models from many 2D photographs. The problem with the latter two, at least right now, is that they don’t work in real time. But for movie production, that’s not really a problem. And before you get all nitpicky: yes, it’s possible to convert a stereo camera into a 3D camera by running a stereo reconstruction algorithm and creating 3D models from the stereo footage. The distinction is in the product (baked-in stereo vs live-rendered 3D models), not in the technology to create the latter.
Any way it’s done, the bottom line is that the technology for proper live action VR movies is not there yet. 3D cameras are low-res and have occlusion problems, or 3D capture spaces are so involved gear-wise that they are not practical for larger or longer scenes. In the short run, if you’re an aspiring VR movie producer, better read up on how to use game engines to create machinima.
Let me close with a related thought: I’m expecting that proper VR movies will use an entirely different language, so to speak, than “traditional” movies. Watching a VR movie would be more like being on stage during a live play, instead of viewing a scene unfold through a viewpoint prescribed by the director. It’s going to be very exciting to see how this develops.