Oculus recently presented the “Crystal Cove,” a version of the Rift head-mounted display with built-in optical tracking, which is combined with the existing inertial tracker to provide a full 6-DOF (position and orientation) tracking solution at low latency, and it is rumored that the Crystal Cove will be released as development kit mark 2 after this year’s Game Developers Conference.
This is great news. I’ve been saying for a long time that Oculus cannot afford to drop positional head tracking on developers at the last minute, because it will break several assumptions built into game engines and other VR software (but let’s talk about game engines here). I’m also happy because the Crystal Cove uses precisely the tracking technology that I predicted: active markers (LEDs) on the headset, and an external camera placed at a fixed position in the environment. I am also sad because I didn’t manage to finish my own after-market optical tracking add-on before Oculus demonstrated their new integrated technology, but that’s life.
So why does positional head tracking break existing games? Because for the first time, the virtual camera used to render a game world is no longer under sufficient control of the software. Let’s take a step back. In a standard, desktop, 3D game, the camera is entirely controlled by the software. The software sets it to some position and orientation determined by the game logic, the 3D engine renders the virtual world for that camera setup, and the result is the displayed image.
For a more concrete example, take a first-person game like Quake (third-person games face the same issues, but things are easier to explain in FPS terms). In such a game, the player is represented in-world by a player character or avatar, which has a position and orientation in space. The avatar is controlled by the user using any combination of input devices, and the game logic takes care to keep the avatar rooted in the virtual world, i..e, keeps it upright, keeps its feet on the ground, and prevents it from moving through obstacles such as walls. The virtual camera will typically be attached to the avatar’s head, and generally face in the same direction as the avatar is facing. The camera will be able to pitch up and down, but left/right rotation will usually be applied to the entire avatar — in other words, most avatars can’t turn their heads sideways.
Adding orientational head tracking, like in the Oculus Rift developer kit mark 1, takes some camera control away from the software, but it turns out the old rules still fundamentally apply. Following the FPS example, the camera’s position will still be locked to the avatar’s head, but the camera’s orientation will be determined by head tracking. In general, there will be a dedicated “forward” direction in real (player) space, such as facing forward in front of the keyboard and mouse, where camera orientation matches avatar orientation, and if the player turns her head away from that forward direction, the avatar will turn its virtual head accordingly. Left/right rotations might again be applied to the avatar’s entire body, but even allowing full rotation, where the avatar can, somewhat unrealistically, rotate their heads by 360° or more, wouldn’t lead to a break in immersion.
The important part of all this is that the virtual camera doesn’t move freely, but only pivots around a fixed position. Conceptually, this is similar to rendering the 3D world onto a sphere centered around the avatar’s head, and then copying the appropriate part of that sphere onto the headset’s screens. In practical terms, this means that camera rotation due to head tracking happens mostly outside the game logic, as part of rendering. This is why injection drivers such as Vireio or vorPX work: they can intercept a game’s rendering command stream, and when detecting that the game sends a projection matrix to the underlying 3D library (OpenGL or Direct3D), the driver can multiply a relative orientation matrix from head tracking, and a displacement matrix for stereo, onto that projection matrix, and rendering still works as before overall.
But the moment positional tracking is added, things break. If positional tracking data is applied to the virtual camera position during rendering, collision detection will no longer work reliably. Imagine the following scenario: in an FPS game, the player avatar is moving along a very narrow hallway, so narrow in fact, that the avatar cannot move left or right at all. Now imagine that the player, while wearing a head-tracked HMD, suddenly takes a step to the side. If head position is applied to the camera only during rendering, the camera will move to the side by the same amount the player moved in the real world, and will clip right through the virtual wall, breaking rendering in a big way. Depending on the rendering engine, the screen might turn entirely black, or show a “hall of mirrors,” or invisible parts of the game world. But whatever happens, immersion is completely broken.
So the bottom line is that head movement must be taken into account by the game logic; specifically, it must be considered during collision detection. The easiest way to achieve that is to treat head tracking not as an independent controller for the virtual camera, but as another type of input device. Put differently, the player moving in the real world should not displace the camera, but move the avatar. Fortunately, this is quite simple.
In any game engine, there should be a (single) place where controller input is applied to the current state of the avatar. This is where mouse, keyboard, joystick, gamepad, …, input is translated into movement. Conceptually, it works along the following lines:
Vector movement = (0, 0, 0) Scalar rotation = 0 if("W key is pressed") movement.y += movementSpeed * timeStep if("A key is pressed") movement.x -= movementSpeed * timeStep // Same for S and D keys... Scalar mouseDeltaX = newMouse.x - lastMouse.x rotation += mouseDeltaX * rotationFactor // Similar for mouse y axis, joystick, gamepad, ..
In the above pseudocode, notice how player movement due to key presses is multiplied by the game time step, while camera rotation due to mouse movement is not. This is because movement keys cause a constant movement velocity, whereas mouse movement causes a constant instantaneous rotation.
With positional head tracking as another input source, this code would be amended as follows:
Vector playerMove = newHeadPos - lastHeadPos movement += playerMove * movementScale
That’s basically it. In this code, movementScale is a scaling factor from real-world units as used by the head tracker, and game units. For example, if the head tracker measures in meters and the game world uses inches as measurement unit, movementScale should be 1000.0/25.4 = 39.37 to apply player movements in the real world at 1:1 scale.
After all movement inputs have been collected to create a final movement vector and rotation angle, the game logic takes over again, and normal collision detection will limit the movement vector to the player-accessible space. Then, during rendering, the virtual camera is fixed to the avatar’s head as in an orientation tracking-only context.
For the player, this will have the following effect: as long as the player avatar moves freely through the game world, head tracking is applied at 1:1 scale. If the player bends down and leans forward to get a closer look at some game object, the virtual world will respond appropriately. More generally, virtual objects in the game world move like real objects. Players can peek around corners by leaning sideways, look through keyholes by kneeling, etc. If, on the other hand, the player avatar bumps into an obstacle, head tracking will be diminished by a proportional amount.
As a concrete example, imagine the player avatar standing half a meter (in game space) in front of a virtual wall, and the player taking a step of one meter forward in the real world. The resulting movement vector would be (0, 1, 0), i.e., one meter forward, but collision detection would clip that vector to (0, 0.5, 0), i.e., placing the avatar smack dab against the wall. To the player, this would feel as if she moved one half meter forward, and then the entire game world moved another half meter forward. The feeling of the game world moving in response to player movements is the same as when there is no positional head tracking at all, so it causes some loss of immersion, but it is not too bad because the transition from 1:1 head tracking to reduced or no head tracking is smooth and seamless.
To continue the example, if the player now takes a step of 1 meter backwards, back to her initial position, the movement vector would be (0, -1, 0), and if no other obstacles are in the way, the player avatar will be one meter (in game space) away from the wall. This means that the player taking a step forward, and then stepping back to the original spot, effectively and permanently moved the game world forward by half a meter, relative to the real world. But that is not a problem at all, because there is usually no large-scale 1:1 correspondence between player movement and avatar movement anyway — movement using mouse or other input devices have the exact same effect of moving the game world relative to the real world.
“But wait,” you might say, “isn’t disabling positional head tracking a Bad Thing™? Didn’t you write an entire rant against it a while back?” Yes, that’s correct, but what I’m describing here is very different from having no head tracking at all. Let’s go back to the example where the player avatar is right up against a wall. If the player moves her head left, right, up, down, or backwards, her motion will still be translated 1:1 onto the player avatar. Only moving the head any further forward is prohibited.
When the player intentionally violates game space constraints, like poking her head through a wall, there are only two ways the game engine can react: it can either do nothing, continue with 1:1 head tracking, and cause a major rendering failure and break in immersion, or it can constrain head tracking in some way. What I’m describing above is, I think, the least intrusive way of constraining head tracking, and by far the lesser of two evils.
So the bottom line is this: to properly handle positional head tracking, game engines (and other VR software) must not treat head tracking as controlling the camera, but must treat it as controlling the player avatar, just like other input devices. Unfortunately, this implies that games need to be written for head tracking from the ground up; after-market adapters like injection drivers will no longer work, because they can influence the game only at the rendering stage, not in the game logic / collision detection stage.
There is one theoretical way to make that work, however: an injection driver could create a fake input device that translates head movement into control. But for that to happen, the game engine would have to have built-in support for “absolute” input devices that translate 1:1 to player motion, and I am not aware of any that have that. Joysticks, keyboards, etc. are all “relative” input devices: pushing forward on a joystick, for example, continuously moves the player avatar by a fixed velocity, not once by a fixed displacement.