I finally managed to get the Oculus Rift DK2 fully supported in my Vrui VR toolkit, and while there are still some serious issues, such as getting the lens distortion formulas and internal HMD geometry exactly right, I’ve already noticed something really neat.
I have a bunch of graphically simple applications that run at ridiculous frame rates (some get several thousand fps on an Nvidia GeForce 770 GTX), and with some new rendering configuration options in Vrui 4.0 I can disable vsync, and render directly into the display window’s front buffer. In other words, I can let these applications “race the beam.”
There are two main results of disabling vsync and rendering into the front buffer: For one, the CPU and graphics card get really hot (so this is not something you want to do this naively). But second, let’s assume that some application can render 1,000 fps. This means, every millisecond, a new complete video frame is rendered into video scan-out memory, where it gets picked up by the video controller and sent across the video link immediately. In other words, almost every line of the Rift’s display gets a “fresh” image, based on most up-to-date tracking data, and flashes this image to the user’s retina without further delay. Or in other words, total motion-to-photon latency for the entire screen is now down to around 1ms. And the result of that is by far the most solid VR I’ve ever seen.
Since Microsoft’s Build 2015 conference, and increasingly since Microsoft’s showing at E3, everybody (including me) has been talking about HoloLens, and its limited field of view (FoV) has been a contentious topic. The main points being argued (fought) about are:
What exactly is the HoloLens’ FoV?
Why is it as big (or small) as it is, and will it improve for the released product?
How does the size of the FoV affect the HoloLens’ usability and effectiveness?
Were Microsoft’s released videos and live footage of stage demos misleading?
How can one visualize the HoloLens’ FoV in order to give people who have not tried it an idea what it’s like?
Measuring Field of View
Initially, there was little agreement among those who experienced HoloLens regarding its field of view. That’s probably due to two reasons: one, it’s actually quite difficult to measure the FoV of a headmounted display; and two, nobody was allowed to bring any tools or devices into the demonstration rooms. In principle, to measure see-through FoV, one has to hold some object, say a ruler, at a known distance from one’s eyes, and then mark down where the apparent left and right edges of the display area fall on the object. Knowing the distance X between the left/right markers and the distance Y between the eyes and the object, FoV is calculated via simple trigonometry: FoV = 2×tan-1(X / (Y×2)) (see Figure 1).
Figure 1: Calculating field of view by measuring the horizontal extent of the apparent screen area at a known distance from the eyes. (In this diagram, FoV is 2×tan-1(6″ / (6″×2)) = 53.13°.)
Last Friday I made a trek down to the San Francisco peninsula, to visit and chat with a couple of other VR folks: Cyberith, SVVR, and AltspaceVR. In the process, I also had the chance to try a couple of VR devices I hadn’t seen before.
Virtual locomotion, and its nasty side effect, simulator sickness, are a pretty persistent problem and timely topic with the arrival of consumer VR just around the corner. Many enthusiasts want to use VR to explore large virtual worlds, as in taking a stroll through the frozen tundra of Skyrim or the irradiated wasteland of Fallout, but as it turns out, that’s one of the hardest things to do right in VR.
Figure 1: Cyberith Virtualizer, driven by an experienced user (Tuncay Cakmak). Yes, you can jump and run, with some practice.
I attended the Augmented World Expo (AWE) once before, in 2013 when I took along an Augmented Reality Sandbox. This time, AWE partnered with UploadVR to include a significant VR subsection. I’m going to split my coverage, focusing on that VR component here, while covering the AR offering in another post.
eMagin 2k×2k VR HMD
eMagin’s (yet to be named) new head-mounted display was the primary reason I went to AWE in the first place. I had seen it announced here and there, but I was skeptical it would be able to provide the advertised field of view of 80°×80°. Unlike Oculus Rift, HTC/Valve Vive, or other post-renaissance HMDs, eMagin’s is based on OLED microdisplays (unsurprisingly, with microdisplay manufacture being eMagin’s core business). Previous microdisplay-based HMDs, including eMagin’s own Z800 3DVisor, were very limited in the FoV department, usually topping out around 40°. Magnifying a display that measures around 1cm2 to a large solid angle requires much more complex optics than doing the same for a screen that’s several inches across.
Figure 1: eMagin’s unnamed 2k x 2k, 80×80 degree FoV, VR HMD with flip-up optics.
Yesterday, I attended the second annual Silicon Valley Virtual Reality Conference & Expo in San Jose’s convention center. This year’s event was more than three times bigger than last year’s, with around 1,400 attendees and a large number of exhibitors.
Unfortunately, I did not have as much time as I would have liked to visit and try all the exhibits. There was a printing problem at the registration desk in the morning, and as a result the keynote and first panel were pushed back by 45 minutes, overlapping the expo time; additionally, I had to spend some time preparing for and participating in my own panel on “VR Input” from 3pm-4pm.
The panel was great: we had Richard Marks from Sony (Playstation Move, Project Morpheus), Danny Woodall from Sixense (STEM), Yasser Malaika from Valve (HTC Vive, Lighthouse), Tristan Dai from Noitom (Perception Neuron), and Jason Jerald as moderator. There was lively discussion of questions posed by Jason and the audience. Here’s a recording of the entire panel:
One correction: when I said I had been following Tactical Haptics‘ progress for 2.5 years, I meant to say 1.5 years, since the first SVVR meet-up I attended. Brainfart. Continue reading →
I have briefly mentioned HoloLens, Microsoft’s upcoming see-through Augmented Reality headset, in a previous post, but today I got the chance to try it for myself at Microsoft’s “Build 2015” developers’ conference. Before we get into the nitty-gritty, a disclosure: Microsoft invited me to attend Build 2015, meaning they waived my registration fee, and they gave me, like all other attendees, a free HP Spectre x360 notebook (from which I’m typing right now because my vintage 2008 MacBook Pro finally kicked the bucket). On the downside, I had to take Amtrak and Bart to downtown San Francisco twice, because I wasn’t able to get a one-on-one demo slot on the first day, and got today’s 10am slot after some finagling and calling in of favors. I guess that makes us even. 😛
Microsoft just announced HoloLens, which “brings high-definition holograms to life in your world.” A little while ago, Google invested heavily in Magic Leap, who, in their own words, “bring magic back into the world.” A bit longer ago, CastAR promised “a magical experience of a 3D, holographic world.” Earlier than that, zSpace started selling displays they used to call “virtual holographic 3D.” Then there is the current trailblazer in mainstream virtual reality, the Oculus Rift, and other, older, VR systems such as CAVEs.
Figure 1: A real person next to two “holograms,” in a CAVE holographic display.
While these things are quite different from a technical point of view, from a user’s point of view, they have a large number of things in common. Wouldn’t it be nice to have a short, handy term that covers them all, has a well-matching connotation in the minds of the “person on the street,” and distinguishes these things from other things that might be similar technically, but have a very different user experience?
In the previous part of this ongoing series of posts, I described how the Oculus Rift DK2’s tracking LEDs can be identified in the video stream from the tracking camera via their unique blinking patterns, which spell out 10-bit binary numbers. In this post, I will describe how that information can be used to estimate the 3D position and orientation of the headset relative to the camera; the first important step in full positional head tracking.
Figure 1: Still frame from pose estimation video, showing a 3D model of the DK2’s headset (the purple wireframe) projected onto a raw 2D video frame from the tracking camera based on reconstructed position and orientation.
3D pose estimation, or the problem of reconstructing the 3D position and orientation of a known object relative to a single 2D camera, also known as the perspective-n-point problem, is a well-researched topic in computer vision. In the special case of the Oculus Rift DK2, it is the foundation of positional head tracking. As I tried to explain in this video, an inertial measurement unit (IMU) by itself cannot track an object’s absolute position over time, because positional drift builds up rapidly and cannot be controlled without an external 3D reference frame. 3D pose estimation via an external camera provides exactly such a reference frame. Continue reading →
The final update/edit to my previous post was to report that I had managed to synchronize the DK2’s tracking LEDs to its camera’s video stream by following pH5’s ouvrt code, and that I was able to extract 5-bit IDs for each LED by observing changes in that LED’s brightness over time. Unfortunately I’ll have to start off right away by admitting that I made a bad mistake.
Understanding the DK2’s camera
Once I started looking more closely, I realized that the camera was only capturing 30 frames per second when locked to the DK2’s synchronization cable, instead of the expected 60. After downloading the data sheet for the camera’s imaging sensor, the Aptina MT9V034, and poring over the documentation, I realized that I had set a wrong vertical blanking interval. Instead of using a value of 5, as the official run-time and pH5’s code, I was using a value of 57, because that was the original value I found in the vertical blanking register before I started messing with the sensor. As it turns out, a camera — or at least this camera — captures video in the same way as a monitor displays it: padded with a horizontal and vertical blanking period. By leaving the vertical blanking period too large, I had extended the time it takes the camera to capture and send a frame across its host interface. Extended by how much? Well, the camera has a usable frame size of 752×480 pixels, a horizontal blanking interval of 94 pixels, and a (fixed) pixel clock of 26.66MHz. Using a vertical blanking interval of 5 lines, the total frame time is ((752+94)*(480+5)+4)/26.66MHz = 15.391ms (in case you’re wondering where the “+4” comes from, so am I. It’s part of the formula in the data sheet). Using 57 as vertical blanking interval, the total frame time becomes ((752+94)*(480+57)+4)/26.66MHz = 17.041ms. Notice something? 17.041ms is longer than the synchronization pulse interval of 16.666ms. Oops. The exposure trigger for an odd frame arrives at a time when the camera is still busy processing the preceding even frame, and is therefore ignored, resulting in the camera skipping every odd frame and capturing at 30Hz. Lesson learned.
Figure 1: First result from LED identification algorithm, showing wrong ID numbers due to the camera dropping video frames all over the place.
Over the weekend, a bunch of people from all over got together on reddit to try and figure out how the Oculus Rift DK2’s optical tracking system works. This was triggered by a call for help to develop an independent SDK from redditor /u/jherico, in response to the lack of an official SDK that works under Linux. That thread became quite unwieldy quickly, with lots of speculation, experimentation, and outright wrong information being thrown around, and then later corrected, but with the corrections nowhere near the wrong bits, etc. etc.
To get some order into things, I want to summarize what we have learned over the weekend, to serve as a starting point for further investigation. In a nutshell, we now know:
How to turn on the tracking LEDs integrated into the DK2.
How to extract the 3D positions and maximum emission directions of the tracking LEDs, and the position of the DK2’s inertial measurement unit in the same coordinate system.
How to get proper video from the DK2’s tracking camera.
Here’s what we still don’t know:
How to properly control the tracking LEDs and synchronize them with the camera. Update: We got that.
How to extract lens distortion and intrinsic camera parameters for the DK2’s tracking camera. Update: Yup, we got that, too. Well, sort of.
And, the big one, how to put it all together to calculate a camera-relative position and orientation of the DK2. 🙂 Update: Aaaaand, we got that, too.