Hacking the Oculus Rift DK2

Note: This is part 1 of a four-part series.Ā [Part 2] [Part 3] [Part 4]

Over the weekend, a bunch of people from all over got together on reddit to try and figure out how the Oculus Rift DK2’s optical tracking system works. This was triggered by a call for help to develop an independent SDK from redditor /u/jherico, in response to the lack of an official SDK that works under Linux. That thread became quite unwieldy quickly, with lots of speculation, experimentation, and outright wrong information being thrown around, and then later corrected, but with the corrections nowhere near the wrong bits, etc. etc.

To get some order into things, I want to summarize what we have learned over the weekend, to serve as a starting point for further investigation. In a nutshell, we now know:

  • How to turn on the tracking LEDs integrated into the DK2.
  • How to extract the 3D positions and maximum emission directions of the tracking LEDs, and the position of the DK2’s inertial measurement unit in the same coordinate system.
  • How to get proper video from the DK2’s tracking camera.

Here’s what we still don’t know:

  • How to properly control the tracking LEDs and synchronize them with the camera. Update: We got that.
  • How to extract lens distortion and intrinsic camera parameters for the DK2’s tracking camera. Update: Yup, we got that, too. Well, sort of.
  • And, the big one, how to put it all together to calculate a camera-relative position and orientation of the DK2. šŸ™‚ Update: Aaaaand, we got that, too.

Let’s talk about all these points in a bit more detail.

Turning on the tracking LEDs

/u/jherico, based on HID communication traces he captured from the Windows SDK, figured out that there is a HID feature report (0x0C) that turns on the LEDs. It has a built-in timeout of 10 seconds, meaning the report must be resent at regular intervals by the driver software or the LEDs turn off again. The following is one report sequence to turn on the LEDs:

0x0c 0x00 0x00 0x00 0x01 0x00 0x5E 0x01 0x1A 0x41 0x00 0x00 0x7F

After sending this feature report, the LEDs will turn on for 10 seconds, and they will show up in the tracking camera image very brightly, but they will flicker in an odd way. More on that later.

Figure 1: Oculus Rift DK2 with tracking LEDs turned on, seen from a (rectified fisheye) camera without infrared filter.

Extracting 3D LED positions

LED positions are extracted via HID feature report 0x0F. The process to do so is interesting; each feature report contains only data for a single LED (or the built-in IMU), and every time the driver requests the same report, it gets a different reply. The DK2’s firmware inserts into the report packet the total number of different reports, and the index of the report it just returned. In order to query all positions, the driver asks for a 0x0F feature report, and remembers the first index it receives. It then keeps asking for more reports until it sees the first index again, at which point it has extracted the position of all LEDs, and the IMU. Here is the structure of feature report 0x0F:

0x0F 0x00 0x00 flag X1 X2 X3 X4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4 ...
                    [  pos X  ] [  pos Y  ] [  pos Z  ]

DX1 DX2 DY1 DY2 DZ1 DZ2 0x00 0x00 index 0x00 num 0x00 0x00 0x00
[dir X] [dir Y] [dir Z]

pos X, pos Y, pos Z are 32-bit little-endian signed integers (we have to thank /u/duschendestroyer for that insight) measuring position in micrometers, and dir X, dir Y, dir Z are 16-bit little-endian signed integers defining a direction vector (credit for that goes to /u/duschendestroyer as well), whose magnitude has unknown meaning. Flag is either 0x02 for a report defining an LED, or 0x01 for the report defining the IMU. Additionally, the IMU report has all zeros as direction vector. Index is the index of this report, and num is the total number of reports (41 in the DK2’s case).

When the extracted 3D positions are displayed in 3D (see Figure 2) or dumped to a file, it turns out that they are not symmetric. This hints that the positions are not the LED positions as designed, but as measured during factory calibration. We have not yet determined if there are differences from device to device, i.e., whether each DK2 is truly factory-calibrated, or if all devices contain the same position data.

Figure 2: 3D model of the Oculus Rift DK2 for optical tracking purposes. Red spheres are LEDs, green lines are directions of maximum emissions, blue sphere is built-in inertial measurement unit.

Receiving video from the DK2’s tracking camera

The tracking camera is just a uvc webcam with an infrared filter, and is handled by the standard Linux uvc driver. However, there’s a bug in the camera’s firmware. It advertises itself as having a resolution of 376×480 pixels, and a YUYV pixel format (downsampled and interleaved luminance/chroma channels, typical for webcams). See Figure 3 for what that looks like.

Figure 3: Video from the Oculus Rift DK2’s tracking camera, without correcting for firmware bug.

In reality, the camera has a resolution of 752×480 pixels, and uses a simple Y8 greyscale pixel format. After writing a custom video frame unpacker, the camera image looks correct (see Figures 4a and 4b).

Figure 4a: Video from the Oculus Rift DK2’s tracking camera, after work-around for firmware bug is in place.

Figure 4b: Same as Figure 4a, with blob extractor enabled and LED blobs (and reflections off table) highlighted in green.

On to the open questions…

Controlling the tracking LEDs

When using the LED control feature report 0x0C as indicated above, the LEDs will appear to flash on and off in the tracking camera image. The ratio between on and off changes very slowly over time, and when viewed through another camera (see Figure 1), the LEDs do not appear to flash at all.

This indicates that the LEDs are strobed at a frequency similar, but not identical to, the tracking camera’s frame rate. The result is a beat frequency that manifests as irregular flashing (see Figure 5).

Figure 5: A small frequency difference between a camera’s frame rate and the strobe rate of an LED causes low-frequency flicker in the captured video stream.

This is most probably not an intentional effect; after all, there’s a synchronization cable between the camera and the HMD. So the issue here seems to be that synchronization is not enabled, either in the camera, or in the HMD. It is not yet known which one of the two is reponsible for synchronization, and which one is the source of whatever synchronization signal is sent across the cable (does the HMD trigger the camera, or the other way around?). I’m hoping that the HMD is in charge, because we already know how to send many commands to it; it is not clear how to send commands to the tracking camera short of writing a custom kernel driver.

A closer look at feature report 0x0C, and some experimentation via random bit twiddling, revealed the following. Remember the byte sequence 0x1A 0x41 from above? Interpreted as a little-endian 16-bit unsigned integer, that translates to 16666, which just so happens to be the frame interval of the 60Hz tracking camera, in microseconds (well, 60Hz would be 16667 microseconds, but let’s be generous). And indeed, twiddling with that value changes the apparent LED flicker in the camera image, but I have yet to find a setting where flicker entirely disappears.

Furthermore, looking at the HID feature report dumps redditor /u/jherico captured from the Windows SDK talking to the DK2, there are several variations of feature report 0x0C, differing in bytes 3 and 4. Experimentation with those revealed the following:

  • The SDK sets byte 3 to either 0x00 or 0xFF, but there is no obvious difference in LED behavior. Update: this byte is a LED pattern selector. See below.
  • Bit 0 of byte 4 is a toggle to enable/disable the LEDs.
  • Bit 1 of byte 4, if set, changes the LEDs to a rapidly flickering mode. Update: when set, this bit automatically cycles through a set of LED patterns. See below.
  • Bit 2 of byte 4, if set, slightly changes the brightness of the LEDs when observed with another camera. This is probably a side effect of switching them on/off at a frequency much higher than 60Hz, resulting in an amplitude-modulated IR signal like that used in IR remote controls.
  • Bit 3 of byte 4, if set, turns the LEDs off no matter the value of the other bits.

After this, I decided to throw caution in the wind and experiment with the other bytes as well. Bytes 6 and 7, read as a little-endian 16-bit unsigned integer, start out with value 0x5E 0x01, or 350 in decimal. Changing this value changes apparent flicker, and also the brightness of the LEDs when observed with a different camera. This means this is probably the active interval of the LEDs, and indeed, once it’s set to a value larger than the LED frame interval, flicker disappears. That confirms that bytes 6 and 7 are LED active time in microseconds.

With active time = frame interval, trying the flag bits in byte 4 again, it appears that bit 2 of byte 4 indeed controls modulation of some sort: with the bit off, the LEDs appear to change in brightness rapidly, with the bit on, they pulse at a low frequency, but at least they stay at >90% brightness at all times.

Still, I was not able to synchronize the LED’s frame interval to the camera’s. There is either another feature report that does that, or some unknown byte in some report I already looked at, or maybe it is done by the camera driver (in which case, tough luck).

Update: Here’s an animated GIF showing the LED patterns through which the DK2 cycles to help identifying LEDs. It’s still not synched to the camera; I’m pretty certain that proper synchronization would differentiate them by more than just a slight difference in brightness (see Figure 6).

Figure 6: Slow-motion video of Oculus Rift DK2 cycling through LED patterns.

Update: Thanks to the intrepid work of github user pH5 reverse-engineering the DK2 camera itself, we now know how to synchronize the camera to the LEDs, and what the flashing LEDs mean. And it’s clever.

First about the camera. I speculated on reddit that the source of synchronization might be the headset, and that was first supported by plugging the camera end of the cable into an oscilloscope, and then confirmed by pH5’s work. By writing directly to registers in the camera’s imaging sensor, the sensor can be switched from “master mode” to “snapshot mode,” in which it opens its shutters in direct response to the sync pulse arriving over the cable. The rest of the registers are useful as well; Linux’ uvc camera driver exposes some controls (such as brightness and contrast), but those don’t do anything. Writing directly to the camera registers gives us full control over exposure, gain, black level, and image flipping. See Figure 7 for a quick&dirty camera control GUI that I slapped together in Vrui’s GLMotif toolkit.

Figure 7: A simple control GUI for the Oculus Rift DK2’s tracking camera.

So now about those flashing LEDs… Once the camera is synchronized, everything becomes clear. Each of the LEDs can be driven at two brightness levels. Byte 3 of feature report 0x0C selects one from a set of pre-defined LED patterns where some LEDs are brighter than others. And bit 1 of byte 4 tells the headset to automatically cycle through those pre-made patterns on every camera exposure. And this is clever because it allows the camera to identify LEDs based on their blinking patterns. Update: The rest of this paragraph after the following sentence is all wrong, due to a dumb mistake. Please see the next post in the series. From looking at a lot of camera frames, I figured out that the patterns of all LEDs repeat after five frames, meaning there is a 5-bit code blinked out by each LED. As there are 32 5-bit codes, and 40 LEDs, some LEDs share the same code, but still, LEDs can now be identified in only 5 frames of video. For example, the LED in the center of the headset’s front plate blinks out the pattern “01001”, or 9. The one in the top-right corner (when wearing it) blinks out “10011” or 19, and the one in the top-left corner “11001” or 25.

It’s now going to be pretty straightforward to add LED identification to my blob extraction algorithm.

Extracting lens distortion and intrinsic camera parameters

Before the DK2’s tracking camera can be used for pose estimation, we need to find its intrinsic parameters, including its focal length, principal point, pixel sizes, and skew factor. Given the camera’s relatively large field of view, it will probably suffer from some lens distortion, and unless that is corrected in the camera itself, we will also need correction parameters for that.

I looked at the other feature reports sent by the device, and I found a candidate for intrinsic parameters. Feature report 0x0E contains what look like twelve little-endian 32-bit signed integers, and when arranged as a 3×4 matrix, they look somewhat like a 3D-to-2D projection matrix:

/ 20065     0     0 -5764 \
|     0 20068     0 -8717 |
\ Ā    0     0 22211 15360 /

I haven’t investigated this further, but it’s a lead. Finding lens distortion parameters, if there are any, will be much harder.

Update:I was able to take pictures of my standard calibration target through the DK2’s tracking camera after prying off the IR filter. I have non-linear lens distortion parameters now, compare Figures 8a and 8b. My lens distortion correction algorithm uses the common Brown-Conrady lens distortion formula with three radial and two tangential coefficients. For my specific camera, the distortion center point is (377.4200432774, 226.9465071677), the radial coefficients are K0 = 0.0977900699, K1 = 0.0253642335, and K2 = -0.0000323837, and the tangential coefficients are P0 = 0.0000049341 and P1 = 0.2917013568.

After calculating lens correction, I ran the same target images through a homography estimator to calculate intrinsic camera parameters. Here is the result matrix:

656.964 0.140451 405.806 0.0 657.569 223.097 0.0 0.0 1.0

I would have preferred to get camera parameters directly from the firmware, and maybe they are stored in the imaging sensor’s EEPROM (pH5’s work hints at that), but at least we have something.

Figure 8a: Image of calibration target seen by Oculus Rift DK2’s tracking camera, before lens distortion correction.

Figure 8b: Image of calibration target seen by Oculus Rift DK2’s tracking camera, after lens distortion correction.

3D pose reconstruction

That one will take more than a long Friday night… but there’ll be people working on it. Stay tuned!

Update: Here’s a very first step towards pose calculation: tracing IR blobs in image space through time to establish an initial association. In Figure 9, the small rectangles on the left and right ends of the box are the camera image, and time moves from left to right. Curves of identical color connect LEDs associated from frame to frame.

Figure 9: Traces of identified LED blobs in camera image space through time.

Update:: It’s done and working.

42 thoughts on “Hacking the Oculus Rift DK2

  1. Great work with the led control. I think our best bet is to check when the windows SDK sends which values. Maybe we find hints towards the synchronization and how the led is meant to be used to identify them. This projection matrix looks promising but I think we should just measure the lens distortion ourselves. I will try to work on that this week.

    • Hm, could the delay between the cable and the IR pulse be used to limit the exposure time in some way, or is the camera not capable of controlling the sensor with that precision?

        • I don’t think so. To do time-of-flight on the order of accuracy you would get from regular optical pose estimation, you’d need significantly more advanced hardware than a simple photo diode. Basically, you’d have to do it based on phase difference instead of actual flight time, like the Kinect does, but that requires a light signal modulated at a very high frequency, not the kind of frequency we’re seeing with the DK2’s LEDs.

    • I did not, but that’s another clue. I get a feeling it’s redundant, though. Nice confirmation in one of the linked posts, where he talks about seeing a timing signal with an active period of 360 us. That matches the 350 us setting I think I found in feature report 0x0C.

  2. Oculus engineer said there is a way to get DK2 camera intrinsics from Oculus SDK in June: http://www.mtbs3d.com/phpbb/viewtopic.php?f=120&t=19684

    Good luck with the bundle adjustment (positional tracking from points). It’s nice that Google has bundle adjustment in their open source Ceres-Solver: http://ceres-solver.org/tutorial.html#bundle-adjustment.

    Perhaps the IR receiver was an attempt to get rid of the sync cable (“eh we’ll just put it into a firmware update once we can get it working…”).

    The LED flashing pattern is almost definitely another way of trying to ID them; it should take less than a second to identify every LED on the headset (even if only a few are showing) with the flashing technique. Technically, only two LEDs need to be seen to acquire position if you just use IMU data from the last known orientation..

    It’s a shame the DK2’s pentile screen bins the pixels in hardware; would have been neat to attempt subpixel rendering (ala cleartype).

  3. IIRC, the LED dimming is the LEDs being modulated bright-dim in a unique manner to identify them (there’s 40-odd LEDs, so a 6-bit code, probably 7-bit to provide regular all-bright as a benchmark). This gives you unique LED identification within a bit over 1/10 second regardless of current orientation, without losing track as with a light-one-LED-at-a-time system.

  4. I’m following progress while waiting for my DK2 to finally arrive šŸ˜‰ Good to see all the community effort and see the parts fit together. When DK2’s use of active IR LEDs + IR cam was revealed it struck me that it resembled the tracking mechanism in the “classic” P5 glove from ~2005. It controlled sequential blinking of LEDs through the sync cable – 2 sets of dual “linear sensors” to measure each single LED at 120Hz instead of a “regular” 2D sensor. I initially thought that the DK2s camera couldn’t be fast enough to get usably fast rates or resolution; but it appears they managed to pull this off somehow with this smart combination of LED patterns. Can’t lose the thought that, like the P5, ‘cheap’, linear sensors and embedded processing HW could crank up such system to 1000+ LED measurements per second – and high fps pose out of USB directly (just like those expensive professional tracking systems out there using linear sensors, e.g. PhaseSpace).

    Check out the great P5 teardown here: http://www.mts.net/~kbagnall/p5/p5%20dissassembly.html

    • I have a P5 glove sitting in my closet of dead technologyTM, from all the way back then. I could *never* get it to work. Calculating a 3D ray equation by having two orthogonal slits cast shadows on one of the two ends of a large photodiode per slot was definitely a clever engineering approach to the challenge. I just wish they had used a bit less cheap plastic, so the slits would have been in the proper positions. šŸ™‚

    • It’s a bit tricky because there’s a lot of intelligence in those LEDs. I haven’t updated this post yet, but I found out how to identify individual LEDs in the tracking video stream based on their blinking patterns, and any additional after-market LEDs would have to blink in synch with the original ones, with their own patterns, to work. That would require pretty deep modifications to the DK2’s firmware, or a completely separate circuit board with almost a full embedded CPU to generate patterns for the additional LEDs in synch with the others by snooping on the synch cable. Either way it’s a pretty big undertaking; way above my pay grade, at least.

    • The camera can only take pictures as fast as its internal video timings allow. If a trigger pulse occurs while the camera is still scanning out a frame, it will ignore that trigger. That’s the reason for the mistake I made here (follow-up post forthcoming). At the timings set up by the Oculus run-time, the internal frame interval of the camera is 15.5ms, meaning it can run at 64 fps max. Since the camera’s pixel clock is fixed at 26.66MHz, and horizontal and vertical blanking periods are already quite narrow, the only way to make it go faster is by reducing the window size from 752×480 to something much smaller, or to bin pixels (camera supports 2×2 and 4×4 bins). At 367×320 (binned or windowed) you could run the camera above 120Hz.

      The imaging sensor is an Aptina MT9V034. You can find the data sheet online for all the dirty timing and programming details.

  5. Pingback: Hacking the Oculus Rift DK2, part II | Doc-Ok.org

  6. Pingback: Hacking the Oculus Rift DK2, part III | Doc-Ok.org

  7. Pingback: Hacking the Oculus Rift DK2 | Doc-Ok.org | Vrla...

    • Yes, but there’s no direct support to record the raw data. You’d have to intercept it inside the Oculus Rift tracking module for VRDeviceDaemon and dump it to a file.

      It is easy to record the combined position/orientation measurements as sent to VR applications, but that’s probably not what you’re looking for.

  8. Thank you for your reply! Yes, this sounds right, although I did not intend to use Vrui. Could this be similarly achieved by accessing OculusSDK > LibOVR > Src > Tracking?

    • Yes, apparently raw sensor data is communicated from the run-time to the SDK along with derived tracking data. I don’t know how to access the relevant data elements through the SDK, though.

  9. Pingback: Examining the Valve/HTC Vive Ecosystem: Basic Sensors and Processing | Metaversing

  10. Hi, Oliver, MT9V034 can work on 60Hz, does that mean, the optical latency is less than 16ms. Or otherwise, frequency doesn’t represent for the latency, it would took longer for image sensor to process the optical information. In my research, at 60hz frequency, the latency rise up to 100ms with the same cmos, have you got thoughts about this? Thanks in advance

  11. Hi,

    I have to say your work here is impressive though most of it is beyond my understanding. I was wondering how you managed to fix the camera display though. I’m currently trying to use the DK2 camera to track a TrackIR clip using Opentrack (https://github.com/opentrack/opentrack) on Windows but I’m stuck with the 376×480 resolution and green tint you have pictured above. Is there an easy way I can fix this?

    • The image format is a driver problem, and on Linux my software can simply ignore what the driver tells it and do the right thing.

      The same might work on Windows, too. When you receive a 376×480 image in your code, do you receive an RGB image, or a YUYV image? Is there a way to ask the driver for raw YUYV instead of RGB? If so, you can just treat the 376 YUYV pixels you get per line, with two bytes per pixel, as 752 greycale pixels, with one byte per pixel, and you’ll get the correct image. Once the image is converted to RGB, that information is lost.

      • That’s the issue: I don’t actually have any code so can’t really do anything to fix it. Ideally I just need a driver I can install to make the camera display correctly in the Opentrack software. Unfortunately I’m not really proficient enough with programming to know what I’m doing. I’ve only really written basic drivers to control displays and other components with microcontrollers like arduinos and PICs and have no idea what I’m doing when it comes to OS drivers.

Please leave a reply!