Over the weekend, a bunch of people from all over got together on reddit to try and figure out how the Oculus Rift DK2’s optical tracking system works. This was triggered by a call for help to develop an independent SDK from redditor /u/jherico, in response to the lack of an official SDK that works under Linux. That thread became quite unwieldy quickly, with lots of speculation, experimentation, and outright wrong information being thrown around, and then later corrected, but with the corrections nowhere near the wrong bits, etc. etc.
To get some order into things, I want to summarize what we have learned over the weekend, to serve as a starting point for further investigation. In a nutshell, we now know:
- How to turn on the tracking LEDs integrated into the DK2.
- How to extract the 3D positions and maximum emission directions of the tracking LEDs, and the position of the DK2’s inertial measurement unit in the same coordinate system.
- How to get proper video from the DK2’s tracking camera.
Here’s what we still don’t know:
- How to properly control the tracking LEDs and synchronize them with the camera. Update: We got that.
- How to extract lens distortion and intrinsic camera parameters for the DK2’s tracking camera. Update: Yup, we got that, too. Well, sort of.
- And, the big one, how to put it all together to calculate a camera-relative position and orientation of the DK2. 🙂 Update: Aaaaand, we got that, too.
Let’s talk about all these points in a bit more detail.
Turning on the tracking LEDs
/u/jherico, based on HID communication traces he captured from the Windows SDK, figured out that there is a HID feature report (0x0C) that turns on the LEDs. It has a built-in timeout of 10 seconds, meaning the report must be resent at regular intervals by the driver software or the LEDs turn off again. The following is one report sequence to turn on the LEDs:
0x0c 0x00 0x00 0x00 0x01 0x00 0x5E 0x01 0x1A 0x41 0x00 0x00 0x7F
After sending this feature report, the LEDs will turn on for 10 seconds, and they will show up in the tracking camera image very brightly, but they will flicker in an odd way. More on that later.
Extracting 3D LED positions
LED positions are extracted via HID feature report 0x0F. The process to do so is interesting; each feature report contains only data for a single LED (or the built-in IMU), and every time the driver requests the same report, it gets a different reply. The DK2’s firmware inserts into the report packet the total number of different reports, and the index of the report it just returned. In order to query all positions, the driver asks for a 0x0F feature report, and remembers the first index it receives. It then keeps asking for more reports until it sees the first index again, at which point it has extracted the position of all LEDs, and the IMU. Here is the structure of feature report 0x0F:
0x0F 0x00 0x00 flag X1 X2 X3 X4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4 ... [ pos X ] [ pos Y ] [ pos Z ] DX1 DX2 DY1 DY2 DZ1 DZ2 0x00 0x00 index 0x00 num 0x00 0x00 0x00 [dir X] [dir Y] [dir Z]
pos X, pos Y, pos Z are 32-bit little-endian signed integers (we have to thank /u/duschendestroyer for that insight) measuring position in micrometers, and dir X, dir Y, dir Z are 16-bit little-endian signed integers defining a direction vector (credit for that goes to /u/duschendestroyer as well), whose magnitude has unknown meaning. Flag is either 0x02 for a report defining an LED, or 0x01 for the report defining the IMU. Additionally, the IMU report has all zeros as direction vector. Index is the index of this report, and num is the total number of reports (41 in the DK2’s case).
When the extracted 3D positions are displayed in 3D (see Figure 2) or dumped to a file, it turns out that they are not symmetric. This hints that the positions are not the LED positions as designed, but as measured during factory calibration. We have not yet determined if there are differences from device to device, i.e., whether each DK2 is truly factory-calibrated, or if all devices contain the same position data.
Receiving video from the DK2’s tracking camera
The tracking camera is just a uvc webcam with an infrared filter, and is handled by the standard Linux uvc driver. However, there’s a bug in the camera’s firmware. It advertises itself as having a resolution of 376×480 pixels, and a YUYV pixel format (downsampled and interleaved luminance/chroma channels, typical for webcams). See Figure 3 for what that looks like.
In reality, the camera has a resolution of 752×480 pixels, and uses a simple Y8 greyscale pixel format. After writing a custom video frame unpacker, the camera image looks correct (see Figures 4a and 4b).
On to the open questions…
Controlling the tracking LEDs
When using the LED control feature report 0x0C as indicated above, the LEDs will appear to flash on and off in the tracking camera image. The ratio between on and off changes very slowly over time, and when viewed through another camera (see Figure 1), the LEDs do not appear to flash at all.
This indicates that the LEDs are strobed at a frequency similar, but not identical to, the tracking camera’s frame rate. The result is a beat frequency that manifests as irregular flashing (see Figure 5).
This is most probably not an intentional effect; after all, there’s a synchronization cable between the camera and the HMD. So the issue here seems to be that synchronization is not enabled, either in the camera, or in the HMD. It is not yet known which one of the two is reponsible for synchronization, and which one is the source of whatever synchronization signal is sent across the cable (does the HMD trigger the camera, or the other way around?). I’m hoping that the HMD is in charge, because we already know how to send many commands to it; it is not clear how to send commands to the tracking camera short of writing a custom kernel driver.
A closer look at feature report 0x0C, and some experimentation via random bit twiddling, revealed the following. Remember the byte sequence 0x1A 0x41 from above? Interpreted as a little-endian 16-bit unsigned integer, that translates to 16666, which just so happens to be the frame interval of the 60Hz tracking camera, in microseconds (well, 60Hz would be 16667 microseconds, but let’s be generous). And indeed, twiddling with that value changes the apparent LED flicker in the camera image, but I have yet to find a setting where flicker entirely disappears.
Furthermore, looking at the HID feature report dumps redditor /u/jherico captured from the Windows SDK talking to the DK2, there are several variations of feature report 0x0C, differing in bytes 3 and 4. Experimentation with those revealed the following:
- The SDK sets byte 3 to either 0x00 or 0xFF, but there is no obvious difference in LED behavior. Update: this byte is a LED pattern selector. See below.
- Bit 0 of byte 4 is a toggle to enable/disable the LEDs.
- Bit 1 of byte 4, if set, changes the LEDs to a rapidly flickering mode. Update: when set, this bit automatically cycles through a set of LED patterns. See below.
- Bit 2 of byte 4, if set, slightly changes the brightness of the LEDs when observed with another camera. This is probably a side effect of switching them on/off at a frequency much higher than 60Hz, resulting in an amplitude-modulated IR signal like that used in IR remote controls.
- Bit 3 of byte 4, if set, turns the LEDs off no matter the value of the other bits.
After this, I decided to throw caution in the wind and experiment with the other bytes as well. Bytes 6 and 7, read as a little-endian 16-bit unsigned integer, start out with value 0x5E 0x01, or 350 in decimal. Changing this value changes apparent flicker, and also the brightness of the LEDs when observed with a different camera. This means this is probably the active interval of the LEDs, and indeed, once it’s set to a value larger than the LED frame interval, flicker disappears. That confirms that bytes 6 and 7 are LED active time in microseconds.
With active time = frame interval, trying the flag bits in byte 4 again, it appears that bit 2 of byte 4 indeed controls modulation of some sort: with the bit off, the LEDs appear to change in brightness rapidly, with the bit on, they pulse at a low frequency, but at least they stay at >90% brightness at all times.
Still, I was not able to synchronize the LED’s frame interval to the camera’s. There is either another feature report that does that, or some unknown byte in some report I already looked at, or maybe it is done by the camera driver (in which case, tough luck).
Update: Here’s an animated GIF showing the LED patterns through which the DK2 cycles to help identifying LEDs. It’s still not synched to the camera; I’m pretty certain that proper synchronization would differentiate them by more than just a slight difference in brightness (see Figure 6).
Update: Thanks to the intrepid work of github user pH5 reverse-engineering the DK2 camera itself, we now know how to synchronize the camera to the LEDs, and what the flashing LEDs mean. And it’s clever.
First about the camera. I speculated on reddit that the source of synchronization might be the headset, and that was first supported by plugging the camera end of the cable into an oscilloscope, and then confirmed by pH5’s work. By writing directly to registers in the camera’s imaging sensor, the sensor can be switched from “master mode” to “snapshot mode,” in which it opens its shutters in direct response to the sync pulse arriving over the cable. The rest of the registers are useful as well; Linux’ uvc camera driver exposes some controls (such as brightness and contrast), but those don’t do anything. Writing directly to the camera registers gives us full control over exposure, gain, black level, and image flipping. See Figure 7 for a quick&dirty camera control GUI that I slapped together in Vrui’s GLMotif toolkit.
So now about those flashing LEDs… Once the camera is synchronized, everything becomes clear. Each of the LEDs can be driven at two brightness levels. Byte 3 of feature report 0x0C selects one from a set of pre-defined LED patterns where some LEDs are brighter than others. And bit 1 of byte 4 tells the headset to automatically cycle through those pre-made patterns on every camera exposure. And this is clever because it allows the camera to identify LEDs based on their blinking patterns. Update: The rest of this paragraph after the following sentence is all wrong, due to a dumb mistake. Please see the next post in the series. From looking at a lot of camera frames, I figured out that the patterns of all LEDs repeat after five frames, meaning there is a 5-bit code blinked out by each LED. As there are 32 5-bit codes, and 40 LEDs, some LEDs share the same code, but still, LEDs can now be identified in only 5 frames of video. For example, the LED in the center of the headset’s front plate blinks out the pattern “01001”, or 9. The one in the top-right corner (when wearing it) blinks out “10011” or 19, and the one in the top-left corner “11001” or 25.
It’s now going to be pretty straightforward to add LED identification to my blob extraction algorithm.
Extracting lens distortion and intrinsic camera parameters
Before the DK2’s tracking camera can be used for pose estimation, we need to find its intrinsic parameters, including its focal length, principal point, pixel sizes, and skew factor. Given the camera’s relatively large field of view, it will probably suffer from some lens distortion, and unless that is corrected in the camera itself, we will also need correction parameters for that.
I looked at the other feature reports sent by the device, and I found a candidate for intrinsic parameters. Feature report 0x0E contains what look like twelve little-endian 32-bit signed integers, and when arranged as a 3×4 matrix, they look somewhat like a 3D-to-2D projection matrix:
/ 20065 0 0 -5764 \ | 0 20068 0 -8717 | \ 0 0 22211 15360 /
I haven’t investigated this further, but it’s a lead. Finding lens distortion parameters, if there are any, will be much harder.
Update:I was able to take pictures of my standard calibration target through the DK2’s tracking camera after prying off the IR filter. I have non-linear lens distortion parameters now, compare Figures 8a and 8b. My lens distortion correction algorithm uses the common Brown-Conrady lens distortion formula with three radial and two tangential coefficients. For my specific camera, the distortion center point is (377.4200432774, 226.9465071677), the radial coefficients are K0 = 0.0977900699, K1 = 0.0253642335, and K2 = -0.0000323837, and the tangential coefficients are P0 = 0.0000049341 and P1 = 0.2917013568.
After calculating lens correction, I ran the same target images through a homography estimator to calculate intrinsic camera parameters. Here is the result matrix:
656.964 0.140451 405.806 0.0 657.569 223.097 0.0 0.0 1.0
I would have preferred to get camera parameters directly from the firmware, and maybe they are stored in the imaging sensor’s EEPROM (pH5’s work hints at that), but at least we have something.
3D pose reconstruction
That one will take more than a long Friday night… but there’ll be people working on it. Stay tuned!
Update: Here’s a very first step towards pose calculation: tracing IR blobs in image space through time to establish an initial association. In Figure 9, the small rectangles on the left and right ends of the box are the camera image, and time moves from left to right. Curves of identical color connect LEDs associated from frame to frame.
Update:: It’s done and working.