On the road for VR: Microsoft HoloLens at Build 2015, San Francisco

I have briefly mentioned HoloLens, Microsoft’s upcoming see-through Augmented Reality headset, in a previous post, but today I got the chance to try it for myself at Microsoft’s “Build 2015” developers’ conference. Before we get into the nitty-gritty, a disclosure: Microsoft invited me to attend Build 2015, meaning they waived my registration fee, and they gave me, like all other attendees, a free HP Spectre x360 notebook (from which I’m typing right now because my vintage 2008 MacBook Pro finally kicked the bucket). On the downside, I had to take Amtrak and Bart to downtown San Francisco twice, because I wasn’t able to get a one-on-one demo slot on the first day, and got today’s 10am slot after some finagling and calling in of favors. I guess that makes us even. 😛

So, on to the big question: is HoloLens real? Given Microsoft’s track record with product announcements (see 2009’s Project Natal trailer and especially the infamous Milo “demo”), there was some well-deserved skepticism regarding the HoloLens teaser released in January, and even the on-stage demo that was part of the Build 2015 keynote:

The short answer is: yes, it’s real, but…

The long answer is, well, long. To tackle it, we have to first detail what Microsoft is promising: at the most basic level, to “add holograms to your world” (for the sake of peace, let’s keep the discussion of what is holographic and what isn’t elsewhere). On a slightly more technical level, this means HoloLens is promised as an untethered see-through Augmented Reality (AR) headset that seamlessly merges virtual three-dimensional objects into the real world, and allows users to interact with those objects via gaze, gesture, and voice (“GGV,” as it was called during the demos). On a deeper technical level, this means HoloLens has to have the following features:

  • A stereoscopic see-through display, ideally one with high resolution, high brightness and contrast, large field-of-view, and a way to remove real-world objects that are supposed to be occluded by virtual objects.
  • Low-latency 6-DOF (positional and orientational) tracking, to render virtual 3D objects from the users correct point-of-view, and track the user’s gaze direction for interaction.
  • A real-time scanning system that creates a 3D model of the user’s real environment, to have virtual objects interact with that environment (such as placing a virtual object onto a table, or hanging a virtual picture on a real wall).
  • A reliable, low-latency, and accurate hand tracker to detect gestures, and allow direct manipulation of virtual objects (such as grabbing an object with one’s hand, and moving it to a different position).
  • A reliable speech recognition engine.

Display

Let’s start with the display. The biggest question going in was: how big is the field of view? And the answer is: small. As I was stripped of all devices and gadgets before being allowed into the demo room, I had to guesstimeasure it by covering the visible screen with my hands (fingers splayed) at arm’s length, ending up with 1 3/4 hands horizontally, and 1 hand vertically (in other words, a 16:9 screen aspect ratio) (see Figure 1). In non-Doc-Ok units, that comes out to about 30° by 17.5° (for comparison, the Oculus Rift DK2’s field of view is about 100° by 100°). From a practical point of view, this means that virtual objects only appear in the center of the viewer’s field of view, which turns out to be very distracting and annoying. Interestingly, this is compounded by the visor’s much larger physical field of view: on the plus side, the user doesn’t get tunnel vision (it feels a bit like wearing lab safety glasses), but on the other hand, the virtual objects get cut off for no visible reason, such as spectacle frames, and simply vanish into thin air at the edges of the screen.

Figure 1: Measuring the HoloLens’ augmentation field of view by covering it with splayed hands at outstretched arms. The brighter rectangle in the center of the image corresponds to the area in which virtual objects (“holograms”) can appear. It is approximately 30° x 17.5°.

Another big question was: can HoloLens present opaque virtual objects? As the screen is transparent, it by itself can only add light to the user’s view, which would give virtual objects a ghostly appearance. There was some speculation whether HoloLens has an additional display layer that can turn the screen opaque pixel-by-pixel, but it turns out it does not. The screen is bright enough that, in a controlled environment like the darkened demo rooms, the background is effectively erased by virtual objects, but when viewing objects against a bright background (I used a table lamp), they become barely visible.

And the third, and final, big display question was: does HoloLens provide accommodation cues, i.e., does it present virtual objects at the proper focal distance, like a real hologram or a light field display? This one I can’t answer definitively. I was going to test it by moving very close to a virtual object and comparing the object’s focus against my hand right next to it, but it turns out the HoloLens’ near plane is set at about 60cm, meaning objects can’t be viewed up close. As HoloLens is supposed to augment human-sized environments, it can assume that virtual objects only appear between 60cm (near plane) and a few meters distance, and could get away with a fixed focal distance somewhere in the middle, which I think is exactly what it does. In practice, virtual objects looked sharp and in focus throughout the range that was available. It will be interesting to see how this changes with applications that allow to extend the real space, by virtually punching holes into the real environment. According to attendees of the “holographic academy,” one of the demo applications there did just that, but I did not find any comments regarding focus.

Aside from these points, the display’s overall quality was very good. It had high resolution (individual pixels were barely visible), good brightness and contrast (at least in the darkened demo room), and good sharpness across the entire screen. If I were to guess — and it’s only a guess — I’d say it’s a 1280×720 display. For comparison, the Oculus Rift DK2 has 960×1080 pixels per eye, spread over a 100°x100° field of view, and looks comparatively blocky. Stereoscopic presentation was also very good (the demo team measured and entered my IPD prior to the demo); objects were embedded into the real environment at apparently proper depth and scale. I would have liked to test this thoroughly by moving very close to an object, but alas, the 60cm near plane prevented that.

6-DOF Tracking

On to tracking. I still can’t say with confidence what tracking method the HoloLens uses. It sports four cameras, one (stereo?) pair each on either side of the visor, facing approximately 45° forward-left and forward-right. Until more technical details are released, I am assuming that tracking is based on sensor fusion of inertial dead reckoning and real-time simultaneous localization and mapping (SLAM), which could be the purview of the HoloLens’ mysterious “holographic processing unit.” I tried tripping up the tracking system by covering both camera pairs with my hands, but tracking didn’t break down. This could be due to me getting my hands in the wrong place, or there being additional cameras behind transparent covers, or due to tracking using an entirely different mechanism. It’s pure speculation at this point.

But whatever tracking method is used, it works very well. There is no noticeable tracking noise while holding one’s head still, and there is only little jitter while moving or rotating one’s head slowly. There is noticeable tracking latency, such that virtual objects get visibly dragged along with gaze direction changes. But unlike in head-mounted Virtual Reality, where such lag can quickly lead to simulator sickness, in Augmented Reality lag manifests as virtual objects becoming “untethered,” and wobbling around their intended positions. This is visible, and can be a minor distraction, but doesn’t cause problems as the user’s visual and vestibular system always stay locked to the real environment.

Environment Scanning

The HoloLens’ environment scanner was a prominently featured part of the demo. As part of initial setup, I was asked to look around the demo room to capture it into a 3D model. This process was visualized in a neat way, by drawing the evolving triangle mesh as a semi-transparent virtual object at 1:1 scale with the real environment. The mesh included complex objects such as a potted plant, at a resolution comparable to scanning an environment with a Kinect camera and the KinectFusion software. That said, I did get the impression that the room had been scanned beforehand and was already loaded up as a 3D model, because I only got to scan the forward portion of the room (the demo was very rushed, with a pushy host), but later on, I could hang virtual objects on the side walls.

A big scanning-related question going in was whether HoloLens correctly embeds virtual objects into the real environment, meaning, whether virtual objects are occluded by real objects in front of them. There are two parts to this: static environment occlusion, and hand/arm occlusion. Surprisingly, there was no environment occlusion. I dragged a virtual object onto a part of the side wall jutting out from the main wall, and then walked into the alcove behind it. To my surprise, I could still see the backside of the object on the wall (and yes, the wall was part of what I had scanned during setup). To my dismay, I cannot say with certainty whether body occlusion works as expected. Due to the limited gesture interface (more on that later), I forgot to try and occlude virtual objects with my hands. But as I did not notice any artifacts when using gestures later, I am assuming for now that there is no body occlusion, in other words, virtual objects will appear overlaid onto the user’s hands. The very limited vertical field of view helps here: the “sweet spot” for gesture recognition is somewhat in front of the user’s chest, which usually ends up completely underneath the display area.

A somewhat related question is how deeply virtual objects interact with the environment. Specific object types such as panels or toolboxes snapped to walls, but other objects did not seem to have collision detection. Also, there was no apparent interaction between the room’s real lighting and illumination of virtual objects. In the demo I saw, all objects were rendered in a cartoony style, with very diffuse lighting. There was no visible correspondence to real-world lighting in the room (which, as said, was darkened).

Gaze, Gesture, and Voice

In the demo application I saw, object selection was completely based on head position and orientation, in other words, HoloLens does not provide, or the application did not use, eye/gaze tracking (as an aside, the demo host in charge of group introduction dismissed gaze tracking as an input method). Feedback was provided by drawing a glyph at the intersection point between the view ray and a virtual object, or at some distance along the ray in case it missed all objects. The selection ray exhibited jitter more obviously than the virtual objects themselves, which is expected, but made selecting some of the smaller objects a bit tricky.

Gestures, sadly, were very limited. The only gesture recognized by the demos is the now famous “air tap,” which is detected reliably, but the position of the user’s hand is either not tracked, or not used. In other words, HoloLens treats two hands and ten fingers as nothing but a single button. I can only hope that this was a limitation of current pre-release technology, and will be fixed soon.

On the other hand, voice recognition worked reliably. The demo application understood commands from a limited vocabulary (“movement!” “rotation!” “rescale!” “undo!” etc.), but had some problems with my accent (it didn’t understand me saying “movement”). I’m assuming that some user-specific training would have fixed that.

Comfort and Battery Life

There isn’t much to say about that. The HoloLens attaches to the user’s head via a rigid brace that is tightened via a thumbscrew in the back, and the visor hangs out in front but does not rest on the user’s nose. The device is not noticeably heavy, and was comfortable to wear during the short demo time. It stayed in place even while I was moving around, and I was able to adjust it to see the entire screen (the small screen, that is) quickly. It’s impossible to judge battery life from a 15 minute demo, but as it so happened, mine ran out of juice about halfway through the demo, and I had to start over with a second one (the world I had already created did not persist). Make of that what you will.

Conclusion

So, how does the real HoloLens compare to what was shown during the keynote presentation? Pretty well, actually, with the glaring discrepancy being field-of-view. In the presentation, the view through a HoloLens was simulated by attaching a 6-DOF tracker (hopefully the same one as used by the real HoloLens) to a camera, and then rendering virtual objects for the camera’s point-of-view, and compositing them into the camera’s video stream in real time — in other words, exactly what a standard AR application for a smartphone or tablet does. This created the impression that the presenters on stage were completely surrounded by “holograms,” which is only true from a certain point of view. Yes, they were surrounded, but they could only see the “holograms” when looking directly at them. Most blatant misrepresentation: at some point during the demo, the presenter snapped a video player window to a wall, and enlarged it to fill the entire wall, to simulate a very big screen TV. In reality, the presenter would only have been able to see a small part of the video at a time, and would have had to move his head around to see other parts. As the parts outside the field of view would have been completely invisible, it would not have been possible to watch a video like that.

But aside from the very limiting field of view, HoloLens works. It embeds virtual 3D objects into a real environment convincingly (as long as they are not occluded by real objects), and allows the user to interact with them (in a somewhat limited form at this point). If there was no sneaky trickery in the demo (such as external tracking), then Microsoft is close to a releasable product that I would use. Increasing the field of view is probably going to be very difficult, but it should be possible to address the other outstanding issues (no room occlusion, no hand interaction besides a single non-localized click gesture).

In summary, here are the main take-home points:

  • Very small field of view (estimated 30°x17.5°).
  • High resolution, good brightness and contrast, good image quality.
  • Purely additive display, meaning virtual objects appear translucent, especially when in front of a bright background.
  • Supports all main depth cues except accommodation, making it a holographic display in my book.
  • Virtual objects are not occluded by the real environment and/or the user’s arms or hands.
  • Usable gaze-based selection, but no hand tracking and only a single “click” gesture.
  • Reliable voice recognition.
  • Completely untethered, light, and comfortable to wear.
  • Probably short battery life.

Would I buy it? Not as it is now, but with an improved field of view and reliable hand tracking and more detected gestures, it would be a very good match for most of our applications. I really want to do this with HoloLens, but with the current prototype it’s not yet possible:

23 thoughts on “On the road for VR: Microsoft HoloLens at Build 2015, San Francisco

  1. Oliver, you are a godsend! So glad you were able to try one out in RL and to give the community this great analysis. I know how you are with new devices (lots of poking and prodding and testing of limits) and am sure you gave it a thorough test.

    Did they say anything about the way we would go about coding up AR experiences? Are those 3D models they show .OBJs or WebGL? (I am hoping the latter).

    Also, I am assuming there must be way to attach behavioural code to these AR objects.

    Does the device have a GPS in it?

    I seem to recall in the talk they gave that Hololens does not require an internet connection but I just don’t see how this is possible inasmuch as they *must* be doing some thorough image recognition and are using Cortana for the voice recognition (both of which somewhat necessitate a server component).

    shannon

    • Coding: It’s all about the new unified Windows API that’s part of Windows 10. Regular 2D applications work out-of-the-box, by being presented in a 2D window placed in the 3D environment (like my old VNC video). 3D applications could be written using Direct3D, or any higher-level 3D engines, I suppose. There’s probably going to be a WebGL viewer at launch.

      They claim everything is done in-device. I assume the “holographic processing unit” is some custom silicon or a repurposed GPU specifically to do real-time environment scanning and tracking, and the rendering is just regular old 3D. Don’t know about voice recognition, but given that it’s command recognition with a limited vocabulary, it’s the easy version of that problem. It worked reliably on my 486 under OS/2 back in ’95, so I’m guessing whatever powers HoloLens can handle it.

  2. Really highlights some of the problems Magic Leap will need to overcome if they plan on delivering what they are promising.

  3. This is a fantastic analysis. Thank you for remembering to estimate FoV with your hands!

    Follow up question: you mentioned that you suspect the room had already been pre-depth-scanned. Were you mostly standing in place and rotating your head? Or were you able to walk about and test the tracking resiliency on location change? (e.g. walking forward, etc.)

  4. Thank you for this thorough and enjoyable analysis of the device.
    Any idea what the optics/display was? i.e waveguide or some other projection assembly?

    I was hoping the Fov was wider. buy yeah, that whole earlier stage demo was a bit of a stretch. I guess it will be a while before the hololens can be used in LDRs :)
    http://dirrogate.com/digital-surrogates-tele-travel-the-future-of-long-distance-relationships-ldrs/

    Although, their NASA demo shows where this aspect of AR is heading.

    • The consensus of speculation right now is holographic waveguide. It matches with the estimated FOV, and would mean that the “HoloLens” name is literally true.

  5. By the way, during environment scanning when the room was turned into a mesh, did it seem as though it was happening in bursts / low FPS, or was it continuously refining the mesh?
    Also you didn’t explicitly say whether you think there is a Kinect on-board the HoloLens. Is this still a fair assumption though given the 3D environment scanning?
    Thanks

    • Hard to tell, but it looked similar to KinectFusion, where the overall 3D model is updated / refined incrementally depending on where the camera is currently looking.

  6. I always look forward for your reviews/analysis of VR devices; I love how you know what you’re talking about and focus on important technical stuff.

  7. Pingback: Zwei neue Videos zu Hololens (weitere Infos auf der E3) - Xbox One - Seite 3

  8. Pingback: Detailed report on the MS Hololens Demo | Ene mene mu …

  9. Pingback: Analyzing HoloLens Field of View (FOV)

  10. Pingback: On the road for VR: Silicon Valley Virtual Reality Conference & Expo | Doc-Ok.org

  11. Pingback: What Is the FOV of My TV?

  12. Pingback: What Is the FOV of the Movie Theaters?

  13. Pingback: On the Road for VR: Augmented World Expo 2015, Part I: VR | Doc-Ok.org

  14. Pingback: HoloLens and Field of View in Augmented Reality | Doc-Ok.org

  15. Pingback: VR Interface Design Pre-Visualisation Methods | MikeAlger's Vlog

  16. Pingback: HoloLens and Holograms | Doc-Ok.org

  17. Pingback: HoloLens: 2.5 Hour Battery Life Under Heavy Use, Limited FoV Confirmed - UploadVR

Please leave a reply!