I went to zCon 2013, the zSpace developers conference, held in the Computer History Museum in Mountain View yesterday and today. As I mentioned in my previous post about the zSpace holographic display, my interest in it is as an alternative to our current line of low-cost holographic displays, which require assembly and careful calibration by the end user before they can be used. The zSpace, on the other hand, is completely plug&play: its optical trackers (more on them below) are integrated into the display screen itself, so they can be calibrated at the factory and work out-of-the-box.
So I drove around the bay to get a close look at the zSpace, to determine its viability for my purpose. Bottom line, it will work (with some issues, more on that below). My primary concerns were threefold: head tracking precision and latency, stylus tracking precision and latency, and stereo quality (i.e., amount of crosstalk between the eyes).
Head tracking
The zSpace’s head tracker uses two active-illumination infrared cameras integrated into the display’s bezel, and five retro-reflective markers on the glasses. According to what little information I could squeeze out of the zSpace reps (and boy, were there a lot of zSpace reps there — it appeared they outfitted all their staff, from janitor to CEO, with baby blue zSpace shirts and let them run demos), the 3D reconstruction and model fitting for head tracking are done on a CPU directly inside the display, and shipped as calibrated coordinates to the host computer via USB.
I’m happy to report that head tracking works very well; using proper software (more on that below), virtual objects floating above the screen appear solid under head movements, like they should. Latency is low; it appears that the IR cameras record at 60 Hz at least, and that processing only adds a few ms of latency on top of camera read-out time. Tracking is robust, as long as viewers keep their heads in the designated tracking zone (i.e., as long as they don’t intentionally try to break the system). There were few issues with direct sunlight (the exhibit hall had large windows and skylights, not ideal), and a larger problem when multiple users try to view the display at once. Not just the obvious fact that only one person gets a perfect view, but zSpace’s decision to outfit all glasses with reflective tracking markers. As soon as multiple pairs of glasses are in the cameras’ field of view, tracking breaks down entirely. I suggested shipping the zSpace with one pair of tracked and several pairs of non-tracked glasses, but the reps said the company had decided against it to remove the potential for eye strain for multiple users. I do see their point but disagree with them; sometimes you just have to have multiple people look at the screen, and in that case you take the viewpoint problems as a fact of life. It works out very well in our CAVE in practice. But in the end, it’s nothing that masking tape or a knife can’t fix.
Stylus tracking
The stylus is tracked using a hybrid inertial/optical system, with a MEMS accelerometer/gyro combo in the stylus itself, two active LEDs on the stylus (one on the tip, one on the end), and a separate set of two IR cameras in the display’s bezel. Having two camera pairs is an interesting solution to the problem that the tracking spaces for the stylus and the glasses don’t overlap much: the glasses are where the viewer’s head is, high above the screen, whereas the stylus is close(r) to the screen. Accordingly, the stylus tracking cameras are angled inwards, to cover the screen surface itself — the stylus can even be used directly on the screen, like on touchscreens. I’m also guessing that the stylus cameras use a different “color” of infrared, to reduce confusion between the glasses and stylus.
Like head tracking, stylus tracking is precise and robust. Although, positional tracking was considerably better than orientational tracking. The virtual selection “laser” ray that forms the basis for most interactions (more on that below) is not always aligned well with the stylus’ physical orientation, and sometimes it points in an entirely different direction. I’m assuming there are some stability problems with the current sensor fusion algorithm. The practical upshot is that the selection ray has to be drawn at all times to provide visual feedback, which I find annoying. Apart from that, the inertial tracking component can carry the stylus over short breaks in optical tracking, such as during 6-DOF interactions when the user’s hand might briefly occlude the stylus LEDs. Latency is higher than I would have expected, but well within the usable range. I asked about plans to use motion prediction in the sensor fusion algorithm to reduce apparent stylus latency, but it appears zSpace hadn’t considered it yet. Motion prediction works well with our IS-900 tracking systems, which are based on similar principles. Just as head tracking, 3D reconstruction and sensor fusion are done directly in the display, and calibrated coordinates are sent over USB.
As an aside, I applaud zSpace’s decision to set the screen up like a drafting table, at an angle of about 30°. My first VR environment, a responsive workbench, was exactly like that, and to this day it’s only surpassed in usability by the full CAVE. This setup allows a user to rest their elbows while working, significantly reducing the danger of gorilla arm.
Screen and stereo quality
The zSpace uses passive (circular polarized) glasses and an active/passive stereo system, where time-interleaved stereo frames at full display resolution (1920×1080) are polarized left or right via a switched full-frame polarization sheet. Due to that, the glasses are very light and comfortable to wear. To work with users wearing prescription glasses, they additionally come in a clip-on style. Stereo quality is surprisingly good, with relatively little crosstalk between the eyes. Most demos I saw used low-contrast images to hide any remaining crosstalk, but even those demos whose developers didn’t get the memo and used white lines on a black background worked.
Because the screen is comparatively small (24″ diagonal), the display’s resolution is very high. Small text is easily readable, and even small 3D models can show a large amount of detail. One has to keep in mind, though, that the display’s resolution is not higher than that of a large-screen 3D TV. While those have fewer pixels per inch, they have more display area, so models can be shown larger and present the same amount of detail. What it boils down to is that the zSpace has the exact same number of pixels as (1080p) 3D TVs, and that’s what counts.
Intermission: 3D data visualization on holographic displays
Just watch the video to take a break, and then read on:
SDK and applications
Even with great hardware, there are still many ways holographic displays can fail. The first step is to generate proper stereoscopic imagery, and the second step is to use proper 6-DOF interaction methods. Fortunately, the zSpace SDK does stereo right. Virtual objects appear solid, and I saw no distortion (i.e., spheres showed up as spheres, and right angles as right angles). The SDK properly uses the calibration data stored in the display to create a 1:1 holographic space. Accordingly, all demos based on the SDK, or the Unity3D binding which is in turn based on the SDK, worked correctly.
Unfortunately, there were several demos which were not based on the SDK. The zSpace SDK contains a tracking server compatible with trackd and vrpn, so pre-existing software can get tracking and button data via those interfaces. As it turns out, all non-SDK demos did stereo wrong, to varying degrees of badness. This is an absolute shame, and I strongly recommend that zSpace, like Apple for the app store, sets up some kind of QA program for external applications, and withdraws support from those that present broken stereo (and ideally educates their developers).
Why is this such a big deal? The zSpace is meant to bring holographic displays to a large audience, and the general population has no idea what a holographic display is supposed to look like (hint: virtual objects are supposed to look and behave exactly like real objects). I witnessed several conference attendees politely praising the 3D effect in those broken demos, probably thinking that it’s supposed to look like crap, and quietly wondering what the big deal with 3D is supposed to be. That is not good at all. I even got into a small argument with a company rep who didn’t believe me when I told him that his 3D was broken: a 3D model showed up like a flat image, floating somewhat behind the screen. I’m assuming they either squashed z during rendering, or used very small eye separation and made up for it by shifting the left/right eye images for display. That’s not how it works, people (the fact that they want to charge $36,000 for their software was added hilarity) . If zSpace want to succeed, they have to work very hard to teach developers how 3D works, so that this doesn’t happen.
From an interaction point-of-view, there’s good news and slightly non-optimal news. The most positive thing as far as I’m concerned is the default 3D dragging method used by the SDK, and by extension all SDK-based applications: it’s exactly the model-on-a-stick method that’s been the default dragging method in Vrui since its inception. After having been told for a decade, by VR experts, that it’s a bad method, I’m glad to see that someone else not only believes in it, but made it the default for a device aimed at a mainstream audience.
The thing I didn’t like so much was the ray-based picking method. Essentially, ray-based picking encourages grabbing virtual objects from far away, which then leads to problems for dragging because the user has a very bad lever arm; small jitter in translations is exaggerated, and rotations don’t work well at all because the user has to make great circles with their hand to rotate an object around itself. A compounding problem is that close-up picking, like sticking the stylus into an object, didn’t work reliably. In fact, most users used the stylus like a remote control, sometimes even pulling their arms back or twisting their bodies to grab objects from as far away as possible. I suggest changing the default picking method to gently nudge users towards directly grabbing objects using the stylus tip, maybe simply by making the selection laser “fade” as it gets longer. I believe users will naturally drift towards close-up interactions then. Here’s a video showing direct “grabbing” interactions in a large-scale holographic display system (a CAVE):
Another not-optimal thing was the interface’s reliance on a single button. The zSpace stylus has three buttons, but most applications used only one. This means most interaction changes are modal; particularly the change from 3D dragging to 3D scaling. We’ve found that seamlessly using dragging and scaling is a very powerful way to navigate through or work with large and/or complex 3D models. Having to move the stylus from the object to a (smallish) button to switch to scaling mode, then scale, and go to another smallish button to go back to dragging mode will discourage users from using scaling, and have them miss out on a very powerful tool. I suggest adding a second mode, where scaling can be enable while in dragging mode by pressing a secondary button in addition to the primary button. That way the secondary button by itself can still be used for whatever else the application wants. That’s how Vrui does scaling in the CAVE and on the desktop. Give your users some credit; I believe the kinds of people who would be using a holographic display in the first place, i.e., those doing 3D work, would be able to deal with having two buttons.
Talking about interaction and applications, here is the most impressive thing I saw. I have long been talking about how desktop-based applications are very hard to port to VR because they are so deeply entrenched in the 2D desktop paradigm, and how that is a major problem for mainstream acceptance of VR in general and holographic displays in particular. And here zSpace was demoing a holographic version of Maya, done completely through Maya’s plug-in framework. I had a long chat with the developer who did this, and he talked about running into exactly those problems I’ve pointed out, and how hard it was to work around some basic design decisions, such as an ill-fitting camera model, a purely 2D user interface, squashing 3D models to the frustum’s front- or backplane for special effects, etc. zSpace’s Maya binding is not perfect (yet), but I am very impressed by how well he got it to work.
Does it play games?
That’s going to be asked eventually. Unity3D had a demo set up, showing a game running in holographic mode. Depending on the style of game, that can work very well. This was a top-down 3rd person game, in the Diablo/Starcraft vein, and it worked. The player character showed up as a miniature (about 2″ tall) in a miniature, but very detailed environment. On the other hand, the display will not work for first-person games or similar applications like architectural walk-throughs, because in order to show 1:1 environments and life-size characters, holographic displays require life-size screens. In other words, the largest 3D object that can be presented on the zSpace is about 24″ in diameter. Anything larger has to be scaled down.
Practical considerations
Bottom line: is the zSpace a viable candidate for a turn-key holographic display system to run Vrui applications in a plug&play fashion, so that we can direct users of our software to go and buy it? Unfortunately, not right now. For one, there’s the price. The zSpace display itself, including the head and stylus trackers, has a suggested retail price of $4,000. On top of that you need a computer running it, and that computer requires a “professional” graphics card, such as an Nvidia Quadro. For no real reason, I might add. The zSpace could just as well accept HDMI 1.4a stereo video modes; the standard (but optional) frame-packed 60Hz 1080p stereo mode has exactly the same resolution, frame rate, and even video timings as the quad-buffered mode currently used by the display. And those HDMI 1.4a stereo modes can be generated by consumer-level graphics cards — that’s what we’re doing on our 3D TV systems.
Right now, total cost for a zSpace plus computer with a “real” Quadro card, such as a Quadro 6000, is around $4,000 + $1,000 + $3,700 = $8,700. That’s around $2,000 more than our line of low-cost VR displays, which have a larger (55″-70″) screen and a larger tracking space, but otherwise work exactly like the zSpace — but require inconvenient amounts of end-user assembly. On top of the base cost, UNIX-based software like Vrui would require either an additional Windows PC to run the zSpace SDK and pump tracking data into the Linux PC via vrpn, or a Windows license to run the SDK inside a virtual machine. Either way, extra $$ and an ugly workaround. I did not get an answer about a Linux (or Mac OS X) SDK beyond “we’re thinking about the possibility.”
I had talked to zSpace (the company) several months ago about writing a native Vrui binding for the zSpace (the device), by talking directly to the hardware via its USB protocol. Unfortunately, the powers-that-be nixed that proposal; maybe because zSpace wants to license the SDK for $1,500 per developer per year, and a free and open source alternative SDK would cut into those profits? Who knows. Either way, no help forthcoming. I might have to get one and break out the old USB analyzer. Correction: I misunderstood one of the company reps about the SDK cost. The SDK itself is free; the $1,500 per year is a subscription to gain access to the hardware (need to find out what exactly they mean by that), and to future display models.
Pingback: Gemischte Links | 3D/VR/AR
Re: ray-based picking, I guess one difference with a CAVE is that some of the objects the user want to grab are behind the screen, so using the stylus tip position may not always be possible to grab the object of interest…