Serial entrepreneur Elon Musk posted this double whammy of cryptic messages to his Twitter account on August 23rd:
@elonmusk: We figured out how to design rocket parts just w hand movements through the air (seriously). Now need a high frame rate holograph generator.
@elonmusk: Will post video next week of designing a rocket part with hand gestures & then immediately printing it in titanium
As there are no further details, and the video is now slightly delayed (per Twitter as of September 2nd: @elonmusk: Video was done last week, but needs more work. Aiming to publish link in 3 to 4 days.), it’s time to speculate! I was hoping to have seen the video by now, but oh well. Deadline is deadline.
First of all: what’s he talking about? My best guess is a free-hand, direct-manipulation, 6-DOF user interface for a 3D computer-aided design (CAD) program. In other words, something roughly like this (just take away the hand-held devices and substitute NURBS surfaces and rocket parts for atoms and molecules, but leave the interaction method and everything else the same):
If you ask me, that would be a very effective tool to design rocket parts, or really anything three-dimensional. And, according to Twitter, Elon Musk agrees (emphasis mine):
@elonmusk: And, uhh no (zillionth person who asked), I am not going to make an IM suit, however design by hand-manipulated hologram is actually useful.
This is great. I’ve been saying for a long time that, if nothing else, VR is a useful and effective method for interacting with 3D data, but if Elon Musk says it, people will take notice. Side note: why did a zillion people ask him about an IM (Iron Man) suit? It appears Elon Musk has taken on the persona of a real-life Tony Stark (director Jon Favreau confirmed that Elon Musk was indeed the inspiration for Stark’s portrayal in the movies), and now with talk of holograms and holographic design programs, the boundaries between real person and movie character are starting to get somewhat blurry. The functional similarities between a design program as alluded to in the pair of Twitter posts and the movie version of same are surely not coincidental:
As I’ve written before, the remarkable thing about this sequence (and other similar ones from the Iron Man movies) is that — ignoring the fact that real holograms don’t work as depicted here — this interface makes sense, and would actually work as shown (unlike the equally famous Minority Report interface, zing). Instead of holograms it would use VR or AR as a display medium, but that’s merely a technical detail.
Now, then, what will the video actually show once it hits the Internets? In principle, any proper VR system could be used. But, given the posts’ emphasis on hand gestures, and that augmented reality (AR) is generally considered “cooler” than virtual reality(VR), I’m guessing we’ll see an integrated system of a pair of AR goggles and a 3D camera for hand tracking and gesture recognition. By sheer not-coincidence, such a device currently exists: the meta. I saw a prototype myself at the 2013 Augmented World Expo, demonstrated by Steve Mann. The meta consists of a SoftKinetic time-of-flight 3D camera directly attached to a pair of AR glasses. This combination enables 3D interfaces, where users can manipulate the virtual 3D objects shown by the AR glasses using their bare hands.
I am very curious to see how well this will work in practice. The meta’s tiny field-of-view (23° per eye) will probably not be an issue, but optical hand/finger tracking might be. Look carefully at this video:
You can see how the red dots indicating the finger tips are somewhat jittery, and sometimes align not with the tip, but with the first joint or other parts of the finger, due to occlusion. That’s a fundamental problem in optical hand tracking. For the kinds of precise interaction required to effectively design in 3D, it’s definitely an issue.
A second problem is how to use hand tracking to trigger events. To go beyond simple sculpting, the user needs to be able to make something happen (create a vertex, drag a face, split an edge, …) at a well-defined 3D location. How do you do that? Complex hand gestures, such as sign language phonemes, are out because they require moving the hands away from the intended action point to perform the gesture. Simple gestures like finger pinches don’t, and are more intuitive to boot, but finger pinches are hard to detect in depth images. It’s a subtle problem: how do you tell from depth images, suffering from occlusion, whether two fingers are actually pinched, or just lightly brushing against each other? If the user and the system do not exactly agree when a pinch occurred, it will lead to missed or spurious events, and user frustration. I’m using pinches as an example here, but other simple gestures have the same problem. This is where old-fashioned push buttons and their direct physical feedback really shine.
To recap: the question is not whether direct-manipulation 6-DOF user interfaces work (do they ever!), but whether optical hand tracking is the ideal way to implement them. The problems are jitter, tracking jumps due to occlusion and joint mislabeling, and flaky event detection. Hand-held 6-DOF devices with buttons are not nearly as futuristic and cool, but they are reliable, predictable, and lead to little user frustration. See this video, starting at 6:35, where my Hydra’s left handle malfunctions and generates spurious events. It’s extremely annoying and derails my workflow. This happened because my Hydra was broken; in optical hand tracking, it’s the normal state of affairs unless one takes extra precautions. This is why I’m very curious to see the promised video. And I’m hoping the “video was done last week, but needs more work” quote doesn’t mean they need more time to edit around the bad bits. I really want to see this work. KeckCAVES‘ users are clamoring for free-hand interactions, but so far nothing worked reliably enough.
Because if it does work, it’s going to be good for VR as a whole. If someone like Elon Musk comes out and says that VR/AR/3D user interfaces are useful, it will have a huge impact. After that, I’m just hoping that people don’t forget that this kind of thing is not entirely new, but has already existed — and been used successfully — for a long time. We, at least, have been building things with our hands in air since 1998.
What do you mean the Minority Report tech doesn’t work as shown? The guy behind it even demonstrated it live on stage on a TED talk ( https://www.youtube.com/watch?v=b6YTQJVzwlI )
Yes, let me clarify what I meant. The MR UI is definitely technically feasible. The problem is, it doesn’t work from an ergonomics/UI perspective — in other words, it shouldn’t be used. I talk about it in greater detail in this old post, but the gist is that all the interactions shown in the movie are 2D: sliding images around, zooming, etc.
Using a hands-in-the-air 3D interface for 2D interactions is a terrible idea, especially when you have to hold your hands up the entire time like Tom Cruise did in the movie. It will be very painful after about five minutes. A standard touch screen, angled at about 30 degrees like a drafting table, would have been a much better interface for the same application (but not nearly as futuristic). In a nutshell: if your interaction is 2D, don’t use a 3D interface. It’s the same problem the leap motion people are running into headlong.
The reason I’m complaining about it is that people are rushing to implement MR-style interfaces because it’s so sexy in the movie, totally ignoring usability, and then unleash painful interfaces on their users. That’s no good.
Btw, have you been following the development of CastAR?
No, this is the first time I hear about it. I knew that ex-Valve employees were doing something with AR, but that was the extent of it. Thanks for pointing it out. I need to look into it in more detail, specifically how exactly the reflective surface they mention works with the projectors so that there are no visible screen borders (sounds odd).
Perhaps the projectors they use are capable of actually not emitting any light on arbitrary pixels? (sorta like those pseudo-holograms they use in “live” concerts with virtual singers)
I found an article about the CastAR on Tested, and they were comparing it to the zSpace, saying that with zSpace the display is limited to the screen (which is true, of course). They then say that with CastAR, display is not limited to a screen, but later they mention that the CastAR uses a special reflective screen with tracking markers to show the images. So there is some contradictory information, and I haven’t yet been able to confirm what’s really going on. The Tested article also says that CastAR and zSpace are basically the same, but are forgetting that zSpace adds a 6-DOF tracked input device to the mix. CastAR doesn’t seem to have anything in terms of input.
For the stuff it is described to do, it must have 6dof tracking and some sort of depth-sensing on the glasses… But then again, people have been embellishing the capabilities of VR/AR technology for ages, so without actual specs or a decent demo video i’m not sure what to believe…
Yeah, I won’t say more about it until I get some reliable information.
They just launched a Kickstarter ( http://www.kickstarter.com/projects/technicalillusions/castar-the-most-versatile-ar-and-vr-system ), and Jeri also did a video talking about it with a few backstory tidbits: https://www.youtube.com/watch?v=cc2NQVQK69A
Yup, just saw that. I also spoke to a colleague a week or so ago who had tried the CastAR at Maker Faire, and he was able to clear up the confusion. It’s using a screen made of retro-reflective material, so that each viewer’s projected image is only reflected back to that viewer. This makes it possible to have several people watching the same screen at the same time, and see different images, or different projections of the same 3D scene. It’s a slick idea.
Pingback: Gemischte Links 5.9.2013 | 3D/VR/AR
Now that he has released his video, basically using leap motion and the rift, what are your thoughts?
I need to dig myself out from under a pile of work before I can post a detailed follow-up, but the gist of my opinion is that the video doesn’t show what’s promised, and fails at making a convincing case for using 6-DOF interfaces.
I was obviously wrong about the hardware that would be used, but that’s a detail. The big issue is that the only thing shown is rotating a model in 3D using a Leap Motion, and adding cutting planes. The former is an interaction that works just as well with a mouse, as any orientation can be achieved by rotation around two fixed orthogonal axes, and the latter can be done easily by cutting in the screen plane (that’s how 3D Visualizer does it in desktop mode).
This has already led to a lot of people complaining that the interface shown, while cool, is useless. And regarding what’s shown, I must agree. It’s a pity there is nothing about the really interesting stuff, i.e., 3D modeling, where 6-DOF interfaces really shine.
This is in response to your recent video “Projection Properties of and Scale Issues with Oculus Rift “, posting here ’cause i don’t wanna be forced to be assimilated by G+ (please let me know if there is a better place).
What happens when you model the eyeball optics if you use the center of the eyeball, instead of the pupil, so that each ray is always entering the pupil head-on?
A new blog post is up; let’s take further discussion there.
But to answer your question: Unfortunately it doesn’t work using the eyeball’s center as the projection center. Light enters the eye through the pupil, where (in theory) all entering rays must converge to a single point and then spread out again. In 3D rendering, the only point that has the property of all projection lines passing through it is the camera position. This means that camera position in 3D rendering must equal pupil position in reality. I wish it were otherwise.
Hey there. I tried to find an email address to contact you at, but I didn’t have much luck so I figured I would ask on here. I am doing my thesis (at Chico) on education in virtual reality, specifically working with the oculus rift and using it to teach the spatial structure of molecules. Given my lack of programming experience (my background is cognitive psychology) would you have a recommendation for a basic program that I could use to view molecules with? Thank you!
The only Rift-enabled molecular visualization packages I know are my own, Nanotech Construction Kit, VR ProtoShop, and MD Visualizer. They are rather special-purpose and Linux-only.
The VMD people are usually quick to put in support for emerging 3D display technology. Have you checked with them?
Thanks for the reply! I was under the impression that your program was the Nanotech construction kit, is that not the case? By the way, the Nanotech construction video with the hydra was in part what initially inspired me to go down this avenue or research, so thank you.
VR ProtoShop is not exactly what I am looking for. I was hoping for something that displays the individual atoms of a molecule as I studying how people encode the spatial structure of an object inside immersive virtual reality.
I am not sure about VMD being used with the Rift, but I did send them an email inquiring about it, so I will keep my fingers crossed. Again, thanks so much for your help and recommendations, they are greatly appreciated. I will definitely be coming back to keep up with your blog.
Those three I listed are my own packages. That sentence structure was bad. Edit to read: “The only Rift-enabled molecular visualization packages I know are my own, namely Nanotech Construction Kit, VR ProtoShop, and MD Visualizer.”
VR ProtoShop has a van-der-Waals sphere rendering mode that shows individual atoms, but the interaction is still based on larger rigid or flexible assemblies. NCK is probably the closest to what you’re looking for.