One of the mysteries of the modern age is the existence of two distinct lines of graphics cards by the two big manufacturers, Nvidia and ATI/AMD. There are gamer-level cards, and professional-level cards. What are their differences? Obviously, gamer-level cards are cheap, because the companies face stiff competition from each other, and want to sell as many of them as possible to make a profit. So, why are professional-level cards so much more expensive? For comparison, an “entry-level” $700 Quadro 4000 is significantly slower than a $530 high-end GeForce GTX 680, at least according to my measurements using several Vrui applications, and the closest performance-equivalent to a GeForce GTX 680 I could find was a Quadro 6000 for a whopping $3660. Granted, the Quadro 6000 has 6GB of video RAM to the GeForce’s 2GB, but that doesn’t explain the difference.
So, is the Quadro a lot faster than the GeForce? According to web lore, it is, by a large factor of 10x or so (oh how I wish I could find a link to a benchmark right now, but please read on). But wait: the quoted performance difference is for “professional” or “workstation” applications. What are those? Well, CAD, obviously. What’s common in CAD? Wireframe rendering and double-sided surfaces. Could it be that these two features were specifically targeted for crippling in the GeForce driver, because they are so common to “workstation” applications, and so rarely used in games, to justify the Quadro’s price tags? No, right?
Well, I don’t have a CAD application at hand, but I have 3D Visualizer, which uses double-sided surfaces to render contour- or iso-surfaces in 3D volumetric data. From way back, on a piddly GeForce 3, I know that frame rate drops precipitously when enabling double-sided surfaces (for those who don’t know: double-sided surfaces are not backface-culled, and are illuminated from both sides, often with different material properties on either side). I don’t recall exactly, but the difference was significant but not outrageous, say a factor of 2-3. Makes sense, considering that OpenGL’s lighting engine has to do twice the amount of work. On a Quadro, the performance difference used to be, and still is, negligible. Makes sense as well; on special “professional” silicon, the two lighting calculations would be run in parallel.
But a couple of years ago, I got a rude awakening. On a GeForce 285, 3D Visualizer’s isosurfaces ran perfectly OK. Then an external user upgraded to a GeForce 480, and the bottom fell out. Isosurfaces that rendered on the 285 at a sprightly 60 or so FPS, rendered on the 480 at a sluggish 15 FPS. That makes no sense, since everything else was significantly faster on a 480. What makes even less sense is that double-sided surfaces were a factor of 13 slower than single-sided surfaces (remember the claimed 10x Quadro speed-up I mentioned above?).
Now think about that. One way of simulating double-sided surfaces is to render single-sided surfaces twice, with triangle orientations and normal vectors flipped. That would obviously take about twice as long. So where does the ginormous factor of 13 come from? There must be some feature specific to the GeForce’s hardware that makes double-sided surfaces slow. Turns out, there isn’t.
Double-sided surfaces are slow when rendered in the “standard” OpenGL way, i.e., by enabling glLightModeli(GL_LIGHT_MODEL_TWO_SIDE,GL_TRUE). That’s how legacy CAD software would do it. However, if an application implements the exact same formulas used for double-sided lighting by fixed-function OpenGL in a shader program, the difference evaporates. Suddenly, the GeForce is just as fast as the Quadro. I didn’t believe this myself, or expected it. I created a double-sided surface shader to go from a 13x penalty to a 2x penalty (remember, twice as many calculations), but I got only a 1.1-1.3 penalty, depending on overdraw. To restate this clearly: if implemented via a shader, double-sided surfaces on a GeForce are exactly as fast as on a Quadro; using fixed-function OpenGL, they are 13 times slower.
But that’s surely an accident, right? Double-sided surfaces are extremely uncommon in the GeForce’s target applications, so Nvidia simply wouldn’t have spent any effort optimizing that code path, right? But that would only explain a 2x penalty, which would come from the laziest most backwards implementation, i.e., rendering everything twice (I actually thought about implementing that via a geometry shader, if the more direct approach hadn’t worked). One wonders.
Let’s just, playing Devil’s advocate, assume for a moment that double-sided surface performance is intentionally crippled. Then wouldn’t the shader-based equivalent be crippled, as well? To paraphrase Cave Johnson, they’re not banging rocks together over there! Well, turns out, the GLSL shading language is practically Turing-complete. Oh, how I love to apply my background in theoretical information science! Turing-completeness, roughly speaking, means that it is impossible for software to detect the intent of a piece of code. Meaning there is no way the GeForce driver can look at a shader program and say Hey! Wait! It’s calculating double-sided illumination, let me slow it way down! So there you go.
But any way, why am I griping about this now? Because of the Quadro’s other selling point, frame-sequential stereo, which is obviously close to my heart. Legacy professional applications use quad-buffered stereo, if any stereo at all, so clearly that’s a feature not supported by GeForces (the way how exactly GeForces enable stereo for games, but disable it for “serious applications” in Nvidia 3D Vision under Windows is yet another funny story).
But we’ve been using GeForces for years to generate stereo content for 3D TVs. Instead of frame-sequential stereo, 3D TVs use HDMI-standardized stereo modes, and those can be “faked” by a cleverly configured GeForce. Specifically, none of the HDMI modes require the extra stereo synchronization cable that Quadros have, but GeForces don’t. I’ve recently tried convincing manufacturers of desktop 3D monitors to support HDMI stereo as well, to reduce the total cost of such systems. And the reaction to my concern is typically “well, we’re aiming for professional applications, and for those Quadros are 10x faster, so…”
And here’s the punchline: on a brand-new GeForce GTX 680, the fixed-functionality performance difference between single-sided and double-sided surfaces is 33x. Using a shader, it’s somewhere around 1.2x. Surely, you can’t be serious?
Finally, to sprinkle some highly needed relevance onto this post: the upcoming Oculus Rift HMD will expect stereo frames in side-by-side format, meaning it will work with GeForces without any problems.
Some side note: on ATI cards(HD 5750) two sided lighting cost nothing. Tested with a big 4 Mio triangle object.
Thank you for the information! I rarely work with ATI cards, because last time I checked, their Linux driver support was still worse than Nvidia’s, even given the latter’s shenanigans. What operating system was this under? Windows, Linux, Mac OS X?
Tested with WindowsXP.
Besided the graphics driver problem which I has also seen some time ago under linux, I think ATI makes at the moment the better cards. especially regards to price/performance.
I haven’t done a thorough review of the cards for some time now, but the benefit of a Quadro card used to be to the engineering/science community because they support double precision floating point operations, while the GeForce only allow for single precision.
Yes, I didn’t mention that. There is a clear performance difference in general-purpose GPU computing using CUDA. While GeForces do support double-precision arithmetic, their performance appears to be artifically capped at 1/8 single-precision performance, whereas Quadros get 1/2 of single-precision performance, as one would expect. Disclaimer: this is second-hand knowledge, I don’t do CUDA myself. In OpenGL, on the other hand, both Quadros and GeForces only use single-precision arithmetic. This is based on observing our CAVE, where the Quadros running the render nodes show exactly the same rounding problems when working with large-extent model data as regular GeForces do. Fortunately, there are workarounds in place.
I’ve noticed floating point rounding errors, as my 3D models get to, say, 5000m away from the origin. There’s some kind of triangular effect, where the graphics move in a triangular pattern when moving slowly through the scenery.
If this rings a bell, what workarounds did you do? I know of storing everything in doubles, and moving closeby scenery towards the origin, but it seems very non-trivial to do this.
Hard to explain in brief, but the main idea is to render everything in physical coordinates, which are a VR-specific thing that more or less corresponds to eye coordinates. Object coordinates are all relative to their own origins, and most of the transformation math is done on the CPU in double precision. For very large objects, such as our Crusta virtual globe, the globe itself is broken up into huge numbers of tiny chunks, all around their own origins. Once you store vertex positions with large coordinate components, you’ve already lost.
Interesting, I didn’t realize that this was an issue with the Quadros. If you don’t mind me asking, what sort of problems do you see visually (if that’s what you’re referring to) when it’s only single precision? We are also running Quadros in our graphics cluster for my lab’s CAVE, and I’d be interested in exploring this. Thanks for any comment.
The problem with IEEE floats is loss of precision as you move away from zero, due to the semi-exponential representation. Imagine that you’re visualizing global earthquakes (like in this movie: http://www.youtube.com/watch?v=OSUPAjrBUrk), where the Earth is centered around the model space origin. Now you zoom in (a lot) to a quake that’s close to the surface, so around 6000km away from the origin. Once you’re zoomed in, you start rotating the view around the quake you’re looking at. The problem is that while 4×4 matrices can rotate around any point you choose, once you multiply them out, they always rotate around the origin first and then apply a translation second. So to rotate around a point that’s not the origin, the matrix basically has to represent a translation of that point to the origin, a rotation, and then a translation back to the (now rotated) point. With infinite-precision math, the operations cancel out, so the center point of your rotation remains exactly in the same place, but with finite-precision, you get a round-off error that displaces your center point.
The practical upshot is that, when you’re zoomed into a point away from the origin and rotate around that point, the entire model randomly jumps around the ideal point. This doesn’t happen when you translate, because then the rotation component of the matrix doesn’t change. If you’re looking at something on the surface of the Earth at 1:1 scale, the surface jumps around so much that the visualization basically becomes unusable. The work-around is to represent vertex positions around a local object origin, and then do most of the matrix calculations on the CPU in double precision and only upload the final matrices to OpenGL. It’s how Crusta allows 1:1 work on the surface, even though it’s using a whole Earth model. I should write a post explaining that in detail.
Anyway, when doing what I described on a Quadro, the effect you see is exactly what you’d expect from single-precision arithmetic, so I’m sure Quadros do OpenGL in single-precision.
Thanks for taking the time to explain this, it is much appreciated. Not much else to say than “that makes sense”. – I also think this would make a great blog post. I’ve worked with graphics for some time now, and have not dealt with this. Now when it comes up, I will know what’s going on, and how to approach fixing it. Thanks again Oliver.
That post about doing matrices in doubles would definitely be interesting. The last I read about this subject is to keep all 3D models close to the origin, i.e. moving 3D objects on the fly so that the camera is always close to the origin. However, that’s quite non-trivial and gives all kinds of issues when trying to trace into the scene etc.
I never said it’s trivial.
One thing that I’m not sure of is, if you hack the graphics board to present itself as a Quadro (as is explained here & there on the web), if it also does stereo.
I can imagine they won’t do double precision floating point, since that sounds much like a hardware thing. But stereo seems simple to implement, especially these days when everything is an FBO anyway (and triple+ buffering for example, where you have a multitude of backbuffers really).
I’d wager that stereo crippling on GeForces is purely a driver thing, so those hacks you’re referring to will probably do the job. It’s not something I’ll try myself, though.
The few Quadros I do use are in our CAVE, where they have to be frame-locked to each other to provide seamless stereo across four screens. That’s definitely something that the GeForce can’t due, if for nothing else but a lack of the required connector on the board.