GPU performance: Nvidia Quadro vs Nvidia GeForce

One of the mysteries of the modern age is the existence of two distinct lines of graphics cards by the two big manufacturers, Nvidia and ATI/AMD. There are gamer-level cards, and professional-level cards. What are their differences? Obviously, gamer-level cards are cheap, because the companies face stiff competition from each other, and want to sell as many of them as possible to make a profit. So, why are professional-level cards so much more expensive? For comparison, an “entry-level” $700 Quadro 4000 is significantly slower than a $530 high-end GeForce GTX 680, at least according to my measurements using several Vrui applications, and the closest performance-equivalent to a GeForce GTX 680 I could find was a Quadro 6000 for a whopping $3660. Granted, the Quadro 6000 has 6GB of video RAM to the GeForce’s 2GB, but that doesn’t explain the difference.

So, is the Quadro a lot faster than the GeForce? According to web lore, it is, by a large factor of 10x or so (oh how I wish I could find a link to a benchmark right now, but please read on). But wait: the quoted performance difference is for “professional” or “workstation” applications. What are those? Well, CAD, obviously. What’s common in CAD? Wireframe rendering and double-sided surfaces. Could it be that these two features were specifically targeted for crippling in the GeForce driver, because they are so common to “workstation” applications, and so rarely used in games, to justify the Quadro’s price tags? No, right?

Well, I don’t have a CAD application at hand, but I have 3D Visualizer, which uses double-sided surfaces to render contour- or iso-surfaces in 3D volumetric data. From way back, on a piddly GeForce 3, I know that frame rate drops precipitously when enabling double-sided surfaces (for those who don’t know: double-sided surfaces are not backface-culled, and are illuminated from both sides, often with different material properties on either side). I don’t recall exactly, but the difference was significant but not outrageous, say a factor of 2-3. Makes sense, considering that OpenGL’s lighting engine has to do twice the amount of work. On a Quadro, the performance difference used to be, and still is, negligible. Makes sense as well; on special “professional” silicon, the two lighting calculations would be run in parallel.

But a couple of years ago, I got a rude awakening. On a GeForce 285, 3D Visualizer’s isosurfaces ran perfectly OK. Then an external user upgraded to a GeForce 480, and the bottom fell out. Isosurfaces that rendered on the 285 at a sprightly 60 or so FPS, rendered on the 480 at a sluggish 15 FPS. That makes no sense, since everything else was significantly faster on a 480. What makes even less sense is that double-sided surfaces were a factor of 13 slower than single-sided surfaces (remember the claimed 10x Quadro speed-up I mentioned above?).

Now think about that. One way of simulating double-sided surfaces is to render single-sided surfaces twice, with triangle orientations and normal vectors flipped. That would obviously take about twice as long. So where does the ginormous factor of 13 come from? There must be some feature specific to the GeForce’s hardware that makes double-sided surfaces slow. Turns out, there isn’t.

Double-sided surfaces are slow when rendered in the “standard” OpenGL way, i.e., by enabling glLightModeli(GL_LIGHT_MODEL_TWO_SIDE,GL_TRUE). That’s how legacy CAD software would do it. However, if an application implements the exact same formulas used for double-sided lighting by fixed-function OpenGL in a shader program, the difference evaporates. Suddenly, the GeForce is just as fast as the Quadro. I didn’t believe this myself, or expected it. I created a double-sided surface shader to go from a 13x penalty to a 2x penalty (remember, twice as many calculations), but I got only a 1.1-1.3 penalty, depending on overdraw. To restate this clearly: if implemented via a shader, double-sided surfaces on a GeForce are exactly as fast as on a Quadro; using fixed-function OpenGL, they are 13 times slower.

But that’s surely an accident, right? Double-sided surfaces are extremely uncommon in the GeForce’s target applications, so Nvidia simply wouldn’t have spent any effort optimizing that code path, right? But that would only explain a 2x penalty, which would come from the laziest most backwards implementation, i.e., rendering everything twice (I actually thought about implementing that via a geometry shader, if the more direct approach hadn’t worked). One wonders.

Let’s just, playing Devil’s advocate, assume for a moment that double-sided surface performance is intentionally crippled. Then wouldn’t the shader-based equivalent be crippled, as well? To paraphrase Cave Johnson, they’re not banging rocks together over there! Well, turns out, the GLSL shading language is practically Turing-complete. Oh, how I love to apply my background in theoretical information science! Turing-completeness, roughly speaking, means that it is impossible for software to detect the intent of a piece of code. Meaning there is no way the GeForce driver can look at a shader program and say Hey! Wait! It’s calculating double-sided illumination, let me slow it way down! So there you go.

But any way, why am I griping about this now? Because of the Quadro’s other selling point, frame-sequential stereo, which is obviously close to my heart. Legacy professional applications use quad-buffered stereo, if any stereo at all, so clearly that’s a feature not supported by GeForces (the way how exactly GeForces enable stereo for games, but disable it for “serious applications” in Nvidia 3D Vision under Windows is yet another funny story).

But we’ve been using GeForces for years to generate stereo content for 3D TVs. Instead of frame-sequential stereo, 3D TVs use HDMI-standardized stereo modes, and those can be “faked” by a cleverly configured GeForce. Specifically, none of the HDMI modes require the extra stereo synchronization cable that Quadros have, but GeForces don’t. I’ve recently tried convincing manufacturers of desktop 3D monitors to support HDMI stereo as well, to reduce the total cost of such systems. And the reaction to my concern is typically “well, we’re aiming for professional applications, and for those Quadros are 10x faster, so…”

And here’s the punchline: on a brand-new GeForce GTX 680, the fixed-functionality performance difference between single-sided and double-sided surfaces is 33x. Using a shader, it’s somewhere around 1.2x. Surely, you can’t be serious?

Finally, to sprinkle some highly needed relevance onto this post: the upcoming Oculus Rift HMD will expect stereo frames in side-by-side format, meaning it will work with GeForces without any problems.

27 thoughts on “GPU performance: Nvidia Quadro vs Nvidia GeForce

    • Thank you for the information! I rarely work with ATI cards, because last time I checked, their Linux driver support was still worse than Nvidia’s, even given the latter’s shenanigans. What operating system was this under? Windows, Linux, Mac OS X?

      • Tested with WindowsXP.

        Besided the graphics driver problem which I has also seen some time ago under linux, I think ATI makes at the moment the better cards. especially regards to price/performance.

  1. I haven’t done a thorough review of the cards for some time now, but the benefit of a Quadro card used to be to the engineering/science community because they support double precision floating point operations, while the GeForce only allow for single precision.

    • Yes, I didn’t mention that. There is a clear performance difference in general-purpose GPU computing using CUDA. While GeForces do support double-precision arithmetic, their performance appears to be artifically capped at 1/8 single-precision performance, whereas Quadros get 1/2 of single-precision performance, as one would expect. Disclaimer: this is second-hand knowledge, I don’t do CUDA myself. In OpenGL, on the other hand, both Quadros and GeForces only use single-precision arithmetic. This is based on observing our CAVE, where the Quadros running the render nodes show exactly the same rounding problems when working with large-extent model data as regular GeForces do. Fortunately, there are workarounds in place.

      • I’ve noticed floating point rounding errors, as my 3D models get to, say, 5000m away from the origin. There’s some kind of triangular effect, where the graphics move in a triangular pattern when moving slowly through the scenery.
        If this rings a bell, what workarounds did you do? I know of storing everything in doubles, and moving closeby scenery towards the origin, but it seems very non-trivial to do this.

        • Hard to explain in brief, but the main idea is to render everything in physical coordinates, which are a VR-specific thing that more or less corresponds to eye coordinates. Object coordinates are all relative to their own origins, and most of the transformation math is done on the CPU in double precision. For very large objects, such as our Crusta virtual globe, the globe itself is broken up into huge numbers of tiny chunks, all around their own origins. Once you store vertex positions with large coordinate components, you’ve already lost.

  2. Interesting, I didn’t realize that this was an issue with the Quadros. If you don’t mind me asking, what sort of problems do you see visually (if that’s what you’re referring to) when it’s only single precision? We are also running Quadros in our graphics cluster for my lab’s CAVE, and I’d be interested in exploring this. Thanks for any comment.

    • The problem with IEEE floats is loss of precision as you move away from zero, due to the semi-exponential representation. Imagine that you’re visualizing global earthquakes (like in this movie: http://www.youtube.com/watch?v=OSUPAjrBUrk), where the Earth is centered around the model space origin. Now you zoom in (a lot) to a quake that’s close to the surface, so around 6000km away from the origin. Once you’re zoomed in, you start rotating the view around the quake you’re looking at. The problem is that while 4×4 matrices can rotate around any point you choose, once you multiply them out, they always rotate around the origin first and then apply a translation second. So to rotate around a point that’s not the origin, the matrix basically has to represent a translation of that point to the origin, a rotation, and then a translation back to the (now rotated) point. With infinite-precision math, the operations cancel out, so the center point of your rotation remains exactly in the same place, but with finite-precision, you get a round-off error that displaces your center point.

      The practical upshot is that, when you’re zoomed into a point away from the origin and rotate around that point, the entire model randomly jumps around the ideal point. This doesn’t happen when you translate, because then the rotation component of the matrix doesn’t change. If you’re looking at something on the surface of the Earth at 1:1 scale, the surface jumps around so much that the visualization basically becomes unusable. The work-around is to represent vertex positions around a local object origin, and then do most of the matrix calculations on the CPU in double precision and only upload the final matrices to OpenGL. It’s how Crusta allows 1:1 work on the surface, even though it’s using a whole Earth model. I should write a post explaining that in detail.

      Anyway, when doing what I described on a Quadro, the effect you see is exactly what you’d expect from single-precision arithmetic, so I’m sure Quadros do OpenGL in single-precision.

      • Thanks for taking the time to explain this, it is much appreciated. Not much else to say than “that makes sense”. – I also think this would make a great blog post. I’ve worked with graphics for some time now, and have not dealt with this. Now when it comes up, I will know what’s going on, and how to approach fixing it. Thanks again Oliver.

      • That post about doing matrices in doubles would definitely be interesting. The last I read about this subject is to keep all 3D models close to the origin, i.e. moving 3D objects on the fly so that the camera is always close to the origin. However, that’s quite non-trivial and gives all kinds of issues when trying to trace into the scene etc.

  3. One thing that I’m not sure of is, if you hack the graphics board to present itself as a Quadro (as is explained here & there on the web), if it also does stereo.
    I can imagine they won’t do double precision floating point, since that sounds much like a hardware thing. But stereo seems simple to implement, especially these days when everything is an FBO anyway (and triple+ buffering for example, where you have a multitude of backbuffers really).

    • I’d wager that stereo crippling on GeForces is purely a driver thing, so those hacks you’re referring to will probably do the job. It’s not something I’ll try myself, though.

      The few Quadros I do use are in our CAVE, where they have to be frame-locked to each other to provide seamless stereo across four screens. That’s definitely something that the GeForce can’t due, if for nothing else but a lack of the required connector on the board.

  4. So after all this , for working with 3ds max for example , would a quadro be recommended or a pro gaming gtx card ? because it seems gtx-es get crippled just so they wont outperform quadros but still in benchmarks and even people around say that gtx will outperform quadros except in the antialising area …

    • I have never worked with 3ds max, so I can’t answer that. It depends on how the software drives the graphics card. If 3ds max uses fixed-functionality OpenGL to do most of its rendering, I expect you’ll get a performance hit on GeForces. If everything is done in custom shaders, you shouldn’t.

  5. NVIDIA actually added OpenGL Quad Buffer Stereo (QBS) support for one monitor only to the GeForce Windows driver in 314.07 on Feb 18, 2013. QBS in three monitors on Windows doesn’t work. QBS is also working on at least one monitor with the Geforce Linux driver.

    http://www.mtbs3d.com/phpBB/viewtopic.php?f=105&t=16849
    https://forums.geforce.com/default/topic/556297/3d-vision/thanks-for-the-partial-geforce-opengl-quad-buffer-support-but-not-on-zalman-etc-/1

    Who knows why NVIDIA relented. Maybe because DirectX 11.1 introduce stereo support for all consumer gpus last year.

    • Thank you, that is very useful information. I’ll look into it more closely; it might change the way how we run low-cost VR environments.

      The reason Nvidia relented is because I exposed their shady practices on this here blog. No, I’m not really that delusional ;) It’s probably a response to DirectX, as you say.

  6. We use use a variety of Quadros and FirePros in our lab and we looked very carefully into the differences with the consumer versions. The following is a more or less complete list of the Quadro 6000 features compared to the Geforce 470 GTX, which is the closest consumer GPU in terms of performance (but not amount of memory). Some of them involve actual hardware differences, while others seem to be artificial limitations that NVIDIA uses for market differentiation.

    * A lot of memory (6GB) + ECC support. The new K6000 has 12GB with ECC.
    * 64x antialiasing with 4×4 supersampling, 128x with Quadro SLI. The Geforce is limited to 32x, but supersampling is used only in certain 8x and 16x modes.
    * Display synchronization across GPUs and across computers with the optional GSync card (new version is called Quadro Sync)
    * Support for SDI video interface, for broadcasting applications
    * GPU affinity so that multiple GPUs can be accessed individually in OpenGL. This feature is available on AMD Radeon but not on Geforce.
    * No artificial limits on rendering performance with very large meshes or computation with double precision
    * Support for quad-buffered stereo in OpenGL
    * Accelerated read-back with OpenGL. There are also dual copy engines so that 2 memory copy operations can run at the same time as rendering/computation. However, this is tricky to use.
    * Accelerated memory copies between GPUs (tricky to implement)
    * Very robust Mosaic mode where all the monitors connected to the computer are abstracted as a single large desktop. Unlike AMD’s Eyefinity, Mosaic works even across multiple GPUs (up to 4 GPUs / 16 displays with Kepler Quadros) and the performance is very good.

    Certain NVIDIA technologies are only licensed on the Quadros. For example, using the new video encoder in the Kepler GPUs for non-whitelisted/licensed applications works only on the Quadro 2000 and up. Out-of-core raytracing with NVIDIA OptiX is currently only available on the Quadros, although it seems that it will be enabled for Geforce GTX as well at some point. NVIDIA IRAY development also worked on the Quadros only the last time I checked.

    Finally, direct access to the engineering and driver teams can be priceless.

    • Thanks for the detailed write-up, but could you elaborate on this:

      No artificial limits on rendering performance with very large meshes

      I haven’t noticed any differences between GeForce and Quadro with large meshes, but I haven’t tried anything larger than around 5-10 million triangles. This is using indexed triangle sets in VBOs. At what mesh size do you see the slowdown?

      I did discover an interesting problem the other day, though. Rendering 3D video (from Kinect) was extremely slow on our Quadros, compared to same-generation GeForces (Quadro FX 5800 vs GeForce GTX 280). The 3D video renderer uploads triangle meshes at a rate of 30 Hz as they come in from the camera, using indexed triangles sets in VBOs in GL_DYNAMIC storage mode. On the GeForce, the driver would upload the new mesh to GPU memory on the first rendering call for good results, but on the Quadro it would keep the VBO in main memory for the first 5 or so rendering passes, not only slowing down rendering to a crawl, but also completely stalling the computer’s memory interface. Overall, when rendering from 3 Kinects, a 280 GTX could do about 200 FPS, and a FX 5800 could do about 10 FPS. Very strange.

  7. what if you are only interested in rendering for movies or commercials? So not a viewport. I am interested in GPU rendering such as cycles, Mental Ray, and VRay. That is all I care about. The CPU is too slow. So what I need are CUDA cores right? Is the GeForce crippled with creating a still image?

    • I think that depends on whether you can get away with single-precision arithmetic or require double-precision. Double-precision arithmetic is gimped on GeForces. I don’t have hard numbers and don’t do CUDA, but I think the performance drop compared to single-precision is a factor of four to eight. On a Quadro it’s two or less. For most ray casting operations and typical scenes single-precision is sufficient, but the software might default to double-precision regardless, and then you have a problem.

      The general rule is: if you’re mostly interested in a particular application, try that application on a variety of hardware and pick what works best.

    • When you say that cpu is slow , you mean a xenon 10 core processor x2 is slow ? Then you might want to try , as i saw a rendering machine at someone , with 3 video cards and made them work together if i remember he had a quadro a gtx and a titan , tons of cuda .still just speculating , i use CPU because i like vray. Btw vray uses only the processor maybe iray or others.

  8. Hi, one Questuion …

    I have got a Quadro 3500 FX (old) than for work I was fine, the problem is that not everything is black and white, work or play, because I do 2 things, I’m continuing training in CAD applications and 3Dmaxy also I play Call of duty and so.

    My question is with graphics cards today;

    Is it better the GTX650 2GB DDR5 2GD5 That my old Quadro 3500 to work (Cad, 3D Max, After effects?

    If I buy the GTX650 as percent, the performance more or less for cad?

    Any recommendations covering both expectations?

  9. QUESTION: if Titan is a gaming gpu than why did is there a quadro / titan driver available ? i’m so frustrated with this back and forth regarding the Titan.
    i’m very soon to build a 3D workstation
    i bought a Gtx 760 to no benefit except a rough 4 million polys with 30 lights performance ceiling ( before it becomes unworkable in shaded mode ).
    i’m about to shell out $5-6000 on this build and i don’t want to make the same mistake.
    simple question – would a Titan Black out render sli 2 x Quadro k4000 ?

  10. (Ok. Delete my previous posts, way too crazy autocorrection)

    Funny how all professionel 3d people rant about how quadro is way better, simply because they spend 10x times as much cash on them… being a gamer, a 3d modeler and part time hacker i can only say geforce is 1:1 with quadro both in 3ds Max and adobe mercury – you just need to hack the driver (for mercury, you actually only need to delete the gfx card whitelist file) – but most modern cuda enabled applications work fine with gtx cards with stock drivers. Such as element3d for ae, vray gpu, mental ray gpu and realistic viewport shading in 3dsmax 2013+ … i guess software devs in time have given nvidia the finger – but why would nvidia make their own gpu shader ( mental ray ) just as power full on gtx