Zero-latency Rendering

I finally managed to get the Oculus Rift DK2 fully supported in my Vrui VR toolkit, and while there are still some serious issues, such as getting the lens distortion formulas and internal HMD geometry exactly right, I’ve already noticed something really neat.

I have a bunch of graphically simple applications that run at ridiculous frame rates (some get several thousand fps on an Nvidia GeForce 770 GTX), and with some new rendering configuration options in Vrui 4.0 I can disable vsync, and render directly into the display window’s front buffer. In other words, I can let these applications “race the beam.”

There are two main results of disabling vsync and rendering into the front buffer: For one, the CPU and graphics card get really hot (so this is not something you want to do this naively). But second, let’s assume that some application can render 1,000 fps. This means, every millisecond, a new complete video frame is rendered into video scan-out memory, where it gets picked up by the video controller and sent across the video link immediately. In other words, almost every line of the Rift’s display gets a “fresh” image, based on most up-to-date tracking data, and flashes this image to the user’s retina without further delay. Or in other words, total motion-to-photon latency for the entire screen is now down to around 1ms. And the result of that is by far the most solid VR I’ve ever seen.

Not entirely useful, but pretty cool nonetheless.

22 thoughts on “Zero-latency Rendering

  1. Does it sends the pixels whole, or does it send red, green and blue sequentially? Is 1ms fast enough to send a new frame per subpixel?

    Btw, can you make stuff render one pixel at a time, instead of wasting a whole screen- worth of pixels to update just one? Or is there no way to guarantee sync with the “beam”, so no point in trying to render just the current pixel?

    • I assume the scan-out hardware grabs at least 32 bits per read, so it would only update all components of a pixel.

      There’s no way I found to synchronize to the level of a scan line, let alone a pixel. I have a mode that does lens undistortion immediately after vsync to race the beam even when an app has normal frame rates, but there’s always tear because the rendering thread doesn’t wake up fast enough. This would need kernel and graphics driver support. Apparently, the state of vsync-related OpenGL extensions is abysmal.

      • Is the refresh rate consistent enough that you can preemptively tell it start rendering before the next cycle begins based on how long it has been since the current cycle began in order to have it render in sync with the scan? Or is the issue actually the consistency of the computer’s clock?

        • Clock accuracy is one issue, but the main problem I’ve had is OS scheduler granularity. If I put a thread to sleep and tell the OS to wake it up exactly at time T, it actually wakes up at T+dt, where dt can be up to several ms. I could get around that by active waiting, but that would peg the CPU at full utilization and lock out background processes, so it’s not ideal.

  2. Thanks for sharing your information! A few questions:
    * “directly into the display window’s front buffer”: how do you enable this? I looked into the current version of Gameworks VR, but it doesn’t support the front buffer rendering part yet.
    * Your camera view updates come from the head orientation. Correct?
    * For the 1000 fps example, you upload a full frame all the time or just some areas that have changed and are relevant for the current scanout?
    * Does the scanout take 16 ms or does it show all pixels at once, updating every 16 ms?

      1. It’s a low-level GLX thing. You can select single- or double-buffered visuals when creating an OpenGL window in the X window system.
      2. Orientation and position, and world animation (if there is any). That’s this “method’s” only benefit over time warp.
      3. Full frame every time. I didn’t find a reliable way to query where the beam is, and even then, you’d need to know where it is before you render, and by the time you’re done rendering, the beam is already elsewhere.
      4. Not sure I understand your questions correctly, Total scan-out still takes a little less than 13ms (75 Hz minus vblank period), but now the contents of the frame that are being scanned out are updated while scan-out is happening.
  3. I read everything. This is very very good work. “Fixing” these fundamental workings in the rendering pipeline sets the foundation for VR in the future. Thank you!

  4. this is the way how I reduce the latency in iRacing, just vsync off and lowest settings possible to get 300 to 400fps. Still 400fps is not enough, the tearing is too much visible.

    • Someone should look into that. I noticed zero tearing with apps running at 1,000+ fps, but a lot of horrific tearing when running close to 75 fps (well, duh). I could take a high-fps application and cap the frame rate (fortunately Vrui has a setting for that) and see if there’s some threshold, or if it’s just a gradual decline in quality until it becomes unusable.

  5. YES!

    What are you using for tracking and what is the update rate?

    I once got end-to-end latency down to under 8ms using just-in-time scanline rendering at about about 20,000 Hz onto an old CRT for fast response (when OLEDs weren’t accessible). Unfortunately tracking was limited to about 1500 Hz. The scene was updated every few scanlines.
    http://www.cs.unc.edu/xcms/wpfiles/dissertations/jerald.pdf

    It would be interesting to experiment with what it would look like to pitch the head up or down really quickly when doing just-in-time pixels. If you pitched the head up at the same rate that pixels scanned out in the vertical direction then theoretically the vertical field of view would go to a single scanline. If you pitched the head down then the vertical field of view would expand. But who knows how the brain would perceive that.

    Jason

    • I’m using the standard Rift head tracker (via Oculus’ own tracking service), so the update rate is either 1000Hz or 500Hz if the Rift’s firmware still packs two IMU updates into each USB packet, and tracking latency should be fractions of a millisecond in the first case, and 1ms+epsilon in the second.

  6. Start “where” and render “there”- sorry does not compute.

    Latency remains the same due to the rasterizers work. And the environment already does revolve around the “camera”.

    If you prerender every possible position you still need to relay the tracking in order to move the pixels to the framebuffer ?

    It remains to be 2.5 D. Even if the bus system does not care for “where” the control bits are added to change which rasterized parts are displayed and even when the pitch and yaw speeds and moving speed does narrow down the pre- rasterize stack? Stash?.

    To determine “when” I´ll look at something somehow is hard to pass by. And nature supplies us with a comfortable latency and synchronizing set already… considered sound and light don´t travel the same speed and reaqding lips still makes sense.

    Sandwich the graphics chip with a game controller port?

    Resource please.

    • Line-by-line. See the latency timing diagram in this post. If the DK2 buffered one frame, it would have a display latency of at least 1/75s=13ms (time it takes to send a full frame over HDMI at 75Hz refresh rate).

      • So is there a rolling shutter effect if you render at only 75fps (where the bottom of the screen is 1/75 s more delayed than the top of the screen)?
        Also do you happen to know if the HTC vive and oculus CV1 also update line by line?

        I can’t see a diagram in the post above.

        • That is correct. Oculus’ SDK accounts for that by time-warping the bottom of the screen farther into the future than the top of the screen (where the top is actually on the right of the headset, and the bottom on the left).

          Based on my latency measurements, the Vive has a globally-exposing screen. I don’t know about the Rift CV1, but I’ve heard rumors that it has global exposure as well.

          I can’t see a diagram in the post above.

          Scroll down to Figure 2.

Leave a Reply to okreylos Cancel reply

Your email address will not be published. Required fields are marked *