Why does everything in my VR headset look so pixelated? It’s supposed to be using a 2160×1200 screen, but my 1080p desktop monitor looks so much sharper!
This is yet another fundamental question about VR that pops up over and over again, and like the others I have addressed previously, it leads to interesting deeper observations. So, why do current-generation head-mounted displays appear so low-resolution?
Here’s the short answer: In VR headsets, the screen is blown up to cover a much larger area of the user’s field of vision than in desktop settings. What counts is not the total number of pixels, and especially not the display’s resolution in pixels per inch, but the resolution of the projected virtual image in pixels per degree, as measured from the viewer’s eyes. A 20″ desktop screen, when viewed from a typical distance of 30″, covers 37° of the viewer’s field of vision, diagonally. The screen (or screens) inside a modern VR headset cover a much larger area. For example, I measured the per-eye field of view of the HTC Vive as around 110°x113° under ideal conditions, or around 130° diagonally (it’s complicated), or three and a half times as much as that of the 20″ desktop monitor. Because a smaller number of pixels (1080×1200 per eye) is spread out over a much larger area, each pixel appears much bigger to the viewer.
Now for the long answer.
Well, actually, the short answer is fine. Current VR headsets look low-resolution because they are low-resolution, in terms of pixels per degree. What’s interesting is a related question: how can one measure the effective resolution of VR headsets, so that one could evaluate different trade-offs between price, resolution, field of view, comfort, image quality, etc., or compare different headsets? Or, so that an application developer could decide at what size important virtual objects should appear so that they can reliably be detected by users?
As I already pointed out, using the screens’ (easily measured) resolution in pixels per inch (ppi) is meaningless, because the screen is not viewed directly, but magnified by one or more lenses. For example, if a microdisplay with an insanely high ppi number is blown up to fill HxV degrees of field of vision, and a full-size panel with a moderate ppi number but the same total number of pixels is magnified to the same HxV degrees, their apparent resolutions will be the more or less the same.
Pixels per degree (ppd) is a better measure, because it takes lens magnification into account, and is also compatible with how the resolution of our eyes is measured. But the problem is how to measure a headset’s ppd. It’s not as simple as taking the total field of view and dividing it by the total number of pixels. Due to the screens being flat, and due to distortion from the magnifying lenses, the ppd number changes from the center of the screen towards its edges, and not in straightforward ways. Take a look at the pictures I took in the article I already linked above. The green lines are straight horizontal and vertical lines on the headset’s screen, equal numbers of pixels apart, and the purple circles show angle to the center of the observing camera, in increments of 5°. Notice how the relationship between green lines and purple circles changes.
To improve the measure, one could expend more effort and measure the angular resolution at the center of the screen and use that as the single quantifier, but even that doesn’t tell the whole story. It ignores aliasing, the effects of super- and/or multi-sampling, reconstruction filter quality during the lens distortion correction warping step, remaining uncorrected chromatic aberration, glare, Fresnel lens artifacts, how many of the screen’s pixels actually end up visible to the user, and an increase in perceived resolution due to tiny involuntary head movements (“temporal super-sampling,” more on that later).
What I propose, instead, is to measure a headset’s effective resolution by the visual acuity it provides, or the size of the smallest details that can reliably be resolved by a normal-sighted viewer. After all, that’s what really counts in the end. Users want to know how well they can see small or far-away objects, read gauges in virtual cockpits, or read text. Other resolution measures are just steps on the way of getting there, so why not directly measure the end result?
Visual Acuity Measurement Procedure
It’s a good idea on paper, but the tricky bit is that it tries to measure not what a display objectively displays, but what a viewer subjectively sees. As you can’t just shove a ruler inside someone’s eyeball, measurement will have to rely on user feedback. Fortunately, there’s an established procedure for that: the standard vision test chart (“Snellen chart”) used in optometric eye exams. The chart is set up at a fixed distance from the viewer (6m in metric countries, 20′ elsewhere), and it displays glyphs or so-called optotypes of decreasing sizes, calculated to subtend specific angles of the viewer’s field of vision. Most prominently, the central line in a Snellen chart features optotypes exactly five minutes of arc (1 minute of arc = 1/60°) in diameter, with important features such as character stems exactly one minute of arc across, defining (somewhat arbitrarily) “normal” or 20/20 visual acuity (the twenty in the numerator refers to the chart’s distance of 20′), or 6/6 visual acuity in metric countries. Other acuities such as 20/10 (6/3 metric) refer to optotypes half that size, or 20/40 (6/12 metric) to optotypes double that size. Expressed in terms of angular resolution, 20/x visual acuity means that a viewer can reliably detect features that are x/20 minutes of arc across.
The process of measuring visual acuity in a repeatable manner, and giving the viewer minimal opportunity to cheat, is best demonstrated in a video (see Figure 2). Note that the measuring utility uses a specific optotype, called “Landolt C” after its inventor, that is more easily randomized and provides exactly one critical feature, a gap that is precisely one minute of arc across at 20/20 size (see Figure 1).
Figure 2: Repeatable visual acuity measurement using feedback-driven test charts, using the standardized “Landolt C” optotype. Test performed on a 28″ 3840×2160 LCD monitor, viewed from a distance of 31.5″, using 16x multisampling.
The following are results from visual acuity tests I performed myself. By themselves they are neither scientific nor indicative, due to the sample size of one. To make meaningful judgments about the resolution provided by headsets, or the improvements gained from changing parameters such as super-sampling, the measurements would have to be repeated by a large number of test subjects. Nonetheless, these give some first indications.
28″ 2160p LCD Desktop Display
These are measurements I took using my desktop monitor, an Asus PB287Q. I set up rendering parameters such that the displayed size of optotypes precisely matched the sizes of the same optotypes on a real test chart, by carefully measuring the size of the monitor’s display area, and the distance from my eyes to the screen surface.
Interestingly, while I was recording the video in Figure 2 (which generated the result in table row 2), I speculated how the result is probably limited as much by the monitor as by my eyes. While reviewing the video footage I recorded, I realized I was wrong — when “cheating” and leaning in closer, or magnifying the footage, I could see the optotypes more clearly than I could during the test, and could easily notice all the mistakes I made. Meaning, my eyes were the limiting factor here, and are not as good as I thought. 🙁 I need to have a talk with my optometrist. I am estimating that the visual acuity my monitor actually provides is closer to 20/10.
The real resolution of my screen, under the viewing conditions used in the test, is 86.6 pixels per degree, or 0.7 minutes of arc per pixel, which, if nothing else mattered, would correspond to a visual acuity of 20/13.8. That closely matches my result of 20/14, but looking at the test images up close, it is possible to resolve optotypes smaller than that. This can primarily be chalked up to multi-sampling.
This is, of course, where it gets interesting. I performed a bunch of tests (again, only on myself), using different rendering settings. My VR software, Vrui, supports several different ways to tweak image quality. The simplest is super-sampling, which exploits that the 3D environment is first rendered to an intermediate frame buffer, and then resampled to the screen’s real pixels during lens distortion correction. This intermediate buffer can be as large or small as desired. For the HTC Vive, the default intermediate size is 1512×1680 pixels, which corresponds to a 1:1 scale in the center of the screen after lens distortion. Super-sampling increases or decreases the intermediate size by some factor, which gives the lens correction step more pixels to work with, and could therefore lead to less aliasing and better clarity.
The second method is multi-sampling, which follows the same approach of rendering more (sub-)pixels to reduce aliasing, but does it under the hood, so that it even works when directly rendering to a display window (such as in my desktop test above). In Vrui, the intermediate frame buffer can use multi-sampling, which has similar effects as super-sampling, but is “cheaper” in some ways, and could still, in theory, offer superior results due to better averaging filters when generating the final pixels.
The third method is an additional tweak to super-sampling. If the chosen super-sampling factor is too large, the method backfires: There are now many pixels to work with, but the reconstruction filter used during lens distortion correction is typically a simple bilinear filter, which is now the limiter on visual quality because it only uses four source pixels when calculating a final pixel color. This would degrade image quality at super-sampling factors close to and above 2.0x. To experiment with this, Vrui has the option of running a bicubic reconstruction filter during lens distortion correction, which uses 16 source pixels to calculate one final image pixel (this is the same filter used in Photoshop or The Gimp when “high quality” or “cubic” image resizing is selected).
The following table is a non-exhaustive (well, to me, it was!) series of tests playing with those three parameters:
Besides the raw numbers, performing these tests was interesting in itself. At the base settings (1.0xSS, 1xMS, bilinear), everything looked pixelated and jaggy, but I could still resolve shapes smaller than the actual pixel grid, by continuously moving my head side-to-side. Due to head tracking, this caused the fixed pixel grid to move relatively to the fixed optotypes, which meant that on subsequent frames, different points of those optotypes were sampled into the final image, which let my brain form a hyper-resolution virtual image. I know this is not a new discovery, but my Google-fu is failing me and I cannot find a reference or an agreed-upon name for this phenomenon right now, so I’m going to call it “temporal super-sampling” for the time being.
The difference going from the base settings to 16x multi-sampling was a vast visual improvement — no more jaggies, beautiful straight and uniformly wide screen protector lines — but, somehow, this improvement caused a decrease in visual acuity. This might just be an outlier, but one possible explanation is that the smoothness of multi-sampling interferes with temporal super-sampling. Because the optotypes are sampled evenly in every frame, their images don’t change appreciably under head movement.
Going from 1.0x to 2.0x super-sampling (and back to 1x multi-sampling) brought the jaggies back, but apparently it also brought temporal super-sampling back. Things looked subjectively worse than in the previous test, but the resulting acuity of 20/35 is surprisingly high.
Adding 16x multi-sampling to 2.0x super-sampling yielded another improvement, removing jaggies without a concomitant loss in visual acuity. The question is whether this setting is realistic for non-trivial VR applications, given that the graphics card would have to render 10 megapixels (or 160 mega-sub-pixels) per eye per frame, 90 times per second. The simple vision test utility, at least, didn’t break a sweat, comfortably coasting along at 480 fps.
At 3.0x super-sampling without multi-sampling, aliasing came back big time, even when using a bicubic reconstruction filter, dropping subjective quality below 2.0x levels. But the acuity results remained high.
In general, it appears that bicubic filtering during lens distortion correction at high super-sampling factors reduces visual acuity. I don’t have a good explanation for this, and need to look at it more closely.
I believe, and I think my preliminary tests show, that it is possible to use subjective visual acuity, measured via a feedback-driven automated system, as a reliable way to judge the visual clarity, or effective resolution, of head-mounted or regular displays. The measure takes effects besides raw display panel resolution and lens magnification into account, and even applies to future head-mounted displays that might use foveated rendering, near-eye light fields, or holography as their display technologies. In a practical sense, it also allows evaluation of different display settings, and gives users and developers a way to predict required minimum sizes for user interface elements, depending on what display hardware and rendering settings are used.
On another note, I need to admit a mistake. Before running these tests, I repeatedly claimed that the visual acuity provided by current VR headsets, specifically the HTC Vive, would be on the order of 20/60. This was based on an earlier implementation of the vision test utility, which was not yet automated and relied on self-assessment by the user. My results there were worse, because at first glance, optotypes below 20/60 looked unresolvable. But the automated method shows that I was able to identify optotypes at a rate that would be hard to explain by guessing. If my math is right, the probability of correctly identifying four or five out of five optotypes, under the assumption that the user is picking orientations randomly, would be 0.11%.
This does not mean that it was easy identifying those optotypes, or that it would be practical or even possible to read text at those sizes. It took me several seconds each, and a lot of squinting, to identify them during the later stages of each test. Nevertheless, it appears that my old 20/60 estimate was too pessimistic.