Chromium (the browser) and DRI3

I got a note on IRC a week ago that Chromium was crashing with DRI3.

The Google team working on Chromium eventually sent me a link to the bug report. That's secret Google stuff, so you won't be able to follow the link, even though it's a bug in a free software application when running on free software drivers.

There's a bug report in the freedesktop bugzilla which looks the same to me.

In both cases, the recommended “fix” was to switch from DRI3 back to DRI2. That's not exactly a great plan, given that DRI3 offers better security between GPU-using applications, which seems like a pretty nice thing to have when you're running random GL applications from the web.

Chromium Sandboxing

I'm not entirely sure how it works, but Chromium creates a process separate from the main browser engine to talk to the GPU. That process has very limited access to the operating system via some fancy library adventures. Presumably, the hope is that security bugs in the GL driver would be harder to leverage into a remote system exploit.

Debugging in this environment is a bit tricky as you can't simply run chromium under gdb and expect to be able to set breakpoints in the GL driver. Instead, you have to run chromium with a magic flag which causes the GPU process to pause before loading the driver so you can connect to it with gdb and debug from there, along with a flag that lets you see crashes within the gpu process and the usual flag that causes chromium to ignore the GPU black list which seems to always include the Intel driver for one reason or another:

$ chromium --gpu-startup-dialog --disable-gpu-watchdog --ignore-gpu-blacklist

Once Chromium starts up, it will print out a message telling you to attach gdb to the GPU process and send that process a SIGUSR1 to continue it. Now you can happily debug and get a stack trace when the crash occurs.

Locating the Bug

The bug manifested with a segfault at the first access to a DRI3-allocated buffer within the application. We've seen this problem in the past; whenever buffer allocation fails for some reason, the driver ignores the problem and attempts to de-reference through the (NULL) buffer pointer, causing a segfault. In this case, Chromium called glClear, which tried (and failed) to allocate a back buffer causing the i965 driver to subsequently segfault.

We should probably go fix the i965 driver to not segfault when buffer allocation fails, but that wouldn't provide a lot of additional information. What I have done is add some error messages in the DRI3 buffer allocation path which at least tell you why the buffer allocation failed. That patch has been merged to Mesa master, and should also get merged to the Mesa stable branch for the next stable release.

Once I had added the error messages, it was pretty easy to see what happened:

$ chromium --ignore-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted

The first two errors were just the sandbox preventing Mesa from using my GL configuration file. I'm not sure how that's a security problem, but it shouldn't harm the driver much.

The last error is where the problem lies. In Mesa, the DRI3 implementation uses a chunk of shared memory to hold a fence object that lets Mesa know when buffers are idle without using the X connection. That shared memory segment is allocated by creating a temporary file using the O_TMPFILE flag:

fd = open("/dev/shm", O_TMPFILE|O_RDWR|O_CLOEXEC|O_EXCL, 0666);

This call “cannot fail” as /dev/shm is used by glibc for shared memory objects, and must therefore be world writable on any glibc system. However, with the Chromium sandbox enabled, it returns EPERM.

Running Without a Sandbox

Now that the bug appears to be in the sandboxing code, we can re-test with the GPU sandbox disabled:

$ chromium --ignore-gpu-blacklist --disable-gpu-sandbox

And, indeed, without the sandbox getting in the way of allocating a shared memory segment, Chromium appears happy to use the Intel driver with DRI3.

Final Thoughts

I looked briefly at the Chromium sandbox code. It looks like it needs to know intimate details of the OpenGL implementation for every possible driver it runs on; it seems to contain a fixed list of all possible files and modes that the driver will pass to open(2). That seems incredibly fragile to me, especially when used in a general Linux desktop environment. Minor changes in how the GL driver operates can easily cause the browser to stop working.