DRI-Next

Thoughts about DRI.Next

On the way to the X Developer's Conference in Nuremberg, Eric and I chatted about how the DRI2 extension wasn't really doing what we wanted. We came up with some fairly rough ideas and even held an informal “presentation” about it.

We didn't have slides that day, having come up with the content for the presentation in the hours just before the conference started. This article is my attempt to capture both that discussion and further conversations held over roast pork dinners that week.

A brief overview of DRI2

Here's a list of the three things that DRI2 currently offers.

Application authentication.

The current kernel DRM authentication mechanism restricts access to the GPU to applications connected to the DRM master. DRI2 implements this by having the application request the DRM cookie from the X server which can then be passed to the kernel to gain access to the device.

This is fairly important because once given access to the GPU, an application can access any flink'd global buffers in the system. Given that the application sends screen data to the X server using flink'd buffers, that means all screen data is visible to any GPU-accessing application. This bypasses any GPU hardware access controls.

Allocating buffers.

DRI2 defines a set of 'attachment points' for buffers which can be associated with an X drawable. An application needing a specific set of buffers for a particular rendering operation makes a request of the X server which allocates the buffers and passes back their flink names.

The server automatically allocates new buffers when window sizes change, sending an event to the application so that it knows to request the new buffers at some point in the future.

Presenting data to the user.

The original DRI2 protocol defined only the DRI2CopyRegion request which copied data between the allocated buffers. SwapBuffers was implemented by simply copy data from the back buffer to the front buffer. This didn't provide any explicit control over frame synchronization, so a new request, DRI2SwapBuffers, was added to expose controls for that. This new request only deals with the front and back buffers, and either copies from back to front or exchanges those two buffers.

Along with DRI2SwapBuffers, there are new requests that wait for various frame counters and expose those to GL applications through the OML_sync_control extension

What's wrong with DRI2?

DRI2 fixed a lot of the problems present with the original DRI extension, and made reliable 3D graphics on the Linux desktop possible. However, in the four years since it was designed, we've learned a lot, and the graphics environment has become more complex. Here's a short list of some DRI2 issues that we'd like to see fixed.

InvalidateBuffers events. When the X window size changes, the buffers created by the X server for rendering must change size to match. The problem is that the client is presumably drawing to the old buffers when the new ones are allocated. Delivering an event to the client is supposed to make it possible for the client to keep up, but the reality is that the event is delivered at some random time to some random thread within the application. This leads to general confusion within the application, and often results in a damaged frame on the screen. Fortunately, applications tend to draw their contents often, so the damaged frame only appears briefly.
No information about new back buffer contents. When a buffer swap happens and the client learns about the new back buffer, the back buffer contents are always undefined. For most applications, this isn't a big deal as they're going to draw the whole window. However, compositing managers really want to reduce rendering by only repainting small damaged areas of the window. Knowing what previous frame contents are present in the back buffer allows the compositing manager to repaint just the affected area.
Un-purgable stale buffers. Between the X server finishing with a buffer and the client picking it up for a future frame, we don't need to save the buffer contents and should mark the buffer as purgable. With the current DRI2 protocols, this can't be done, which leaves all of those buffers hanging around in memory.
Driver-specific buffers. The DRI2 buffer handles are device specific, and so we can't use buffers from other devices on the screen. External video encoders/cameras/encoders can't be used with the DRI2 extension.
GEM flink has lots of issues. The flink names are global, allowing anyone with access to the device to access the flink data contents. There is also no reference to the underlying object, so the X server and client must carefully hold references to GEM objects during various operations.

Proposed changes for DRI.Next

Given the three basic DRI2 operations (authentication, allocation, presentation), how can those be improved?

Eliminate DRI/DRM magic-cookie based authentication

Kristian Høgsberg, Martin Peres, Timothée Ravier & Daniel Vetter gave a talk on DRM2 authentication at XDC this year that outlined the problems with the current DRM access control model and proposed some fairly simple solutions, including using separate device nodes—one for access to the GPU execution environment and a separate, more tightly controlled one, for access to the display engine.

Combining that with the elimination of flink for communicating data between applications and there isn't a need for the current magic-cookie based authentication mechanism; simple file permissions should suffice to control access to the GPU.

Of course, this ignores the whole memory protection issue when running on a GPU that doesn't provide access control, but we already have that problem today, and this doesn't change that, other than to eliminate the global uncontrolled flink namespace.

Allocate all buffers in the application

DRI2 does buffer allocation in the X server. This ensures that that multiple (presumably cooperating) applications drawing to the same window will see the same buffers, as is required by the GLX extension. We suspected that this wasn't all that necessary, and it turns out to have been broken several years ago. This is the traditional way in X to phase out undesirable code, and provides an excellent opportunity to revisit the original design.

Doing buffer allocations within the client has several benefits:

No longer need DRI2 additions to manage new GL buffers. Adding HiZ to the intel driver required new DRI2 code in the X server, even though X wasn't doing anything with those buffers at all.
Eliminate some X round trips currently required for GL buffer allocation.
Knowing what's in each buffer. Because the client allocates each buffer, it can track the contents of them.
Size tracking is trivial. The application sends the GL the of the viewport, and the union of all viewports should be the same as the size of the window (or there will be undefined contents on the screen). The driver can use the viewport information to size the buffers and ensure that every frame on the screen is complete.

Present buffers through DMA-buf

The new DMA-buf infrastructure provides a cross-driver/cross-process mechanism for sharing blobs of data. DMA-buf provides a way to take a chunk of memory used by one driver and pass it to another. It also allows applications to create file descriptors that reference these objects.

For our purposes, it's the file descriptor which is immediately useful. This provides a reliable and secure way to pass a reference from an underlying graphics buffer from the client to the X server by sending the file descriptor over the local X socket.

An additional benefit is that we get automatic integration of data from other devices in the system, like video decoders or non-primary GPUs. The 'Prime' support added in DRI version 2.8 hacks around this by sticking a driver identifier in the driverType value.

Once the buffer is available to the X server, we can create a request much like the current DRI2SwapBuffers request, except instead of implicitly naming the back and front buffers, we can pass an arbitrary buffer and have those contents copied or swapped to the drawable.

We also need a way to copy a region into the drawable. I don't know if that needs the same level of swap control, but it seems like it would be nice. Perhaps the new SwapBuffers request could take a region and offset as well, copying data when swapping isn't possible.

Managing buffer allocations

One trivial way to use this new buffer allocation mechanism would be to have applications allocate a buffer, pass it to the X server and then simply drop their reference to it. The X server would keep a reference until the buffer was no longer in use, at which point the buffer memory would be reclaimed.

However, this would eliminate a key optimization in current drivers— the ability to re-use buffers instead of freeing and allocating new ones. Re-using buffers takes advantage of the work necessary to setup the buffer, including constructing page tables, allocating GPU memory space and flushing caches.

Notifying the application of idle buffers

Once the X server is finished using a buffer, it needs to notify the application so that the buffer can be re-used. We could send these notifications in X events, but that ends up in the twisty mess of X client event handling which has already caused so much pain with Invalidate events. The obvious alternative is to send them back in a reply. That nicely controls where the data are delivered, but causes the application to block waiting for the X server to send the reply.

Fortunately, applications already want to block when swapping buffers so that they get throttled to the swap buffers rate. That is currently done by having them wait for the DRI2SwapBuffers reply. This provides a nice place to stick the idle buffer data. We can simply list buffers which have become idle since the last SwapBuffers reply was delivered.

Releasing buffer memory

Applications which update only infrequently end up with a back buffer allocated after their last frame which can't be freed by the system. The fix for this is to mark the buffer purgable, but that can only be done after all users of the buffer are finished with it.

With this new buffer management model, the application effectively passes ownership of its buffers to the X server, and the X server knows when all use of the buffer are finished. It could mark buffers as purgable at that point. When the buffer was sent back in the SwapBuffers reply, the application would be able to ask the kernel to mark it un-purgable again.

A new extension? Or just a new DRI2 version?

If we eliminate the authentication model and replace the buffer allocation and presentation interfaces, what of the existing DRI2 protocol remains useful? The only remaining bits are the other synchronization requests: DRI2GetMSC, DRI2WaitMSC, DRI2WaitSBC and DRI2SwapInterval.

Given this, does it make more sense to leave DRI2 as it is and plan on deprecating, and eventually eliminating, it?

Doing so would place a support burden on existing applications, as they'd need to have code to use the right extension for the common requests. They'll already need to support two separate buffer management versions though, so perhaps this burden isn't that onerous?