DRI3K — First Steps
Here's an update on DRI3000. I'll start by describing what I've managed to get working and then summarize discussions that happened on the xorg-devel mailing list.
Private Back Buffers
One of the big goals for DRI3000 is to finish the job of moving buffer management out of the X server and into applications. The only thing still allocated by DRI2 in the X server are back buffers; everything else moved to the client side. Yes, I know, this breaks the GLX requirement for sharing buffers between applications, but we just don't care anymore.
As a quick hack, I figured out how to do this with DRI2 today — allocate our back buffers separately by creating X pixmaps for them, and then using the existing DRI2GetBuffersWithFormat request to get a GEM handle for them.
Of course, now that all I've got is a pixmap, I can't use the existing DRI2 swap buffer support, so for now I'm just using CopyArea to get stuff on the screen. But, that works fine, as long as you don't care about synchronization.
Handling Window Resize
The biggest pain in DRI2 has been dealing with window resize. When the window resizes in the X server, a new back buffer is allocated and the old one discarded. An event is delivered to 'invalidate' the old back buffer, but anything done between the time the back buffer is discarded and when the application responds to the event is lost.
You can easily see this with any GL application today — resize the window and you'll see occasional black frames.
By allocating the back buffer in the application, the application handles the resize within GL; at some point in the rendering process the resize is discovered, and GL creates a new buffer, copies the existing data over, and continues rendering. So, the rendered data are never lost, and every frame gets displayed on the screen (although, perhaps at the wrong size).
The puzzle here was how to tell that the window was resized. Ideally, we'd have the application tell us when it received the X configure notify event and was drawing the frame at the new size. We thought of a cute hack that might do this; track GL calls to change the viewport and make sure the back buffer could hold the viewport contents. In theory, the application would receive the X configure notify event, change the viewport and render at the new size.
Tracking the viewport settings for an entire frame and constructing their bounding box should describe the size of the window; at least it should describe the intended size of the window.
There's at least one serious problem with this plan — applications may well call glClear before calling glViewport, and as glClear does not use the current viewport, instead clearing the "whole" window, we couldn't use the viewport as an indication of the current window size.
However, what this exercise did lead us to realize was that we don't care what size the window actually is, we only care what size the application thinks it is. More accurately, the GL library just needs to be aware of any window configuration changes before the application, so that it will construct a buffer that is not older than the application knowledge of the window size.
I came up with two possible mechanisms here; the first was to construct a shared memory block between application and X server where the X server would store window configuration changes and signal the application by incrementing a sequence number in the shared page; the GL library would simply look at the sequence number and reallocate buffers when it changed.
The problem with the shared memory plan was that it wouldn't work across the network, and we have a future project in mind to replace GLX indirect rendering with local direct rendering and PutImage which still needs accurate window size tracking. More about that project in a future post though...
X Events to the Rescue
So, I decided to just have the X server send me events when the window size changed. I could simply use the existing X configure notify events, but that would require a huge infrastructure change in the application so that my GL library could get those events and have the application also see them. Not knowing what the application is up to, we'd have to track every ChangeWindowAttributes call and make sure the event_mask included the right bits. Ick.
Fortunately, there's another reason to use a new event — we need more information than is provided in the ConfigureNotify event; as you know, the Swap extension wants to have applications draw their content within a larger buffer that can have the window decorations placed around it to avoid a copy from back buffer to window buffer. So, our new ConfigureNotify event would also contain that information.
Making sure that ConfigureNotify event is delivered before the core ConfigureNotify event ensures that the GL library should always be able to know about window size changes before the application.
Splitting the XCB Event Stream
Ok, so I've got these new events coming from the X server. I don't want the application to have to receive them and hand them down to the GL library; that would mean changing every application on the planet, something which doesn't seem very likely at all.
Xlib does this kind of thing by allowing applications to stick themselves into the middle of the event processing code with a callback to filter out the events they're interested in before they hit the main event queue. That's how DRI2 captures Invalidate events, and it "works", but using callbacks from the middle of the X event processing code creates all kinds of locking nightmares.
As discussed above, I don't care when GL sees the configure events, as long as it gets them before the application finds about about the window size change. So, we don't need to synchronously handle these events, we just need to be able to know they've arrived and then handle them on the next call to a GL drawing function.
What I've created as a prototype is the ability to identify specific events and place them in a separate event queue, and when events are placed in that event queue, to bump a 'sequence number' so that the application can quickly identify that there's something to process.
Making the Event Mask Per-API Instead of Per-Client
The problem described above about using the core ConfigureNotify events made me think about how to manage multiple APIs all wanting to track window configuration. For core events, the selection of which events to receive is all based on the client; each client has a single event mask, and each client receives one copy of each event.
Monolithic applications work fine with this model; there's one place in the application selecting for events and one place processing them. However, modern applications end up using different APIs for 3D, 2D and media. Getting those libraries to cooperate and use a common API for event management seems pretty intractable. Making the X server treat each API as a separate entity seemed a whole lot easier; if two APIs want events, just have them register separately and deliver two events flagged for the separate APIs.
So, the new DRI3 configure notify events are created with their own XID to identify the client-side owner of the event. Within the X server, this required a tiny change; we already needed to allocate an XID for each event selection so that it could be automatically cleaned up when the client exited, so the only change was to use the one provided by the client instead of allocating one in the server.
On the wire, the event includes this new XID so that the library can use it to sort out which event queue to stick the event in using the new XCB event stream splitting code.
Current Status
The above section describes the work that I've got running; with it, I can run GL applications and have them correctly track window size changes without losing a frame. It's all available on the 'dri3' branches of my various repositories for xcb proto, libxcb, dri3proto and the X server.
Future Directions
The first obvious change needed is to move the configuration events from the DRI3 extension to the as-yet-unspecified new 'Swap' extension (which I may rename as 'Present', as in 'please present this pixmap in this window'). That's because they aren't related to direct rendering, but rather to tracking window sizes for off-screen rendering, either direct, indirect or even with the CPU to memory.
DRI3 and Fences
Right now, I'm not synchronizing the direct rendering with the CopyArea call; that means the X server will end up with essentially random contents as the application may be mid-way through the next frame before it processes the CopyArea. A simple XSync call would suffice to fix that, but I want a more efficient way of doing this.
With the current Linux DRI kernel APIs, it is sufficient to serialize calls that post rendering requests to the kernel to ensure that the rendering requests are themselves serialized. So, all I need to do is have the application wait until the X server has sent the CopyArea request down to the kernel.
I could do that by having the X server send me an X event, but I think there's a better way that will extend to systems that don't offer the kernel serialization guarantee. James Jones and Aaron Plattner put together a proposal to add Fences to the X Sync extension. In the X world, those offer a method to serialize rendering between two X applications, but of course the real goal is to expose those fences to GL applications through the various GL sync extensions (including GL_ARB_sync and GL_NV_fence).
With the current Linux DRI implementation, I think it would be pretty easy to implement these fences using pthread semaphores in a block of memory shared between the server and application. That would be DRI-specific; other direct rendering interfaces would use alternate means to share the fences between X server and application.
Swap/Present — The Second Extension
By simply using CopyArea for my application presentation step, I think I've neatly split this problem into manageable pieces. Once I've got the DRI3 piece working, I'll move on to fixing the presentation issue.
By making that depend solely on existing core Pixmap objects as the source of data to present, I can develop that without any reference to DRI. This will make the extension useful to existing X applications that currently have only CopyArea for this operation.
Presentation of application contents occurs in two phases; the first is to identify which objects are involved in the presentation. The second is to perform the presentation operation, either using CopyArea, or by swapping pages or the entire frame buffer. For offscreen objects, these can occur at the same time. For onscreen, the presentation will likely be synchronized with the scanout engine.
The second form will mean that the Fences that mark when the presentation has occurred will need to signaled only once the operation completes.
A CopyArea operation means that the source pixmap is "ready" immediately after the Copy has completed. Doing the presentation by using the source pixmap as the new front buffer means that the source pixmap doesn't become "ready" until after the next swap completes.
What I don't know now is whether we'll need to report up-front whether the presentation will involve a copy or a swap. At this point, I don't think so — the application will need two back buffers in all cases to avoid blocking between the presentation request and the presentation execution. Yes, it could use a fence for this, but that still sticks a bubble in the 3D hardware where it's blocked waiting for vblank instead of starting on the next frame immediately.
Plan of Attack
Right now, I'm working on finishing up the DRI3 piece:
Replace the DRI2 buffer allocation kludge with actual local buffer allocation, mapping them into pixmaps using FD passing.
Replace the DRI2 authentication scheme with having the X server open the DRI object, preparing it for rendering and passing it back to the application.
Working on the XCB pieces to get the split event-queue stuff landed upstream.
Implementing the Fencing stuff to correctly serialize access to the pixmap.
The first three seem fairly straight forward. The fencing stuff will involve working with James and Aaron to integrate their XSync changes into the server.
After that, I'll start working on the presentation piece. Foremost there is figuring out the right name for this new extension; I started with the name 'Swap' as that's the GL call it implements. However, 'Swap' is quite misleading as to the actual functionality; a name more like 'Present' might provide a better indication of what it actually does. Of course, 'Present' is both a verb and a noun, with very different connotations. Suggestions on this most complicated part of the project are welcome!