Present Extension Redirection

Multi-buffered applications have always behaved poorly in the presence of the Composite extension:

  • Updating involves multiple copies of the window contents, first from back buffer to composite redirect buffer, thence from the composite buffer to the screen back buffer, and then from the screen back buffer to the scanout buffer.

    The last copy is amenable to page flipping, but that would require the EGL buffer age extension so that the compositor could reduce the cost of presenting following frames by being able to track the contents of the back buffers.

  • The application is not informed about when the actual screen presentation occurs.

Owen Taylor suggested that Present should offer a way to 'redirect' operations to the compositing manager as a way to solve these problems. This posting is my attempt to make that idea a bit more concrete given the current Present design.

Design Goals

Here's a list of features I think we should try to provide:

  1. Provide accurate information to applications about when presentation to the screen actually occurs. In particular, GLX applications using GLX_OML_sync_control should receive the correct information in terms of UST and MSC for each Swap Buffers request.

  2. Ensure that applications still receive correct information as to the contents of their buffers, in particular we want to be able to implement EGL_EXT_buffer_age in a useful manner.

  3. Avoid needing to "un-redirect" full-screen windows to get page flipping behavior.

  4. Eliminate all extra copies. A windowed application may still perform one copy from back buffer to scanout buffer, but there shouldn't be any reason to copy contents to the composite redirection buffer or the compositing manager back buffer.

Simple Present Redirection

With those goals in mind, here's what I see as the sequence of events for a simple windowed application doing a new full-window update without any translucency or window transformation in effect:

  1. Application creates back buffer, draws new frame to it.

  2. Application executes PresentRegion. In this example, the 'valid' and 'update' parameters are 'None', indicating that the full window should be redrawn.

  3. The server captures the PresentRegion request and constructs a PresentRedirectNotify event containing sufficient information for the compositor to place that image correctly on the screen:

    • target window for the presentation
    • source pixmap containing the new image
    • idle fence to notify when the source pixmap is no longer in use.
    • serial number from the request.
    • target MSC for the presentation. This should probably just be the computed absolute MSC value, and not the original target/divisor/numerator values provided by the application.
  4. The compositing manager receives this event and constructs a new PresentRegion request using the provided source pixmap, but this time targeting the root window, and constructing a 'valid' region which clips the pixmap to the shape of the window on the screen.

    This request would use the original application's idle fence value so that when complete, the application would get notified.

    This request would need to also include the original target window and serial number so that a suitable PresentCompleteNotify event can be constructed and delivered when the final presentation is complete.

  5. The server executes this new PresentRegion operation. When complete, it delivers PresentCompleteNotify events to both the compositing manager and the application.

  6. Once the source pixmap is no longer in use (either the copy has completed, or the screen has flipped away from this pixmap), the server triggers the idle fence.

Multiple Application Redirection

If multiple applications perform PresentRegion operations within the same frame, then the compositing manager will receive multiple PresentRedirectNotify events, and can simply construct multiple new PresentRegion requests. If these are all queued to the same global MSC, they will execute at the same frame boundary. No inter-operation dependency exists here.

Complex Presentations

Ok, so the simple case looks like it's pretty darn simple to implement, and satisfies the design goals nicely. Let's look at a couple of more complicated cases in common usage; the first is with translucency, the second with scaling application images down to thumbnails and the third with partial application window updates.

Redirection with Translucency

If the compositing manager discovers that a portion of the updated region overlays or is overlaid by something translucent (either another window, or drop shadows, or something else), then a composite image for that area must be constructed before the whole can be presented. Starting when the compositing manager receives the event, we have:

  1. The compositing manager receives this event. Using the new pixmap, along with pixmaps for the other involved windows and graphical elements, the compositing manager constructs an updated view for the affected portion of the screen in a back buffer pixmap. Once complete, a PresentRegion operation that uses this back buffer pixmap is sent to the X server.

    Again, the original target window and serial number are also delivered to the server so that a suitable PresentCompleteNotify event can be delivered to the application.

  2. The server executes this new PresentRegion operation; PresentCompleteNotify events are delivered, and idle fences triggered as appropriate.

Redirection with Transformation

Transformation of the window contents means that we cannot always update a portion of the back buffer directly from the provided application pixmap as that will not contain the window border. Contents generated from a region that includes both application pixels and window border pixels must be sourced from a single pixmap containing both sets of pixels.

One option that I've discussed in the past to solve this would be to have the original application allocate the pixmap large enough to hold both the application contents and the surrounding window border. Have it draw the application contents at the correct offset within this pixmap, and then have the window manager contents drawn around that; either automatically by the X server, or even manually by the compositing manager.

That would be mighty convenient for the compositing manager, but would require significant additional infrastructure throughout the X server and—even harder—the drawing system (OpenGL or some other system). There's another reason to want this though, and that's for sub-frame buffer scanout page swapping.

The second option would be for the compositing manager to combine these images itself; there's a nice pixmap already containing the window manager image—the composite redirect buffer. Taking the provided source pixmap and copying it directly to the target window will construct our composite image, just as if we had no Present redirection in place.

This will cost an additional copy though, which we've promised to avoid. Of course, as it's just for thumb-nailing or other visual effects, perhaps the compositing manager could perform this operation at a reduced frame rate, so that overall system performance didn't suffer.

Retaining Access to the Application Buffer

Above, I discussed having the idle fence from the redirected PresentRegion operation be sent along with the replacement PresentRegion operation. This ignores the fact that the composting manager may well need the contents of that application frame again in the future, when displaying changes for other applications that involve the same region of the screen.

With the goal of making sure the idle fences are triggered as soon as possible so that applications can re-use or free the buffers quickly, let's think about when the triggering can occur.

  1. Full-screen flipped applications. In this case, the application's idle fence can be triggered once the application provides a new frame and the X server has flipped to that new frame, or some other scanout buffer.

  2. Windowed, copied applications. In this case, the application's idle fence can be triggered once the application provides a new frame to the compositing manager, and the X server doesn't have any presentations queued.

In both cases, we require that both the X server and the compositing manager be 'finished' with the buffer before the application's idle fence should be triggered.

One easy way to get this behavior is for the composting manager to create a new idle fence for its operations. When that is triggered, it would receive an X event and then trigger the applications idle fence as appropriate. This would add considerable latency to the application's idle fence—a round trip through the compositing manager.

The alternative would be to construct some additional protocol to make the applications idle fence 'dependent' on the Present operation and some additional state provided by the compositing manager.

Some experimentation here is warranted, but my experience with latency in this area is that it causes applications to end up allocating another back buffer as the idle notification arrives just after a buffer allocation request comes down to the rendering library. Definitely sub-optimal.

An Aside on Media Stream Counters

The GLX_OML_sync_control extension defines the Media Stream Counter (MSC) as a counter unique to the graphics subsystem which is incremented each time a vertical retrace occurs. That would be trivial if we had only one vertical retrace sequence in the world. However, we actually have N+1 such counters, one for each of the N active monitors in the system and a separate fake counter to be used when none of the other counters is available.

In the current Present implementation, windows transition among the various Media Stream Counter domains as they move between the various monitors, and those monitors get turned on and off. As they move between these counter domains, Present tracks a global offset from their original domain. This offset ensures that the MSC value remains monotonically increasing as seen by each window. What it does not ensure is that all windows have comparable MSC sequence values; two windows on the same monitor may well have different MSC values for the same physical retrace event.

And, even moving a window from one MSC domain to another and back won't make it return to the original MSC sequence values due to differences in refresh rates between the monitors.

Internally, Present asserts that each CRTC in the system identifies a unique MSC domain, and it has a driver API which identifies which CRTC a particular window should be associated with. Once a particular CRTC has been identified for a window, client-relative MSC values and CRTC-relative MSC values can be exchanged using an offset between that CRTC MSC domain and the window MSC domain.

The Intel driver assigns CRTCs to windows by picking the CRTC showing the greatest number of pixels for a particular window. When two CRTCs show the same number of pixels, the Intel driver picks the first in the list.

Vblank Synchronization and Multiple Monitors

Ok, so each window lives in a particular MSC domain, clocked by the MSC of the CRTC the driver has associated it with. In an un-composited world, this makes picking when to update the screen pretty simple; Present updates the screen when vblank happens in the CRTC associated with the window.

In the composite redirected case, it's a bit harder -- all of the PresentRegion operations are going to target the root window, and yet we want updates for each window to be synchronized with the monitor containing that window. Of course, the root window belongs to a single MSC domain (likely the largest monitor, using the selection algorithm described above from the Intel driver). So, any PresentRegion requests will be timed relative to that single monitor.

I think what is required here is for the PresentRegion extension to take an optional CRTC argument, which would then be used as the MSC domain instead of the window MSC domain. All of the timing arguments would be interpreted relative to that CRTC MSC domain.

The PresentRedirectNotify event would then contain the relevant CRTC and the MSC value would be relative to that CRTC.

A clever Compositing manager could then decompose a global PresentRegion operation into per-CRTC PresentRegion operations and ensure that multiple monitors were all synchronized correctly.

We could take this even further and have the PresentRegion capable of passing a smaller CRTC-sized pixmap down to the kernel, effectively providing per-CRTC pixmaps with no visible explicit protocol...

Other Composite Users

Ok, so the above discussion is clearly focused on getting the correct contents onto the screen with minimal copies along the way. However, what I've ignored is how to deal with other applications, also using Composite at the same time. They're all going to expect that the composite redirect buffers will contain correct window contents at all times, and yet we've just spent a bunch of time making that not be the case to avoid copying data into those buffers and instead copying directly to the compositing manager back or front buffers.

Obviously the X server is aware of when this happens; the compositing manager will have selected for manual redirection on all top-level windows, while our other application will have only been able to select for automatic redirection.

So, we've got two pretty clear choices here:

  1. Have the X server change how Present redirection works when some other application selects for Automatic redirection on a window. It would copy the source pixmap into the window buffer and then send (a modified?) PresentRedirectNotify event to the compositing manager.

  2. Include a flag in the PresentRedirectNotify event that the composite redirect buffer needs to eventually get the contents of the source pixmap, and then expect the compositing manager to figure out what to do.

Development Plans

As usual, I'm going to pick the path of least resistance for all of the above options and see how things look; where the easy thing works, we can keep using it. Where the easy thing fails, I'll try something else. The changes required for this are pretty minimal.

The PresentRegion request needs to gain a list of window/serial pairs that are also to be notified when the operation completes:

PRESENTNOTIFY {
    window: WINDOW
    serial: CARD32
    }

┌───
    PresentRegion
    window: WINDOW
    pixmap: PIXMAP
    serial: CARD32
    valid-area: REGION or None
    update-area: REGION or None
    x-off, y-off: INT16
    idle-fence: FENCE
    target-crtc: CRTC or None
    target-msc: CARD64
    divisor: CARD64
    remainder: CARD64
    notifies: LISTofPRESENTNOTIFY
└───
    Errors: Window, Pixmap, Match

The 'target-crtc' parameter explicitly identifies a CRTC MSC domain. If None, then this request implicitly uses the window MSC domain.

'notifies' provides a list of windows that will also receive PresentCompleteNotify events with the associated serial number when this PresentRegion operation completes.

┌───
    PresentRedirectNotify
    type: CARD8         XGE event type (35)
    extension: CARD8        Present extension request number
    length: CARD16          2
    evtype: CARD16          Present_RedirectNotify
    eventID: PRESENTEVENTID
    event-window: WINDOW
    window: WINDOW
    pixmap: PIXMAP
    serial: CARD32
    valid-area: REGION
    valid-rect: RECTANGLE
    update-area: REGION
    update-rect: RECTANGLE
    x-off, y-off: INT16
    target-crtc: CRTC
    target-msc: CARD64
    idle-fence: FENCE
    update-window: BOOL
└───

The 'target-crtc' identifies which CRTC MSC domain the 'target-msc' value relates to.

'divisor' and 'remainder' have been removed as the target-msc value has been adjusted using the application values.

If 'update-window' is True, then the recipient of this event is instructed to provide reasonably up-to-date contents directly to the window by copying the contents of 'pixmap' to the window manually.

Beyond these two protocol changes, the compositing manager is expected to receive Sync events when the idle-fence is triggered and then manually perform a Sync operation to trigger the client's idle-fence when appropriate.

I'm planning to work on these changes, and then go re-work xcompmgr (or perhaps unagi, which certainly looks less messy) to incorporate support for Present redirection. The goal is to have something to demonstrate at Guadec, which doesn't seem impossible, aside from leaving on vacation in four days...