RSS Add a new post titled:

AltOS 1.2.1 — TeleBT support, bug fixes and new AltosUI features

Bdale and I are pleased to announce the release of AltOS version 1.2.1.

AltOS is the core of the software for all of the Altus Metrum products. It consists of cc1111-based micro-controller firmware and Java-based ground station software.

The biggest new feature for AltOS is the addition of support for TeleBT, our ground station designed to operate with Android phones and tablets. In addition, there’s a change in the TeleDongle radio configuration that should improve range, some other minor bug fixes and new features in AltosUI

AltOS Firmware — Features and fixes

There are bug fixes in both ground station and flight software, so you should plan on re-flashing both units at some point. However, there aren’t any incompatible changes, so you don’t have to do it all at once.

New features:

  • TeleBT support.

  • Improved radio sensitivity. The TeleDongle receiver parameters have been tweaked to provide better reception.

  • TeleMini now completely resets all radio parameters in recovery mode (with the two outer debug pins connected) — 434.550MHz, N0CALL, factory radio cal.

Bug fixes:

  • USB device fixes. This improves operation with Windows, avoiding hangs and errors in many cases.

  • Correct the Kalman filter error covariance matrix; the old parameters were built assuming continuous measurements.

AltosUI — Easier to use

AltosUI has also seen quite a bit of work for the 1.2.1 release. It’s got several fun new features and a few bug fixes.

New Graph UI features:

  • Show tool-tips with the value near the cursor.

  • Make the set of displayed values configurable. Add all of the available data values just in case you want to see them.

  • Added a Map tab showing the ground track of the whole flight.

  • The flight summary tab now includes the final GPS position. This lets you figure out where your rocket landed without replaying the whole flight.

Other new AltosUI features:

  • TeleBT support, including Bluetooth connections (Linux-only, at present).

  • Shows the callsign in the Monitor Idle and other command-mode windows so that you can tell what callsign is being used.

  • Show the block number when downloading flight data. This lets you see something happen even for longer flights.

  • Make the initial position of the AltosUI configurable so that you can position it out of the way of the rest of you desktop.

  • Distribute Mac OS X in .dmg format (Mac OS Disk Image); this means you don’t need to explicitly unpack the bits.

Bug fixes:

  • Deal with broken networking while downloading map tiles. Tiles are now always downloaded asynchronously so that the UI doesn’t freeze when the network is slow.

Posted Tue May 21 17:47:09 2013 Tags:

Shared Memory Fences

In our last adventure, dri3k first steps, one of the ‘future work’ items was to deal with synchronization between the direct rendering application and the X server. DRI2 “handles” this by performing a round trip each time the application starts using a buffer that was being used by the X server.

As DRI3 manages buffer allocation within the application, there’s really no reason to talk to the server, so this implicit serialization point just isn’t available to us. As I mentioned last time, James Jones and Aaron Plattner added an explicit GPU serialization system to the Sync extension. These SyncFences serializing rendering between two X clients, but within the server there are hooks provided for the driver to use hardware-specific serialization primitives.

The existing Linux DRM interfaces queue rendering to the GPU in the order requests are made to the kernel, so we don’t need the ability to serialize within the GPU, we just need to serialize requests to the kernel. Simple CPU-based serialization gating access to the GPU will suffice here, at least for the current set of drivers. GPU access which is not mediated by the kernel will presumably require serialization that involves the GPU itself. We’ll leave that for a future adventure though; the goal today is to build something that works with the current Linux DRM interfaces.

SyncFence Semantics

The semantics required by SyncFences is for multiple clients to block on a fence which a single client then triggers. All of the blocked clients start executing requests immediately after the trigger fires.

There are four basic operations on SyncFences:

  • Trigger. Mark the fence as ready and wake up all waiting clients

  • Await. Block until the fence is ready.

  • Query. Retrieve the current state of the fence.

  • Reset. Unset the fence; future Await requests will block.

SyncFences are the same as Events as provided by Python and other systems. Of course all of the names have been changed to keep things interesting. I’ll call them Fences here, to be consistent with the current X usage.

Using Pthread Primitives

One fact about pthreads that I recently learned is that the synchronization primitives (mutexes, barriers and semaphores) are actually supposed to work across process boundaries, if those objects are in shared memory mapped by each process. That seemed like a great simplification for this project; allocate a page of shared memory, map into the X server and direct rendering application and use the existing pthreads APIs.

Alas, the pthread objects are architecture specific. I’m pretty sure that when that spec was written, no-one ever thought of running multiple architectures within the same memory space. I went and looked at the code to check, and found that each of these objects has a different size and structure on x86 and x86_64 architectures. That makes it pretty hard to use this API within X as we often have both 32- and 64- bit applications talking to the same (presumably 64-bit) X server.

As a last resort, I read through a bunch of articles on using futexes directly within applications and decided that it was probably possible to implement what I needed in an architecture-independent fashion.

Futexes

Linux Futexes live in this strange limbo of being a not-quite-public kernel interface. Glibc uses them internally to implement locking primitives, but it doesn’t export any direct interface to the system call. Certainly they’re easy to use incorrectly, but it’s unusual in the Linux space to have our fundamental tools locked away ‘for our own safety’.

Fortunately, we can still get at futexes by creating our own syscall wrappers.

static inline long sys_futex(void *addr1, int op, int val1,
                 struct timespec *timeout, void *addr2, int val3)
{
    return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}

For this little exercise, I created two simple wrappers, one to block on a futex:

static inline int futex_wait(int32_t *addr, int32_t value) {
    return sys_futex(addr, FUTEX_WAIT, value, NULL, NULL, 0);
}

and one to wake up all futex waiters:

static inline int futex_wake(int32_t *addr) {
    return sys_futex(addr, FUTEX_WAKE, MAXINT, NULL, NULL, 0);
}

Atomic Memory Operations

I need atomic memory operations to keep separate cores from seeing different values of the fence value, GCC defines a few such primitives and I picked _syncboolcompareandswap and _syncvalcompareandswap. I also need fetch and store operations that the compiler won’t shuffle around:

#define barrier() __asm__ __volatile__("": : :"memory")

static inline void atomic_store(int32_t *f, int32_t v)
{
    barrier();
    *f = v;
    barrier();
}

static inline int32_t atomic_fetch(int32_t *a)
{
    int32_t v;
    barrier();
    v = *a;
    barrier();
    return v;
}

If your machine doesn’t make these two operations atomic, then you would redefine these as needed.

Futex-based Fences

These wake-all semantics of Fences greatly simplify reasoning about the operation as there’s no need to ensure that only a single thread runs past Await, the only requirement is that no threads pass the Await operation until the fence is triggered.

A Fence is defined by a single 32-bit integer which can take one of three values:

  • 0 - The fence is not triggered, and there are no waiters.
  • 1 - The fence is triggered (there can be no waiters at this point).
  • -1 - The fence is not triggered, and there are waiters (one or more).

With those, I built the fence operations as follows. Here’s Await:

int fence_await(int32_t *f)
{
    while (__sync_val_compare_and_swap(f, 0, -1) != 1) {
        if (futex_wait(f, -1)) {
            if (errno != EWOULDBLOCK)
                return -1;
        }
    }
    return 0;
}

The basic requirement that the thread not run until the fence is triggered is met by fetching the current value of the fence and comparing it with 1. Until it is signaled, that comparison will return false.

The compareandswap operation makes sure the fence is -1 before the thread calls futex_wait, either it was already -1 in the case where there were other waiters, or it was 0 before and is now -1 in the case where there were no waiters before. This needs to be an atomic operation so that the fence value will be seen as -1 by the trigger operation if there are any threads in the syscall.

The futex_wait call will return once the value is no longer -1, it also ensures that the thread won’t block if the trigger occurs between the swap and the syscall.

Here’s the Trigger function:

int fence_trigger(int32_t *f)
{
    if (__sync_val_compare_and_swap(f, 0, 1) == -1) {
        atomic_store(f, 1);
        if (futex_wake(f) < 0)
            return -1;
    }
    return 0;
}

The atomic compareandswap operation will make sure that no Await thread swaps the 0 for a -1 while the trigger is changing the value from 0 to 1; either the Await switches from 0 to -1 or the Trigger switches from 0 to 1.

If the value before the compareandswap was -1, then there may be threads waiting on the Fence. An atomic store, constructed with two memory barriers and a regular store operation, to mark the Fence triggered is followed by the futex_wake call to unblock all Awaiting threads.

The Query function is just an atomic fetch:

int fence_query(int32_t *f)
{
    return atomic_fetch(f) == 1;
}

Reset requires a compareandswap so that it doesn’t disturb things if the fence has already been reset and there are threads waiting on it:

void fence_reset(int32_t *f)
{
    __sync_bool_compare_and_swap(f, 1, 0);
}

A Request for Review

Ok, so we’ve all tried to create synchronization primitives only to find that our ‘obvious’ implementations were full of holes. I’d love to hear from you if you’ve identified any problems in the above code, or if you can figure out how to use the existing glibc primitives for this operation.

Posted Fri Apr 26 16:28:11 2013 Tags:

DRI3K — First Steps

Here’s an update on DRI3000. I’ll start by describing what I’ve managed to get working and then summarize discussions that happened on the xorg-devel mailing list.

Private Back Buffers

One of the big goals for DRI3000 is to finish the job of moving buffer management out of the X server and into applications. The only thing still allocated by DRI2 in the X server are back buffers; everything else moved to the client side. Yes, I know, this breaks the GLX requirement for sharing buffers between applications, but we just don’t care anymore.

As a quick hack, I figured out how to do this with DRI2 today — allocate our back buffers separately by creating X pixmaps for them, and then using the existing DRI2GetBuffersWithFormat request to get a GEM handle for them.

Of course, now that all I’ve got is a pixmap, I can’t use the existing DRI2 swap buffer support, so for now I’m just using CopyArea to get stuff on the screen. But, that works fine, as long as you don’t care about synchronization.

Handling Window Resize

The biggest pain in DRI2 has been dealing with window resize. When the window resizes in the X server, a new back buffer is allocated and the old one discarded. An event is delivered to ‘invalidate’ the old back buffer, but anything done between the time the back buffer is discarded and when the application responds to the event is lost.

You can easily see this with any GL application today — resize the window and you’ll see occasional black frames.

By allocating the back buffer in the application, the application handles the resize within GL; at some point in the rendering process the resize is discovered, and GL creates a new buffer, copies the existing data over, and continues rendering. So, the rendered data are never lost, and every frame gets displayed on the screen (although, perhaps at the wrong size).

The puzzle here was how to tell that the window was resized. Ideally, we’d have the application tell us when it received the X configure notify event and was drawing the frame at the new size. We thought of a cute hack that might do this; track GL calls to change the viewport and make sure the back buffer could hold the viewport contents. In theory, the application would receive the X configure notify event, change the viewport and render at the new size.

Tracking the viewport settings for an entire frame and constructing their bounding box should describe the size of the window; at least it should describe the intended size of the window.

There’s at least one serious problem with this plan — applications may well call glClear before calling glViewport, and as glClear does not use the current viewport, instead clearing the “whole” window, we couldn’t use the viewport as an indication of the current window size.

However, what this exercise did lead us to realize was that we don’t care what size the window actually is, we only care what size the application thinks it is. More accurately, the GL library just needs to be aware of any window configuration changes before the application, so that it will construct a buffer that is not older than the application knowledge of the window size.

I came up with two possible mechanisms here; the first was to construct a shared memory block between application and X server where the X server would store window configuration changes and signal the application by incrementing a sequence number in the shared page; the GL library would simply look at the sequence number and reallocate buffers when it changed.

The problem with the shared memory plan was that it wouldn’t work across the network, and we have a future project in mind to replace GLX indirect rendering with local direct rendering and PutImage which still needs accurate window size tracking. More about that project in a future post though…

X Events to the Rescue

So, I decided to just have the X server send me events when the window size changed. I could simply use the existing X configure notify events, but that would require a huge infrastructure change in the application so that my GL library could get those events and have the application also see them. Not knowing what the application is up to, we’d have to track every ChangeWindowAttributes call and make sure the event_mask included the right bits. Ick.

Fortunately, there’s another reason to use a new event — we need more information than is provided in the ConfigureNotify event; as you know, the Swap extension wants to have applications draw their content within a larger buffer that can have the window decorations placed around it to avoid a copy from back buffer to window buffer. So, our new ConfigureNotify event would also contain that information.

Making sure that ConfigureNotify event is delivered before the core ConfigureNotify event ensures that the GL library should always be able to know about window size changes before the application.

Splitting the XCB Event Stream

Ok, so I’ve got these new events coming from the X server. I don’t want the application to have to receive them and hand them down to the GL library; that would mean changing every application on the planet, something which doesn’t seem very likely at all.

Xlib does this kind of thing by allowing applications to stick themselves into the middle of the event processing code with a callback to filter out the events they’re interested in before they hit the main event queue. That’s how DRI2 captures Invalidate events, and it “works”, but using callbacks from the middle of the X event processing code creates all kinds of locking nightmares.

As discussed above, I don’t care when GL sees the configure events, as long as it gets them before the application finds about about the window size change. So, we don’t need to synchronously handle these events, we just need to be able to know they’ve arrived and then handle them on the next call to a GL drawing function.

What I’ve created as a prototype is the ability to identify specific events and place them in a separate event queue, and when events are placed in that event queue, to bump a ‘sequence number’ so that the application can quickly identify that there’s something to process.

Making the Event Mask Per-API Instead of Per-Client

The problem described above about using the core ConfigureNotify events made me think about how to manage multiple APIs all wanting to track window configuration. For core events, the selection of which events to receive is all based on the client; each client has a single event mask, and each client receives one copy of each event.

Monolithic applications work fine with this model; there’s one place in the application selecting for events and one place processing them. However, modern applications end up using different APIs for 3D, 2D and media. Getting those libraries to cooperate and use a common API for event management seems pretty intractable. Making the X server treat each API as a separate entity seemed a whole lot easier; if two APIs want events, just have them register separately and deliver two events flagged for the separate APIs.

So, the new DRI3 configure notify events are created with their own XID to identify the client-side owner of the event. Within the X server, this required a tiny change; we already needed to allocate an XID for each event selection so that it could be automatically cleaned up when the client exited, so the only change was to use the one provided by the client instead of allocating one in the server.

On the wire, the event includes this new XID so that the library can use it to sort out which event queue to stick the event in using the new XCB event stream splitting code.

Current Status

The above section describes the work that I’ve got running; with it, I can run GL applications and have them correctly track window size changes without losing a frame. It’s all available on the ‘dri3’ branches of my various repositories for xcb proto, libxcb, dri3proto and the X server.

Future Directions

The first obvious change needed is to move the configuration events from the DRI3 extension to the as-yet-unspecified new ‘Swap’ extension (which I may rename as ‘Present’, as in ‘please present this pixmap in this window’). That’s because they aren’t related to direct rendering, but rather to tracking window sizes for off-screen rendering, either direct, indirect or even with the CPU to memory.

DRI3 and Fences

Right now, I’m not synchronizing the direct rendering with the CopyArea call; that means the X server will end up with essentially random contents as the application may be mid-way through the next frame before it processes the CopyArea. A simple XSync call would suffice to fix that, but I want a more efficient way of doing this.

With the current Linux DRI kernel APIs, it is sufficient to serialize calls that post rendering requests to the kernel to ensure that the rendering requests are themselves serialized. So, all I need to do is have the application wait until the X server has sent the CopyArea request down to the kernel.

I could do that by having the X server send me an X event, but I think there’s a better way that will extend to systems that don’t offer the kernel serialization guarantee. James Jones and Aaron Plattner put together a proposal to add Fences to the X Sync extension. In the X world, those offer a method to serialize rendering between two X applications, but of course the real goal is to expose those fences to GL applications through the various GL sync extensions (including GLARBsync and GLNVfence).

With the current Linux DRI implementation, I think it would be pretty easy to implement these fences using pthread semaphores in a block of memory shared between the server and application. That would be DRI-specific; other direct rendering interfaces would use alternate means to share the fences between X server and application.

Swap/Present — The Second Extension

By simply using CopyArea for my application presentation step, I think I’ve neatly split this problem into manageable pieces. Once I’ve got the DRI3 piece working, I’ll move on to fixing the presentation issue.

By making that depend solely on existing core Pixmap objects as the source of data to present, I can develop that without any reference to DRI. This will make the extension useful to existing X applications that currently have only CopyArea for this operation.

Presentation of application contents occurs in two phases; the first is to identify which objects are involved in the presentation. The second is to perform the presentation operation, either using CopyArea, or by swapping pages or the entire frame buffer. For offscreen objects, these can occur at the same time. For onscreen, the presentation will likely be synchronized with the scanout engine.

The second form will mean that the Fences that mark when the presentation has occurred will need to signaled only once the operation completes.

A CopyArea operation means that the source pixmap is “ready” immediately after the Copy has completed. Doing the presentation by using the source pixmap as the new front buffer means that the source pixmap doesn’t become “ready” until after the next swap completes.

What I don’t know now is whether we’ll need to report up-front whether the presentation will involve a copy or a swap. At this point, I don’t think so — the application will need two back buffers in all cases to avoid blocking between the presentation request and the presentation execution. Yes, it could use a fence for this, but that still sticks a bubble in the 3D hardware where it’s blocked waiting for vblank instead of starting on the next frame immediately.

Plan of Attack

Right now, I’m working on finishing up the DRI3 piece:

  • Replace the DRI2 buffer allocation kludge with actual local buffer allocation, mapping them into pixmaps using FD passing.

  • Replace the DRI2 authentication scheme with having the X server open the DRI object, preparing it for rendering and passing it back to the application.

  • Working on the XCB pieces to get the split event-queue stuff landed upstream.

  • Implementing the Fencing stuff to correctly serialize access to the pixmap.

The first three seem fairly straight forward. The fencing stuff will involve working with James and Aaron to integrate their XSync changes into the server.

After that, I’ll start working on the presentation piece. Foremost there is figuring out the right name for this new extension; I started with the name ‘Swap’ as that’s the GL call it implements. However, ‘Swap’ is quite misleading as to the actual functionality; a name more like ‘Present’ might provide a better indication of what it actually does. Of course, ‘Present’ is both a verb and a noun, with very different connotations. Suggestions on this most complicated part of the project are welcome!

Posted Fri Apr 12 15:40:05 2013 Tags:

Composite and Swap — Getting it Right

Where the author tries to make sure DRI3000 is going to do what we want now and in the future

DRI3000

The basic DRI3000 plan seems pretty straightforward:

  1. Have applications allocate buffers full of new window contents, attach pixmap IDs to those buffers and pass them to the X server to get them onto the screen.

  2. Provide a mechanism to let applications know when those pixmaps are idle so that they can reuse them instead of creating new ones for every frame.

  3. Finally, allow the actual presentation of the contents to be scheduled for a suitable time in the future, generally synchronized with the monitor. Let the client know when this has happened in case they want to synchronize themselves to vblank.

The DRI3 extension provides a way to associate pixmap IDs and buffers, and given the MIT-SHM prototype I’ve already implemented, I think we can safely mark this part as demonstrably implementable.

That leaves us with a smaller problem, that of taking pixmap contents and presenting them on the screen at a suitable time and telling applications about the progress of that activity.

In the absence of compositing, I’m pretty sure the initial Swap extension design would do this job just fine, and should resolve some of the known DRI2 limitations related to buffer management. And, I think that goal is sufficient motivation to go and implement that. However, I wanted to write up some further ideas to see if the DRI3000 plan can be made to do precisely what we want in a composited world.

The Composited Goal

To make sure we’re all on the same page, here’s what I expect from the Swap extension in a composited world:

  1. Application calls Swap with new window pixmap

  2. Compositor hears about the new pixmap and uses that to construct a new screen pixmap

  3. Compositor calls Swap with new screen pixmap

  4. Vertical retrace happens, executing the pending swap operation

  5. Compositor hears about the swap completion for the screen

  6. Application hears about the swap completion for its window

In particular, applications should not hear that their swap operations are complete until the contents appear on the screen. This allows for applications to throttle themselves to the screen rate, either doing double or triple buffering as they choose.

I didn’t add steps here indicating buffers going idle or being allocated, because I think that should all happen ‘behind the scenes’ from the application’s perspective. Many applications won’t care about the swap completion notification either, but some will and so that needs to be visible.

Redirected Swaps?

Owen Taylor suggested that one way of getting the compositor involved would be to have it somehow ‘redirect’ Swap operations, much like we do with window management operations today. I think that idea may be a good direction to try:

  1. Application calls Swap with new window pixmap

  2. Swap is redirected to compositor, passing along the new window pixmap

  3. Compositor constructs a new screen pixmap using the new window pixmap

  4. Compositor calls Swap on the screen and the window, passing the new screen pixmap and the new window pixmap. When the screen update occurs, the screen and the window both receive swap completion events.

This has the added benefit that the X server knows when the compositor is expecting window pixmaps to change like this — the compositor has to explicitly request Swap redirection.

Window Pixmap Names and GEM Buffer Handles

One issue that swapping window pixmaps around like this brings up is how to manage existing names for the window pixmap. Right now, applications expect that window pixmaps will only change when the window is resized. If the Swap extension is going to actually replace the window pixmap when running with a suitable compositor, then we need to figure out what the old names will reference.

Are there non-compositor applications using NameWindowPixmap that matter to us? How about non-compositor applications using TextureFromPixmap to get a GEM handle for a window pixmap? For now, I’m very tempted to just break stuff and see who complains, but knowing what we’re breaking might be nice beforehand.

Idling Pixmaps

When an application is done drawing to a window pixmap and has passed it off to the X server for presentation, we’d like for that pixmap to be automatically marked as discardable as soon as possible. This way, when memory is tight, the kernel can come steal those pages for something critical. Of course, applications may not want to let the server mark the pixmap as idle after being used, so a flag to the Swap call would be needed.

Ideally, the pixmap would become idle immediately after the pixmap contents have been extracted. In the absence of a compositor, that would probably be when the Swap operation completes. With a compositor running, we’d need explicit instruction from the compositor telling us that the window pixmap was now ‘idle’:

┌───
    SwapIdle
    drawable: Drawable
    pixmap: Pixmap
      ▶
└───

Furthermore, the application needs to know that the pixmap is in fact idle. I think that we’ll need a synchronous X request that marks a buffer as ‘no longer idle’ and have that return whether the buffer was discarded while idle. It doesn’t seem sufficient to use events here as the application will need to completely reconstruct the pixmap contents in this case. This reply could also contain information about precisely what contents the pixmap does contain.

┌───
    SwapReuse
    drawable: Drawable
    pixmap: Pixmap
      ▶
    valid: BOOL
    swap-hi: CARD32
    swap-lo: CARD32
└───

Pixmap Lifetimes and Triple Buffered Applications

If we redirect the Swap operation and send the original application window pixmap ID to the compositor, what happens when the application frees that pixmap before the compositor gets around to using the contents?

Surely the Compositor must handle such cases, and not just crash. However, I’m fine with requiring that the application not free the pixmap until told by the compositor.

Posted Wed Mar 6 14:28:16 2013 Tags:

x-on-resize: a simple display configuration daemon

I like things to be automated as much as possible, and having abandoned Gnome to their own fate and switched to xfce, I missed the automatic display reconfiguration stuff. I decided to write something as simple as possible that did just what I needed. I did this a few months ago, and when Carl Worth asked what I was using, I decided to pack it up and make it available.

Automatic configuration with a shell script

I’ve had a shell script around that I used to bind to a key press which I’d hit when I plugged or unplugged a monitor. So, all I really need to do is get this script run when something happens.

The missing tool here was something to wait for a change to happen and automatically invoke the script I’d already written.

Resize vs Configure

The first version of x-on-resize just listened for ConfigureNotify events on the root window. These get sent every time anything happens with the screen configuration, from hot-plug to notification when someone runs xrandr. That was as simple as possible; the application was a few lines of code to select for ConfigureNotify events, and invoke a program provided on the command line.

However, it was a bit too simple as it would also respond to manual invocations of xrandr and call the script then as well. So, as long as I was content to accept whatever the script did, things were fine. And, with a laptop that had a DisplayPort connector for my external desktop monitor, and a separate VGA connector for projectors at conferences, the script always did something useful.

Then I got this silly laptop that has only DisplayPort, and for which a dongle is required to get to VGA for projectors. I probably could write something fancy to figure out the difference between a desktop DisplayPort monitor and DisplayPort to VGA dongle, but I decided that solving the simpler problem of only invoking the script on actual hotplug events would be better.

So, I left the current invoke-on-resize behavior intact and added new code that watched the list of available outputs and invoked a new ‘config’ script when that set changed.

The final program, x-on-resize, is available via git at

git://people.freedesktop.org/~keithp/x-on-resize

I even wrote a manual page. Enjoy!

Posted Thu Feb 28 15:42:49 2013 Tags:

DRI3000 — Even Better Direct Rendering

This all started with the presentation that Eric Anholt and I did at the 2012 X developers conference, and subsequently wrote about in my DRI-Next posting. That discussion sketched out the goals of changing the existing DRI2-based direct rendering infrastructure.

Last month, I gave a more detailed presentation at Linux.conf.au 2013 (the best free software conference in the world). That presentation was recorded, so you can watch it online. Or, you can read Nathan Willis’ summary at lwn.net. That presentation contained a lot more details about the specific techniques that will be used to implement the new system, in particular it included some initial indications of what kind of performance benefits the overall system might be able to produce.

I sat down today and wrote down an initial protocol definition for two new extensions (because two extensions are always better than one). Together, these are designed to provide complete support for direct rendering APIs like OpenGL and offer a better alternative to DRI2.

The DRI3 extension

Dave Airlie and Eric Anholt refused to let me call either actual extension DRI3000, so the new direct rendering extension is called DRI3. It uses POSIX file descriptor passing to share kernel objects between the X server and the application. DRI3 is a very small extension in three requests:

  1. Open. Returns a file descriptor for a direct rendering device along with the name of the driver for a particular API (OpenGL, Video, etc).

  2. PixmapFromBuffer. Takes a kernel buffer object (Linux uses DMA-BUF) and creates a pixmap that references it. Any place a Pixmap can be used in the X protocol, you can now talk about a DMA-BUF object. This allows an application to do direct rendering, and then pass a reference to those results directly to the X server.

  3. BufferFromPixmap. This takes an existing pixmap and returns a file descriptor for the underlying kernel buffer object. This is needed for the GL Texture from Pixmap extension.

For OpenGL, the plan is to create all of the buffer objects on the client side, then pass the back buffer to the X server for display on the screen. By creating pixmaps, we avoid needing new object types in the X server and can use existing X apis that take pixmaps for these objects.

The Swap extension

Once you’ve got direct rendered content in a Pixmap, you’ll want to display it on the screen. You could simply use CopyArea from the pixmap to a window, but that isn’t synchronzied to the vertical retrace signal. And, the semantics of the CopyArea operation precludes us from swapping the underlying buffers around, making it more expensive than strictly necessary.

The Swap extension fills those needs. Because the DRI3 extension provides an X pixmap reference to the direct rendered content, the Swap extension doesn’t need any new object types for its operation. Instead, it talks strictly about core X objects, using X pixmaps as the source of the new data and X drawables as the destination.

The core of the Swap extension is one request — SwapRegion. This request moves pixels from a pixmap to a drawable. It uses an X fixes Region object to specify the area of the destination being painted, and an offset within the source pixmap to align the two areas.

A bunch of data are included in the reply from the SwapRegion request. First, you get a 64-bit sequence number identifying the swap itself. Then, you get a suggested geometry for the next source pixmap. Using the suggested geometry may result in performance improvements from the techniques described in the LCA talk above.

The last bit of data included in the SwapRegion reply is a list of pixmaps which were used as source operands to earlier SwapRegion requests to the same drawable. Each pixmap is listed along with the 64-bit sequence number associated with an earlier SwapRegion operation which resulted in the contents which the pixmap now contains. Ok, so that sounds really confusing. Some examples are probably necessary.

  • If the SwapRegion operation was implemented by copying data out of the source pixmap into the destination drawable, then the idle swap count will be equal to the swap count from this SwapRegion operation.

  • If the SwapRegion operation was implemented by swapping the destination contents with the source contents, then the idle swap count will be equal to the previous swap count on the destination drawable.

I’m hoping you’ll be able to tell that in both cases, the idle swap count tries to name the swap sequence at which time the destination drawable contained the contents currently in the pixmap.

Note that even if the SwapRegion is implemented as a Copy operation, the provided source pixmap may not be included in the idle list as the copy may be delayed to meet the synchronization requirements specfied by the client.

Finally, if you want to throttle rendering based upon when frames appear on the screen, Swap offers an event that can be delivered to the drawable after the operation actually takes place.

Because the Swap extension needs to supply all of the OpenGL SwapBuffers semantics (including a multiplicity of OpenGL extensions related to that), I’ve stolen a handful of DRI2 requests to provide the necessary bits for that:

  1. SwapGetMSC
  2. SwapWaitMSC
  3. SwapWaitSBC

These work just like the DRI2 requests of the same names.

Current State of the Extensions

Both of these extensions have an initial protocol specification written down and stored in git:

  1. DRI3 protocol

  2. Swap protocol

Posted Tue Feb 19 16:30:44 2013 Tags:

MicroPeak USB Interface now available.

Altus Metrum is pleased to announce the immediate availability of the MicroPeak USB interface.

MicroPeak USB Interface

MicroPeak and the MicroPeak USB Interface

MicroPeak is fun to use all by itself, providing a quick way to know how high your rocket has flown. But, for those people itching for more data, MicroPeakUSB offers a way to download raw flight data and analyze that on your computer.

MicroPeakUSB doesn’t require any changes to the MicroPeak hardware—new MicroPeak firmware transmits the entire flight log through the on-board LED to a phototransistor on the MicroPeakUSB Interface and then to the USB port on your computer.

Existing MicroPeak owners can contact us for a special deal on the MicroPeak USB interface and upgrading the MicroPeak firmware.

Posted Fri Feb 8 13:31:14 2013 Tags:

MicroPeak Serial Interface — Flight Logging for MicroPeak

MicroPeak was original designed as a simple peak-recording altimeter. It displays the maximum height of the last flight by blinking out numbers on the LED.

Peak recording is fun and easy, but you need a log across apogee to check for unexpected bumps in baro data caused by ejection events. NAR also requires a flight log for altitude records. So, we wondered what could be done with the existing MicroPeak hardware to turn it into a flight logging altimeter.

Logging the data

The 8-bit ATtiny85 used in MicroPeak has 8kB of flash to store the executable code, but it also has 512B (yes, B as in “bytes”) of eeprom storage for configuration data. Unlike the code flash, the little eeprom can be rewritten 100,000 times, so it should last for a lifetime of rocketry.

The original MicroPeak firmware already used that to store the average ground pressure and minimum pressure (in Pascals) seen during flight; those are used to compute the maximum height that is shown on the LED. If we store just the two low-order bytes of the pressure data, we’d have room left for 251 data points. That means capturing data at least every 32kPa, which is about 3km at sea level.

251 points isn’t a whole lot of storage, but we really only need to capture the ascent and arc across apogee, which generally occurs within the first few seconds of flight.

MicroPeak samples air pressure once every 96ms, if we record half of those samples, we’ll have data every 192ms. 251 samples every 192ms captures 48 seconds of flight. A flight longer than that will just see the first 48 seconds. Of course, if apogee occurs after that limit, MicroPeak will still correctly record that value, it just won’t have a continuous log.

Downloading the data

Having MicroPeak record data to the internal eeprom is pretty easy, but it’s not a lot of use if you can’t get the data into your computer. However, there aren’t a whole lot of interfaces avaialble on MicroPeak. We’ve only got:

  • The 6-pin AVR programming header. This is how we load firmware onto MicroPeak during manufacturing. It’s not locked (of course), and the hardware supports reading and writing of flash, ram and eeprom.

  • The LED. We already use this to display the maximum height of the previous flight, can we blink it faster and then get the computer to read it out?

First implementation

I changed the MicroPeak firmware to capture data to eeprom and made a ‘test flight’ using my calibrated barometric chamber (a large syringe). I was able to read out the flight data using the AVR programming pins and got the flight logging code working that way.

The plots I created looked great, but using an AVR programmer to read the data looked daunting for most people as it requires:

  • An AVR programmer. Adafruit sells the surprisingly useful USBtinyISP programmer. There are Windows drivers available for this, and you can get the necessary avrdude binaries from the usbtiny page above. The programmer itself comes in kit form, so you have to solder it together. There are other programmers available for a bit more than come pre-assembled, but all of them require that you wander around the net finding the necessary drivers and programming software.

  • A custom MicroPeak programming jig. We have these for sale in the Altus Metrum web store but, because they need special pogo pins and a pile of custom circuit boards, they’re not cheap to make.

With the hardware running at least $120 retail, and requiring a pile of software installed from various places around the net, this approach didn’t seem like a great way to let people easily capture flight data from their tiny altimeter.

The Blinking LED

The only other interface available is the MicroPeak LED. It’s a nice LED, bright and orange and low power. But, it’s still just a single LED. However, it seemed like it might be possible to have it blink out the data and create a device to watch the LED and connect that to a USB port.

The simplest idea I had was to just blink out the data in asynchronous serial form; a start bit, 8 data bits and a stop bit. On the host side, I could use a regular FTDI FT230 USB to serial converter chip. Those even have a 3.3V regulator and can supply a bit of current to other components on the board, eliminating the need for an external power supply.

To ‘see’ the LED blink, I needed a photo-transistor that actually responds to the LED’s wavelength. Most photo-transistors are designed to work with infrared light, which nicely makes the whole setup invisible. There are a few photo-transistors available which do respond in the visible range, and ROHM RPM-075PT actually has its peak sensitivity right in the same range as the LED.

In between the photo-transistor and the FT230, I needed a detector circuit which would send a ‘1’ when the light was present and a ‘0’ when it wasn’t. To me, that called for a simple comparator made from an op-amp. Set the voltage on the negative input to somewhere between ‘light’ and ‘dark’ and then drive the positive input from the photo-transistor; the output would swing from rail to rail.

Bit-banging async

The ATtiny85 has only a single ‘serial port’, which is used on MicroPeak to talk to the barometric sensor in SPI mode. So, sending data out the LED requires that it be bit-banged — directly modulated with the CPU.

I wanted the data transmission to go reasonably fast, so I picked a rate of 9600 baud as a target. That means sending one bit every 104µS. As the MicroPeak CPU is clocked at only 250kHz, that leaves only about 26 cycles per bit. I need all of the bits to go at exactly the same speed, so I pack the start bit, 8 data bits and stop bit into a single 16 bit value and then start sending.

Of course, every pass around the loop would need to take exactly the same number of cycles, so I carefully avoided any conditional code. With that, 14 of the 26 cycles were required to just get the LED set to the right value. I padded the loop with 12 nops to make up the remaining time.

At 26 cycles per bit, it’s actually sending data at a bit over 9600 baud, but the FT230 doesn’t seem to mind.

A bit of output structure

I was a bit worried about the serial converter seeing other light as random data, so I prefixed the data transmission with ‘MP’; that made it easy to ignore anything before those two characters as probably noise.

Next, I decided to checksum the whole transmission. A simple 16-bit CRC would catch most small errors; it’s easy enough to re-try the operation if it fails after all.

Finally, instead of sending the data in binary, I displayed each byte as two hex digits, and sent some newlines along to keep the line lengths short. This makes it easy to ship flight logs in email or whatever.

Here’s a sample of the final data format:

MP
dc880100fec000006800f56d8f63b059
73516447273fa93728301927d91b7712
730bbf0491fe88f7c5ee8ee896e3fadc
9dd9d3d502d1afcea2cbafc6b4c34ec1
bfbfcabf10c03dc05dc070c084c08fc0
9cc0abc0b9c0c1c0ccc0dcc020c152c4
71c9a6cf45d623db7de05ee758edd9f2
b4f9fd00aa074311631a9221c4291330
c035873b2943084bbb52695c0c67eb6b
d26ee5707472fb74a4781f7dee802b84
09860a87e786ad868a866e8659865186
4e8643863e863986368638862e862d86
2f862d86298628862a86268629862686
28862886258625862486
d925

Making the photo-transistor go fast enough

The photo-transistor acts as one half of a voltage divider on the positive op-amp terminal, with a resistor making the other half. However, the photo-transistor acts a bit like a capacitor, so when I initially chose a fairly large value for the resistor, it actually took too long to switch between on and off — the transistor would spend a bunch of time charging and discharging. I had to reduce the resistor to 1kΩ for the circuit to work.

Remaining hardware design

I prototyped the circuit on a breadboard using a through-hole op-amp that my daughter designed into her ultrasonic guided robot and a prefabricated FTDI Friend board. I wanted to use the target photo-transistor, so I soldered a couple of short pieces of wire onto the SMT pads and stuck that into the breadboard.

Once I had that working, I copied the schematic to gschem, designed a board and had three made at OSHPark for the phenomenal sum of $1.35.

Aside from goofing up on the FT230 USB data pins (swapping D+ and D-), the board worked perfectly.

The final hardware design includes an LED connected to the output of the comparator that makes it easier to know when things are lined up correctly, otherwise it will be essentially the same.

Host software

Our AltosUI code has taught us a lot about delivering code that runs on Linux, Mac OS X and Windows, so I’m busy developing something based on the same underlying Java bits to support MicroPeak. Here’s a sample of the graph results so far:

Production plans

I’ve ordered a couple dozen raw boards from OSH Park, and once those are here, I’ll build them and make them available for sale in a couple of weeks. The current plan is to charge $35 for the MicroPeak serial interface board, or sell it bundled with MicroPeak for $75.

Posted Sun Dec 30 06:34:25 2012 Tags:

MicroPeak — tiny peak-recording altimeter now available

MicroPeak is a miniature peak-recording altimeter. About the same size and weight as a US dime (with battery), MicroPeak offers fabulous accuracy (20cm or 8in at sea level) and wide range (up to 31km or 101k’).

  • Uses the Measurement Specialties MS5607 barometric sensor.

  • Includes built-in battery holder for easily replaceable CR1025 lithium battery

  • Compact design is only 18mm x 14mm or 0.7” x 0.56”. Weighs 1.9g including the battery.

  • Low power design lasts for over 40 hours in flight.

  • Auto-poweroff on landing.

  • Learn more at the Altus Metrum web site

  • Buy these at the gag.com web store for Altus Metrum products

The size of the board was predicated with the premise that we needed a battery included to avoid having wiring running between the altimeter and the board, we found some small lithium coin-cell battery holders for the CR1025 battery. These battery holders are rated to hold the battery secure up to 150gs.

We’d already started playing with the Measurement Specialties MS5607 pressure sensor which offers amazing accuracy while using very little power. Taking full-precision measurements every 96ms consumes about .2mA on average. Once on the ground, we stop taking measurements entirely, dropping the power use to around 1µA. It’s also pretty small, measuring only 5mm x 3mm.

For a CPU, this little project didn’t need much. The 8-bit ATtiny85 comes in a 20qwfn package which is only 4mm x 4mm. When run at full speed (8MHz), it consumes a couple of mA of power. Reduce the clock to a pokey 250kHz and the CPU has enough CPU power to track altitude while consuming less than .2mA on average.

To avoid losing the battery, we wanted to avoid having it removed while the board wasn’t in use. So, we added a little power switch to the board. The one we found is good to at least 50g.

Finally, we wanted to find a nice bright LED to show the state of the device and to blink out the final altitude. The OSRAM LO T67K are bright-orange surface-mount LEDs that run happily on 2mA.

We used OSHPark.com to create prototype circuit boards for this project. Because of the small size of the board, each prototype run cost only $2 for three boards. It takes a couple of weeks to get boards, but it’s really hard to beat the price.

All of the schematic and circuit board artwork are published under the TAPR Open Hardware License and are available via git.

All of the source code is published under the GPLv2 and is included in the main AltOS source repository.

Posted Thu Nov 22 17:12:27 2012 Tags:

FD passing for DRI.Next

Using the DMA-BUF interfaces to pass DRI objects between the client and server, as discussed in my previous blog posting on DRI-Next, requires that we successfully pass file descriptors over the X protocol socket.

Rumor has it that this has been tried and found to be difficult, and so I decided to do a bit of experimentation to see how this could be made to work within the existing X implementation.

(All of the examples shown here are licensed under the GPL, version 2 and are available from git://keithp.com/git/fdpassing)

Basics of FD passing

The kernel internals that support FD passing are actually quite simple — POSIX already require that two processes be able to share the same underlying reference to a file because of the semantics of the fork(2) call. Adding some ability to share arbitrary file descriptors between two processes then is far more about how you ask the kernel than the actual file descriptor sharing operation.

In Linux, file descriptors can be passed through local network sockets. The sender constructs a mystic-looking sendmsg(2) call, placing the file descriptor in the control field of that operation. The kernel pulls the file descriptor out of the control field, allocates a file descriptor in the target process which references the same file object and then sticks the file descriptor in a queue for the receiving process to fetch.

The receiver then constructs a matching call to recvmsg that provides a place for the kernel to stick the new file descriptor.

A helper API for testing

I first write a stand-alone program that created a socketpair, forked and then passed an fd from the parent to the child. Once that was working, I decided that some short helper functions would make further testing a whole lot easier.

Here’s a function that writes some data and an optional file descriptor:

ssize_t
sock_fd_write(int sock, void *buf, ssize_t buflen, int fd)
{
    ssize_t     size;
    struct msghdr   msg;
    struct iovec    iov;
    union {
        struct cmsghdr  cmsghdr;
        char        control[CMSG_SPACE(sizeof (int))];
    } cmsgu;
    struct cmsghdr  *cmsg;

    iov.iov_base = buf;
    iov.iov_len = buflen;

    msg.msg_name = NULL;
    msg.msg_namelen = 0;
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;

    if (fd != -1) {
        msg.msg_control = cmsgu.control;
        msg.msg_controllen = sizeof(cmsgu.control);

        cmsg = CMSG_FIRSTHDR(&msg);
        cmsg->cmsg_len = CMSG_LEN(sizeof (int));
        cmsg->cmsg_level = SOL_SOCKET;
        cmsg->cmsg_type = SCM_RIGHTS;

        printf ("passing fd %d\n", fd);
        *((int *) CMSG_DATA(cmsg)) = fd;
    } else {
        msg.msg_control = NULL;
        msg.msg_controllen = 0;
        printf ("not passing fd\n");
    }

    size = sendmsg(sock, &msg, 0);

    if (size < 0)
        perror ("sendmsg");
    return size;
}

And here’s the matching receiver function:

ssize_t
sock_fd_read(int sock, void *buf, ssize_t bufsize, int *fd)
{
    ssize_t     size;

    if (fd) {
        struct msghdr   msg;
        struct iovec    iov;
        union {
            struct cmsghdr  cmsghdr;
            char        control[CMSG_SPACE(sizeof (int))];
        } cmsgu;
        struct cmsghdr  *cmsg;

        iov.iov_base = buf;
        iov.iov_len = bufsize;

        msg.msg_name = NULL;
        msg.msg_namelen = 0;
        msg.msg_iov = &iov;
        msg.msg_iovlen = 1;
        msg.msg_control = cmsgu.control;
        msg.msg_controllen = sizeof(cmsgu.control);
        size = recvmsg (sock, &msg, 0);
        if (size < 0) {
            perror ("recvmsg");
            exit(1);
        }
        cmsg = CMSG_FIRSTHDR(&msg);
        if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(int))) {
            if (cmsg->cmsg_level != SOL_SOCKET) {
                fprintf (stderr, "invalid cmsg_level %d\n",
                     cmsg->cmsg_level);
                exit(1);
            }
            if (cmsg->cmsg_type != SCM_RIGHTS) {
                fprintf (stderr, "invalid cmsg_type %d\n",
                     cmsg->cmsg_type);
                exit(1);
            }

            *fd = *((int *) CMSG_DATA(cmsg));
            printf ("received fd %d\n", *fd);
        } else
            *fd = -1;
    } else {
        size = read (sock, buf, bufsize);
        if (size < 0) {
            perror("read");
            exit(1);
        }
    }
    return size;
}

With these two functions, I rewrote the simple example as follows:

void
child(int sock)
{
    int fd;
    char    buf[16];
    ssize_t size;

    sleep(1);
    for (;;) {
        size = sock_fd_read(sock, buf, sizeof(buf), &fd);
        if (size <= 0)
            break;
        printf ("read %d\n", size);
        if (fd != -1) {
            write(fd, "hello, world\n", 13);
            close(fd);
        }
    }
}

void
parent(int sock)
{
    ssize_t size;
    int i;
    int fd;

    fd = 1;
    size = sock_fd_write(sock, "1", 1, 1);
    printf ("wrote %d\n", size);
}

int
main(int argc, char **argv)
{
    int sv[2];
    int pid;

    if (socketpair(AF_LOCAL, SOCK_STREAM, 0, sv) < 0) {
        perror("socketpair");
        exit(1);
    }
    switch ((pid = fork())) {
    case 0:
        close(sv[0]);
        child(sv[1]);
        break;
    case -1:
        perror("fork");
        exit(1);
    default:
        close(sv[1]);
        parent(sv[0]);
        break;
    }
    return 0;
}

Experimenting with multiple writes

I wanted to know what would happen if multiple writes were made, some with file descriptors and some without. So I changed the simple example parent function to look like:

void
parent(int sock)
{
    ssize_t size;
    int i;
    int fd;

    fd = 1;
    size = sock_fd_write(sock, "1", 1, -1);
    printf ("wrote %d without fd\n", size);
    size = sock_fd_write(sock, "1", 1, 1);
    printf ("wrote %d with fd\n", size);
    size = sock_fd_write(sock, "1", 1, -1);
    printf ("wrote %d without fd\n", size);
}

When run, this demonstrates that the reader gets two bytes in the first read along with a file descriptor followed by one byte in a second read, without a file descriptor. This demonstrates that a file descriptor message forms a barrier within the socket; multiple messages will be merged together, but not past a message containing a file descriptor.

Reading without accepting a file descriptor

What happens when the reader isn’t expecting a file descriptor? Does it just get lost? Does the reader not get the message until it asks for the file descriptor? What about the boundary issue described above?

Here’s my test case:

void
child(int sock)
{
    int fd;
    char    buf[16];
    ssize_t size;

    sleep(1);
    size = sock_fd_read(sock, buf, sizeof(buf), NULL);
    if (size <= 0)
        return;
    printf ("read %d\n", size);
    size = sock_fd_read(sock, buf, sizeof(buf), &fd);
    if (size <= 0)
        return;
    printf ("read %d\n", size);
    if (fd != -1) {
        write(fd, "hello, world\n", 13);
        close(fd);
    }
}

void
parent(int sock)
{
    ssize_t size;
    int i;
    int fd;

    fd = 1;
    size = sock_fd_write(sock, "1", 1, 1);
    printf ("wrote %d without fd\n", size);
    size = sock_fd_write(sock, "1", 1, 2);
    printf ("wrote %d with fd\n", size);
}

This shows that the first passed file descriptor is picked up by the first sockfdread call, but the file descriptor is closed. The second file descriptor passed is picked up by the second sockfdread call.

Zero-length writes

Can a file descriptor be passed without sending any data?

void
parent(int sock)
{
    ssize_t size;
    int i;
    int fd;

    fd = 1;
    size = sock_fd_write(sock, "1", 1, -1);
    printf ("wrote %d without fd\n", size);
    size = sock_fd_write(sock, NULL, 0, 1);
    printf ("wrote %d with fd\n", size);
    size = sock_fd_write(sock, "1", 1, -1);
    printf ("wrote %d without fd\n", size);
}

And the answer is clearly “no” — the file descriptor is not passed when no data are included in the write.

A summary of results

  1. read and recvmsg don’t merge data across a file descriptor message boundary.

  2. failing to accept an fd in the receiver results in the fd being closed by the kernel.

  3. a file descriptor must be accompanied by some data.

Make X pass file descriptors

I’d like to get X to pass a file descriptor without completely rewriting the internals of both the library and the X server. Ideally, without making any changes to the existing code paths for regular request processing at all.

On the sending side, this seems pretty straightforward — we just need to get the X connection file descriptor and call sendmsg directly, passing the desired file descriptor along. In XCB, this could be done by using the xcbtakesocket interface to temporarily hijack the protocol as Xlib does.

It’s the receiving side where things are messier. Because a bare read will discard any delivered file descriptor, we must make sure to use recvmsg whenever we want to actually capture the file descriptor.

Kludge X server fd receiving

Because a passed fd creates a barrier in the bytestream, when the X server reads requests from a client, the read will stop sending data after the message with the file descriptor is consumed.

Of course, this process consumes the passed file descriptor, and if that call isn’t made with recvmsg set up to receive it, the fd will be lost.

As a simple kludge, if we pass a meaningless fd with the X request and then the ‘real’ fd with a following XNoOperation request, the existing request reading code will get the request, discard the meaningless fd and then stop reading at that point due to the barrier. Once into the request processing code, recvmsg can be called to get the real file descriptor and the associated XNoOperation request.

I wrote a test for this that demonstrates how this works:

static void
child(int sock)
{
    uint8_t xreq[1024];
    uint8_t xnop[4];
    uint8_t req;
    int i, reqlen;
    ssize_t size, fdsize;
    int fd = -1, *fdp;
    int j;

    sleep (1);
    for (j = 0;; j++) {
        size = sock_fd_read(sock, xreq, sizeof (xreq), NULL);
        printf ("got %d\n", size);
        if (size == 0)
            break;
        i = 0;
        while (i < size) {
            req = xreq[i];
            reqlen = xreq[i+1];
            i += reqlen;
            switch (req) {
            case 0:
                break;
            case 1:
                if (i != size) {
                    fprintf (stderr, "Got fd req, but not at end of input %d < %d\n",
                         i, size);
                }
                fdsize = sock_fd_read(sock, xnop, sizeof (xnop), &fd);
                if (fd == -1) {
                    fprintf (stderr, "no fd received\n");
                } else {
                    FILE    *f = fdopen (fd, "w");
                    fprintf(f, "hello %d\n", j);
                    fflush(f);
                    fclose(f);
                    close(fd);
                    fd = -1;
                }
                break;
            case 2:
                fprintf (stderr, "Unexpected FD passing req\n");
                break;
            }
        }
    }
}

int
tmp_file(int j) {
    char    name[64];

    sprintf (name, "tmp-file-%d", j);
    return creat(name, 0666);
}

static void
parent(int sock)
{
    uint8_t xreq[32];
    uint8_t xnop[4];
    int i, j;
    int fd;

    for (j = 0; j < 4; j++) {
        /* Write a bunch of regular requests */
        for (i = 0; i < 8; i++) {
            xreq[0] = 0;
            xreq[1] = sizeof (xreq);
            sock_fd_write(sock, xreq, sizeof (xreq), -1);
        }

        /* Write our 'pass an fd' request with a 'useless' FD to block the receiver */
        xreq[0] = 1;
        xreq[1] = sizeof(xreq);
        sock_fd_write(sock, xreq, sizeof (xreq), 1);

        /* Pass an fd */
        xnop[0] = 2;
        xnop[1] = sizeof (xnop);
        fd = tmp_file(j);
        sock_fd_write(sock, xnop, sizeof (xnop), fd);
        close(fd);
    }
}

Fixing XCB to receive file descriptors

Multiple threads may be trying to get replies and events back from the X server at the same time, which means the kludge of having the real fd follow the message will likely lead to the wrong thread getting the file descriptor.

Instead, I suspect the best plan will be to fix XCB to internally capture passed file descriptors and save them with the associated reply. Because the file descriptor message will form a barrier in the read stream, xcb can associate any received file descriptor with the last reply in the read data. The X server would then send the reply with an explicit sendmsg call to pass both reply and file descriptor together.

Next steps

The next thing to do is code up a simple fd passing extension and try to get it working, passing descriptors back and forth to the X server. Once that works, design of the rest of the DRM-Next extension should be pretty straightforward.

Posted Fri Oct 5 13:30:16 2012 Tags:

All Entries