Keithp.com/ blog
RSS Add a new post titled:
kernel-mode-drivers

I mentioned in passing during my Linux.Conf.Au talk that we were looking at moving portions of the video drivers into the kernel. Others, including Alan Coopersmith have started to chime in on this note and I thought I should write a bit more about what I was thinking.

At least a few of the goals are as follows:

  1. BIOS-free Suspend/Resume support
  2. Text-mode panic
  3. Flicker-free graphical boot
  4. Fancy animated user switching

A significant non-goal here is an in-kernel unified API for graphics acceleration. Our current high-level architecture for rendering is in good shape, with user-mode filling a ring buffer with hardware-specific commands and the kernel just dispatching buffers.

Ok, so as you can see from the list above, I’m just talking about device detection, configuration and video mode selection.

Why this sudden desire to move video mode selection into the kernel? Well, for years, we were told by several video card vendors that there was “no way” we could ever figure out how to do mode setting without using the BIOS. If you believe you must use the BIOS, it’s difficult to see how that can be done from within the kernel; executing the BIOS requires either an x86 emulator or vm86 support, neither of which belong in the kernel. For a long time, I believed the video chip vendors. Silly me.

Several people, including Luc Verhaegen, had been moving BIOS-based drivers towards native mode setting. While the BIOS was a nice crutch, real support requires us to follow other operating systems and program the hardware directly. It’s the only way to get at the full range of hardware capabilities, although it does require a lot more code. And, some machines will not quite work right until magic tweaks are added. On balance, the number of machines fixed is greater than the number of machines broken (by a huge margin, given the lack of BIOS support for the native panel mode support in many laptops).

Now, if you have to have BIOS support for video mode selection, you have to wait for usermode to wake up before you can program the video card. There’s not a lot you can do early in the kernel, without some joyous adventure involving initrd devices that include a complete X server. Ick.

Once you embrace native mode setting, suddenly the range of options opens up and you can think about what might be possible if this code were moved down into the kernel. Of course, the first thought is how little you can move.

So, with that, we can start enumerating what capabilities must be in kernel mode to solve the list of problems above.

First, to get BIOS-free suspend/resume support working, we already require the ability to save and restore the entire graphics state, including synchronizing with applications performing graphical operations. For video systems involving external chips (most, these days), this also means saving and restoring the state of those chips.

While we may mock Windows and the BSOD, many people would be far happier with that than the current state of an Xorg system where kernel panics are not announced at all; instead, the screen simply freezes and the user has no idea what has happened. Even the ability to escape from this state is limited to those with the secret handshake knowledge. Getting back to a simple text mode and displaying the panic message requires the ability to program a fixed mode from within the kernel, and the ability to lock out other users of the graphics device (perhaps waiting for the hardware pipeline to drain, if necessary).

Eliminating the screen flashing during boot-up will require the ability to set the desired final graphical mode as soon as the kernel is running (or, even from the boot loader?). I suggest that the precise target mode can be computed during system installation time (and changed for next boot by the user session), so this only requires the ability to read the desired mode configuration from somewhere at boot time. Kristian Høgsberg has been working on an early switch to the final graphical mode, but hasn’t managed to eliminate the flashing.

I envision fancy user switching being implemented by a transition from one X server to another rather than trying to get two user sessions running inside the same X server. Right now, we switch users with a VT switch where the first session switches back to text mode and the second switches back to graphical mode. Flash, Flash. Having the kernel recognise that the two X servers were using the same mode will eliminate the flashing nicely, the only remaining issue is how to animate from one session to the next. I think that’s largely a matter of being able to allocate multiple front buffers and creating an animation client that uses two X server front buffer images to construct the intermediate representations.

Finally, building a kernel API for all of this has become possible because we’ve all come to recognise that there are commonalities in mode selection across video hardware. The Intel driver had some separation between CRTC and output when we started working on it. Luc’s Via driver has been moving this direction for some time. While the Radeon driver uses a different approach (it sets everything on the card every time you touch anything), it too recognises the distinction between CRTC and Output. Matthew Tippett suggested that even the closed source ATI driver worked this way as he, along with Kevin Martin, redesigned RandR 1.2 to work this way.

Because we now have a reasonably common abstraction across a wide range of drivers, it now seems tractable to produce a common API. We’ve already started this inside the X server itself; the hope is that working on the common layer there will inform choices about how the kernel API should look. This will include

  1. Mode setting primitives.
  2. GPU ring buffer management
  3. Memory management of some kind.

Yes, this makes modesetting just an addition to the existing DRM drivers; they’re responsible for the GPU and memory management stuff, and the modesetting driver needs both of those bits to perform the operations listed above, so it cannot be ‘underneath’ the DRM driver. We welcome our new DRM overlords.

Posted Mon Feb 5 22:17:36 2007
randr 1.2 update

We’ve been working busily at RandR 1.2, both cleaning up the extension specification and trying to build a infrastructure that will help people get support into all of the video drivers. I thought I’d let people know where things stand and what remains to be done.

First off, things that are fairly stable now:

One piece that needs a few new features is the xrandr application. I just spent a day cleaning things up a bit (it could use a more cleanup, especially splitting the source across multiple files). I added the ability to position outputs relative to one another and also a global ‘—auto’ mode which turns on all connected outputs and turns off all disconnected ones. With that, I hooked up a new global hotkey in metacity to run ‘xrandr —auto’ so I can just plug in a new monitor, hit the key and expect it to light up. There are still a few more tasks to take care of here:

On the driver front, we’re doing development in an unusual fashion (and we may regret it if we’re not careful). As our goal is to produce a single driver binary that will run against both 7.2 and newer X servers, we cannot depend on having any new functions in the server binary.

To avoid server dependencies, we’re building a bunch of new driver-independent functionality as if it were part of the server binary and linking that into the Intel driver. As much of this code started life as driver-dependent explorations of how to make RandR 1.2 work, it isn’t quite as independent as we’d like. We’ve done some piecemeal attempts to make it look a little better, but the result is actually fairly ugly at present with a mish-mash of function naming schemes and remnants of driver-dependencies in odd places.

Dave Airlie has been copying this code into the nouveau driver and hooking up RandR 1.2 support there. That’s great as it gives us a chance to make sure these new interfaces are right for more than one driver. I’m hoping someone will also take a look at how this will work in the radeon driver; with that, we’ll have a reasonably broad experience with the new interfaces and should be able to avoid nasty surprises down the road. Of course, drivers will still be able to completely by-pass this layer within the server, so at least we won’t make fixing it impossible, only painful.

I’ve also recently started on some xorg.conf support for output configuration. Right now, this consists of the ability to associate a Monitor section in the config file with each output of the device. From the monitor section, you can add new mode lines, specify DPMS support, override sync ranges and set a preferred mode.

We’ve also started adding some monitor ‘quirks’ to the EDID detection code; I got a trio of monitors from France that all had various incorrect data in their EDID blocks, including one monitor which reported a preferred mode of 640x350. I’d like to keep adding more quirks as we find broken monitors; that lets everyone share the same fixes. Of course, with the xorg.conf support now available, you can override most of the EDID data and work around things at run-time, but if you do have to do this, please submit a bug report and attach the broken EDID data (xrandr —prop will print it out for you).

Aside from general infrastructure cleanup, we’ve still got some features missing from the implementation that we’d like to play with:

And, of course, it would be fun to see some applications starting to use this, in particular KDE and Gnome both have screen size setting applets which could see some significant enhancements now.

Posted Sun Dec 31 23:53:05 2006
Startup RandR Configuration

Well, RandR 1.2 work is progressing apace; I can now reconfigure the X server in some fairly dramatic ways. I now regularly use the 1600x1200 monitor at home as extra desktop space for my laptop, growing the X root window to cover both monitors.

But, what I’ve lost in the process is any ability to configure the system at startup time. This is rather unusual; the system is now far more flexible through the RandR protocol than through the configuration file. Getting things set right at startup time seems important, as that will avoid flashing monitors as they change modes and other possible issues.

I started the process of allowing startup-time configuration by making the RandR 1.2 code permit object creation before the Screen objects were created. This allows the driver to create the necessary RandR 1.2 structures and use those to control the configuration process. This would work, except that I’d like the same driver to work in the absence of RandR 1.2 in the core server.

Back to the drawing board.

What I’m doing now is creating some new structures that map to the RandR 1.2 structures but which are hw/xfree86 specific and which don’t depend on RandR 1.2 in the core server. With the goal of eventually moving these into the hw/xfree86 portion of the server, these structures provide all of the RandR 1.2 semantics using smaller driver-specific methods. The code for this new work can then use these new data structures to configure the server at startup time, as well as on-the-fly at runtime using RandR 1.2.

The two primary data structures are the xf86CrtcRec and the xf86OutputRec:

struct _xf86Crtc {
    /**
     * Associated ScrnInfo
     */
    ScrnInfoPtr     scrn;

    /**
     * Active state of this CRTC
     *
     * Set when this CRTC is driving one or more outputs 
     */
    Bool        enabled;

    /**
     * Position on screen
     *
     * Locates this CRTC within the frame buffer
     */
    int         x, y;

    /** Track whether cursor is within CRTC range  */
    Bool        cursorInRange;

    /** Track state of cursor associated with this CRTC */
    Bool        cursorShown;

    /**
     * Active mode
     *
     * This reflects the mode as set in the CRTC currently
     * It will be cleared when the VT is not active or
     * during server startup
     */
    DisplayModeRec  curMode;

    /**
     * Desired mode
     *
     * This is set to the requested mode, independent of
     * whether the VT is active. In particular, it receives
     * the startup configured mode and saves the active mode
     * on VT switch.
     */
    DisplayModeRec  desiredMode;

    /** crtc-specific functions */
    const xf86CrtcFuncsRec *funcs;

    /**
     * Driver private
     *
     * Holds driver-private information
     */
    void        *driver_private;

#ifdef RANDR_12_INTERFACE
    /**
     * RandR crtc
     *
     * When RandR 1.2 is available, this
     * points at the associated crtc object
     */
    RRCrtcPtr       randr_crtc;
#else
    void        *randr_crtc;
#endif
};


struct _xf86Output {
    /**
     * Associated ScrnInfo
     */
    ScrnInfoPtr     scrn;
    /**
     * Currently connected crtc (if any)
     *
     * If this output is not in use, this field will be NULL.
     */
    xf86CrtcPtr     crtc;
    /**
     * List of available modes on this output.
     *
     * This should be the list from get_modes(), plus perhaps additional
     * compatible modes added later.
     */
    DisplayModePtr  probed_modes;

    /** EDID monitor information */
    xf86MonPtr      MonInfo;

    /** Physical size of the currently attached output device. */
    int         mm_width, mm_height;

    /** Output name */
    char        *name;

    /** output-specific functions */
    const xf86OutputFuncsRec *funcs;

    /** driver private information */
    void        *driver_private;

#ifdef RANDR_12_INTERFACE
    /**
     * RandR 1.2 output structure.
     *
     * When RandR 1.2 is available, this points at the associated
     * RandR output structure and is created when this output is created
     */
    RROutputPtr     randr_output;
#else
    void        *randr_output;
#endif
};

The hardware is manipulated through driver-specific functions contained in the xf86CrtcFuncsRec and xf86OutputFuncsRec:

typedef struct _xf86CrtcFuncs {
   /**
    * Turns the crtc on/off, or sets intermediate power levels if available.
    *
    * Unsupported intermediate modes drop to the lower power setting.  If the
    * mode is DPMSModeOff, the crtc must be disabled, as the DPLL may be
    * disabled afterwards.
    */
   void
    (*dpms)(xf86CrtcPtr     crtc,
        int             mode);

   /**
    * Saves the crtc's state for restoration on VT switch.
    */
   void
    (*save)(xf86CrtcPtr     crtc);

   /**
    * Restore's the crtc's state at VT switch.
    */
   void
    (*restore)(xf86CrtcPtr      crtc);

    /**
     * Clean up driver-specific bits of the crtc
     */
    void
    (*destroy) (xf86CrtcPtr crtc);
} xf86CrtcFuncsRec, *xf86CrtcFuncsPtr;


typedef struct _xf86OutputFuncs {
    /**
     * Turns the output on/off, or sets intermediate power levels if available.
     *
     * Unsupported intermediate modes drop to the lower power setting.  If the
     * mode is DPMSModeOff, the output must be disabled, as the DPLL may be
     * disabled afterwards.
     */
    void
    (*dpms)(xf86OutputPtr   output,
        int         mode);

    /**
     * Saves the output's state for restoration on VT switch.
     */
    void
    (*save)(xf86OutputPtr       output);

    /**
     * Restore's the output's state at VT switch.
     */
    void
    (*restore)(xf86OutputPtr    output);

    /**
     * Callback for testing a video mode for a given output.
     *
     * This function should only check for cases where a mode can't be supported
     * on the pipe specifically, and not represent generic CRTC limitations.
     *
     * \return MODE_OK if the mode is valid, or another MODE_* otherwise.
     */
    int
    (*mode_valid)(xf86OutputPtr     output,
          DisplayModePtr    pMode);

    /**
     * Callback for setting up a video mode before any crtc/dpll changes.
     *
     * \param pMode the mode that will be set, or NULL if the mode to be set is
     * unknown (such as the restore path of VT switching).
     */
    void
    (*pre_set_mode)(xf86OutputPtr   output,
            DisplayModePtr  pMode);

    /**
     * Callback for setting up a video mode after the DPLL update but before
     * the plane is enabled.
     */
    void
    (*post_set_mode)(xf86OutputPtr  output,
             DisplayModePtr pMode);

    /**
     * Probe for a connected output, and return detect_status.
     */
    enum detect_status
    (*detect)(xf86OutputPtr output);

    /**
     * Query the device for the modes it provides.
     *
     * This function may also update MonInfo, mm_width, and mm_height.
     *
     * \return singly-linked list of modes or NULL if no modes found.
     */
    DisplayModePtr
    (*get_modes)(xf86OutputPtr  output);

    /**
     * Clean up driver-specific bits of the output
     */
    void
    (*destroy) (xf86OutputPtr   output);
} xf86OutputFuncsRec, *xf86OutputFuncsPtr;

Right now, I’ve just hacked up the Intel driver internals using these new structures and have left the implementation a tangled mess with driver-specific, randr-specific and other code all tied together. Obviously this is not a long-term plan, but I want to change the data structures first, then split the code apart.

At this point, I’ve managed to get the Intel driver to compile with the new data structures, so I’ll first get it working then start cleaning up the implementation so that driver-independent code is clearly separated and performing as much of the work as possible.

Once driver-independent code is in place, I will add startup configuration to the driver-independent code. I’ll probably use the existing Radeon configuration file options as much as possible, and probably accept the Intel configuration file options as well. While those do not expose the full capabilities of the RandR 1.2 extension, I suspect they’re sufficient for most users.

It’s a long way around to get startup configuration for the new capabilities in the Intel driver, but I’m hoping it will allow us to create a common configuration language for all of the drivers and also remove the current dependence on RandR 1.2 from the Intel driver for much of this functionality.

Posted Sun Nov 26 17:28:51 2006
Autoconf in a RandR World

While busily rewriting the RandR extension and the Intel driver to match, we decided to tackle another major issue, what to do when the driver isn’t given any instruction in the config file about how to light up the screens. And, beyond that, how to make that configurable in reasonable ways while making it driver-independent.

The current scheme is a mish-mash of crufty ancient code and magic driver-specific hacks. The ancient driver-independent code doesn’t understand that a single video card can have multiple monitors, so the normal configuration mechanisms are mostly harmful. For the Intel driver, the mode that you specify in the configuration file is almost, but not entirely, ignored.

In the bad old days of BIOS-based mode selection, it would just use the specified mode to try and match some mode present in the BIOS. In the brave new world of native mode setting, we can at least use the mode provided directly, assuming the output is capable of using it. In either case, all screens were programmed with the same mode (more or less). Not very useful when you have a 1024x768 internal panel and a 1600x1200 external monitor.

Almost all of the i830 and later Intel chips have two “pipes”. Each pipe can be connected to a variety of “outputs” (where an output is effectively a connector, like a local LCD panel, or an external VGA connector). The “pipes” are important here because it takes a pipe to hold a specific mode, the pipe fetches data from the frame buffer and sends it to the outputs with the timing specified by the mode.

Now, the weird thing is you can sometimes connect multiple outputs to a single pipe. But, when you do that, each output gets exactly the same mode and sees exactly the same pixels out of the frame buffer. Plus, there are other restrictions, like you can’t share a pipe with the local LCD panel or the TV output on the 945. Whatever. We mostly ignore this at present because it’s not that useful, and it’s a pain to think about. Of course RandR supports it and will expose it to the user when it can work, but that’s not often.

Ok, with that brief diversion into the oddities of the Intel graphics chip, let’s get back to configuration.

To allow the user to customize how modes were set, there were three parameters in the config file:

Option  "Clone"     "yes"
Option  "CloneRefresh"  "60"
Option  "MonitorLayout" "LFP,CRT"

The “Clone” option directed the driver to turn on two of the outputs. The “CloneRefresh” option specified the vertical sync rate for the “other” monitor. “MonitorLayout” gave the user precise control over which outputs are connected to which pipes.

The current BIOS-based driver on the master branch adds a bunch more:

Option  "MergeFB"   "yes"
Option  "MetaModes" "1024x768"
Option  "SecondHSync"   "80-130"
Option  "SecondVRefresh "50-75"
Option  "SecondPosition "RightOf"
Option  "MergedXinerama" "yes"

As you might guess, these all combine to let you place the second monitor somewhere other than right on top of the first monitor. Useful when you have two monitors on your desktop.

To confuse you further, the Radeon driver uses a different set of options to perform exactly the same function. Cool, huh? Even more fun is that these two drivers have completely different semantics of how to interpret the lack of these options. Makes building a configuration tool fairly challenging, and makes the chances that the user will get “random” results high.

So, the first thing to realize is that RandR 1.2 makes all of these things entirely configurable. But, not until you have the server running and can connect and X client. Bummer. With that, you’d have to put the desired configuration into a startup utility and you’d watch the screen flash a couple of times as you were logging in. Fun for some, but annoying for most.

Given that RandR has sufficient power to configure things after the server has started, and that this is expressed in a driver-independent fashion, it seems sensible to figure out how to use that information at startup time to make better choices and ease customization.

The first piece I’ve done is to replace the existing default mode selection logic with something a bit fancier and (I hope) more generally useful. After that, I’ll write up some replacement configuration options and use those to mutate this configuration.

For the initial default configuration (used when no options are present in the configuration file), I made some simplifying assumptions:

Given that I wanted to present the same data on every monitor, I started by picking a single monitor to control the size of the screen. For this, the code first looks for a monitor with a preferred mode; those are usually either the laptop LCD panel or an external DVI-connected LCD monitor. If no such monitor is present, the code picks a random monitor and selects a mode that will present data at about 96dpi. I think this makes more sense than picking the highest supported resolution as CRTs often advertise support for incredibly high resolutions that end up fuzzy and dim. Better to just pick a reasonable size by default and let the user change it after login.

Once the first mode is selected, all of the other monitors are set to modes that are close to that size.

Finally, the list of monitors is used to compute the maximum screen size that should be permitted. Yes, we’re still stuck with allocating the frame buffer at server init time. This will, eventually, go away, but that’s related to rendering infrastructure which we’re ignoring this week. In any case, the maximum size is computed by figuring out how much space is needed to place all of the known monitors side-by-side. For outputs which don’t have any monitor connected, we just pretend that they’ll max out at 1600x1200. The end result is that there shouldn’t be any limits on which mode combinations can be used in clone or mergefb mode.

Right now, all of this code is down inside the Intel driver. I will pull the DIX-level functions up to the RandR code. What remains is to decide how to make the remaining code driver independent and (eventually) move it to the xf86 common layer where it can be shared across multiple drivers.

All of this work is available on the modesetting branch of the Intel driver when built against the randr-1.2-for-server-1.2 branch of the X server.

Posted Thu Nov 16 22:32:14 2006
Hacking 965 modesetting

What a pleasant weekend I’ve had; no nasty meetings or politics, just some good clean hacking fun.

I spent a few hours each day poking at the modesetting branch and getting it working on my shiny new 965-based desktop system. Eric had been working on the SDVO support and gotten that working, so I figured I’d at least get the CRT output working, which seemed like an easy enough task.

The BIOS-based modesetting code was already working, so I knew the hardware worked correctly. But, our existing CRT modesetting code was producing a nice black screen. I love modesetting code—the most common error indication is just ‘sadness’; the monitor remains black and indicates that there is no signal present on the wire.

I poked around looking at what the BIOS did and how that differed from what the modesetting driver did and made a small bit of progress thanks to an accident. I left the video clock programmed for 1600x1200 and then asked the server to display a 640x480 mode. One would expect this would leave the video mode running far too fast. But, to my surprise, the monitor happily locked onto this and reported that it was running at 85Hz. Weird. A bit of math and I discovered that somehow the clock was getting divided by 4 somewhere. Sure enough, simply multiplying the real clock by 4 left me with stable modes across a wide range of sizes. Unfortunately, not including the high-resolution modes loved by our users; those were now out of the reach of the programmable clock. But, it made the question of where the problem was a bit clearer—something was wrong with the clock register programming.

The next accidental discovery was that in pure clone mode, with both CRT and DVI connected to the same pipe, I also managed to see a working mode for the lower resolutions (640x480 and 800x600). Higher resolutions still failed. It’s important to note that the DVI connector is reached through the SDVO port, which must run at high frequencies. Low frequency modes are padded with junk and clocked faster to keep the bus stable. For 640x480 and 800x600 modes, the clock is multiplied by 4.

While I had looked at the register results for the BIOS mode setting, I hadn’t seen it in action. Fortunately, action shots were available.

Dave Airlie hacked up Matthew Garrett’s vbetest program and left it on here. This fine piece of work executes the video bios and monitors all of the video device register accesses it performs. Watching it live allowed me to see precisely which registers it thought were related to clock timing. I noticed that it set the DPLLAMD register when programming a pure CRT mode, something which seemed a bit odd to me as that register has rather vague documentation about UDI and SDVO outputs. But, one small sentence did mention CRT multipliers of some sort, so I figured I might as well give it a try. Stealing the same setting that it used, the CRT now locked nicely using the normal un-multiplied clock frequency, and worked across the whole range of modes.

Thanks Dave, thanks Matthew.

The final adventure for the weekend was to discover why my screen image was getting corrupted when I used a large frame buffer. The effect was quite mystic—contents written to one location in memory would be duplicated to many locations on the screen. I thought it might be fifo size issues, but exploration with an application window and a window manager demonstrated that the corruption was not just on the screen, but actually visible to the GPU and CPU as well, and appeared to be caused by multiple GATT entries mapping to the same physical page. I eventually disocovered that just skipping the first 256K of video memory and not using fixed the problem; I haven’t looked into this in more detail, but it seems likely that those areas are actually mapped to the GATT table itself, and using them for other things caused the symptoms observed above. For now, I’ve just made the driver skip over that amount of memory; it fixes the problem I had.

I also spent a bunch of time shrinking the driver to eliminate a bunch of redundant state. We’re planning on moving all of the initial frame buffer configuration to common RandR code shared across drivers, so I went ahead and disabled the Intel-specific code in the driver. Yes, this means that there’s no way to configure screen layout when you start the server, but you can use the RandR extension afterwards to make it do whatever you like. It’s temporary, eventually we’ll get the common code working. Probably the first time X has had this kind of state which can only be set through the protocol and not in the config file.

As Eric made a merge that broke things before he took off for the weekend (strong work, Eric), I’ve placed my work on a separate branch for merging this week sometime. Everything here is on the modesetting-keithp branch in the xf86-video-intel repository.

Posted Sun Nov 5 20:02:27 2006
Nokia 6131 Synchronization

While in Shanghai a few weeks ago, I picked up a Nokia 6131 telephone. Prices there are quite reasonable, and I was fed up with my Motorola Razr (the worst phone ever invented, as far as I can tell). Friends familiar with Nokia phones suggested I might prefer a Series 40 phone, which while less feature-rich than the Symbian models tend to run quite a bit faster. Shopping for telephones was quite easy; every model I’d ever heard of was available from multiple vendors. Someday maybe the US will rediscover the simple joy of providing what the customer wants.

In any case, once we had switched the 6131 from Chinese to English, I slipped my sim card into place and was happily conversing and taking pictures with the new toy.

Of course, one of the big goals in moving from the Razr to the Series 40 was to get synchronization between my Evolution contacts and calendar and the applications on the telephone. Not since my Treo 600 had I been able to see my schedule and access my whole phone book from my cell phone.

The 6131 putatively supports SyncML, but attempts to get the opensync SyncML plugin working ended in failure after much gnashing of teeth. The plugin would load, but the synchronization process would just hang without transferring a single phone number. Sigh. Clearly this phone doesn’t quite follow the same interpretation of the syncml standard as the opensync plugin.

Finally, I started looking around for alternatives when I discovered that the Gnokii folks had created a gnokii opensync plugin using their Nokia-specific backend. Shockingly, there was no Debian package, so I downloaded the latest source and built it. Surprisingly enough, it appears to work fine.

Well, almost fine. I’ve got ‘a few’ contacts in my address book, and the simplistic gnokii code for locating a free address book slot was reading every address book entry looking for a free spot. Oddly, it spent a lot of time re-reading address book entries. That was easy to fix at least.

Next, I discovered that the gnokii sync code wasn’t dealing with finite repeating events, events which repeat for a while and then stop. Every repeating event would go on forever. I use repeating events for conferences by setting them to repeat every day for the length of the conferences. I go to a few conferences each year, so with 10 years of conference history, I had several repeating events occurring ‘today’. I tried to make this work correctly; the phone appears to have a notion of ‘occurrences’, which I was guessing meant a count of repeat events. I didn’t manage to get this working, so I kludged around it and set these events to non-repeating, which at least removes them from view for the moment.

Of course, hacked versions of gnokii and gnokii-sync are available from my git repository.

Ah, life with a functioning telephone again. We’ll see how long this lasts.

Posted Tue Oct 31 00:54:45 2006
Tyrannical SCM selection

Ok, so one thing I haven’t blogged about is how the X.org SCM selection was made. Some of you may know the story, but it may surprise others to learn that there was no democratic process involved. Usually, X.org leans heavily on concensus or at least voting when making global choices about the direction of the project; I’ve been subjected to some of them and accept the consequences when things don’t go my way.

However, when selecting and SCM, I decided (already the tyrant) that it really couldn’t involve even a substantial minority of the project developers. Learning enough about the available SCMs takes a lot of time; I spent about a year looking at options and trying things out. During that time, I downloaded SCM source code, built repositories, converted bits of the X.org tree and looked at the results.

Finally, last January, I was fortunate to be at LCA along with key developers for Bzr, Mercurial and Git. Taking advantage of the situation, I sat down with each of them and talked about their system architecture and overall goals. After the week was over, it was clear to me that the right choice for X.org was Git. I’d say the chances of getting a dozen key X.org developers to spend that kind of time doing this research are slim, I happened to enjoy significant latitude in my activities between OLS and LCA in 2005 as I moved from HP to Intel.

Any choice which involves forcing dozens or hundreds of people to study abstruse details about a system which they have no fundamental interest is doomed to failure—most of the group will just not bother, and will end up choosing essentially randomly, with a slight bias to whatever is most familiar, assuming that familiar choices will be less likely to be really bad.

Hence Subversion; it sounds safe because it’s advertised as ‘just like CVS, except fixed a bit’. After only a few months, it was clear that SVN would be a really poor fit for X.org, and X.org developers were already rumbling about moving to SVN as the obvious upgrade from CVS. Clearly democracy was going to fail here.

So, I took matters in my own hands and pre-emptively switched a significant, if fairly stable, piece of the X.org infrastructure from CVS to Git. Perhaps not handled in the most politic fashion, the result was a reasonably animated discussion about the results. Suddenly the discussion ended; people discovered that Git wasn’t that frightening and that I was reasonably serious about keeping at least the pieces I owned under Git control. Discussions about how to complete the migration from CVS to Git ensued shortly thereafter with a complete migration plan created that spanned a few months.

Yes, the developers were forced to come up to speed on a new SCM if they wanted to contribute. But, making the switch without a lot of sturm und drang meant we could focus more resources on helping those users figure out how to use the new system and fixing the various problems uncovered by the migration. I’m happy to say that now the transition is complete and development is proceeding apace. And, we’ve even gained contributions from some users who were already used to Git (thanks, Greg K-H) but refused to use CVS.

Posted Sun Oct 22 20:44:23 2006
Repository Formats Matter

I’ve seen many posts recently about SCM user interfaces and how one system is easier to learn, more powerful than another or better supports a particular development style. I submit that these arguments fail to capture the most salient feature of any source code management system—how the system manages the actual source code. This fundamental underpinning of the system, the repository structure, limits the kind of information the system can capture, the robustness and reliability of the data and to a great extent can limit the kinds of repository interactions possible.

A few days ago, Havoc makes a push for Subversion as a reasonable choice for projects. His complaints focus on the Git user interface, while again making this mistake that Git forces users to engage in distributed development.

I agree with Havoc that few projects are large enough in scale to require the kind of hierarchy seen in the Linux kernel. In fact, most projects have fewer than 10 developers working on them, and with close coordination, rarely see the need for any branching and merging at all.

However, as far as I know, none of the SCMs that provide distributed development insist that developers hide their work on long-lived branches and send patches up to a master maintainer. The distributed SCMs all allow either centralized or distributed development; it all depends on the conventions used within a project and individual developer style.

At X.org, we migrated from CVS to Git and yet have retained our largely centralized development model. There are few people publishing alternate trees, and we grant direct repository access to the same set of developers who used to have CVS access.

For really bizarre stuff that is experimental in nature, we occasionally publish a temporary alternate repository as a way to distance work from the mainline further than a branch within the master repository would; we allow developers to publish such trees on a public server that is visible through the same web interface as the master repositories, so there remains a single central location to discover what work is going on within a given module.

Git provides us with three principle functional advantages:

  1. Offline repository access. Until you’ve used it, it’s hard to understand just how often one can commit changes to a repository if the operation takes mere seconds. Havoc himself likes to save editor state every few minutes; with Git, he would be free to commit that state to the repository without significant additional delay.

    The ability to make very fine grained changes to the code encourages people to separate work into small comprehensible pieces. Both proactive review and reactive debugging benefit substantially from this kind of detail, allowing people to highlight significant small changes which would otherwise be lost in large functionally-neutral restructuring.

    Offline repository access is not the same as distributed development; changes are still pushed to a single shared public repository and included in a single line of development. Of course, simultaneous offline development often results in conflicts, but we’ve had that with CVS forever, and Git provides better merge-resolution tools than CVS ever did.

  2. Private branches. For those of us with ultra-secret hardware plans, we develop drivers for unreleased hardware in parallel with the development of the public project. Git makes this supremely easy by allowing us to keep the ultra-secret new hardware changes in a private repository while still tracking the public repository. When we’re allowed to release the source code for the new hardware, we simply merge the private branch to the upstream master and push that to the public repository. All of the development history for the new hardware then becomes a part of the public source repository.

  3. Distributed backups. Even given freedesktop.org’s reasonably reliable RAID disk array and daily tape backups, it’s nice to know that around the world there are hundreds of people with complete backups of our source code repositories. If freedesktop.org is destroyed by earthquake, fire, flood or volcano, we can be confident that somewhere on the planet there will be complete and recent backups.

    Alternatively, if the freedesktop.org administration becomes evil and starts to manipulate source code to subvert users machines, the distributed nature of our system means that the external developers will detect such changes and can easily repair them.

That’s nice for us, but none of these may be compelling for people new to the distributed revision control world. Similarly, Git provides some nice tools to view and manage the repository (gitk, Git-bisect, etc.), again, useful but not compelling.

I would like to argue that none of the user-interface and high-level functional details are nearly as important as the fundamental repository structure. When evaluating source code management systems, I primarily researched the repository structures and essentially ignored the user interface details. We can fix the user interface over time and even add features. We cannot, however, fix a broken repository structure without all of the pain inherent in changing systems.

Given this argument, it should be clear that I think git’s repository structure is better than others, at least for X.org’s usage model. It seems to hold several interesting properties:

  1. Files containing object data are never modified. Once written, every file is read-only from that point forward.

  2. Compression is done off-line and can be delayed until after the primary objects are saved to backup media. This method provides better compression than any incremental approach, allowing data to be re-ordered on disk to match usage patterns.

  3. Object data is inherently self-checking; you cannot modify an object in the repository and escape detection the first time the object is referenced.

Many people have complained about git’s off-line compression strategy, seeing it as a weakness that the system cannot automatically deal with this. Admittedly, automatic is always nice, but in this case, the off-line process gains significant performance advantages (all objects, independent of original source file name are grouped into a single compressed file), as well as reliability benefits (original objects can be backed-up before being removed from the server). From measurements made on a wide variety of repositories, git’s compression techniques are far and away the most successful in reducing the total size of the repository. The reduced size benefits both download times and overall repository performance as fewer pages must be mapped to operate on objects within a Git repository than within any other repository structure.

Subversion appears to me to have the worst repository structure of all; worse even than CVS. It supports multiple backends, with two available in open source and one (by google) in closed source. The old Berkeley DB-based backend has been deprecated as unstable and subject to corruption, so we will ignore that as obviously unsuitable. The new FSFS backend uses simple file-based storage and is more reliable, if somewhat slower in some cases.

The FSFS backend places one file per revision in a single directory; a test import of Mozilla generated hundreds of thousands of files in this directory, causing performance to plummet as more revisions were imported. I’m not sure what each file contains, but it seems like revisions are written as deltas to an existing revision, making damage to one file propagate down through generations. Lack of strong error detection means such errors will be undetected by the repository. CVS used to suffer badly from this when NFS would randomly zero out blocks of files.

The Mozilla CVS repository was 2.7GB, imported to Subversion it grew to 8.2GB. Under Git, it shrunk to 450MB. Given that a Mozilla checkout is around 350MB, it’s fairly nice to have the whole project history (from 1998) in only slightly more space.

Mercurial uses a truncated forward delta scheme where file revisions are appended to the repository file, as a string of deltas with occasional complete copies of the file (to provide a time bound on operations). This suffers from two possible problems—the first is fairly obvious where corrupted writes of new revisions can affect old revisions of the file. The second is more subtle — system failure during commit will leave the file contents half written. Mercurial has recovery techniques to detect this, but they involve truncating existing files, a piece of the Linux kernel which has constantly suffered from race conditions and other adventures.

I was looking seriously at Mercurial for X.org development, and was fortunate to spend a week last January with key developers from both Mercurial and Git. Discussions with both groups led me to understand that Git provided more of what X.org needed in terms of repository flexibility and stability than Mercurial did. The key detractors for Git was (and remains) the steep learning curve for the native Git interface; ameliorated for some users by alternate interfaces (such as Cogito), but not for core developers.

The other killer Git feature is speed. We’ve all gotten very spoiled by Git; many operations which take minutes under CVS now complete fast enough to leave you wondering if anything happened at all. This alone should be enough to convince anyone leaning towards Subversion or Bzr; fine-grained commits are only reasonable if the commit operation takes almost no time.

We were not particularly interested in the kind of massive distributed development model seen in the kernel, but the ability to work off-line (some of us spend an inordinate amount of time on airplanes) and still provide fine-grained detail about our work makes a purely central model less than ideal. Plus, the powerful merge operations that Git provides for the kernel developers are still useful in our environment, if not as heavily exercised.

I know Git suffers from its association with the wild and wooly kernel developers, but they’ve pushed this tool to the limits and it continues to shine. Right now, there’s nothing even close in performance, reliability and functionality. Yes, the user interface continues to need improvements. Small incremental changes have been made which make the tools more consistent, and I hope to see those discussions continue. Mostly, the developers respond to cogent requests (with code) from the user community; if you find the UI intolerable, fix it. But, know that while the UI improves, the underlying repository remains fast, stable and reliable.

And yes, Havoc, anyone seriously entertaining moving to SVN should have their heads examined.

Posted Sun Oct 22 19:58:44 2006
Happiness at Gnome Summit

Ah, the always enjoyable redeye from SFO; of course, filled with people who fly way more than I, so I got stuck back in coach. I’m used to it, and slept fitfully for the whole flight.

Then off to the Kendall hotel for a shower and breakfast. Tea. Lots of tea.

Jdub and Blizzard entertained the gathered masses for a few minutes, of course with jdub’s signature white-text-on-black-background presentation style. Someday he will change. We hope. It’s affected jdub, and we’re tired of Larry’s talks too.

Except for several complaints about the lack of coffee, it appears that the gnome summit is functioning normally. Good work guys. The best part is that the networking ‘just worked’; thanks MIT. One wishes Guadec would be so well piped.

Posted Sat Oct 7 08:42:35 2006
Jeremy - XFree86VidMode will soon be obsolete

Jeremy writes that he’s having trouble generating XCB descriptions for the XFree86VidMode extension. I believe we have a fine solution—”don’t bother”. Before applications get around to using XCB and XFree86VidMode together, I’m hoping we’ll have the hotpluggy sweetness version of RandR widely available instead, which should supplant this apparantly difficult to XCB-ify extension.

In other RandR news today, Eric has started restructuring the Intel driver to expose the full capabilities of the hardware and make it possible for RandR++ to achieve its full potential. I’ll be demonstrating whatever we’ve got working this weekend in Boston at the Gnome Summit, so if you’re coming, prepare to be stunned by our fabulous new CLI-based UI for monitor reconfiguration.

Posted Wed Oct 4 23:47:33 2006

All Entries