RSS Add a new post titled:

Chromium (the browser) and DRI3

I got a note on IRC a week ago that Chromium was crashing with DRI3.

The Google team working on Chromium eventually sent me a link to the bug report. That's secret Google stuff, so you won't be able to follow the link, even though it's a bug in a free software application when running on free software drivers.

There's a bug report in the freedesktop bugzilla which looks the same to me.

In both cases, the recommended “fix” was to switch from DRI3 back to DRI2. That's not exactly a great plan, given that DRI3 offers better security between GPU-using applications, which seems like a pretty nice thing to have when you're running random GL applications from the web.

Chromium Sandboxing

I'm not entirely sure how it works, but Chromium creates a process separate from the main browser engine to talk to the GPU. That process has very limited access to the operating system via some fancy library adventures. Presumably, the hope is that security bugs in the GL driver would be harder to leverage into a remote system exploit.

Debugging in this environment is a bit tricky as you can't simply run chromium under gdb and expect to be able to set breakpoints in the GL driver. Instead, you have to run chromium with a magic flag which causes the GPU process to pause before loading the driver so you can connect to it with gdb and debug from there, along with a flag that lets you see crashes within the gpu process and the usual flag that causes chromium to ignore the GPU black list which seems to always include the Intel driver for one reason or another:

$ chromium --gpu-startup-dialog --disable-gpu-watchdog --ignore-gpu-blacklist

Once Chromium starts up, it will print out a message telling you to attach gdb to the GPU process and send that process a SIGUSR1 to continue it. Now you can happily debug and get a stack trace when the crash occurs.

Locating the Bug

The bug manifested with a segfault at the first access to a DRI3-allocated buffer within the application. We've seen this problem in the past; whenever buffer allocation fails for some reason, the driver ignores the problem and attempts to de-reference through the (NULL) buffer pointer, causing a segfault. In this case, Chromium called glClear, which tried (and failed) to allocate a back buffer causing the i965 driver to subsequently segfault.

We should probably go fix the i965 driver to not segfault when buffer allocation fails, but that wouldn't provide a lot of additional information. What I have done is add some error messages in the DRI3 buffer allocation path which at least tell you why the buffer allocation failed. That patch has been merged to Mesa master, and should also get merged to the Mesa stable branch for the next stable release.

Once I had added the error messages, it was pretty easy to see what happened:

$ chromium --ignore-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted

The first two errors were just the sandbox preventing Mesa from using my GL configuration file. I'm not sure how that's a security problem, but it shouldn't harm the driver much.

The last error is where the problem lies. In Mesa, the DRI3 implementation uses a chunk of shared memory to hold a fence object that lets Mesa know when buffers are idle without using the X connection. That shared memory segment is allocated by creating a temporary file using the O_TMPFILE flag:

fd = open("/dev/shm", O_TMPFILE|O_RDWR|O_CLOEXEC|O_EXCL, 0666);

This call “cannot fail” as /dev/shm is used by glibc for shared memory objects, and must therefore be world writable on any glibc system. However, with the Chromium sandbox enabled, it returns EPERM.

Running Without a Sandbox

Now that the bug appears to be in the sandboxing code, we can re-test with the GPU sandbox disabled:

$ chromium --ignore-gpu-blacklist --disable-gpu-sandbox

And, indeed, without the sandbox getting in the way of allocating a shared memory segment, Chromium appears happy to use the Intel driver with DRI3.

Final Thoughts

I looked briefly at the Chromium sandbox code. It looks like it needs to know intimate details of the OpenGL implementation for every possible driver it runs on; it seems to contain a fixed list of all possible files and modes that the driver will pass to open(2). That seems incredibly fragile to me, especially when used in a general Linux desktop environment. Minor changes in how the GL driver operates can easily cause the browser to stop working.

Posted Tue Sep 30 23:51:25 2014 Tags:

Neil Anderson Flies EasyMega to 118k' At BALLS 23

Altus Metrum would like to congratulate Neil Anderson and Steve Cutonilli on the success the two stage rocket, “A Money Pit”, which flew on Saturday the 20th of September on an N5800 booster followed by an N1560 sustainer.

“A Money Pit” used two Altus Metrum EasyMega flight computers in the sustainer, each one configured to light the sustainer motor and deploy the drogue and main parachutes.

Safely Staged After a 7 Second Coast

After the booster burned out, the rocket coasted for 7 seconds to 250m/s, at which point EasyMega was programmed to light the sustainer. As a back-up, a timer was set to light the sustainer 8 seconds after the booster burn-out. In both cases, the sustainer ignition would have been inhibited if the rocket had tilted more than 20° from vertical. During the coast, the rocket flew from 736m to 3151m, with speed going from 422m/s down to 250m/s.

This long coast, made safe by EasyMega's quaternion-based tilt sensor, allowed this flight to reach a spectacular altitude.

Apogee Determined by Accelerometer

Above 100k', the MS5607 barometric sensor is out of range. However, as you can see from the graph, the barometric sensor continued to return useful data. EasyMega doesn't expect that to work, and automatically switched to accelerometer-only apogee determination mode.

Because off-vertical flight will under-estimate the time to apogee when using only an accelerometer, the EasyMega boards were programmed to wait for 10 seconds after apogee before deploying the drogue parachute. That turned out to be just about right; the graph shows the barometric data leveling off right as the apogee charges fired.

Fast Descent in Thin Air

Even with the drogue safely fired at apogee, the descent rate rose to over 200m/s in the rarefied air of the upper atmosphere. With increasing air density, the airframe slowed to 30m/s when the main parachute charge fired at 2000m. The larger main chute slowed the descent further to about 16m/s for landing.

Posted Mon Sep 22 21:33:01 2014 Tags:

A Forest of X Server Changes

We've got about another month left in the X server merge window for 1.17 and I've written a small set of fixes which haven't been reviewed yet for merging. I thought I'd advertise them a bit and see if I couldn't encourage a few of you to take a look and see if they're useful, correct and complete.

All of these are in my personal X server repository:

git://people.freedesktop.org/~keithp/xserver.git

Cleaning up the X Registry

Branch: registry-fixes

I'll bet most of you don't even know about this code. It serves as a database mapping various X enumerations to strings to aid in diagnostics. For the security extensions, SECURITY and XSELinux, it holds names for all of the request, event and errors in the core protocol and all registered extensions. For X-Resource, it has the names of the registered resource types.

The X registry gets the request, event and error data from a file, "protocol.txt", which is installed in /usr/lib/xorg/protocol.txt on my machine. It gets the resource names as a part of resource type allocation.

So, what's wrong with this? Three basic things:

  1. A simple bug -- protocol.txt is left open while the server runs. This consumes a file descriptor for no good reason.

  2. protocol.txt is read and parsed even if the security extensions aren't available. This wastes time and memory.

  3. The resource names are kept even if X-Resource isn't in use.

The fixes remove the configure options for including the registry code; these functions are only used by the above extensions, so we can tell whether to include the code based solely on whether the extensions are being built.

Getting rid of the TCP listener by default

Branch: listen-fixes

We've had the '-nolisten' option for a while now to disable inbound TCP connections. It's useful for security reasons, but we've never enabled this by default. This patch sequence provides configure options for each of the listen sockets (tcp, unix and local), leaves unix and local enabled by default and disables tcp by default.

A new option, '-listen', is added which allows the user to override the -nolisten defaults in case they actually want to use TCP connections to X.

Glamor bug fixes

branch: glamor-fixes

This branch fixes two bugs:

  1. Scale a large pixmap down to a small pixmap. This happens when you display enormous images in a web page. Iceweasel sends the whole huge image to X and uses Render to scale it to the screen. If the image is larger than a single texture, the X server splits it up into tiles, but the code which tries to perform the merged scale is just broken. Five patches fix this.

  2. Shader-based trapezoids. This code uses area coverage to compute trapezoids. That violates the Render spec, which requires point sampling. Further, the performance of these trapezoids is lower than software (by a lot). This one patch removes the code.

Present bug fixes

branch: present-fixes

A selection of small bug fixes:

  1. Clear pending flips at CloseScreen. This removes a reference to any pending flip pixmap, allowing it to be freed. Otherwise, we'll leak memory across server reset.

  2. Add support for PresentOptionCopy. This has been in the protocol spec for a while, and was completely trivial to implement. However, it never got done. One tiny little patch.

  3. Expose the Present API to drivers via sdksyms.sh. Until now, the present extension APIs have only been available inside the X server. This exposes them to drivers. This took a few cleanup patches first.

Use Present for Glamor XV

branch: glamor-present-xv

Painting XV to the screen should be done at vblank time to avoid tearing. Present offers vblank synchronized operations. Hooking those two together required a few new present APIs to expose the vblank functionality outside of the present code, then a bit of glamor code to hook up that new API to the XV bits.

Switching Glamor to a GL core profile context

branch: glamor-core-profile

This patch set is still in progress, but demonstrates how close we are. We'll be requiring OpenGL 3.3 for this so that we get texture swizzling, which is required for our single channel objects.

The changes present on the branch are:

  1. Switch single channel surfaces from GL_ALPHA to GL_RED.

  2. Use vertex array objects.

  3. Switch ephyr over to using a core 3.3 profile.

Still left to do is

  1. Switch Render code to VBOs

The core code uses VBOs everywhere, but the Render code doesn't. This means that all Render drawing fails, which makes the resulting server not very useful.

My main objective for getting this done is to reduce memory usage by about 16MB, which is the space allocated for software rendering in Mesa in case someone does something which the hardware doesn't handle, and that can only with some legacy OpenGL APIs.

Please help out!

All of these friendly little patches are looking for a bit of review so that they can get merged before the 1.17 window closes.

Posted Mon Sep 15 10:14:16 2014 Tags:

AltOS 1.5 — EasyMega support, features and bug fixes

Bdale and I are pleased to announce the release of AltOS version 1.5.

AltOS is the core of the software for all of the Altus Metrum products. It consists of firmware for our cc1111, STM32L151, LPC11U14 and ATtiny85 based electronics and Java-based ground station software.

This is a major release of AltOS, including support for our new EasyMega board and a host of new features and bug fixes

AltOS Firmware — EasyMega added, new features and fixes

Our new flight computer, EasyMega, is a TeleMega without any radios:

  • 9 DoF IMU (3 axis accelerometer, 3 axis gyroscope, 3 axis compass).

  • Orientation tracking using the gyroscopes (and quaternions, which are lots of fun!)

  • Four fully-programmable pyro channels, in addition to the usual apogee and main channels.

AltOS Changes

We've made a few improvements in the firmware:

  • The APRS secondary station identifier (SSID) is now configurable by the user. By default, it is set to the last digit of the serial number.

  • Continuity of the four programmable pyro channels on EasyMega and TeleMega is now indicated via the beeper. Four tones are sent out after the continuity indication for the apogee and main channels with high tones indicating continuity and low tones indicating an open circuit.

  • Configurable telemetry data rates. You can now select among 38400 (the previous value, and still the default), 9600 or 2400 bps. To take advantage of this, you'll need to reflash your TeleDongle or TeleBT.

AltOS Bug Fixes

We also fixed a few bugs in the firmware:

  • TeleGPS had separate flight logs, one for each time the unit was turned on. Turning the unit on to test stuff and turning it back off would consume one of the flight log 'slots' on the board; once all of the slots were full, no further logging would take place. Now, TeleGPS appends new data to an existing single log.

  • Increase the maximum computed altitude from 32767m to 2147483647m. Back when TeleMetrum v1.0 was designed, we never dreamed we'd be flying to 100k' or more. Now that's surprisingly common, and so we've increased the size of the altitude data values to fit modern rocketry needs.

  • Continuously evaluate pyro firing condition during delay period. The previous firmware would evaluate the pyro firing requirements, and once met, would delay by the indicated amount and then fire the channel. If the conditions had changed state, the channel would still fire. Now, the conditions are continuously evaluated during the delay period and if they change state, the event is suppressed.

  • Allow negative values in the pyro configuration. Now you can select a negative speed to indicate a descent rate or a negative acceleration value to indicate acceleration towards the ground.

AltosUI and TeleGPS — EasyMega support, OS integration and more

The AltosUI and TeleGPS applications have a few changes for this release:

  • EasyMega support. That was a simple matter of adapting the existing TeleMega support.

  • Added icons for our file types, and hooked up the file manager so that AltosUI, TeleGPS and/or MicroPeak are used to view any of our data files.

  • Configuration support for APRS SSIDs, and telemetry data rates.

Posted Sat Sep 13 11:47:42 2014 Tags:

Reworking Intel Glamor

The original Intel driver Glamor support was based on the notion that it would be better to have the Intel driver capture any fall backs and try to make them faster than Glamor could do internally. Now that Glamor has reasonably complete acceleration, and its fall backs aren't terrible, this isn't as useful as it once was, and because this uses Glamor in a weird way, we're making the Glamor code harder to maintain.

Fixing the Intel driver to not use Glamor in this way took a bit of effort; the UXA support is all tied into the overall operation of the driver.

Separating out UXA functions

The first task was to just identify which functions were UXA-specific by adding "_uxa" to their names. A couple dozen sed runs and now a bunch of the driver is looking better.

Next, a pile of UXA-specific functions were actually inside the non-UXA parts of the code. Those got moved out, and a new 'intel_uxa.h" file was created to hold all of the definitions.

Finally, a few non UXA-specific functions were actually in the uxa files; those got moved over to the generic code.

Removing the Glamor paths in UXA

Each one of the UXA functions had a little piece of code at the top like:

if (uxa_screen->info->flags & UXA_USE_GLAMOR) {
    int ok = 0;

    if (uxa_prepare_access(pDrawable, UXA_GLAMOR_ACCESS_RW)) {
        ok = glamor_fill_spans_nf(pDrawable,
                      pGC, n, ppt, pwidth, fSorted);
        uxa_finish_access(pDrawable, UXA_GLAMOR_ACCESS_RW);
    }

    if (!ok)
        goto fallback;

    return;
}

Pulling those out shrank the UXA code by quite a bit.

Selecting Acceleration (or not)

The intel driver only supported UXA before; Glamor was really just a slightly different mode for UXA. I switched the driver from using a bit in the UXA flags to having an 'accel' variable which could be one of three options:

  • ACCEL_GLAMOR.
  • ACCEL_UXA.
  • ACCEL_NONE

I added ACCEL_NONE to give us a dumb frame buffer mode. That actually supports DRI3 so that we can bring up Mesa and run it under X before we have any acceleration code ready; avoiding a dependency loop when doing new hardware. All that it requires is a kernel that offers mode setting and buffer allocation.

Initializing Glamor

With UXA no longer supporting Glamor, it was time to plug the Glamor support into the top of the driver. That meant changing a bunch of the entry points to select appropriate Glamor or UXA functionality, instead of just calling into UXA. So, now we've got lots of places that look like:

        switch (intel->accel) {
#if USE_GLAMOR
        case ACCEL_GLAMOR:
                if (!intel_glamor_create_screen_resources(screen))
                        return FALSE;
                break;
#endif
#if USE_UXA
        case ACCEL_UXA:
                if (!intel_uxa_create_screen_resources(screen))
                        return FALSE;
        break;
#endif
        case ACCEL_NONE:
                if (!intel_none_create_screen_resources(screen))
                        return FALSE;
                break;
        }

Using a switch means that we can easily elide code that isn't wanted in a particular build. Of course 'accel' is an enum, so places which are missing one of the required paths will cause a compiler warning.

It's not all perfectly clean yet; there are piles of UXA-only paths still.

Making It Build Without UXA

The final trick was to make the driver build without UXA turned on; that took several iterations before I had the symbols sorted out appropriately.

I built the driver with various acceleration options and then tried to count the lines of source code. What I did was just list the source files named in the driver binary itself. This skips all of the header files and the render program source code, and ignores the fact that there are a bunch of #ifdef's in the uxa directory selecting between uxa, glamor and none.

    Accel                    Lines          Size(B)
    -----------             ------          -------
    none                      7143            73039
    glamor                    7397            76540
    uxa                      25979           283777
    sna                     118832          1303904

    none legacy              14449           152480
    glamor legacy            14703           156125
    uxa legacy               33285           350685
    sna legacy              126138          1395231

The 'legacy' addition supports i810-class hardware, which is needed for a complete driver.

Along The Way, Enable Tiling for the Front Buffer

While hacking the code, I discovered that the initial frame buffer allocated for the screen was created without tiling (!) because a few parameters that depend on the GTT size were not initialized until after that frame buffer was allocated. I haven't analyzed what effect this has on performance.

Page Flipping and Resize

Page flipping (or just flipping) means switching the entire display from one frame buffer to another. It's generally the fastest way of updating the screen as you don't have to copy any bits.

The trick with flipping is that a client hands you a random pixmap and you need to stuff that into the KMS API. With UXA, that's pretty easy as all pixmaps are managed through the UXA API which knows which underlying kernel BO is tied with each pixmap. Using Glamor, only the underlying GL driver knows the mapping. Fortunately (?), we have the EGL Image extension, which lets us take a random GL texture and turn it into a file descriptor for a DMA-BUF kernel object. So, we have this cute little dance:

fd = glamor_fd_from_pixmap(screen,
                               pixmap,
                               &stride,
                               &size);


bo = drm_intel_bo_gem_create_from_prime(intel->bufmgr, fd, size);
    close(fd);
    intel_glamor_get_pixmap(pixmap)->bo = bo;

That last bit remembers the bo in some local memory so we don't have to do this more than once for each pixmap. glamor_fd_from_pixmap ends up calling eglCreateImageKHR followed by gbm_bo_import and then a kernel ioctl to convert a prime handle into an fd. It's all quite round-about, but it does seem to work just fine.

After I'd gotten Glamor mostly working, I tried a few OpenGL applications and discovered flipping wasn't working. That turned out to have an unexpected consequence -- all full-screen applications would run flat-out, and not be limited to frame rate. Present 'recovers' from a failed flip queue operation by immediately performing a CopyArea; not waiting for vblank. This needs to get fixed in Present by having it re-queued the CopyArea for the right time. What I did in the intel driver was to add a bunch more checks for tiling mode, pixmap stride and other things to catch pixmaps that were going to fail before the operation was queued and forcing them to fall back to CopyArea at the right time.

The second adventure was with XRandR. Glamor has an API to fix up the screen pixmap for a new frame buffer, but that pulls the size of the frame buffer out of the pixmap instead of out of the screen. XRandR leaves the pixmap size set to the old screen size during this call; fixing that just meant getting the pixmap size set correctly before calling into glamor. I think glamor should get fixed to use the screen size rather than the pixmap size.

Painting Root before Mode set

The X server has generally done initialization in one order:

  1. Create root pixmap
  2. Set video modes
  3. Paint root window

Recently, we've added a '-background none' option to the X server which causes it to set the root window background to none and have the driver fill in that pixmap with whatever contents were on the screen before the X server started.

In a pre-Glamor world, that was done by hacking the video driver to copy the frame buffer console contents to the root pixmap as it was created. The trouble here is that the root pixmap is created long before the upper layers of the X server are ready for drawing, so you can't use the core rendering paths. Instead, UXA had kludges to call directly into the acceleration functions.

What we really want though is to change the order of operations:

  1. Create root pixmap
  2. Paint root window
  3. Set video mode

That way, the normal root window painting operation will take care of getting the image ready before that pixmap is ever used for scanout. I can use regular core X rendering to get the original frame buffer contents into the root window, and even if we're not using -background none and are instead painting the root with some other pattern (like the root weave), I get that presented without an intervening black flash.

That turned out to be really easy -- just delay the call to I830EnterVT (which sets the modes) until the server is actually running. That required one additional kludge -- I needed to tell the DIX level RandR functions about the new modes; the mode setting operation used during server init doesn't call up into RandR as RandR lists the current configuration after the screen has been initialized, which is when the modes used to be set.

Calling xf86RandR12CreateScreenResources does the trick nicely. Getting the root window bits from fbcon, setting video modes and updating the RandR/Xinerama DIX info is now all done from the BlockHandler the first time it is called.

Performance

I ran the current glamor version of the intel driver with the master branch of the X server and there were not any huge differences since my last Glamor performance evaluation aside from GetImage. The reason is that UXA/Glamor never called Glamor's image functions, and the UXA GetImage is pretty slow. Using Mesa's image download turns out to have a huge performance benefit:

1. UXA/Glamor from April
2. Glamor from today

       1                 2                 Operation
------------   -------------------------   -------------------------
     50700.0        56300.0 (     1.110)   ShmGetImage 10x10 square 
     12600.0        26200.0 (     2.079)   ShmGetImage 100x100 square 
      1840.0         4250.0 (     2.310)   ShmGetImage 500x500 square 
      3290.0          202.0 (     0.061)   ShmGetImage XY 10x10 square 
        36.5          170.0 (     4.658)   ShmGetImage XY 100x100 square 
         1.5           56.4 (    37.600)   ShmGetImage XY 500x500 square 
     49800.0        50200.0 (     1.008)   GetImage 10x10 square 
      5690.0        19300.0 (     3.392)   GetImage 100x100 square 
       609.0         1360.0 (     2.233)   GetImage 500x500 square 
      3100.0          206.0 (     0.066)   GetImage XY 10x10 square 
        36.4          183.0 (     5.027)   GetImage XY 100x100 square 
         1.5           55.4 (    36.933)   GetImage XY 500x500 square 

Running UXA from today the situation is even more dire; I suspect that enabling tiling has made CPU reads through the GTT even worse than before?

1: UXA today
2: Glamor today

       1                 2                 Operation
------------   -------------------------   -------------------------
     43200.0        56300.0 (     1.303)   ShmGetImage 10x10 square 
      2600.0        26200.0 (    10.077)   ShmGetImage 100x100 square 
       130.0         4250.0 (    32.692)   ShmGetImage 500x500 square 
      3260.0          202.0 (     0.062)   ShmGetImage XY 10x10 square 
        36.7          170.0 (     4.632)   ShmGetImage XY 100x100 square 
         1.5           56.4 (    37.600)   ShmGetImage XY 500x500 square 
     41700.0        50200.0 (     1.204)   GetImage 10x10 square 
      2520.0        19300.0 (     7.659)   GetImage 100x100 square 
       125.0         1360.0 (    10.880)   GetImage 500x500 square 
      3150.0          206.0 (     0.065)   GetImage XY 10x10 square 
        36.1          183.0 (     5.069)   GetImage XY 100x100 square 
         1.5           55.4 (    36.933)   GetImage XY 500x500 square 

Of course, this is all just x11perf, which doesn't represent real applications at all well. However, there are applications which end up doing more GetImage than would seem reasonable, and it's nice to have this kind of speed up.

Status

I'm running this on my crash box to get some performance numbers and continue testing it. I'll switch my desktop over when I feel a bit more comfortable with how it's working. But, I think it's feature complete at this point.

Where's the Code

As usual, the code is in my personal repository. It's on the 'glamor' branch.

git://people.freedesktop.org/~keithp/xf86-video-intel  glamor
Posted Mon Jul 21 00:39:04 2014 Tags:

AltOS 1.4.1 — Fix ups for 1.4

Bdale and I are pleased to announce the release of AltOS version 1.4.1.

AltOS is the core of the software for all of the Altus Metrum products. It consists of firmware for our cc1111, STM32L151, LPC11U14 and ATtiny85 based electronics and Java-based ground station software.

This is a minor release of AltOS, incorporating a small handful of build and install issues. No new features have been added, and the only firmware change was to make sure that updated TeleMetrum v2.0 firmware is included in this release.

AltOS — TeleMetrum v2.0 firmware included

AltOS version 1.4 shipped without updated firmware for TeleMetrum v2.0. There are a couple of useful new features and bug fixes in that version, so if you have a TeleMetrum v2.0 board with older firmware, you should download this release and update it.

AltosUI and TeleGPS — Signed Windows Drivers, faster maps downloading

We finally figured out how to get our Windows drivers signed making it easier for Windows 7 and 8 users to install our software and use our devices.

Also for Windows users, we've fixed the Java version detection so that if you have Java 8 already installed, AltOS and TeleGPS won't try to download Java 7 and install that. We also fixed the Java download path so that if you have no Java installed, we'll download a working version of Java 6 instead of using an invalid Java 7 download URL.

Finally, for everyone, we fixed maps downloading to use the authorized Google API key method for getting map tiles. This makes map downloading faster and more reliable.

Thanks for flying with Altus Metrum!

Posted Tue Jun 24 22:35:08 2014 Tags:

TeleGPS Battery Life

I charged up one of the "160mAh" batteries that we sell. (The ones we've got now are labeled 200mAh; the 160mAh rating is something like a minimum that we expect to be able to ever get at that size.)

I connected the battery to a TeleGPS board, hooked up a telemetry monitoring setup on my laptop and set the device in the window of my office. This let me watch the battery voltage through the day without interrupting my other work. Of course, because the telemetry was logged to a file, I've now got a complete plot of the voltage data:

It looks like a pretty typical lithium polymer discharge graph; slightly faster drop from the 4.1V full charge voltage down to about 3.9V, then a gradual drop to 3.65 at which point it starts to dive as the battery is nearly discharged.

Because we run the electronics at 3.3V, and the LDO has a dropout of about 100mV, it's best if the battery stays above 3.4V. That occurred at around 21500 seconds of run time, or almost exactly six hours.

We also have an "850mAh" battery in the shop; I'd expect that to last a bit more than four times as long, or about a day. Maybe I'll get bored enough at some point to hook one up and verify that guess.

Posted Tue Jun 17 19:23:49 2014 Tags:

AltOS 1.4 — TeleGPS support, features and bug fixes

Bdale and I are pleased to announce the release of AltOS version 1.4.

AltOS is the core of the software for all of the Altus Metrum products. It consists of firmware for our cc1111, STM32L151, LPC11U14 and ATtiny85 based electronics and Java-based ground station software.

This is a major release of AltOS, including support for our new TeleGPS board and a host of new features and bug fixes

AltOS Firmware — TeleGPS added, new features and fixes

Our new tracker, TeleGPS, works quite differently than a flight computer

  • Starts tracking and logging at power-on

  • Disables RF and logging only when connected to USB

  • Doesn't log position when it isn't moving for a long time.

TeleGPS transmits our digital telemetry protocol, APRS and radio direction finding beacons.

For TeleMega, we've made the firing time for the additional pyro channels (A-D) configurable, in case the default (50ms) isn't long enough.

AltOS Beeping Changes

The three-beep startup tones have been replaced with a report of the current battery voltage. This is nice on all of the board, but particularly useful with EasyMini which doesn't have the benefit of telemetry reporting its state.

We also changed the other state tones to "Farnsworth" spacing. This makes them all faster, and easier to distinguish from the numeric reports of voltage and altitude.

Finally, we've added the ability to change the frequency of the beeper tones. This is nice when you have two Altus Metrum flight computers in the same ebay and want to be able to tell the beeps apart.

AltOS Bug Fixes

Fixed a bug which prevented you from using TeleMega's extra pyro channel 'Flight State After' configuration value.

AltOS 1.3.2 on TeleMetrum v2.0 and TeleMega would reset the flight number to 2 after erasing flights; that's been fixed.

AltosUI — New Maps, igniter tab and a few fixes

With TeleGPS tracks now potentially ranging over a much wider area than a typical rocket flight, the Maps interface has been updated to include zooming and multiple map styles. It also now uses less memory, which should make it work on a wider range of systems.

For TeleMega, we've added an 'Igniter' tab to the flight monitor interface so you can check voltages on the extra pyro channels before pushing the button.

We're hoping that the new Maps interface will load and run on machines with limited memory for Java applications; please let us know if this changes anything for you.

TeleGPS — All new application just for TeleGPS

While TeleGPS shares the same telemetry and data logging capabilities as all of the Altus Metrum flight computers, its use as a tracker is expected to be both broader and simpler than the rocketry-specific systems. We've build a custom TeleGPS application that incorporates the mapping and data visualization aspects of AltosUI, but eliminates all of the rocketry-specific flight state tracking.

Posted Sun Jun 15 19:48:15 2014 Tags:

Current Glamor Performance

I finally managed to get a Gigabyte Brix set up running Debian so that I could do some more reasonable performance characterization of Glamor in its current state. I wanted to use this particular machine because it has enough cooling to keep from thermally throttling the CPU/GPU package.

This is running my glamor-server branch of the X server, which completes the core operation rework and then has some core X server performance improvements as well for filled and outlined arcs.

Changes in X11perf

First off, I did some analysis of what x11perf was doing and found that it wasn't quite measuring what we thought. I'm not interested in competing on x11perf numbers absolutely, I'm only interested in doing relative measurements of useful operations, so fixing the tool to measure what I want seems reasonable to me.

When x11perf was first written, it drew 100x100 rectangles tight against one another without any gap. And, it filled the window with them, drawing a 6x6 grid of 100x100 rectangles in a 600x600 window. To better exercise the rectangle code and check edge conditions better, we added a one pixel gap between the rectangles. However, we didn't reduce the number of rectangles drawn, so we ended up drawing 11 of the 36 rectangles on top of the first set of 25. Simple region computations would allow the X server to draw only 25 most of the time, skipping the redundant rectangles.

The vertical and horizontal line tests were added a while after the first set of tests, and were done without regard to how an X server might optimize for them. x11perf draws these lines packed tightly together, creating a single square of pixels for the result.

EXA, UXA and SNA all take vertical and horizontal lines and convert them to rectangles, then take the rectangles and clip them against the window clip list by computing a region from them and intersecting that with the GC composite clip. It's a completely reasonable plan, however, when you take what x11perf was drawing and run it through this code, you end up with a single solid rectangle. Which is surprisingly fast, compared with drawing individual lines.

I "fixed" the overlapping rectangle case by reducing the number of boxes drawn from 36 to 25, and I fixed the vertical and horizontal line case by spacing the lines a pixel apart.

I've pushed out these changes to my x11perf repository on freedesktop.org.

What's Fast

Things that match GL's capabilities are fast, things which don't are slow. No surprises there. What's interesting is precisely what matches GL

Patterns For Free

Because GL makes it easy to program fill patterns into the GPU, there are essentially no performance differences between solid and patterned operations.

GL Lines

Glamor uses GL lines, which can be programmed to match X semantics, to quite good effect. The only trick required was to deal with cap styles. GL never draws the final pixel in a line, while X does unless the cap style is CapNotLast. The solution was to draw an extra line segment containing a single pixel at the end of every joined set of lines for this case.

The other implicit requirement is that all zero width lines look the same. Right now, I've solved that for fill styles and raster ops as they're all drawn with the same GL operations. However, for plane masks, we're currently falling back to software, which may draw something different. Fixing that isn't impossible, it's just tedious.

Text

Pushing all of the work of drawing core text into glamor wasn't terribly difficult, and the results are pretty spectacular.

What's Slow

We've still got room for improvement in Glamor, but there aren't any obvious show-stoppers to getting great performance for reasonable X applications anymore.

Wide Lines and Arcs

One of the speed-ups I've made in my glamor branch is to merge all of drawing of multiple filled and zero-width arcs into a single giant GL request. That turned out to both improve performance and save a bit of code. Right now, drawing wide lines and wide arcs doesn't do this, and so we suffer from submitting many smaller requests to GL. It's hard to get excited about speeding any of this up as all of the wide primitives are essentially unused these days.

Filled Polygons

Because X only lets applications draw a single polygon in each request, Glamor can't really gain any efficiency from batching work unless we start looking ahead in the X protocol stream to see if the next request is another polygon. Alternatively, we could leave the span operation pending to see if more spans were coming before the X serve went idle. Neither of these is all that exciting though; X polygons just aren't that useful.

Render Operations

These are still not structured to best fit modern GL; some work here would help a bunch. We've got a gsoc student ready to go at this though, so I expect we'll have much better numbers in a few months.

Window Operations

You wouldn't think that moving and resizing windows would be so limited by drawing performance, but x11perf tests these with tiny little windows, and each operation draws or copies only a couple of little rectangles, which makes GL quite expensive. Working on speeding up GL for small numbers of operations would help a bunch here.

Unexpected Results

Solid rectangles are actually running slower than patterned rectangles, and I really have no idea why. The CPU is close to idle during the 500x500 solid rectangle test (as you'd expect, given the workload), the vertex and fragment shaders look correct out of the compiler, and yet solid rectangles run at only 0.80 of the performance of the patterned rectangles.

GL semantics for copying data essentially preclude overlapping blts of any form. There's the NV_texture_barrier extension which at least lets us do blts within the same object, but even that doesn't define how overlapping blts work. So, we have to create a temporary copy for this operation to make it work right. I was worried that this would slow things down, but the Iris Pro 3D engine is enough faster than the 2D engine that even with the extra copy, large scrolls and copies within the same object are actually faster.

Results

Here's a giant image showing the ratio of Glamor to both UXA and SNA running on the same machine, with all of the same software; the only change between runs was to switch the configured acceleration architecture.

Posted Sun Apr 27 01:46:50 2014 Tags:

Java Sound on Linux

I'm often in the position of having my favorite Java program (AltosUI) unable to make any sounds. Here's a history of the various adventures I've had.

Java and PulseAudio ALSA support

When we started playing with Java a few years ago, we discovered that if PulseAudio were enabled, Java wouldn't make any sound. Presumably, that was because the ALSA emulation layer offered by PulseAudio wasn't capable of supporting Java.

The fix for that was to make sure pulseaudio would never run. That's harder than it seems; pulseaudio is like the living dead; rising from the grave every time you kill it. As it's nearly impossible to install any desktop applications without gaining a bogus dependency on pulseaudio, the solution that works best is to make sure dpkg never manages to actually install the program with dpkg-divert:

# dpkg-divert --rename /usr/bin/pulseaudio

With this in place, Java was a happy camper for a long time.

Java and PulseAudio Native support

More recently, Java has apparently gained some native PulseAudio support in some fashion. Of course, I couldn't actually get it to work, even after running the PulseAudio daemon but some kind Debian developer decided that sound should be broken by default for all Java applications and selected the PulseAudio back-end in the Java audio configuration file.

Fixing that involved learning about said Java audio configuration file and then applying patch to revert the Debian packaging damage.

$ cat /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/sound.properties
...
#javax.sound.sampled.Clip=org.classpath.icedtea.pulseaudio.PulseAudioMixerProvider
#javax.sound.sampled.Port=org.classpath.icedtea.pulseaudio.PulseAudioMixerProvider
#javax.sound.sampled.SourceDataLine=org.classpath.icedtea.pulseaudio.PulseAudioMixerProvider
#javax.sound.sampled.TargetDataLine=org.classpath.icedtea.pulseaudio.PulseAudioMixerProvider

javax.sound.sampled.Clip=com.sun.media.sound.DirectAudioDeviceProvider
javax.sound.sampled.Port=com.sun.media.sound.PortMixerProvider
javax.sound.sampled.SourceDataLine=com.sun.media.sound.DirectAudioDeviceProvider
javax.sound.sampled.TargetDataLine=com.sun.media.sound.DirectAudioDeviceProvider

You can see the PulseAudio mistakes at the top of that listing, with the corrected native interface settings at the bottom.

Java and single-open ALSA drivers

It used to be that ALSA drivers could support multiple applications having the device open at the same time. Those with hardware mixing would use that to merge the streams together; those without hardware mixing might do that in the kernel itself. While the latter is probably not a great plan, it did make ALSA a lot more friendly to users.

My new laptop is not friendly, and returns EBUSY when you try to open the PCM device more than once.

After downloading the jdk and alsa library sources, I figured out that Java was trying to open the PCM device multiple times when using the standard Java sound API in the simplest possible way. I thought I was going to have to fix Java, when I figured out that ALSA provides user-space mixing with the 'dmix' plugin. I enabled that on my machine and now all was well.

$ cat /etc/asound.conf
pcm.!default {
    type plug
    slave.pcm "dmixer"
}

pcm.dmixer  {
    type dmix
    ipc_key 1024
    slave {
        pcm "hw:1,0"
        period_time 0
        period_size 1024
        buffer_size 4096
        rate 44100
    }
    bindings {
        0 0
        1 1
    }
}

ctl.dmixer {
    type hw
    card 1
}

ctl.!default {
    type hw
    card 1
}

As you can see, my sound card is not number 0, it's number 1, so if your card is a different number, you'll have to adapt as necessary.

Posted Sat Apr 5 22:30:55 2014 Tags:

All Entries