With our channel-interleaving mostly sorted out, Eric and I spent a short time figuring out what to do about it all. The important thing we learned is that the hardware has two modes: linear and tiled. The CPU always accesses memory in linear mode, while the GPU can either use linear or tiled mode. What is important here is that 'linear' mode uses the same interleaving, whether from the GPU or CPU. So, we can only get into trouble if we use tiled mode from the GPU and linear mode from the CPU.

We came up with a fairly simple plan to resolve this issue:

  • Figure out how to automatically detect the precise interleaving configuration of the system. There is a MCH BAR holding the data, and so far most (but not all) machines appear to report what we expect.

  • Add an IOCTL to propose tiling to the kernel, letting the kernel reject the proposal. This gives us a hook to pass back whatever channel interleaving is necessary. When the memory configuration cannot be supported in tiling mode, the call fails and user mode reverts to linear allocations. This will reduce performance, so we want to do it as little as possible.

    A side benefit here is that we get a reliable way of saving tiling data for each buffer -- until now, we've stored that in the sarea for front/back/depth, but hadn't any general plan.

  • For now, fail tiling requests on hardware that uses bit 17 in linear mode, or hardware with an “L” shaped memory configuration. Of course, it would be nice if we could find an ”L”-shaped machine to test with.

  • Add return data to this ioctl to report what address bits to stir into the channel select (bit 6) value. Now user-space needn't change when the hardware configuration does.

  • Provide a way to map buffers through the GTT using the fence registers to de-tile them. This will make tiled buffers appear linear to the CPU. This is required to let us tile X pixmaps as we would otherwise need to use wfb instead of fb for software rendering. We've wanted to do this for a while; it's nice to have the option of using GTT mappings in any case -- writes are less hassle as the WC mapping doesn't require explicit cache flushing. Giving applications the option seems like the best way forward. Of course, when the GTT fills, these mappings will go away. When the application touches a page, it will fault the object back into the GTT.

  • Maybe someday provide enough information back to user space to deal with page-level interleaving information. For bit-17 configurations, we'd need to report back the physical bit-17 information for each page. For “L”-shaped configurations, we'd have to return back whether each page was interleaved at all. Attempting to make this work with paging seems really hard though -- you'd have to create some kind of atomic section where user-space would read the interleave information, compute pixel addresses and access the data. Icky.

On Wednesday and Thursday, Jesse Barnes, Zou Nan Hai and I got to attend the 2008 Intel Gfxcon. Put on by Intel GDG (no, I don't know what that means, but it's the Intel integrated graphics group), it was held in Folsom, CA. The trip down on the Intel shuttle was uneventful, but on arrival I found the 42° air full of smoke from wild fires. My plan to bicycle from the hotel to Intel suddenly seemed like it would be a lot less fun.

The conference was huge fun. As usual, meeting people in person is always better than via email or even on a teleconference. After working at Intel for nearly three years now, I'm starting to feel a bit more a part of the organization and less of an outsider. Linux continues to gain mindshare within the company as it gains visibility in our customers' products.

I got to present our current GEM work to a fair-sized crowd, including several people from our Windows driver development team. I was interested to hear what they thought about the architecture and was pleased to learn that a lot of what we're doing is similar to how the Vista driver works. While we can't share source code, it is at least nice that we can share ideas about how best to drive the hardware at the lowest levels of the system.

That evening, Jesse, Nan Hai and I managed to find decent steak-and-potatoes, but our attempt to locate gelato ended in near-failure -- Google led us to a mini-mall in a neighboring town. Upon failing to locate the expected restaurant, we enquired in an Italian place who first told us that the gelato place had closed five years earlier, and then convinced us that they could provide gelato. Served freezer-burnt ice cream, we quickly left for our respective lodgings.

The next evening we found acceptable California-style Mexican food. As usual, portions were large enough to push any thought of desert from our minds. We lumbered back to Nan Hai's hotel room and hacked for several hours, although the free wifi there left a lot to be desired. Then, we chatted about how to get rid of tearing in textured video.

For vblank synchronized textured video, I'm hoping we'll be able to queue the update to the kernel and have it perform the necessary blts. This would mean interrupting the current command stream and switching over to a separate stream. It probably means using separate hardware contexts, which would be a good thing in any case as we could eliminate the per-batch-buffer configuration that the 3D driver currently performs. Work on this will need to start with multiple hardware context support, then move on to interrupting the ring and then figuring out how to manage the blts along with clip list changes etc.