SAMD21 USB vs Windows 7

I'm mostly used to USB being really hard to get working on a new SoC, everything from generating a stable 48MHz clock to diving through thousands of register definitions to get the device programmed to receive that first SETUP packet. However, I'm used to having that part be the hardest section of the work, and once the first SETUP packet has been received and responded to successfully, it's usually down hill from there.

Not this time.

I've written about Snek on the SAMD21G18A before, and this is about the same board. USB on this device is medium-complicated, as the device supports both host and device modes, plus has a range of 'optimizations' which always makes simple operation harder. It took a few hours of hacking to get SETUP packets flowing, but after that (at least when talking to Linux and Mac OS X), the rest of the USB driver was pretty simple.

Enter Windows 7

I'm pushing towards a Snek 1.0 release and was testing snekde on Windows 7. It's working great with the classic Arduino Duemilanove, but when I plugged in the Metro M0 board, it got stuck after I typed one character. "That's Odd", I thought.

I figured it'd be a simple matter of a stuck interrupt or other minor mistake in the SAMD21 USB driver that I wrote. So, I broke out my trusty Beagle USB analyzer to see where the USB link was getting stuck.

IN-NAK ... IN DATAx ...

USB is an odd protocol; data from the device to the host has to sit in the device waiting for the host to come and ask for it. When the device is in use, the host polls for data by sending an IN packet. When there's no data to send back, the device sends a NAK reply. When there is data, the device sends a DATAx packet and the host replies with an ACK packet.

In my case, the host sends thousands of IN packets waiting for data, and the device responds with an equally huge number of NAK packets. The first time data was queued from the device to the host, the device responded to the IN packet with a DATAx packet and the host ACK'd that. After that, the host never sent another IN packet again. It would happily send it's own data using OUT packets, and the device would receive that data, and of course the usual stream of SOF (start of frame) packets were streaming along. But, not a single IN packet to be seen.

Differential Debugging

Well, I've got a lot of USB devices around here, so I hooked up one of our TeleBTv3.0 devices. That worked just fine, which was good as we've sold hundreds of those and it would kinda suck to discover that some Windows boxes weren't compatible.

A visual examination of the traces as seen captured by the Beagle analyzer didn't show anything obvious. But, it's often the little details that break things.

So, I hacked up the SAMD21 board to appear to be the same device as the TeleBT -- same VID/PID, same names, same serial number. Everything.

Now windows can't seem to tell the difference. It uses the same COM port for both at least.

I devised a simple test — plug-in the device, start PuTTY and then type two characters ('a', or 0x61). Because both devices echo whatever you send to them, this means I should get two characters back. Because they're typed separately, those two characters will be sent in separate OUT transactions, and the echos should be sent back in two IN transactions.

I captured traces from both devices:

TeleBT-v3.0 (STM32L151):

Metro M0 (SAMD21G18A):

The 'trimmed' versions elide timing and packet sequence information which can't be easily replicated exactly between the two tests; that "can't" matter, at least according to my understanding of USB. With those versions, I can do a text diff of the packet traces to find that, aside from a different number of SOF and IN-NAK transactions, the only difference appears at the end

$ diff -u stm32l.trim samd21.trim | tail +231
 0  1 B  01 04 OUT txn 61   
 1  3 B  01 04    OUT packet E1 01 BA   
 1  4 B  01 04    DATA0 packet C3 61 81 57   
 1  1 B  01 04    ACK packet D2   
-0  1 B  01 05 IN txn   [57536 POLL] 61   
-1    01 05    [57536 IN-NAK]    
+0  1 B  01 05 IN txn   [50387 POLL] 61   
+1    01 05    [50387 IN-NAK]    
 1  3 B  01 05    IN packet 69 81 0A   
 1  4 B  01 05    DATA0 packet C3 61 81 57   
 1  1 B  01 05    ACK packet D2   
-0      [1004 SOF]  [Frames: 853 - 1856]   
+0      [2000 SOF]  [Frames: 138 - 89] [Periodic Timeout]  
+0      [2000 SOF]  [Frames: 90 - 41] [Periodic Timeout]  
+0      [572 SOF]  [Frames: 42 - 613]   
 0  1 B  01 04 OUT txn 61   
 1  3 B  01 04    OUT packet E1 01 BA   
 1  4 B  01 04    DATA1 packet 4B 61 81 57   
 1  1 B  01 04    ACK packet D2   
-0  1 B  01 05 IN txn   [83901 POLL] 61   
-1    01 05    [83901 IN-NAK]    
-1  3 B  01 05    IN packet 69 81 0A   
-1  4 B  01 05    DATA1 packet 4B 61 81 57   
-1  1 B  01 05    ACK packet D2   
-0    01 01 [16 IN-NAK]  [Periodic Timeout]  
-0    01 05 [178185 IN-NAK]  [Periodic Timeout]  
-0      [2000 SOF]  [Frames: 1857 - 1808] [Periodic Timeout]  
-0    01 01 [16 IN-NAK]  [Periodic Timeout]  
-0    01 05 [147487 IN-NAK]  [Periodic Timeout]  
-0      [2000 SOF]  [Frames: 1809 - 1760] [Periodic Timeout]  
-0      [474 SOF]  [Frames: 1761 - 186]   
-0    01 05 [34876 IN-NAK]    
-0   ! 01 05 [1 ORPHANED]    
-1   U 01 05    [1 IN]    
-0    01 01 [16 IN-NAK]    
-0      Capture stopped  [Sun 31 Mar 2019 02:25:32 PM PDT]  
+0      [2000 SOF]  [Frames: 614 - 565] [Periodic Timeout]  
+0      [1163 SOF]  [Frames: 566 - 1728]   
+0      Capture stopped  [Sun 31 Mar 2019 02:36:23 PM PDT]  

You can see both boards receiving the first 'a' character and then send that back. Then both boards receive the second 'a' character, but only the stm32l gets the IN packets which it can respond with the DATAx packet containing the 'a' character. The samd21 board gets only SOF packets.

Next Steps?

I'm heading out of town on Tuesday to help with the NASA Student Launch, so I think I'll let this sit until I get back. Maybe I'll come up with a new debugging idea, or maybe I'll hear about a fancier USB monitoring device that might capture details that I'm missing.

Anyone with suggestions or comments is welcome to send them along; I'd like to get this bug squashed and finish the rest of the Snek 1.0 release process.