As I’ve written, for the last few weeks I’ve been working (mostly part time) with a small team on developing a Linux driver for the Gadgetlabs GL824 sound card. Linux audio is a new world for me but with the help of a few others in the group, I’ve learned a lot about how modules are set up and loaded in the 2.6 kernel and generally how things all hang together in the Linux world.
I’ve put most of my focus in the sound engine, as the performance challenges of shifting up to 48000 samples @ 4 bytes per sample * 8 channels * 2 (in/out) = about 3 megabytes per second really intruiged me.
Given the GL uses an Altera in its sound engine, it brought back a lot of fond memories of the projects I’d been involved with at the Institute for Telecommunications Research, particularly with high speed satellite modems.
Optimistically, I set things up so data is moved in just 64 sample blocks per channel, meaning a theoretical latency of only 1.33ms!
One of the challenges of the project is that ALSA is optimised for use with hardware which can DMA into buffers that the driver sets up. The GL doesn’t work this way; it has its own hardware buffer (mapped into IO space) and ALSA’s promised support for this configuration hasn’t worked as we expected. I’ve ended up doing things by having the interrupt handler simulate DMA in and out of the ALSA buffer. For this to work, ALSA has to be using a buffer configuration that easily maps to the GL’s fixed buffer.
Now the tricky part. I wanted the driver to be able to play 16bit 44.1K stereo as well as 24 bit content, and allow multiple clients to use a subset of the channels from the card simultaneously. Given that each ALSA substream uses one buffer for all the channels being asked of it, vs. the GL’s buffer-per-channel approach, it was difficult coming up with a single ALSA buffer configuration that worked in all the modes.
A fixed 24 bit 8 channel mode is trivial if we wanted to give up.
I ended up coming up with a solution where the driver defines multiple Linux “subdevices” for the card. These can be opened in any combination until all the channels are used up. The different subdevices support configurations including stereo 16 bit, stereo 24 bit, 8 channel 16 bit and of course 8 channel 24 bit. I also included support for 32 bit samples since Jack seems to not care about 24 bit samples.
As long as all the substreams for the card are set at the same sample rate (theres only one sample clock), things can be very flexible. It does make the user have to think a bit though in order to pick the right subdevice for a given desired configuration.
Given the lack of hardware DMA, shifting the data becomes very interesting. I started by writing x86 assembly for the transfer routines (a first for me) but later changed it to C, in the spirit of being cross platform. Because of the card’s architecture, you can only transfer data in short bursts, after which you need to do a read-back. I ended up using unrolled loops for maximum speed/pipelineabiltiy, an old trick I’ve used in the past. This gives pages and pages of gibberish like:
The various combinations of sample format and interleaving mean quite a few combinations of these unrolled loops.
My system seems to cope well with the interrupt rate. The basic data pumping only uses about 1% kernel CPU. However we’ve found that the higher level audio layers (Jack) seem to require more buffers than I’d hoped (at least 10 for me). I’m sure its because I haven’t been able to enable the realtime kernel option in the Jack layer, something isn’t installed, so the Linux scheduler gets in the way.
Apart from the buffers issue, my card is working well with aplay/arecord, Audacious (Winamp clone), Ardour (8 channel record/playback at 24 bits), Hydrogen (drum machine) and even, with a bit of persuasion, making it a default linux sound device. The apps seem to work better going straight to the card than through Jack – need to get my realtime kernel issue sorted out.
I’m currently using Fedora Core 8, with the CCRMA realtime kernel patches. This weekend I gave 64Studio a try, but couldn’t get the kernel module for the card to install, even though it compiled fine with the kernel’s headers. There seems an inconsistency in the published kernel headers and the core sound modules being used. Its possibly because I’m not building within the ALSA framework. Given the age of the kernel in 64Studio, I’ve decided to go back to CCRMA as its really on the bleeding edge and I’m getting used to Fedora, even after being a long time Debian user.
So… basically, its working! Still waiting for more people to confirm performance on different machines. I’m starting to think that 1.4ms latency is overkill and I might halve the interrupt rate. I’ll see how things go with the realtime kernel mode before making that decision.
There’s still some way to go before we’re finished. The card has a 16550 UART for MIDI. Its working but we’ve currently got it running off the audio interrupt in polled mode, which isn’t ideal. We need more validation against incorrect modes for the subdevices. Controls need to be defined for the card’s mute/level/monitoring switches. Then there is the possibility of syncing multiple cards (they can be physically linked to share a clock) and maybe support of the 4 channel variants of the GL824.
Once we’re done, we intend to roll it into the ALSA library, and hopefully have it accepted.
I must mention that one of the reasons that the Linux driver came together so quickly is that we had access to the source of the Win32 driver developed by Waldemar aka. “Mostek”. His redesigned Altera code made things a lot more straightfoward than with the original firmware that the cards shipped with, notwithstanding the very different ways of doing things in the Linux ALSA and the MSWindows WDM/ASIO audio models.