Playing the arp

I thought it would be interesting to have a small linux VM running on the ION HTPC. Nothing heavy, just an SSH server. I installed VMWare player from my 7.1 VMWare Workstation install and copied across my linux VM.

It booted but there was a bizarre networking problem. I could ping from the guest (VM) to other machines on my network except its host, for which it would report destination unreachable. The host could ping it and I discovered that after the host pinged the guest, the guest could ping the host, for a while. Then it became “unreachable” again. It wasn’t a host firewall problem.

The same VM was fine on my main PC, both setups were using a bridged network. The only difference was that the ION is using a wireless network.

Some searching found this very useful thread:

It turns out that the guest is having problems finding routing information at the ethernet level. It does not know how to reach the host which is sharing the same network adapter. The guest sends out an arp request but the response it gets has the IP address of the host, probably related to them being bridged on the same device on a wireless network.

The solution is simple, in rc.local add an ‘arp’ command to create an association between the host’s IP address and its adapter MAC.

This worked for me:
arp -s yy:yy:yy:yy:yy:yy
where is the IP address of the host and yy:yy:yy:yy:yy:yy is the MAC address of the host.

Interestingly, if I used the MAC address of the guest, it also worked! Not surprising since host and guest are bridged. But pings were about 150us quicker if I used the host MAC address.

Speaking of speed, given the ION is no speed demon and has no hardware VM support, the VM is not exactly fast. It feels like the speed I used to get with VMs on a PIII; good enough for lightweight linux stuff and data rates < 2MB/second.

VMWare under Fedora FC8 + CCRMA Realtime Kernel

Getting VMWare running under a realtime kernel:

Ensure you have gcc and gcc-c++ installed.

Download VMWare player from VMWare’s site. I downloaded ‘VMware-player-2.0.2-59824.i386.rpm’.

From here download the patch

# yum install  VMware-player-2.0.2-59824.i386.rpm

Extract the any-any patch and run it.  It will prompt to run the configuration script. All the defaults worked for me and a link got created for vmplayer in the Applications->System Tools menu.

I was running the VM off an NTFS mount. To get it to allocate memory properly, I had to add this setting to the vmx file:


Replaced Matrox Parhelia with NVidia

My Matrox Parhelia 128 had been a reasonable 2D card under Windows but was becoming more of a struggle to keep going under Linux. Matrox’s little supported Linux drivers did not support the key feature of the card that I wanted – hardware zoom – and I couldn’t get Compiz Fusion (which has a nice software zoom) to work with it.

I bought an NVidia 7600GT to replace it; hardly the latest graphics powerhouse but I thought it would be more than good enough for running the Compiz desktop under Linux and maybe even the odd game.

The NVidia XP drivers support hardware zoom under Windows, so that feature was covered. Under Linux I eventually got Compiz going and its software zoom and other shiny graphical features are nice.

Hardware wise, the first thing I noticed was that the big bright fan heatsink was amazingly close to my GL824 card in the adjacent slot. I wedged a small plastic separator between the cards to make sure they dont bump.

Second thing I noticed is that the heatsink gets hot even if I’m not running the 3D desktop or doing anything graphic intensive. On a warm day it sat around 76 degrees. I got it down to 72 by underclocking the card (see below). I’ve then bumped up the rear case fan speed (making the machine more noisy), the temperature sits in the 60s now.

Guess I was too used to the power (and performance) miserly Matrox.

One thing that has me stumped; when X isn’t running, the card goes crazy heat wise, it gets to about 85C! What is it doing when X isn’t running? Is this a bug? I’ll have to try it under Windows when its sitting at a full screen CMD window to see.

Over/Under Clocking The 7600GT Under Linux

I found out how to enable clock adjustment in the card. In my xorg.conf I have:

Section “Device”
Identifier “Videocard0”
Driver “nvidia”
Option “AddARGBGLXVisuals” “True”
Option “Coolbits” “1”

Then I’ve set this up to execute when my X session starts:

nvidia-settings -a GPUOverclockingState=1
nvidia-settings -a GPU2DClockFreqs=150,200
nvidia-settings -a GPU3DClockFreqs=150,200

This change brings the temperature down about 4 degrees. I must say I am a bit disappointed that so much heat (and power) is being wasted by the card when I’m not using the 3D capabilities. The world of graphics GPUs that use more power than the main CPU is very new and strange to me.

A Working Gadgetlabs GL824 Under Linux

As I’ve written, for the last few weeks I’ve been working (mostly part time) with a small team on developing a Linux driver for the Gadgetlabs GL824 sound card. Linux audio is a new world for me but with the help of a few others in the group, I’ve learned a lot about how modules are set up and loaded in the 2.6 kernel and generally how things all hang together in the Linux world.

I’ve put most of my focus in the sound engine, as the performance challenges of shifting up to 48000 samples @ 4 bytes per sample * 8 channels * 2 (in/out) = about 3 megabytes per second really intruiged me.

Given the GL uses an Altera in its sound engine, it brought back a lot of fond memories of the projects I’d been involved with at the Institute for Telecommunications Research, particularly with high speed satellite modems.

Optimistically, I set things up so data is moved in just 64 sample blocks per channel, meaning a theoretical latency of only 1.33ms!

One of the challenges of the project is that ALSA is optimised for use with hardware which can DMA into buffers that the driver sets up. The GL doesn’t work this way; it has its own hardware buffer (mapped into IO space) and ALSA’s promised support for this configuration hasn’t worked as we expected. I’ve ended up doing things by having the interrupt handler simulate DMA in and out of the ALSA buffer. For this to work, ALSA has to be using a buffer configuration that easily maps to the GL’s fixed buffer.

Now the tricky part. I wanted the driver to be able to play 16bit 44.1K stereo as well as 24 bit content, and allow multiple clients to use a subset of the channels from the card simultaneously. Given that each ALSA substream uses one buffer for all the channels being asked of it, vs. the GL’s buffer-per-channel approach, it was difficult coming up with a single ALSA buffer configuration that worked in all the modes.

A fixed 24 bit 8 channel mode is trivial if we wanted to give up.

I ended up coming up with a solution where the driver defines multiple Linux “subdevices” for the card. These can be opened in any combination until all the channels are used up. The different subdevices support configurations including stereo 16 bit, stereo 24 bit, 8 channel 16 bit and of course 8 channel 24 bit. I also included support for 32 bit samples since Jack seems to not care about 24 bit samples.

As long as all the substreams for the card are set at the same sample rate (theres only one sample clock), things can be very flexible. It does make the user have to think a bit though in order to pick the right subdevice for a given desired configuration.

Given the lack of hardware DMA, shifting the data becomes very interesting. I started by writing x86 assembly for the transfer routines (a first for me) but later changed it to C, in the spirit of being cross platform. Because of the card’s architecture, you can only transfer data in short bursts, after which you need to do a read-back. I ended up using unrolled loops for maximum speed/pipelineabiltiy, an old trick I’ve used in the past. This gives pages and pages of gibberish like:

*(c++)=*(b++)<<8; *(c++)=*(b++)<<8;
*(c++)=*(b++)<<8; *(c++)=*(b++)<<8;
*(c++)=*(b++)<<8; *(c++)=*(b++)<<8;

The various combinations of sample format and interleaving mean quite a few combinations of these unrolled loops.

My system seems to cope well with the interrupt rate. The basic data pumping only uses about 1% kernel CPU. However we’ve found that the higher level audio layers (Jack) seem to require more buffers than I’d hoped (at least 10 for me). I’m sure its because I haven’t been able to enable the realtime kernel option in the Jack layer, something isn’t installed, so the Linux scheduler gets in the way.

Apart from the buffers issue, my card is working well with aplay/arecord, Audacious (Winamp clone), Ardour (8 channel record/playback at 24 bits), Hydrogen (drum machine) and even, with a bit of persuasion, making it a default linux sound device. The apps seem to work better going straight to the card than through Jack – need to get my realtime kernel issue sorted out.

I’m currently using Fedora Core 8, with the CCRMA realtime kernel patches. This weekend I gave 64Studio a try, but couldn’t get the kernel module for the card to install, even though it compiled fine with the kernel’s headers. There seems an inconsistency in the published kernel headers and the core sound modules being used. Its possibly because I’m not building within the ALSA framework. Given the age of the kernel in 64Studio, I’ve decided to go back to CCRMA as its really on the bleeding edge and I’m getting used to Fedora, even after being a long time Debian user.

So… basically, its working! Still waiting for more people to confirm performance on different machines. I’m starting to think that 1.4ms latency is overkill and I might halve the interrupt rate. I’ll see how things go with the realtime kernel mode before making that decision.

There’s still some way to go before we’re finished. The card has a 16550 UART for MIDI. Its working but we’ve currently got it running off the audio interrupt in polled mode, which isn’t ideal. We need more validation against incorrect modes for the subdevices. Controls need to be defined for the card’s mute/level/monitoring switches. Then there is the possibility of syncing multiple cards (they can be physically linked to share a clock) and maybe support of the 4 channel variants of the GL824.

Once we’re done, we intend to roll it into the ALSA library, and hopefully have it accepted.

I must mention that one of the reasons that the Linux driver came together so quickly is that we had access to the source of the Win32 driver developed by Waldemar aka. “Mostek”. His redesigned Altera code made things a lot more straightfoward than with the original firmware that the cards shipped with, notwithstanding the very different ways of doing things in the Linux ALSA and the MSWindows WDM/ASIO audio models.

Linux driver for the Gadgetlabs GL824

Through my writings on this blog, I was approached by a team working on a Linux driver for the Gadgetlabs GL8x24 8 channel pro audio sound card. I’ve got one in my production PC and the lack of a Linux driver has held me back from switching the host OS for my VMs to Linux.

I joined the project at the end of January and am contributing to developing and testing the driver. In working with a couple of the guys from the team, I’ve learned how the cards realtime sound engine works, how linux audio drivers are structured and how to write x86 assembly code (high speed sample moving/conversion) which is my first foray back into ASM since the Apple ][ days. My past experience with Alteras, UARTs and real time systems is all coming back to be useful again.

We’ve got the driver doing 8 channel full duplex analog loop back at the interrupt level, with 1.4ms latency and MIDI record/playback is now also working. I’ll post more details as we approach an alpha driver release.

Its a testament to the quality and functionality of these cards that even 7 years after the company that made them went under, there is a group of loyal users and dedicated developers who have kept the card going, first by writing new WDM drivers (& Altera firmware) so it would work well in XP, and now – a Linux driver development. I’m really proud to be one of them.