Saturday, 1 November 2008

Using NAudio to Replace the BabySmash Audio Stack

After adding MIDI in support to BabySmash, the next obvious step was to replace the audio playback mechanism with NAudio too, which would allow better mixing of sounds, and doing cool things like controlling volume and panning of sounds. It also gives me a chance to write some more example documentation for NAudio.

To be able to play from one of the embedded WAV files, we need to create a WaveStream derived class that can read from an embedded resource, and convert the audio into a common format ready for mixing. One problem with BabySmash is that the current embedded audio files represent a whole smorgasbord of formats:

babygigl2.wav
scooby2.wav
MP3 11025Hz mono
babylaugh.wav
ccgiggle.wav
giggle.wav
PCM 11kHz mono 8 bit
EditedJackPlaysBabySmash.wav PCM 22kHz mono 16 bit
falling.wav
rising.wav
PCM 8kHz mono 16 bit
laughingmice.wav PCM 11127Hz! mono 8 bit
smallbumblebee PCM 22kHz stereo 16 bit

So I created WavResourceStream whose job it was to take an embedded resource and output a 32bit IEEE floating point stereo stream at 44.1kHz. I could equally have chosen 22kHz, which would reduce the amount of data that needs to be passed around. The choice of floating point audio is important for the mixing phase, as it gives us plenty of headroom.

The constructor is the most interesting part of this class. It takes the resource name, and uses a WaveFileReader to read from that resource (n.b. the constructor for WaveFileReader that takes a Stream has only recently been checked in to NAudio, there were other ways of doing this in the past, but in the latest code I am trying to clean things up a bit).

The next step is to convert to PCM if it is not already (there are two files whose audio is actually MP3, even though they are contained within a WAV file). The WaveFormatConversionStream looks for an ACM codec that can perform the requested format conversion. The BlockAlignReductionStream helps us ensure we call the ACM functions with sensible buffer sizes.

The step afterwards is to get ourselves to 44100kHz. You can't mix audio unless all streams are at the same sample rate. Again we use a BlockAlignmentReductionStream to help with the buffer sizing. Finally we go into a WaveChannel32 stream, which converts us to 32 bit floating point stereo stream, and allows us to  set volume and pan if required. So the audio graph depth is already potentially six streams deep. It may seem confusing at first, but once you get the hang of it, chaining WaveStreams together is quite simple.

class WavResourceStream : WaveStream
{
    WaveStream sourceStream;

    public WavResourceStream(string resourceName)
    {
        // get the namespace 
        string strNameSpace = Assembly.GetExecutingAssembly().GetName().Name;

        // get the resource into a stream
        Stream stream = Assembly.GetExecutingAssembly().GetManifestResourceStream(strNameSpace + resourceName);
        sourceStream = new WaveFileReader(stream);
        var format = new WaveFormat(44100, 16, sourceStream.WaveFormat.Channels);
            
        if (sourceStream.WaveFormat.Encoding != WaveFormatEncoding.Pcm)
        {
            sourceStream = WaveFormatConversionStream.CreatePcmStream(sourceStream);
            sourceStream = new BlockAlignReductionStream(sourceStream);
        }
        if (sourceStream.WaveFormat.SampleRate != 44100 ||
            sourceStream.WaveFormat.BitsPerSample != 16)
        {
            sourceStream = new WaveFormatConversionStream(format, sourceStream);
            sourceStream = new BlockAlignReductionStream(sourceStream);
        }
        
        sourceStream = new WaveChannel32(sourceStream);            
    }

The rest of the WavResourceStream is simply implementing the WaveStream abstract class members by calling into the source stream we constructed:

    public override WaveFormat WaveFormat
    {
        get { return sourceStream.WaveFormat; }
    }

    public override long Length
    {
        get { return sourceStream.Length; }
    }

    public override long Position
    {
        get { return sourceStream.Position; }
        set { sourceStream.Position = value; }
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        return sourceStream.Read(buffer, offset, count);
    }

    protected override void Dispose(bool disposing)
    {
        if (sourceStream != null)
        {
            sourceStream.Dispose();
            sourceStream = null;
        }
        base.Dispose(disposing);
    }
}

Now we need to create a mixer that can take multiple WavResourceStreams and mix their contents together into a single stream. NAudio includes the WaveMixer32Stream, but it is designed more for sequencer use, where you want exact sample level control over the positioning of the source streams, and also can reposition itself. BabySmash's needs are simpler - we simply want to play sounds once through. So I created a simpler MixerStream, which may find its way in modified form into NAudio in the future.

The first part of the code simply allows us to add inputs to the mixer. Our mixer doesn't really care what the sample rate is as long as all inputs have the same sample rate. The PlayResource function simply creates one of our WavResourceStream instances and adds it to our list of inputs. Currently I have no upper limit on how many inputs can be added, but it would make sense to limit it in some way (probably by throwing away some existing inputs rather than failing to play new ones).

public class MixerStream : WaveStream
{
    private List<WaveStream> inputStreams;
    private WaveFormat waveFormat;
    private int bytesPerSample;

    public MixerStream()
    {
        this.waveFormat = WaveFormat.CreateIeeeFloatWaveFormat(44100, 2);
        this.bytesPerSample = 4;
        this.inputStreams = new List<WaveStream>();
    }

    public void PlayResource(string resourceName)
    {
        WaveStream stream = new WavResourceStream(resourceName);
        AddInputStream(stream);
    }

    public void AddInputStream(WaveStream waveStream)
    {
        if (waveStream.WaveFormat.Encoding != WaveFormatEncoding.IeeeFloat)
            throw new ArgumentException("Must be IEEE floating point", "waveStream.WaveFormat");
        if (waveStream.WaveFormat.BitsPerSample != 32)
            throw new ArgumentException("Only 32 bit audio currently supported", "waveStream.WaveFormat");

        if (inputStreams.Count == 0)
        {
            // first one - set the format
            int sampleRate = waveStream.WaveFormat.SampleRate;
            int channels = waveStream.WaveFormat.Channels;
            this.waveFormat = WaveFormat.CreateIeeeFloatWaveFormat(sampleRate, channels);
        }
        else
        {
            if (!waveStream.WaveFormat.Equals(waveFormat))
                throw new ArgumentException("All incoming channels must have the same format", "inputStreams.WaveFormat");
        }

        lock (this)
        {
            this.inputStreams.Add(waveStream);
        }
    }

The real work of MixerStream is done in the Read method. Here we loop through all the inputs and mix them together. We also detect when an input stream has finished playing and dispose it and remove it from our list of inputs. The mixing is performed using unsafe code so you need to set the unsafe flag on the project to get it to compile. One important note is that we always return a full empty buffer even if we have no inputs, because our IWavePlayer expects full reads if it is to keep going.

public override int Read(byte[] buffer, int offset, int count)
{
    if (count % bytesPerSample != 0)
        throw new ArgumentException("Must read an whole number of samples", "count");            

    // blank the buffer
    Array.Clear(buffer, offset, count);
    int bytesRead = 0;

    // sum the channels in
    byte[] readBuffer = new byte[count];
    lock (this)
    {
        for (int index = 0; index < inputStreams.Count; index++)
        {
            WaveStream inputStream = inputStreams[index];

            int readFromThisStream = inputStream.Read(readBuffer, 0, count);
            System.Diagnostics.Debug.Assert(readFromThisStream == count, "A mixer input stream did not provide the requested amount of data");
            bytesRead = Math.Max(bytesRead, readFromThisStream);
            if (readFromThisStream > 0)
            {
                Sum32BitAudio(buffer, offset, readBuffer, readFromThisStream);
            }
            else
            {
                inputStream.Dispose();
                inputStreams.RemoveAt(index);
                index--;
            }
        }
    }
    return count;
}

static unsafe void Sum32BitAudio(byte[] destBuffer, int offset, byte[] sourceBuffer, int bytesRead)
{
    fixed (byte* pDestBuffer = &destBuffer[offset],
              pSourceBuffer = &sourceBuffer[0])
    {
        float* pfDestBuffer = (float*)pDestBuffer;
        float* pfReadBuffer = (float*)pSourceBuffer;
        int samplesRead = bytesRead / 4;
        for (int n = 0; n < samplesRead; n++)
        {
            pfDestBuffer[n] += pfReadBuffer[n];
        }
    }
}

The remaining functions of the MixerStream are quite simple. We don't need to report position or length as BabySmash simply plays a continuous stream.

public override long Length
{
    get { return 0; }
}

public override long Position
{
    get { return 0; }
    set 
    {
        throw new NotImplementedException("This mixer is not repositionable");
    }
}

public override WaveFormat WaveFormat
{
    get { return waveFormat; }
}

protected override void Dispose(bool disposing)
{
    if (disposing)
    {
        if (inputStreams != null)
        {
            foreach (WaveStream inputStream in inputStreams)
            {
                inputStream.Dispose();
            }
            inputStreams = null;
        }
    }
    else
    {
        System.Diagnostics.Debug.Assert(false, "WaveMixerStream32 was not disposed");
    }
    base.Dispose(disposing);
}

Finally we are ready to set up BabySmash to play its audio using MixerStream. We add two new members to the Controller class:

private MixerStream mainOutputStream;
private IWavePlayer wavePlayer;

And then in the Controller.Launch method, we create a new MixerStream, and use WaveOut to play it. We have chosen a read-ahead of 300 milliseconds which should mean that we don't get stuttering on a reasonably powerful modern PC. We need to pass the window handle to WaveOut as I have found that some laptop chipsets (most notably SoundMAX) have issues with running managed code in their callback functions. We don't have to use WaveOut if we don't want to. WASAPI works as well if you have Vista (although I found it was stuttering a bit for me).

IntPtr windowHandle = new WindowInteropHelper(Application.Current.MainWindow).Handle;
wavePlayer = new WaveOut(0, 300, windowHandle);
//wavePlayer = new WasapiOut(AudioClientShareMode.Shared, 300);
mainOutputStream = new MixerStream();
wavePlayer.Init(mainOutputStream);
wavePlayer.Play();

The call to wavePlayer.Play will mean we start making calls into the Read method of the MixerStream, but initially it will just be silence. When we are ready to make a sound, simply call the MixerStream.PlayResource method instead of the PlayWavResourceYield method:

//audio.PlayWavResourceYield(".Resources.Sounds." + "rising.wav");
mainOutputStream.PlayResource(".Resources.Sounds." + "rising.wav");

So what does it sound like? Well I haven't tested it too much, but it certainly is possible to have a large number of simultaneous laughs on my PC. "Cacophony" would sum up it up pretty well. The next step of course would be to implement the "Play notes from songs on keypress" feature request. Another obvious enhancement would be to cache the converted audio so that we didn't need to continually pass the same files through ACM again and again.

1 comment:

do0g said...

Very informative Mark, thanks.