Tuesday 3 February 2015

How to Encode MP3s with NAudio MediaFoundationEncoder

In this post I am going to explain how the NAudio MediaFoundationEncoder class can be used to convert WAV files into other formats such as WMA, AAC and MP3. And to do so, I'll walk you through a real-world example of some code I created recently that uses it.

The application is my new Skype Voice Changer utility, and I wanted to allow users to save their Skype conversations in a variety of different formats. The WavFormat that Skype uses is 16kHz 16 bit mono PCM, and I capture the audio directly in this format, before converting it to the target format when the call finishes.

The input to the MediaFoundationEncoder doesn't actually have to be a WAV file. It can be any IWaveProvider, so there is no need to create a temporary WAV file before the encoding takes place.

Initialising Media Foundation

The first step is to make sure Media Foundation is initialised. This requires a call to MediaFoundation.Startup(). You only need to do this once in your application, but it doesn’t matter if you call it more than once. Note that Media Foundation is only supported on Windows Vista and above. This means that if you need to support Windows XP, you will not be able to use Media Foundation.

Determining Codec Availability

Since I planned to make use of whatever encoders are available on the user’s machine, I don't need to ship any codecs with my application. However, not all codecs are present on all versions of Windows. The Windows Media Audio (and the Windows Media Voice) codec, are unsurprisingly present on all the desktop editions of Windows from Vista and above. Windows 7 introduced an AAC encoder, and it was only with Windows 8 that we finally got an MP3 encoder (although MP3 decoding has been present in Windows for a long time). There are rumours that a FLAC encoder will be present in Windows 10.

For server versions of Windows, the story is a bit more complicated. Basically, you may find you have to install the "Desktop Experience" before you have any codecs available.

But the best way to find out whether the codec you want is available is simply to ask the Media Foundation APIs whether there are any encoders that can target your desired format for the given input format.

MediaFoundationEncoder includes a useful helper function called SelectMediaType which can help you do this. You pass in the MediaSubtype (basically a GUID indicating whether you want AAC, WMA, MP3 etc), the input PCM format, and a desired bitrate. NAudio will return the "MediaType" that most closely matches your bitrate. This is because many of these codecs offer you a choice of bitrates so you can choose your own trade-off between file size and audio quality. For the lowest bitrate available, just pass in 0. For the highest bitrate, pass in a suitably large number.

So for example, if I wanted to see if I can encode to WMA, I would pass in an audio subtype of WMAudioV8 (this selects the right encoder), a WaveFormat that matches my input format (this is important as it includes what sample rate my input audio is at - encoders don't always support all sample rates), and my desired bitrate. I passed in 16kbps to get a nice compact file size.

var mediaType = MediaFoundationEncoder.SelectMediaType(
                    AudioSubtypes.MFAudioFormat_WMAudioV8, 
                    new WaveFormat(16000, 1), 
                    16000); 

if (mediaType != null) // we can encode… 

What about MP3 and AAC? Well you might think that the code would be the same. Just pass in MFAudioFormat_MP3 or MFAudioFormat_AAC as the first parameter. The trouble is, if we do this, we get no media type returned, even on Windows 8 which has both an MP3 encoder and an AAC encoder. Why is this? Well it's because the MP3 and AAC encoders supplied with Windows don't support 16kHz as an input sample rate. So we will need to upsample to 44.1kHz before passing it into the encoder. So now let's ask Windows if there is an MP3 encoder available that can encode mono 44.1kHz audio, and just request the lowest bitrate available:

mediaType = MediaFoundationEncoder.SelectMediaType(
            AudioSubtypes.MFAudioFormat_MP3, 
            new WaveFormat(44100,1), 
            0); 

Now (on Windows 8 at least) we do get back a media type, and it has a bitrate of 48kbps. The same applies to AAC - we need to upsample to 44.1kHz first, and the AAC encoder provided with Windows 7 and above has a minimum bitrate of 96kbps.

Performing the Encoding

So, assuming that we've successfully got a MediaType, how do we go about the encoding? Well thankfully, that's the easy bit. So for example, if we had selected a WMA media type, we could encode to a file like this:

using (var enc = new MediaFoundationEncoder(mediaType)) 
{ 
    enc.Encode("output.wma"), myWaveProvider) 
} 

In fact, to make things even simpler, MediaFoundationEncoder includes some helper methods for encoding to WMA, AAC and MP3 in a single line. You specify the wave provider, the output filename and the desired bitrate:

MediaFoundationEncoder.EncodeToMp3(myWaveProvider, 
                        "output.mp3", 48000); 

 

Creating your Pipeline

But of course the bit I haven't explained is how to set up the input stream to the encoder. This will need to be PCM (or IEEE float), and as we mentioned, it should be at a sample rate that the encoder supports. Here's an example of encoding a WAV file to MP3, but remember that the input WAV file will need to be 44.1kHz or 48kHz for this to work.

using (var reader = new WaveFileReader("input.wav")) 
{ 
    MediaFoundationEncoder.EncodeToMp3(reader, 
            "output.mp3", 48000); 
} 

But that was a trivial example. In a real world example, such as my Skype Voice Changer application, we have a more complicated setup. First, we open the inbound and outbound recording files with WaveFileReader. Then we mix them together using a MixingSampleProvider. Then, since I limit unregistered users to 30 seconds of recording, we optionally need to truncate the length of the file (I do this with the OffsetSampleProvider, and using the Take property). Then, if they selected MP3 or AAC we need to resample up to 44.1kHz. Since we’re already working with Media Foundation, we’ll use the MediaFoundationResampler for this. And finally, I go back down to 16 bit before encoding using a SampleToWaveProvider16 (although this is not strictly necessary for most Media Foundation encoders).

// open the separate recordings 
var incoming = new WaveFileReader("incoming.wav"); 
var outgoing = new WaveFileReader("outgoing.wav"); 

// create a mixer (for 16kHz mono) 
var mixer = new MixingSampleProvider(
                WaveFormat.CreateIeeeFloatWaveFormat(16000,1)); 

// add the inputs - they will automatically be turned into ISampleProviders 
mixer.AddMixerInput(incoming); 
mixer.AddMixerInput(outgoing); 

// optionally truncate to 30 second for unlicensed users 
var truncated = truncateAudio ? 
                new OffsetSampleProvider(mixer) 
                    { Take = TimeSpan.FromSeconds(30) } : 
                (ISampleProvider) mixer; 

// go back down to 16 bit PCM 
var converted16Bit = new SampleToWaveProvider16(truncated); 

// now for MP3, we need to upsample to 44.1kHz. Use MediaFoundationResampler 
using (var resampled = new MediaFoundationResampler(
            converted16Bit, new WaveFormat(44100, 1))) 
{ 
    var desiredBitRate = 0; // ask for lowest available bitrate 
    MediaFoundationEncoder.EncodeToMp3(resampled, 
                    "mixed.mp3", desiredBitRate); 
} 

Hopefully that gives you a feel for the power of chaining together IWaveProvider’s and ISampleProvider’s in NAudio to construct complex and interesting signal chains. You should now be able to encode your audio with any Media Foundation encoder present on the user’s system.

Footnote: Encoding to Streams

One question you may have is "can I encode to a stream"? Unfortunately, this is a little tricky to do, since NAudio takes advantage of various "sink writers" that Media Foundation provides, which know how to correctly create various audio container file formats such as WMA, MP3 and AAC. It means that the MediaFoundationEncoder class for simplicity only offers encoding to file. To encode to a stream, you'd need to work at a lower level with Media Foundation transforms directly, which is quite a complicated and involved process. Hopefully this is something we can add support for in a future NAudio.

No comments: