Sunday, 7 November 2010

State of MP3 Playback Support in NAudio

The MP3 playback support in NAudio was always rather experimental. The ACM conversion code I had written assumed CBR (constant bit rate) and constant block sizes. With MP3s this is not always the case, since there are VBR files with variable block sizes, and even in CBR MP3 files, padding means you can get frames of different sizes. However, despite these issues I did manage to get MP3 more or less playing back, which was cool, but not 100% reliable. People ran into issues like the occasional error while repositioning a stream, or the more irritating fact that Mp3FileReader was not good at calculating the duration of a file.

In this post I will go over some of the key challenges to getting good MP3 playback support, with details of some recent changes I have checked in, along with some ideas for the future.

1. Correctly parsing MP3 frame headers

To work effectively with MP3 files you need to be able to parse frame headers correctly and determine their exact size. If this cannot be done, we have no choice but to pass blocks of MP3 file directly to the decoder without knowing whether we are giving it whole frames or not.

Finding out how to properly parse MP3 frame headers was a much harder challenge than it seemed. Googling for info on MP3 frames reveals some articles that look authoritative but simply failed to correctly parse everything I threw at them. Mono or low sample rate MP3s got their frame size calculation wrong. However, eventually I found this article, whose source code had the final missing piece of information that allowed me to get it reliable.

2. A single frame decoder

Once we can calculate frame sizes reliably, we are able to decode them one by one. The WaveFormatConversionStream assumes everywhere it is working with CBR, so I have removed it from the equation. Now a simple MP3 Frame Decoder class (final name and interface to be decided) is used to decode MP3 frames one at a time using ACM. Alternative frame decoders could easily be plugged in if required in the future (e.g. using DMO or NLayer).

The really big change is that this means that the MP3FileReader no longer returns MP3 data in its Read method, but emits PCM. This makes life so much easier downstream and simplifies the playback graph considerably (no more BlockAlignReductionStream). I’ve made the ReadFrame method public so if you have a pressing need to get the compressed data out instead there is nothing stopping you.

3. Accurate length reporting

Accurate length reporting was never possible before, since it relied on an estimate of the bitrate. But now we can parse MP3 frames, we have accurate knowledge of exactly how many samples each frame will decompress into, and the TotalTime property (and CurrentTime property) of MP3FileReader should be entirely accurate. n.b. I think that it may be possible that the first frame in a VBR MP3 file decompresses to zero samples (although I already exclude the Xing frame so maybe there is another similar meta-data type frame), so we might actually very slightly over-report the length – I’ll need to look into that.

4. Repositioning to frame granularity

When you reposition with the MP3FileReader.Position property, there is of course every possibility that you will ask for it to reposition to a place midway through a frame. We now automatically move you to the start of the frame that contains the position you asked for.

An earlier NAudio contributor had done some cool stuff with a BinarySearch to speed this up. I needed to drop this temporarily as I was making changes to the table of contents generation. However, there is no reason why this could not be reinstated now, to speed up performance further (although repositioning perf doesn’t seem to be a major issue with the tests I have done on 1 hour long MP3s).

5. Repositioning with sample granularity

Obviously, it would be even nicer to support MP3 repositioning with sample granularity. This would involve us decompressing a frame during the seek process, so when the next Read occurs we can read from part-way through that frame. The framework to do that is already in place (we keep track of “leftovers”, so this could be a feature I add in the not too distant future).

6. Forward only

One thing I haven’t got round to doing yet, is making it possible to use MP3FileReader from an input stream that doesn’t support repositioning. Obviously, it would not be able to work out its Length, and there would be no real need for a TOC. Proper support for forward only streams would be useful to people wanting to do network streamed playback, which seems to be one of the most common queries I get.

7. Changes of Sample Rate and Number of Channels

It is theoretically possible within an MP3 file for the sample rate and number of channels to change from frame to frame. However, I have no immediate plans to support this since I’ve never seen an MP3 file that does this. The only scenario in which I could imaging this would be if someone was attempting to concatenate two MP3 files by simply copying frames from one into the other.

8. ID3 Tag Support

I have no plans to introduce ID3 tag reading or writing, since there are other open source libraries out there that do this perfectly well.

Give me feedback

Please grab the latest NAudio code from Codeplex and let me know how you get on with it. As always, the best way to give it a run-through is to use the NAudioDemo app that is included in the solution. Load it up, select WAV playback and try it with whatever MP3s you have lying around on your hard disk. It would be great to have robust MP3 playback as a headline feature for NAudio 1.4.

Post a Comment