Wednesday, 10 September 2014

Creating RF64 and BWF WAV files with NAudio

There are two extensions to the standard WAV file format which you may sometimes want to make use of. The first is the RF64 extension, which overcomes the inherent limitation that WAV files cannot be larger than 4GB. The second is the Broadcast Wave Format (BWF) which builds on the existing WAV format and specifies various extra chunks containing metadata.

In this post, I’ll explain how you can make a class to create Broadcast Wave Files using NAudio that supports large file sizes using the RF64 extension, and includes the “bext” chunk from the BWF specification.

File Header

First of all, a WAV file starts with the byte sequence ‘RIFF’ and then has a four byte size value, which is the number of bytes following in the entire file. However, for large files, instead of ‘RIFF’, ‘RF64’ is used, and the following four byte integer for the RIFF size is then ignored (it should be set to -1).

Then we have the ‘WAVE’ identifier (another 4 bytes), and following that in a normal WAV file we would usually expect the format chunk (with the ‘fmt ‘ 4 byte identifier). But to support RF64, we add a “JUNK” chunk. This is of size 28 bytes, and initially is all set to zeroes. If the overall size of the entire WAV file grows to over 4GB, then we will turn this “JUNK” chunk into a ‘ds64’ chunk. If the overall file size does not grow beyond 4GB, then the junk chunk is simply left in place, and media players will just ignore it.

The ds64 Chunk

A ds64 chunk consists of three 8 byte integers and a four byte integer. These are the RIFF size, which is the size of the entire file minus 8 bytes, the data size, which is the number of bytes of sample data in the ‘data’ chunk and the ‘sampleCount’ which is the number of samples. The sample count is optional really, as it corresponds to the sample count found in the ‘fact’ chunk of a standard WAV file. This chunk is usually only present for non-PCM audio formats as it is trivial to calculate the sample count for PCM from the byte count. Finally, a ds64 chunk can have a table containing the sizes of any other huge chunks, but usually this would be unused since it is likely only the ‘data’ chunk that will grow larger than 4GB. So the final four bytes of a typical ds64 chunk are 0s, indicating no table entries.

The bext chunk

Following the ds64 chunk, we have the bext chunk from the BWF specification. This has space for a textual description of the file as well as timestamps, and newer versions of the bext chunk also allow you to specify various bits of loudness information. The algorithms for calculating this are hard to track down, so I tend to use version 1 of bext and ignore them.

The fmt chunk

Then we have the standard ‘fmt ‘ chunk, which works just the same way it does in a standard WAV file, containing a WAVEFORMATEX structure with information about sample rate, bit depth, encoding and number of channels. RF64 files are almost always either PCM or IEEE floating point samples, since it is only with uncompressed audio that you typically end up creating files larger than 4GB.

The data chunk

Finally, we have the ‘data’ chunk, containing the actual audio data. Again this is used in exactly the same way as it is in a regular WAV file, except that the chunk data length only needs to be filled in the file is less than 4GB. If it is a RF64 file, the four byte length for the data chunk is ignored (set it to –1), and the size from the ds64 chunk is used instead.

The code

Here’s a simple implementation of a BWF writer class, that creates BWF files with a simple bext chunk and turns them into RF64 files if necessary. I plan to clean this code up a little and import it into NAudio in the future (either as its own class or upgrade WaveFileWriter to support RF64 and BWF – let me know your preference in the comments).

Post a Comment