Wednesday, 27 March 2013

How to convert byte[] to short[] or float[] arrays in C#

One of the challenges that frequently arises when writing audio code in C# is that you get a byte array containing raw audio that would be better presented as a short (Int16) array, or a float (Single) array. (There are other formats too – some audio is 32 bit int, some is 64 bit floating point, and then there is the ever-annoying 24 bit audio). In C/C++ the solution is simple, cast the address of the byte array to a short * or a float * and access each sample directly.

Unfortunately, in .NET casting byte arrays into another type is not allowed:

byte[] buffer = new byte[1000];
short[] samples = (short[])buffer; // compile error!

This means that, for example, in NAudio, when the WaveIn class returns a byte[] in its DataAvailable event, you usually need to convert it manually into 16 bit samples (assuming you are recording 16 bit audio). There are several ways of doing this. I’ll run through five approaches, and finish up with some performance measurements.


Perhaps the simplest conceptually is to use the System.BitConverter class. This allows you to convert a pair of bytes at any position in a byte array into an Int16. To do this you call BitConverter.ToInt16. Here’s how you read through each sample in a 16 buffer:

byte[] buffer = ...;
for(int n = 0; n < buffer.Length; n+=2)
   short sample = BitConverter.ToInt16(buffer, n);

For byte arrays containing IEEE float audio, the principle is similar, except you call BitConverter.ToSingle. 24 bit audio can be dealt with by copying three bytes into a temporary four byte array and using ToInt32.

BitConverter also includes a GetBytes method to do the reverse conversion, but you must then manually copy the return of that method into your buffer.

Bit Manipulation

Those who are more comfortable with bit manipulation may prefer to use bit shift and or to convert each pair of bytes into a sample:

byte[] buffer = ...;
for (int n = 0; n < buffer.Length; n+=2)
   short sample = (short)(buffer[n] | buffer[n+1] << 8);

This technique can be extended for 32 bit integers, and is very useful for 24 bit, where none of the other techniques work very well. However, for IEEE float, it is a bit more tricky, and one of the other techniques should be preferred.

For the reverse conversion, you need to write more bit manipulation code.


Another option is to copy the whole buffer into an array of the correct type. Buffer.BlockCopy can be used for this purpose:

byte[] buffer = ...;
short[] samples = new short[buffer.Length];

Now the samples array contains the samples in easy to access form. If you are using this technique, try to reuse the samples buffer to avoid making extra work for the garbage collector.

For the reverse conversion, you can do another Buffer.BlockCopy.


One cool trick NAudio has up its sleeve (thanks to Alexandre Mutel) is the “WaveBuffer” class. This uses the StructLayout=LayoutKind.Explicit attribute to effectively create a union of a byte[], a short[], an int[] and a float[]. This allows you to “trick” C# into letting you access a byte array as though it was a short array. You can read more about how this works here. If you’re worried about its stability, NAudio has been successfully using it with no issues for many years. (The only gotcha is that you probably shouldn’t pass it into anything that uses reflection, as underneath, .NET knows that it is still a byte[], even if it has been passed as a float[]. So for example don’t use it with Array.Copy or Array.Clear). WaveBuffer can allocate its own backing memory, or bind to an existing byte array, as shown here:

byte[] buffer = ...;
var waveBuffer = new WaveBuffer(buffer);
// now you can access the samples using waveBuffer.ShortBuffer, e.g.:
var sample = waveBuffer.ShortBuffer[sampleIndex];

This technique works just fine with IEEE float, accessed through the FloatBuffer property. It doesn’t help with 24 bit audio though.

One big advantage is that no reverse conversion is needed. Just write into the ShortBuffer, and the modified samples are already in the byte[].

Unsafe Code

Finally, there is a way in C# that you can work with pointers as though you were using C++. This requires that you set your project assembly to allow “unsafe” code. "Unsafe” means that you could corrupt memory if you are not careful, but so long as you stay in bounds, there is nothing unsafe at all about this technique. Unsafe code must be in an unsafe context – so you can use an unsafe block, or mark your method as unsafe.

byte[] buffer = ...;
    fixed (byte* pBuffer = buffer)
        short* pSample = (short*)buffer;
        // now we can access samples via pSample e.g.:
        var sample = pSample[sampleIndex];

This technique can easily be used for IEEE float as well. It also can be used with 24 bit if you use int pointers and then bit manipulation to blank out the fourth byte.

As with WaveBuffer, there is no need for reverse conversion. You can use the pointer to write sample values directly into the memory for your byte array.


So which of these methods performs the best? I had my suspicions, but as always, the best way to optimize code is to measure it. I set up a simple test application which went through a four minute MP3 file, converting it to WAV and finding the peak sample values over periods of a few hundred milliseconds at a time. This is the type of code you would use for waveform drawing. I measured how long each one took to go through a whole file (I excluded the time taken to read and decode MP3). I was careful to write code that avoided creating work for the garbage collector.

Each technique was quite consistent in its timings:

  Debug Build Release Build
BitConverter 263,265,264 166,167,167
Bit Manipulation 254,243,250 104,104,103
Buffer.BlockCopy 205,206,204 104,103,103
WaveBuffer 239.264.263 97,97,97
Unsafe 173.172.162 98,98,98

As can be seen, BitConverter is the slowest approach, and should probably be avoided. Buffer.BlockCopy was the biggest surprise for me  - the additional copy was so quick that it paid for iteself very quickly. WaveBuffer was surprisingly slow in debug build – but very good in Release build. It is especially impressive given that it doesn’t need to pin its buffers like the unsafe code does, so it may well be the quickest possible technique in the long-run as it doesn’t hinder the garbage collector from compacting memory. As expected the unsafe code gave very fast performance. The other takeaway is that you really should be using Release build if you are doing audio processing.

Anyone know an even faster way? Let me know in the comments.


javO said...

Nice post. Is there any way for the WaveBuffer aproach to work with a 32bit sample?

Unknown said...

@javO, yes, WaveBuffer deals with int and float, both of which are 32 bit

Riley L said...

This was super useful, thanks for the post!