Thursday 5 June 2008

The NAudio WaveStream in Depth

Continuing my series of posts documenting the NAudio open source .NET audio library, I will look in a bit more detail today at the WaveStream class, which is the base class for several others in NAudio, and can be overridden to add functionality to the NAudio engine. For a higher level look at where the WaveStream fits into the big picture of Wave mixing, see this post.

Object Hierarchy

WaveStream is an abstract class that inherits from System.IO.Stream. I chose to do this because I wanted it to be easily interoperable with other streaming APIs. However, with hindsight this has not proved to be a crucial feature, and a WaveStream to Stream adapter class could easily be created so a future version of NAudio may change this, which would allow us greater flexibility over the interface design of WaveStream.

Abstract Members

These members of WaveStream must be implemented by any concrete class that derives from WaveStream. They represent the main functionality of WaveStream. This section serves as a guide to the things you must do if you create a custom WaveStream.

WaveFormat Property

Each WaveStream must report the WaveFormat that it uses (i.e. the sample rate, bit depth, number of channels etc). Some WaveStreams can simply pass on the WaveFormat of their source stream, while others (e.g. the WaveFormatConversionStream) change the format of their source stream, so must provide the output format.

long Stream.Position Property

The Position property is measured in bytes. I had thought for a while about whether it should be measured in sample frames, and this may be an option for a future version of NAudio. The advantage of using bytes is that it is less error prone keeping track of position and ties in with the return value of the Read function. There are some helper methods (see below) that allow you to use TimeSpans to position the stream which is useful for media player applications. Position 0 represents the start of the stream.

When setting position, you should lock the WaveStream. This is because a Read may well be happening at the same time. Often in audio playback applications, the callback to Read happens in another thread to the main GUI thread.

Often when you are setting the position, you need to reposition the underlying WaveStream(s). This will often involve a calculation as the input and output WaveFormats may not be the same, or there may be an offset in start positions. If this is the case, be careful to ensure that the underlying stream is never set to a non block-aligned position. If it is, the resulting audio will be garbage.

long Stream.Length Property

This read-only property returns the length of the stream in bytes. This is useful because it allows playback devices to know when the overall playback has ended. The WaveFileWriter also uses this to know how much data to write to the Wave file. Streams that have no definite end position should return long.MaxValue. Sometimes an effect will result in a longer output than the input stream (for example the tail of a reverb effect). In this case, this should be reflected by increasing the Length.

void Stream.Dispose Method

Wave Streams should override the Dispose(bool disposing) method to free any associated resources. The approach taken by NAudio is that WaveStreams will dispose any input WaveStreams when they are disposed, but obviously you can take a different approach if required. By contrast, the WaveStream rendering classes (e.g. WaveOut, WaveFileWriter) do not dispose of their input streams when they are disposed.

int Read(byte[] destBuffer, int offset, int numBytes) Method

This is where the real work of a WaveStream happens. This method will generate the required number of bytes of audio, often by reading from one or more source streams and processing the data in some way. There are a number of key considerations for the implementation of this function.

  • The code should be highly optimised. This code must run several times a second if playing audio at low latencies. Avoid creating any new objects in this method if possible.
  • A lock should be taken to avoid the stream being repositioned during a read, which can cause unpredictable behaviour
  • Thought should be given on what to do if insufficient data is available to fulfil the request. With some types of WaveStream it is sufficient to return less than numBytes, but any WaveStream that will be connected directly to an audio output device will need to continue returning data past its own end position. Otherwise there will be nothing to fill the audio device buffers with and playback will stop (or stutter). Obviously if your source is a Wave file you can read ahead, but if your source is an audio capture device, there is no way of reading into the future.
  • This function should make sure that the Position property is updated. It is simplest to have a private position member variable that is updated with the number of bytes read each time.

Inheriting from System.IO.Stream has meant that we are tied to a Read method signature that is perhaps not ideal in all circumstances. Some WaveStream derived classes would benefit from a Read method that takes an array of floats or shorts, while others would benefit from using an IntPtr or unsafe byte, short or float pointer. This is part of the reason I wish Microsoft would let us cast more freely between struct types. I have some ideas for how the WaveStream base class could be made more flexible in a future version, but for now, you must always write your WaveStream output into the byte array provided.

Another possible limitation of WaveStream is that it currently offers no way of reporting any latency it introduces. Some types of DSP will introduce a delay. When you are playing back pre-recorded data, it is possible to compensate for that delay. Again, this is something I may look to introduce in a future version of NAudio, particularly if I ever get round to writing a WaveStream that can host a VST effect, which has been a long-term intention of mine.

Other WaveStream Members

These members of WaveStream are implemented in the base class. You may wish to override some of them, but on the whole they can be left alone.

bool Stream.CanRead Property

Implemented by the WaveStream base class and returns true.

bool Stream.CanSeek Property

Implemented by the WaveSteam base class and returns true.

bool Stream.CanWrite Property

Implemented by the WaveStream base class and returns false.

Stream.Flush Method

Implemented by the WaveStream base class. Does not do anything.

Stream.Seek Method

This is implemented by the WaveStream base class. It simply turns the methods into a call to the Position property setter.

Stream.SetLength Method

Throws a Not Supported Exception.

Stream.Write Method

Throws a Not Supported Exception.

int GetReadSize(int milliseconds)

Helper function that returns a recommended read size given a desired number of milliseconds of audio. The base class implementation calculates this using the WaveFormat.AverageBytesPerSecond property. This does not typically need to be overridden in derived WaveStream classes.

int BlockAlign Property

The base implementation simply gives quick access to the WaveFormat.BlockAlign property.

void Skip(int seconds) Method

This is a helper method that advances (or rewinds if seconds is negative) by the specified duration.

TimeSpan CurrentTime Property

This helper method allows you to access the current position in terms of a TimeSpan. You can also set the position using this method. The base class implementation uses the WaveFormat.AverageBytesPerSecond property to do its calculation.

TimeSpan TotalTime Property

This returns the length of the stream expressed as a timespan.

bool HasData(int count) Method

This function is intended to help optimise the mixing of streams. You can ask whether the stream has any non-zero data in the next count bytes. The default implementation returns true if the current Position is less than the Length. Overriding this in derived WaveStream classes allows the mixer to skip over this stream if it has no data, thus speeding things up. However, HasData should always be quick to run.

3 comments:

jibin said...

HI,
HOw we can use this waestream class for reading data from networkstream and the play it to the speakers directly using NAudio..
Need to implement a VOIP sample.
Any examples or links helpful to do this
Thanks
jibin
jibin.mn@gmail.com

Unknown said...

you need to create a custom wavestream or WaveProvider that supplies the data you read from the network from the Read method. You'll need to report the correct WaveFormat, and implement buffering. See the BufferedWaveProvider in the latest code for a good starting point.

Anonymous said...

Hi. I am new in NAudio. i am using wave stream in my program. where i produce two wavestream from two different/similar audio. How can i compare the percentage of similarity between the two wavestream?