Monday, 2 June 2008

What's up with WASAPI?

Windows Vista introduced a new series of APIs for audio, called CoreAudio. This includes WASAPI (Windows Audio Session API), a brand new API for capturing and rendering audio.

Decisions, Decisions

Of course, there were always ways of dealing with audio in previous versions of Windows. And they are still available in Vista, so there is no need to move if you don't want to. Here's the main choices...

  • WinMM - these are the APIs that have been around for ages (e.g. the waveOut... and waveIn... functions). Their main limitation is poor latency (its hard to go much below 50ms without dropouts).
  • DirectSound - has its uses, particularly for game development, but it appears Microsoft is phasing this out too.
  • Kernel Streaming - this is what most of the pro audio applications use to talk directly to the WDM driver with minimal latency.
  • ASIO - This is a Steinberg audio interface model used by virtually all pro audio applications and is usually the best way to work at very low latencies. Pro audio sound card manufacturers provide ASIO drivers. Its one weakness is that you can only use one ASIO driver at a time, which has potential to cause issues in the future as more and more studio equipment such as microphones, sound modules and monitors (that's what speakers are called in the world of pro audio) come with USB interfaces rather than the older model of plugging all your ins and outs into a single audio interface.

Why Yet Another Audio API?

So why has Microsoft added WASAPI to the list?

  • First, Vista has a completely new audio mixing engine, so WASAPI gives you the chance to plug directly into it rather than going through a layer of abstraction. The reasons for the new audio engine are:
    • A move to 32 bit floating point rather than 16 bit, which greatly improves audio quality when dealing with multiple audio streams or effects.
    • A move from kernel mode into user mode in a bid to increase system stability (bad drivers can't take the system down).
    • The concept of endpoints rather than audio devices - making it easier for Windows users to send sounds to "headphones" or record sound from "microphone" rather than requiring them to know technical details about the soundcards installed on their system
    • Grouping audio streams. In Vista, you can group together all audio streams out of a single application and control their volume separately. In other words, a per-application volume control. This is a bit more involved than might be at first thought, because some applications such as IE host all kinds of processes and plugins that all play sound in their own way.
  • Second, the intention was to support pro audio applications which needed to be as close to the metal as possible, and keep latency to a bare minimum. (see Larry Osterman's Where does WASAPI fit in the big multimedia API picture?)

Learning WASAPI

Documentation on WASAPI is fairly sparse despite Vista being out for well over a year now. The best places at the moment to go are:

Who's Using It?

So is anyone actually using WASAPI? Well, I use a variety of pro audio applications, each of which offer the user a selection from a variety of APIs, and yet none of them have added WASAPI to the list. Even Cakewalk, who seem to be very loyal to Microsoft, have stuck with kernel streaming to access the new WaveRT driver model rather than using WASAPI.

The trouble seems to be that WASAPI doesn't offer anything that WDM Kernel Streaming doesn't already, and since WASAPI is Vista only, there is no incentive to switch. And with Windows XP not looking like it will go out of common usage for a very long time, writing a new application based on WASAPI doesn't make much sense.

But what about slightly less pro audio applications? What if someone is writing a new application that wants to deal in some way with individual audio samples, and is able to target just Vista and above. Could they choose to use WASAPI? It all depends on how much work they are willing to do...

Rendering Modes

WASAPI gives two options for audio rendering - Shared mode and Exclusive mode. In exclusive mode, you are the only application talking to the audio endpoint in question - all other applications cannot make any noise. This gives the absolutely best performance possible, so would the choice of all pro audio applications like Cubase, SONAR, REAPER, Pro Tools etc. But as I have already said, they are not using WASAPI. They are using ASIO or Kernel Streaming.

Which leaves us with shared mode. This allows you to share the endpoint with other applications. In other words, you can still hear your Windows sounds etc. Of course, when you share an endpoint, one application might want to play sound at 48kHz 24 bit stereo, while another wants to play it at 22kHz 16 bit mono. With the WinMM APIs, this is no problem - built in converters will convert each audio stream to the format of the Windows mixing engine.

The Missing Feature

Now I have been creating a set of .NET wrappers for WASAPI as part of my NAudio open source audio library. After the pain of writing the mountains of COM interop required to get .NET talking to WASAPI, I hit a brick wall. WASAPI does not offer sample rate conversion. In other words, to use shared mode, you must either hope that the Vista machine's audio engine is set to the exact sample rate of your audio, or you must write your own sample rate converter. And sample rate conversion is by no means trivial. Especially if your criteria are that it must not degrade the audio quality and it must be as fast as possible.

Why on earth could Microsoft not have given us SRC with WASAPI? Sure latency will be affected (or to be more accurate, processor load will increase, making lower latencies harder to achieve). But this is taken for granted with shared mode. We know that when we are sharing the endpoint then SRC must occur somewhere. Microsoft already have done the R&D to create a configurable performant Sample Rate Converter. Why not let WASAPI plug it in automatically (or at least give us a flag to ask WASAPI to do SRC for us)?

The upshot of this is that for me to continue with my WASAPI .NET interop I now have to wrap the Audio Resampler DSP DMO (DirectX Media Object), which is a whole new can of worms (perhaps another blog post later).

The Future of WASAPI

WASAPI is here to stay, but if it is to be used more widely, it needs to be made more developer friendly. There is I guess the possibility that pro audio companies will look to it more as we go to 64 bit Windows and as more people move to Vista, but it will really be able to pick up some traction if it was easier to use in shared mode. Here's my two suggestions...

  • Build in sample rate conversion in shared mode (and exclusive mode for those using sample rates not supported by the device)
  • Build a low-level audio API in .NET as part of the next version of the .NET framework, using a simple model of Wave Streams that can be plugged together and connected to WASAPI endpoint devices (or other driver models or file types). More on this in a future blogpost though.
Post a Comment