Friday 28 February 2014

Should NAudio use SharpDX for MediaFoundation support?

In NAudio 1.7 I introduced Media Foundation support into NAudio. The main classes are MediaFoundationReader, MediaFoundationEncoder, MediaFoundationResampler and MediaFoundationTransform. Those four classes actually cover the bulk of what most people would want to do with Media Foundation, but they required a huge amount of interop, and there are still a few areas yet to complete which are proving rather tricky.

But I noticed a while ago that SharpDX has actually already got the vast majority of Media Foundation interop made available in its SharpDX.MediaFoundation assembly. It got me thinking, what if NAudio simply relied on the work done in SharpDX rather than creating its own interop wrappers? There are several advantages and disadvantages to taking this approach.

Advantages...

Completeness
A large part of SharpDX is auto-generated, which means that it contains pretty much every interface, every API call, every enumeration, and every Guid you could ever need. By contrast, the NAudio wrappers are done by hand, so there are a number of bits missing. In particular SharpDX has done the hard work of implementing the reading and encoding from standard .NET streams which is something I've really wanted to add to NAudio but it has proved very complicated to implement. SharpDX also includes interop wrappers that allow access to the "presentation attributes" which would be a nice enhancement.

Cross-Threading
SharpDX uses a fancy post-compile trick which allows COM objects created on one thread to be used on another. NAudio needs something like this, as the current workaround I use is a bit of a hack (basically recreate the MF source reader on the new thread). However, I will confess to not fully understanding how the SharpDX technique works, which is why I haven’t yet borrowed the technique for NAudio.

Closing
SharpDX has a really nice approach to creating wrappers for COM objects, which makes them disposable objects. By contrast I simply have been trying to call Marshal.ReleaseComObject in all the right places in NAudio. I try to hide this as much as possible from users of NAudio, but there are places where it does leak out leaving the potential for a memory leak.

Collaboration
I always find it slightly depressing that so many open source projects decide to compete rather than collaborate with each other. If we spent less time re-inventing the wheel and more time building upon what others have already created, I’m sure we could make some amazing software.

So I could just decide to let SharpDX do what it does well (wrap COM-based Windows APIs), and then NAudio could focus on providing helpful ways to construct your audio graph (which for me is the interesting bit). What's more, if NAudio used SharpDX, it could result in enhancements being submitted to SharpDX (I've already made a few contributions as part of my experimentation).

Disadvantages...

So there’s a lot in favour of using SharpDX, but there are some disadvantages that need to be weighed up.

Dependencies
We'd need to make NAudio depend on two additional DLLs – SharpDX.dll and SharpDX.MediaFoundation.dll. As well as increasing the overall size of the dependencies, this could also cause some namespace confusion as NAudio and SharpDX both have classes with the same name (WaveFormat is a prime example). There would also be potential for versioning conflicts if NAudio was introduced into a project that was using a different version of SharpDX.

Control
By depending on an external library like SharpDX, we'd lose a measure control over all the interop. Currently NAudio's interop is hand-crafted to be exactly how I want it. But with SharpDX, I'd need to submit pull-requests for all the tweaks I want. I'd also be dependent on the release schedule of SharpDX - a new version of NAudio would need to depend on an official release of SharpDX, not a special build. Having said that, Alexandre Mutel has been quick to accept my pull requests so far and there isn’t a pressing need for any more to be made.

Nearly There?
It may be that I'm not that far off at all. If I get read and write from a stream working, and can solve the threading issue, then the main motivation for building on SharpDX would disappear. But I can't really tell how close I am to getting it all working just how I want it.

Trying it Out

The good news is that I can trial all of this without needing to change NAudio at all. I've made a new library that depends on NAudio and SharpDX, and offers alternative implementations of NAudio's four Media Foundation classes. I've called them SharpMediaFoundationReader, SharpMediaFoundationEncoder, SharpMediaFoundationResampler and SharpMediaFoundationTransform. It’s called NAudio.SharpMediaFoundation and it’s on GitHub.

Once a version of SharpDX is released containing my customisations, I can release this as its own NuGet package, then that would allow people who want/need the features SharpDX can offer to take advantage of it without any disruption to the existing NAudio library. So maybe I can have the best of both worlds.

Do let me know if you have any feedback on this approach to MediaFoundation. I’ll announce on my blog when an official release of NAudio.SharpMediaFoundation is available.

3 comments:

Lucian Wischik said...

How did this work out?

I'm curious, does sharpdx have any additional dependencies that naudio doesn't? Eg. would this move prevent naudio working on PhoneSilverlight or some other platforms?

Unknown said...

no that I know of. You can in fact use my alternative implementation alongside NAudio, so I may go with that solution.

I'm hoping to look at getting a version of NAudio for windows phone, maybe when 8.1 comes out as that opens the door to more shared code

Jon said...

I know this is not what the audio library is for, but any chance you know of a CoreAudio set that is based on SharpDx? It didn't look like it was part of the main set of functionality.