One of the occasionally annoying things about developer communities is their tendency to present certain technologies as unqualified 'magic' and one should be thankful for developing for platform x because only it has technology y. GCD is sometimes presented this way, when it's mostly just a thread pool with a nice API.
Which is why articles like this are so great. When I was first dipping my toes into development, Mike Ash's articles were a great help in cutting through the marketing hype and realising that these are mostly things you could have implemented yourself given the time and inclination.
While I quite agree with your general point I think GCD is often under appreciated. Characterizing GCD as "mostly just a thread pool with a nice API" glosses over the key feature which Mike Ash specifically points out is not in this implementation: "the number of threads in the global pool scales up and down with the amount of work to be done and the CPU utilization of the system." Making the pool aware of the utilization of the whole system as a whole is the key feature of GCD. This also cuts to why something like this is best if it is created and promoted by the system vendor: the more GCD is utilized by the system software and third party applications the better the system will perform as a whole.
You clearly didn't read the article, which repeatedly states that GCD is not just a thread pool with a nice API.
The whole point of it is that GCD is a lot more efficient than that because it is optimized across the whole system. The author intentionally chooses to omit this because he is trying to present a simplified example for educational purposes.
Yes indeed! Oddly in the code on GitHub it's correct. I'm not sure how the mismatch happened. I corrected it in the article as well. Thanks for pointing that out.
I used to have a lot of problems with that, due to my crappy home-grown blog software, but it's pretty solid now. I think what may have happened is I had it backwards in the code originally, noticed the problem when pasting the code into the article, then fixed the code but forgot to fix the article. Maybe. It was late and beer was involved.
Nice! dispatch is a lovely API. Concurrent queues are nice to have, but the real magic is in serial queues.
For example, we can wrap a hash table in a dispatch queue: get() uses dispatch_sync, set() uses dispatch_async, and boom: our hash table rehashes in the background, instead of forcing set() to wait.
What are some other APIs that enable you to build a hash table like that?
For others not knowing the details of Cocoa API and wondering about it, [NSCondition wait] unlocks the condition's locks and blocks the thread until the condition variable is signaled, at which point is reacquires the lock and resumes.
You're welcome! I'm glad you liked. These posts are fun to write too. It's nice to start with an extremely well defined goal and implementation plan and then just go straight for it.
Mike Ash is a well known developer with a lot of great things to say about Apple technologies (I believe his day job is with Rogue Amoeba). He regularly breaks down notable parts of the Apple software stack and reimplements them as a learning exercise. His entire blog is worth a read.
Rogue Amoeba does a lot of neat things that, honestly, should have been built into OSX and other Apple products from day one, such as SoundFlower (an audio router using fake devices, inherited the stewardship from Cycling 74) and LineIn (just plays an input to an output with no magic).
AirFoil is also really neat, but I wish there was an AirFoil to AirFoil zero latency PCM mode that can plug into SoundFlower and just send my audio to my Windows desktop with no otherwise magic... AirFoil via the AirPlay protocol (or anything via the AirPlay protocol, even an Apple product to any other Apple product) has at least two seconds of lag, and also wastes time by encoding it (and is probably the source of the latency). Literally, the latency is the only thing preventing me from adopting the product.
As noted in a sibling comment I'm not with Rogue Amoeba anymore, but I'll still weigh in.
Airfoil is limited by the decision to stick to Apple's protocol. Obviously, this is necessary to talk to Apple's devices. It's not strictly necessary when going from Airfoil to Airfoil Speakers, but it makes life much simpler to just use the same protocol for everything.
AirTunes (what Apple used to call the audio version of AirPlay, which is totally unrelated to the video version of AirPlay... confused yet?) isn't really suited to low latencies. The transmitter basically sends audio data, and then separately sends commands every so often saying, "At time T you should be playing sample number X in the audio stream." This allows sending to multiple outputs and having them all be synchronized. But for low latency you really just want "play this instantly," and to make sure you get audio data to it in a timely fashion. If it's fast enough, then you don't need to synchronize multiple outputs, since they'll be synchronized with "real time."
The major challenge is dealing with packet loss. A typical WiFi network might see 1% packet loss, which you need to recover from somehow. For non-realtime operations, it's easy. You detect the loss, retransmit the data, and carry on. For realtime stuff it's harder. With video you can drop frames. It's not ideal, but it's not too bad. Dropping audio sounds way worse than dropping video looks, though, so that's not really an option for something like Airfoil.
One option I've always wanted to explore was to use forward error correction on the transmitted data. Send out enough redundancy that the receiver can put the original data together even with some loss in the middle. I never have tried it out, but I think it could really cut down on latency.
Without that, you're left with detecting loss and retransmitting, which eats up valuable time. Combine it with code written to do synchronized output with a substantial buffer, and you just can't do realtime transmission.
I don't think the encoding/decoding step (AirTunes uses Apple Lossless encoding for the audio, for those unaware) adds all that much latency to the process, but I will admit that I never went in and measured the bits on the millisecond level, since we were looking at 2000ms end-to-end latency.
If you want to experiment, there is a hidden setting in there somewhere that lets you adjust the latency Airfoil tells the receiver to use. It might be in the window that pops up when you hold option while launching the apps, or you might need to track down a hidden defaults key, I don't quite recall anymore. In my experience you probably won't be able to get it below about 200ms without lots of audible problems, and that's still plenty for a highly perceptible delay.
OK, I'd better stop this before it turns into something long enough to be a blog post on its own!
Forward error correction implies extra latency, so of course it has to make up for enough packet loss to provide a net reduction in latency to be useful.
Is there a generic UDP-based FEC transport library that one could use to wrap a simple Opus or FLAC stream and test this out? Edit: I found http://openfec.org/ but it seems to implement something other than packet loss recovery
Yep. Fortunately audio is relatively low-bandwidth compared to a typical LAN so you can pile on a lot of redundancy. If you want to go crazy you could transmit everything twice from the start.
AirTunes uses 352-frame packets. At 44100Hz, that's about 8ms per packet. So you definitely do not want to do FEC over more than a fairly small number of packets, or else you'll hit audible latency. (In my testing, if two sets of speakers are playing identical audio but with different delays, latency differences can be heard down to about 10ms with the right audio. Under 50ms is usually OK.)
Of course, if you're doing your own protocol you could use whatever size you want for the packets, keeping in mind that overhead will start to dominate if you make them really tiny.
Another problematic question is, what causes the packet loss? If it's just random, that's one thing, but what if it's outside interference? If some outside transmitter blasts your network for, say, 200ms and causes all data to be lost in that time, then a 2s buffer can recover, but a low-latency scheme is screwed no matter what. I don't know what the answer is here.
Wireless networks usually exhibit bursty packet loss on top of a base packet loss rate, which poses challenges for streaming (and everything else). QUIC uses forward error correction in some cases to mitigate this: https://www.chromium.org/quic
Mike could you do a breakdown of how to write a simple version of soundflower on OS X , say if I wanted to apply a global system wide audio compression to my audio ?
Apple used to have some sample code but I can no longer find it.
I'm afraid I've never touched the Soundflower code or done any related development with it. It is open source though, so you can download the code and do what you will with it:
Which is why articles like this are so great. When I was first dipping my toes into development, Mike Ash's articles were a great help in cutting through the marketing hype and realising that these are mostly things you could have implemented yourself given the time and inclination.