libtask isn't using the ucontext syscall (normally), just the ucontext_t struct. If you look at the source code, it provides local assembly versions of the context swapping code for the most common architectures.
That said, your library is quite nice in that it just provides the co-routine wrappers and nothing else, which I appreciate. Also, your switch code has less instructions (are there cases it doesn't handle?), but if you look at libtask's code, it probably not going to be 100x faster. :-)
But it looks like you're right, context.c is actually defining its own version of swapcontext to replace the system function that can use other backends. I'm sorry for the mistake, thanks for correcting me.
> are there cases it doesn't handle?
No, it conforms to the platform ABIs. I even back up the xmm6-15 registers that Microsoft made non-volatile on their amd64 ABI (compare to how much lighter the SystemV amd64 ABI's switch routine is; the pre-assembled versions are in the doc/ folder.)
Their own fibers library even has a switch to choose whether to back those up or not, which I think is quite dangerous.
Still, nothing can top SPARC's register windows for being outright hostile to the idea of context switching.
...
Oh, and it also supports thread-local storage for safe use with multiple real threads.
> it probably not going to be 100x faster.
No, definitely not compared to his ASM versions. Even with a less efficient swap, once you get past the syscall, most of the overhead is simply in the cache impact of swapping out the stack pointer, so his will likely be very close. In fact, even I had to sacrifice a tiny bit of speed to wrap the fastcall calling convention and execute non-inline assembly code (to avoid dependencies on specific compilers/assemblers.)
I do still strongly favor jmpbuf over ucontext for the final fallback, but with x86, ppc, arm, mips and sparc on Windows, OS X, Linux and BSD, you've pretty much covered 99.999% of hardware you'd ever use this on. That and libtask lacks Windows and OS X support.
That said, your library is quite nice in that it just provides the co-routine wrappers and nothing else, which I appreciate. Also, your switch code has less instructions (are there cases it doesn't handle?), but if you look at libtask's code, it probably not going to be 100x faster. :-)