The main differences are the potential delay for adding threads, and the use of ML. The "injector" immediately adds threads if there are none spare (and there's permitted headroom), at the time of task submission. They are then _pruned_ as they are seen to be wasting cycles.
Admittedly this approach was designed to minimise the overheads of thread state management only, whereas in principle the CLR approach can respond to harmful competition for resources between tasks as well, assuming relatively consistent behaviour.
The downside of course is potentially very slow ramp-up time when the workload calls for a sudden increase in threads.
We had a system like this at my workplace, but it is gradually being replaced with cooperative threads. I don't remember what specific issues we were having with it, but I believe slow adaptation to periodic workloads was one issue. Fundamentally, why hill climb when you can know with certainty exactly how many blocked threads there are in your system.
From [1] "in the case of blocking workloads, it’s extremely difficult to determine the number of threads that optimizes overall throughput because it’s hard to determine when a request will be completed." So the argument they make is that responding to the number of blocked threads directly could lead to an over-correction, where you add a lot of threads to the thread pool when threads are about to unblock. This reduces throughput because you now suffer context switches and poor cache locality.
This is only a problem when you're not scheduling cooperatively. If you are scheduling cooperatively, then you don't ever have to experience unnecessary context switches.
The CLR thread pool is a pretty reliable workhorse. For throughput concerns you can benefit from a more fine-tuned approach to threading, though - each job added to the CLR thread pool involves at least two allocations in most cases, so you end up paying additional GC costs on top of the scheduling and context switch costs.
In cases where you know you are going to be running very many jobs on your thread pool you can get improved throughput by assigning each thread a specific job type so that it can blast through many jobs of the same type with better locality and less overhead. You can also leverage the underlying platform thread pool to spin up the job type workers and run them for a bit to avoid managing thread counts yourself.
To a degree if you're using an existing parallel framework it may do this for you, but at least on the CLR the options there like Parallel.For aren't so great - lots of abstraction overhead that will show up in profiles.
I have a prototype to optimize a caching policy using a hill climber. It favors recency or frequency by adapting the partition between the two regions. It works really well in early experiments.
You don't get real threads in Python thanks to the GIL - only one thread at a time can be executing Python code.
You can use processes to get around this, but it's often not as performant or convenient. Computer hardware shifted to multi-core a decade ago and Python was left behind because of that little bit of technical debt.
It used to be my favorite language, but I've since moved on to Go, both for the better performance and multi-core story, and for the static typing.
http://belliottsmith.com/injector/
The main differences are the potential delay for adding threads, and the use of ML. The "injector" immediately adds threads if there are none spare (and there's permitted headroom), at the time of task submission. They are then _pruned_ as they are seen to be wasting cycles.
Admittedly this approach was designed to minimise the overheads of thread state management only, whereas in principle the CLR approach can respond to harmful competition for resources between tasks as well, assuming relatively consistent behaviour.
The downside of course is potentially very slow ramp-up time when the workload calls for a sudden increase in threads.