Using DTrace to measure Erlang dirty scheduler overhead

arthurcolle · on Aug 30, 2015

Here is an overview of how BEAM is implemented for people who want a quick refresher: http://www.erlang.org/euc/08/euc_smp.pdf

And a comparison to the JVM here: http://ds.cs.ut.ee/courses/course-files/To303nis%20Pool%20.p...

pron · on Aug 30, 2015

I'll just note that the JVM part in that comparison is wrong. The main difference between BEAM and the JVM is that BEAM implements a much larger part of the language's functionality in the runtime, while -- at least for many JVM languages -- that is not the case with the JVM. The JVM -- similarly to the CPU+OS -- directly offers a rather general programming model -- shared memory, kernel threads etc., only with the addition of an optimizing JIT and a GC. BEAM, OTOH, operates at a much higher level, much closer to the Erlang language. It offers a very specific form of GC, a very specific form of shared memory, and a very specific scheduler. All of these -- just like BEAM implements them on the CPU+OS, can be implemented on top of the JVM, which implements a lower level-of-abstraction than BEAM.

The comparison, however, compares BEAM to a programming model (offered by the Java language) that is very close to the JVM's native, low-level, abstraction. That is a lot like comparing Erlang and C, namely comparing two things that are aimed at completely different levels of abstraction. And just like Erlang can be (and is) implemented in C -- which is a lower level language -- so too it can be implemented in Java.

Its preemptive lightweight processes can be implemented in Java, its scheduler can be implemented in Java (both have been, in fact), and even its per-process GC can be implemented in Java (although that's probably unnecessary given new Java GCs).

The reason BEAM is implemented that way is not because it results in a better Erlang runtime, but that a very specific, high-level VM, can yield good(ish -- BEAM is a very slow VM compared to HotSpot or V8) results at relatively little effort because the high-level constraints imposed by the language are used to restrict the scope of the runtime, while the JVM has required a much bigger investment to provide superb result across a wide variety of languages (HotSpot with its next-gen JIT is comparable to V8 at running JavaScript and PyPy at running Python, and not too far behind gcc at running C). The price that BEAM has to pay for that decision is that going beyond the very narrow limits of execution profile it supports well requires implementing the code in C. Which is why most large Erlang applications are mixed Erlang/C applications (Erlang for the control plane, C for the data plane), while JVM applications and library require virtually no native code (aside from the runtime itself, which is also moving more and more functionality to Java -- the next gen JIT is written entirely in Java).

The difference between the JVM (at least HotSpot; there are lots of JVMs) and BEAM is that BEAM is a reasonable, Erlang-specific (or languages with similar semantics to Erlang) VM, while HotSpot is a state-of-the-art, general purpose(ish) VM, with many, many man-centuries behind it.

jacquesm · on Aug 30, 2015

Another very large difference between BEAM and the JVM is that BEAM operates on 'reductions' and when a thread has exhausted it's fair share of reductions another thread is scheduled. So even though the scheduler is not pre-emptive (interrupt based) the effect is very much the same as if it were with less overhead.

pron · on Aug 30, 2015

But that's an implementation detail of a high-level user-mode thread implemented on top of kernel threads, and something you can implement on the JVM (I know I have) at the language level just like BEAM implements that in C.

jlouis · on Aug 30, 2015

The JVM could have had like 99% of all the VM market by now had Sun just opted to fix two things back in the day:

* GC intrinsics, so you could implement functional languages easily.

* Tail calls, so you could implement functional languages easily.

I note LLVM made the same mistake :)

pcwalton · on Sept 1, 2015

> I note LLVM made the same mistake :)

The tail call problem is a problem of interoperability with C ABIs, not a "mistake" in LLVM. If LLVM hadn't made the decisions that it did, it wouldn't have gotten any traction at all, since you wouldn't be able to link LLVM code with any other libraries (including the system ones).

pron · on Aug 30, 2015

The JVM has nearly 99% of the non-Windows server VM market (you can't beat MS on Windows). And tail calls are coming once they matter enough to the users.

What do you mean by GC intrinsics?

jlouis · on Aug 30, 2015

I'm talking about getting academia onboarded back in 1996 here. If you look at the industrial market, then sure, but I'm looking at what it takes to get languages that doesn't look like the normal imperative piece of crap to run on the JVM, and without tail-calls that is just hoop-jumping.

As for the GC, the interplay is the ability to tell the runtime where your pointers are. It is one of the places which usually lacks because many VMs assume a calling convention, like the one in Java, say. The JVM is a bit better here as it uses a stack-based engine, and as such is somewhat simpler to handle.

Functional compilers rarely, if ever, uses the standard conventions for handling this. Especially if they want to avoid boxing polymorphic parameters, and expand them for speed.

We could have had far better client-side and academic penetration of the JVM by now, had Sun played their cards differently. But alas, they didn't, and we are stuck with a Server VM market only.

pron · on Aug 30, 2015

When you say academia you mean academic PL research, which makes up a very small portion of CS academia. Most CS people care about FP just as much as the industry does, and the JVM is quite popular in the algorithmic fields (maybe not as much as C, but more than any other managed runtime).

Aside from tail-calls (which will come once people really ask for them), the JVM is about to have pretty much everything functional languages can ever need (value types and excellent box-elision), and the new JIT (Graal) is the biggest breakthrough in compiler technology in the last decade or so. PL academics are drooling over it. ECOOP had a full-day workshop dedicated just to Graal.

And you don't want to change calling conventions, because language interoperability is one of the JVM's greatest strengths. Here it is running JS, R and C, in the same REPL with Graal: https://dl.dropboxusercontent.com/u/292832/useR_multilang_de...

MichaelGG · on Aug 30, 2015

While I'm sympathetic to bashing non-functional stuff, the JVM holds the record for languages. The CLR got the tail call thing partially right (AFAIK they didn't work well and weren't guaranteed) but due to the Windows only aspect MS chose, got almost no 3rd party language support.

The client side issue was much more to do with the terrible UI stuff and insistence on cross platform (hint: actual end users rarely care about cross platform since they tended to only run one platform at any time.) I believe MS was particularity enamored with Java but Sun slapped them away.

Client side was not lost due to lack of functional support. This should be trivially verifiable by the fact that fp is still only lightly used in industry.

Noonespecial2 · on Aug 30, 2015

Not sure why this JVM comparison is pulled all the time, from anyone with a little JVM experience it is clear the author has not a lot of knowledge of the JVM. Please don't.

Sent the author of the paper some mails, but sadly got no reply.

_ZeD_ · on Aug 30, 2015

Do you have a better article to point to? I'm eager to read about it!

Noonespecial2 · on Aug 31, 2015

No sorry, haven't found one. Most articles are either written by Erlang or by Java evangelists. I would wish for a comparision from someone with experience in both and without an agenda.

rdtsc · on Aug 30, 2015

The 3-5 usec overhead, with VM optimization flag, for dirty schedulers is pretty good.

What's the overhead of just passing the data through to a regular NIF? Probably gets burried in the jitter caused by cache and memory access times...

(For others, if you don't know about the Erlang VM, and didn't understand the first couple of paragraphs, dirty schedulers is a new feature that solves the problem of running user created, long running, C extension code inside the Erlang VM, without blocking the rest of the VM).

jlouis · on Aug 30, 2015

With jitter, the call time for a constant, in a dynamically linked library, is around 1200ns:

	  enacl_nif:crypto_box_ZEROBYTES/0                  
	           value  ------------- Distribution ------------- count    
	            1000 |                                         0        
	            1100 |@@@@                                     110903   
	            1200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      879765   
	            1300 |                                         7131     
	            1400 |                                         689      
	            1500 |                                         218      
	            1600 |                                         45       
	            1700 |                                         29

rdtsc · on Aug 30, 2015

Thanks. That is a bit higher than I would expect but not too bad.

motoboi · on Aug 30, 2015

Do Java Native Interfaces suffer the same sort of problem?

rdtsc · on Aug 30, 2015

Java maps its threads to native OS threads one-to-one. So if you execute in a thread in Java then jump to C you'll block that particular thread. Erlang implementas N:M concurrency where N Erlang processes map to M schedulers (a scheduler here is a separate OS thread ). N is usually greater than M. So you might have N=100K processes mapping to M=16 threads. If one of those processes makes a call to a C module it might block large (say 10k) number of other processes from executing.