BLOG ON CAMLCITY.ORG: Ocamlnet
Ocaml and multicore programming - by Gerd Stolpmann, 2011-04-12
The approach of Netmulticore is unusual, but AFAIK the only one that works without modifying the Ocaml runtime and/or compiler. Instead of using kernel threads and implicitly sharing memory, Netmulticore forks full-fledged processes as thread replacements, and allocates explicit shared memory that is accessible by all worker processes. By doing this allocation directly at program startup, it is ensured that all processes see the shared memory block at the same address (which would not be the case if the processes mapped the block individually). The shared block is now treated in a very special way - shared memory has to cope with some difficulties that normally do not exist. All in all, this creates a setup where each process has its normal Ocaml heap (which is not shared with other processes) plus access to the shared block. This is not a bad situation provided we can make the access to the shared block as convenient as possible, and this is what Netmulticore is really about. If we can achieve a nice programming API for dealing with shared data, the multicore issue is solved, and probably even in a more scalable way than in other runtimes where all threads have to share a single heap, and constantly step on each others' feet.
So what we want to have is that we can store normal Ocaml values into the shared block - including all regular data representations such as tuples, variants, records. Also, we want to allow modifications of values - immutable shared memory is not worth the fun. Netmulticore solves both issues - it provides direct read access to Ocaml values stored in the shared block, and it allows modifications (but the programmer has to follow a special API).
The initially allocated shared block is broken down in smaller units called shared heaps. The heaps do not have a fixed size but can grow and shrink (just like the regular Ocaml heap). The heaps are now the containers for the Ocaml values: When creating a heap, one can copy an initial Ocaml value into it, and by following the special rules for mutation it is possible to put more values into it (or remove values). The shared heaps are structured similarly to the normal Ocaml heap, and contain the value blocks densely packed one after the other. Free areas are managed with a free list. When the shared heap fills up, a specially implemented garbage collector tries to find unreachable values and reclaims the space (using the mark-and-sweep design). Shared heaps have a lock which synchronizes accesses to it, and, unfortunately, limits the degree of concurrency. Especially, only one process can write to a heap at a time. The programmer can, if necessary, work around this limitation by using several heaps - each heap has its own lock, so tricky application designs can avoid lock contention.
This sounds like a nice idea, but there are some traps. In the next two paragraphs I'll try to give an impression of the difficulties. The essence is, after all, that managing shared heaps requires a disciplined programmer, or the memory gets corrupted. This is the downside of the Netmulticore approach - it is quite easy to crash the program by not following one of the programming rules. (This is, however, also true for "normal" multi-threaded programming.)
The first difficulty is that this design requires that the shared heaps are self-contained. This means that no pointer must ever exist that points from a shared heap to the normal process-local Ocaml heap, or from a shared heap to a different shared heap. The first kind of pointer would cause invalid memory accesses if a second process dereferenced such a pointer. The second kind of pointer confuses the garbage collector. What is still allowed, of course, are pointers from process-local memory to the shared heaps. (The garbage collector built into the Ocaml runtime fortunately does not follow such pointers.) However, one should be careful: the garbage collector cleaning the shared heap from time to time will not see that such "external" pointers exist, and will not keep the referenced data alive. It is left to the programmer to do something about it.
The second difficulty is how to actually do mutation in shared heaps. If you have every written a memory manager, you'll probably know the problem: Each allocation can cause a GC run, and this can invalidate what you've just put into the heap but is not yet considered as reachable by the GC. The solution is to keep a set of further roots, i.e. pointers the GC must also consider although they are not in the memory region the GC manages. I omit here the details - the point is that mutation requires a special procedure so that such additional roots can be managed. This is a bit like declaring arguments of wrapper functions with the CAMLparam macros. A similar convention exists for Netmulticore, only that it has to be done on the Ocaml level.
In some sense, these modules are "ports" of the corresponding data structures that are provided by the standard library. During porting I had especially to change the way the data is mutated so the mentioned programming rules are followed. (This is the main reason why these ports exist - you cannot put a normal Hashtbl into shared memory, because the normal mutation breaks the rules for shared heaps. There is no such problem when read-only data structures are copied to shared heaps.)
You'll probably have to check out the whole tree to build it.
The skeptical reader is very encouraged to look into the examples, just to see how nice the code using Netmulticore looks. It is the first time you can use shared memory from Ocaml without having to deal with data marshalling issues (because we don't use this technique). Actually, the code looks very much like multi-threaded programming, only that here and there different primitives need to be used.
It is very important to create sample programs that use a library like Netmulticore, because problems only show up in practice, and are hard to predict. There are already three non-trivial examples, and I've plans to write a few more. Expect also blog articles about this, and how the performance is (and the numbers I got so far are promising).
The Netmulticore implementation has reached some "beta quality", at least. We need a few improvements here and there, but generally the code exists and works.
It is generally a good idea to watch out how to make Netmulticore programming safer. As pointed out before, it is right now easy to crash the program (with a segfault) when missing one of the special programming rules. The OCaml compiler could perhaps help here and prevent some mistakes. What I've especially in mind here is a typing annotation whether a value is in a shared heap. This annotation would normally be invisible to the user (like the polarity annotation), but jump in at compile time when a reference to a non-shared value is stored into a shared value. However, this annotation would probably require a modification of the compiler.
If anybody is interested in testing Netmulticore, please write me. Optimizing programs for multicore is tricky, and getting more experience here would allow us to make a good step forward.