Plasma GitLab Archive
Projects Blog Knowledge

BLOG ON CAMLCITY.ORG: Plasma

Plasma: Map/Reduce for Ocaml

Plasma MapReduce and PlasmaFS - by Gerd Stolpmann, 2010-06-25

I'm very proud to announce the public availability of Plasma MapReduce, a map/reduce compute framework, and PlasmaFS, the underlying distributed filesystem. All of this is written in Ocaml and makes it now possible to develop map/reduce programs in a functional programming language.

Plasma MapReduce is a distributed implementation of the map/reduce algorithm scheme. In a sentence, map/reduce performs a parallel List.map on an input file, sorts and splits the output by some criterion into partitions, and runs a List.fold_left on each partition. Only that it does not do that sequentially, but in a distributed way, and chunk by chunk. Because of this Plasma MapReduce can process very large files, and if run on enough computers, this also will work in reasonable time. Of course, map and reduce are Ocaml functions here.

This all works on top of a distributed filesystem, PlasmaFS. This is a user-space filesystem that is primarily accessed over RPC (but it is also mountable as NFS volume). Actually, most of the effort went here. PlasmaFS focuses on reliability and speed for big blocksizes. To get this, it implements ACID transactions, replicates data and metadata with two-phase commit, uses a shared memory data channel if possible, and monitors itself. Unlike other filesystems for map/reduce, PlasmaFS implements the complete set of usual file operations, including random reads and writes. It can also be used as unspecialized global filesystem.

Both pieces of software are bundled together in one download. Here is the project page.

This is an early alpha release. A lot of things work already, and you can already run map/reduce jobs. However, it is in no way complete.

Gerd Stolpmann works as O'Caml consultant
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml