BLOG ON CAMLCITY.ORG: WasiCaml
The portability story behind WasiCaml - by Gerd Stolpmann, 2021-07-15
As the name suggests, WebAssembly provides a fairly low-level virtual machine for running the code. The instructions are comparable to the ones you find in a CPU, e.g. load, store, arithmetic. The code is structured into functions which take a fixed number of parameters and return a single result. The functions can have local variables that can be read and written by the code. The parameters and variables can have one of four numeric types (i32, i64, f32, and f64).
For example, this is a WebAssembly module with just one function that increments a 32 bit number at a memory location by one, and returns the value:
(module (import "env" "memory" (memory $memory 1)) (func $incr (export "incr") (param $x i32) (result i32) (local.get $x) (i32.load) (i32.const 1) (i32.add) (return) ) )
Here, the code is given in the textual format known as WAT. For running it, you first need to convert it to the binary format (WASM), e.g. with a tool like wat2wasm.
Also note that there is an operands stack:
the result on this stack, and
i32.load loads the number
from the address found on the stack, and also pushes the result on the
stack. This stack is mainly meant to express the code in a very compact
way. The engine running code normally translates the stack operations
into a more efficient form before starting up.
A WebAssembly VM is equipped with linear memory, i.e. the memory addresses go from 0 to a maximum address, without fragmentation, and without address ranges supporting special semantics like mapped files. The memory is only used for data - the running code is inaccessible (i.e. the VM has a Harvard architecture), and this also includes the call stack and other parts of the VM (e.g. you cannot iterate over the local variables of the functions). In order to also support indirect jumps, there is a way to reference functions by numeric IDs.
While the WebAssembly standard defines how to express the code and how to run it, there is still the question how to use it with popular languages like C, and Rust. The WASI standard is an ABI that answers a lot of the questions. As an ABI it defines calling conventions, but it is not limited to that. In particular, there is a version of libc that defines a Unix-like set of base functionality the language-specific runtime can use. Also, WASI defines a set of host functions that play a role comparable to system calls in the WebAssembly world, and that allow access to files, the process environment, and the current time. With the help of WASI you can compile many C or Rust libraries to WebAssembly, and the porting effort is low.
WASI is multi-lingual environment, and you can in particular link code written in different languages into the same executable. This is possible because the language-specific runtimes have a common foundation (libc), and e.g. memory allocated from one language also counts as "taken" within the other language.
WASI is still in an early stage. While developing with it I discovered a couple of bugs, but the functionality is already impressive and usable for many purposes.
So now, what is WasiCaml, and how can I use it?
Let's assume you have a bytecode executable created by something like
ocamlc -o myexecutable mycode.ml
Now, you can further translate the bytecode executable to WebAssembly:
wasicaml -o mywasm.wasm myexecutable
If you want to run this executable, you need a specially configured WebAssembly engine which can be found in ~/.wasicaml/js after installation:
node ~/.wasicaml/js/main.js ./mywasm.wasm ./mywasm.wasm arg ...
mywasm.wasm binary is portable and can be run
For simplicity, wasicaml can also generate a wrapper that hides the
node invocation, and this is triggered by just omitting
the .wasm suffix:
wasicaml -o mywasm myexecutable
Now you can run the program simply with
./mywasm (but note
that the wrapper is not portable).
Another option is to link in C libraries like e.g.
Of course, the C library must also be WASI-compatible.
wasicaml -o mywasm.wasm myexecutable -cclib ~/.wasicaml/lib/ocaml/libunix.a
Note that WasiCaml-produced code can so far not be run with wasmtime or wasmer, in particular because there is no machinery for exception handling in these engines. Browsers are fully supported, though.
WebAssembly is still a very new technology and information about it
is rare. For example, it took a while until I understood that LLVM
includes a full-featured assembler for WebAssembly, i.e. you can feed
code.s file, and you get a
file back with partially linked WebAssembly code. This is documented
nowhere, and I could only figure out some parts of the assembler syntax
by reading the source code of LLVM.
The very first task was then to get the OCaml bytecode interpreter working in a WASI (plus EH) environment.
Essentially, this means that I wanted to (1) clone the OCaml source
configure it, and (3)
bytecode interpreter (and the whole OCaml bytecode toolchain). The
C compiler comes from the
and it compiles directly to WebAssembly. Now, if you just set the
CC variable to this C compiler,
will consider the target as a cross-compile target. Such targets
are still very tricky, and - because we actually can run
the code somehow - I thought it is better to avoid cross-compilation
altogether, and to add some tooling so that binaries are
Instead of pointing
CC directly to the C compiler of
the WASI SDK, there is now a wrapper script
The main purpose of this script is to reshape the WebAssembly
executables so that they are directly runnable on the host
system. This is accomplished by prepending a starter to the
WebAssembly code. The starter runs
the right driver script, and extracts the WebAssembly code from the
executable file. For example, if you do
the resulting file
wasi_cc -o ex code.c
excan be directly run with
With this trick,
configure now "thinks" that the
target is a native target of the operating
configure could also run the tests on the
existence of the various libc library functions the OCaml runtime
needs, and figured out a lot of that stuff correctly. Nevertheless,
not everything was working, and I had to fork the OCaml sources in
order to disable functions that are not available
for the changes).
A final difficulty was that function pointers in WebAssembly are typed
- which is a logical consequence of the fact that functions are typed.
OCaml generates a file
prims.c that initializes the list
of FFI functions, and initially LLVM did not like this file, because
it could not infer the types of the function pointers. The solution
was not to generate WebAssembly for this single file but
to leave it as LLVM IR ("bitcode"). In this format function pointers
can remain untyped, and the LLVM linker is smart enough to fix up
the problem at link time, and to convert LLVM IR to WebAssembly when
the types of the FFI functions are known.
With this trick, everything worked fine! The speed of the bytecode interpreter did not slow much down in WebAssembly, which was very encouraging.
After the bytecode interpreter was running, the second step was to directly generate WebAssembly code from OCaml. Actually, there were two choices: either to pick up one of the internal formats of OCaml (e.g. "Lambda" or "C--") and to change the OCaml compiler directly, or to take the bytecode as the starting point. I preferred the latter because WasiCaml is then an add-on processor that can be easily added to existing OCaml projects, and because some difficulties could be avoided (e.g. incremental compilation, and many many fixups through the whole toolchain). Also, I hoped that the resulting speed would still be "good enough" (at least for the purposes of the DSL compiler we wanted to run with WebAssembly).
Also, bytecode made it also a lot easier for me to get started. There were really a lot of unanswered questions: what does the function call mechanism look like? How do we get around the problem that OCaml code typically requires tail calls to be working but there aren't tail calls in WebAssembly (yet)? What does the code look to allocate a block of memory? How do we emulate exceptions? Picking bytecode meant that I could focus on these questions, while the bytecode instructions could initially be translated in a naive way, e.g. by translating each bytecode instruction separately to a fixed block of WebAssembly instructions (like instantiating a template). (Note that the current WasiCaml compiler is already a lot better than that.)
Picking bytecode also meant that WasiCaml inherits the bytecode stack. This is actually not a bad thing - because of OCaml's memory management the stack must reside in addressable memory, and the bytecode stack could serve as what the WebAssembly community calls a shadow stack. (Even for the C language there is a shadow stack - and the alternative would have been to also use the shadow stack of the C language.) So we got the shadow stack for OCaml code practically for free.
The stack is important because the garbage collector must be able to run over all locations where OCaml values are stored. As already mentioned, the locations WebAssembly natively supports cannot be traversed over (like local and global variables), and hence it is crucial to put OCaml values into memory whenever there is the chance of a garbage collector run.
Note that the native OCaml compiler is not much different in this respect - only that the native stack of the operating system can be used for storing values because it resides in memory. The details are different, though. When a value is moved temporarily to the stack, this is usually called "register spilling", and this is done because (1) there is only a limited amount of registers, but another register is needed, or (2) you don't know which register remains untouched when you call a function, or (3) you call some code that may run the garbage collector. Now, in WebAssembly, reason (1) is never the case because there can be any number of local variables (which take over the role of registers), and the details of (3) are very different, because in a native environment the registers are global stores, permitting some time-saving tricks that are unavailable in WebAssembly.
So, for developing the WasiCaml code emitter, this meant that it had to follow constraints so that OCaml values end up on the stack in the right moment. Actually, these constraints mainly shaped the layout of the WasiCaml code.
Once WasiCaml was working, we got back to the DSL compiler we originally wanted to make cross-platform. And we actually got it running! There was one remaining problem, though: WebAseembly is a 32 bit environment. As you may know, OCaml suffers from some limitations in this case. Most annoyingly, strings can only be 16 MB in size at most.
Fortunately, this problem occurred only here and there, mostly in the code emitter. Here, we could switch to ropes as alternate representation - and, lucky as we were, it turned out that this change did not eat much performance.
The DSL compiler is quite big, and the WebAssembly version takes around 3 seconds to start up. This is longer than usual, but for our application we could hide the startup time, and are now quite happy with the product.
PS. Interested in WebAssembly and you know OCaml (or another functional language like Elm, Scala, Haskell, ...)? We might have a job for you (July 2021).