BLOG ON CAMLCITY.ORG: OMake
Faster builds with omake, part 1: Overview - by Gerd Stolpmann, 2015-06-16
OMake is not only a build system (like e.g. ocamlbuild), but it also includes extensions that are important for controlling and customizing builds. There is an interpreter for a simple dynamically typed functional language. There is a command shell implementing utilities like "rm" or "cp" which is in particular important on non-Unix systems. There are system interfaces for watching files and restarting the build whenever source code is saved in the editor. In short, OMake is very feature-rich, but also, and this is the downside, it is also quite complex: around 130 modules and 80k lines of code. Obviously, it is easy to overlook performance problems when so much code is involved. For me as the developer seeing the sources for the first time the size was also a challenge, namely for identifying possible problems and for finding solutions.
This is what I got for omake-0.9.8.6 (note that a different computer was used for Windows, so you cannot compare Linux with Windows):
|Size n||Parallelism j||Number of modules (n^4)||Runtime Linux||Runtime Windows|
This clearly shows that there is something wrong, in particular for Linux as OS: For the n=8 number of 4096 modules, which is around 1.7 times of the 2401 modules for n=7, omake needs around three times longer (for a single-threaded build). For Windows, the numbers are slightly better: the n=8 build takes 2.5 of the time of the n=7 build. Nevertheless, this is quite far away from the optimum.
Note that this is not good, but it is also not a catastrophe. The latter shows up if you try to use ocamlbuild. I couldn't manage to build the n=7 test case at all: after 30 minutes ocamlbuild slowed down to a crawl, and progressed only with a speed of around one module per second. Apparently, there are much worse problems than with OMake. (Btw, it would be nice to hear how other build systems compete.)
|Size n||Parallelism j||Number of modules (n^4)||Runtime Linux
|n=7||j=1||2401||169 (3.8)||317 (1.1)|
|n=7||j=4||2401||59 (3.6)||163 (1.1)|
|n=8||j=1||4096||363 (5.3)||661 (1.3)|
|n=8||j=4||4096||144 (4.2)||330 (1.0)|
As you can see, there is a huge improvement for Linux and a slight one for Windows. It turns out that the Linux version ran into a Unix-specific issue of starting commands from a big process (the OMake main process reaches around 450MB). OMake used the conventional fork/exec combination for doing so, but it is a known problem that this does not work well for big process images. We'll come to the details of this later. The Windows version never suffered from this problem.
The scalability is now somewhat better, but still not great. For both Windows and Linux, the n=8 runs take now around 2.1 times longer than the n=7 runs.
Another aspect of the performance impression is how long a typical incremental build takes after changing a single file. At least for OMake, a good measure for this is the zero rebuild time: how long OMake takes to figure out that nothing has changed, i.e. the time for the second omake run in "omake ; omake":
|Parameters||Runtime Linux omake-0.9.8.6||Runtime Linux 2015-05-18
|n=7, j=1||16.8||8.4 (2.0)|
|n=8, j=1||39.2||15.6 (2.5)|
The time roughly halves. Note that you get a similar effect under Windows as OMake doesn't start any commands for a zero rebuild. Actually, most time is spent for constructing the internal data structures and for computing digests (not only for files but also for commands, which turns out to be the more expensive action).
As OCaml supports gprof instrumentation I also tried this but
without success. The problem is simply that gprof looks at the wrong
metrics, namely only at the runtimes of the two innermost function
invocations in the call stack. In OCaml this is usually something like
String.sub, i.e. at both
levels there are general-purpose functions. This is useless
information. We need more context for the analysis (i.e. more levels
in the call stack), but it depends very much from where the function
Another problem of gprof was that you do not see kernel time. For analyzing a utility like OMake whose purpose is to start external commands this is crucial information, though.
For measuring the size of OCaml values I used objsize.
Summarized, the following improvements were done:
There is still one problem I could not yet address, and this problem is mainly responsible for the long startup time of OMake for large builds. Unlike other build systems, OMake creates a dependency from the rule to the command of the rule, as if every rule looked like:
i.e. when the command changes the rule "fires" and is executed. This is an automatic addition, and it is very useful: When you start a build after changing parameters (e.g. include paths) OMake automatically detects which commands have changed because of this, and reruns these.
target: source1 ... sourceN :value: $(command)
However, there is a price to pay. For checking whether a rule is out of date it is required to expand the command and compute the digest. For a full build the time for this is negligible (and you need the commands anyway for starting them), but for a "zero rebuild" the commands are finally not needed, and OMake expands them only for the out-of-date check. As you might guess, this is the main reason why a zero rebuild is so slow.
It is probably possible to speed up the out-of-date check by doing a static analysis of the command expansions. Most expansions just depend on a small number of variables, and only if these variables change the command can expand to something different. With that knowledge it is possible to compile a quick check whether the expansion is actually needed. As any expression of the OMake language can be used for the commands, developing such a compiler is non-trivial, and it was so far not possible to do in my time budget.