BLOG ON CAMLCITY.ORG: Ocamlnet

Camlcity.org gets a shared cache

Ocaml and multicore programming - by Gerd Stolpmann, 2011-05-07

The website camlcity.org (where this is blog is published) is running a special server software written in Ocaml. Recently I made it faster by introducing a cache that is directly shared by several worker processes. The cache module consists only of a few lines of code, and makes use of the new Netmulticore library.

Netmulticore is a part of Ocamlnet, and because the camlcity.org software is also developed with Ocamlnet, it was quite natural and simple to use this shared memory interface.

But let's step back first and look at the special problem that was solved. Camlcity.org is not a simple web server - it is a cascade of a front-end server and several back-end servers. The front-end is, more or less, mixing the data coming from the back-ends, and transforms the data to a presentable form using a template engine. The back-ends are also HTTP servers. This is shown in this picture:

The nice aspect about this architecture is that the back-ends can be individually deployed, and can run on different machines than the front-end.

So, for example, if you view this blog post on camlcity.org, the text of the blog article comes from a back-end server, and the front-end creates the frame with the navigation elements. If you view this blog with an RSS reader, the front-end just wraps the back-end text differently so an RSS file is generated instead of a web page.

This architecture creates a little performance problem, though: For processing a user request quite a number of accesses to the back-end servers are required. Not only the article text needs to be fetched, but also the required templates, and for generating the navigation elements, also some neighbor texts (parents and siblings) need to be requested from the back-ends. This can add up to a dozen or more requests, and was the reason why using camlcity.org often felt a bit sluggish.

Note that only use multi-processing is used: There is a master process starting as many worker processes as needed (the workers can be of different type, here front-ends or back-ends). The workers can run in parallel, and even take advantage of several processor cores. Also, the workers are fully separated from each other, so that a malfunction (including crash) of one worker does not affect other workers. A good feature for a software running 24/7.

However, multi-processing makes it difficult to share data between workers. The workers have only their own process-local memory, and cannot normally not make data available to others. Well, Netmulticore changes the game at this point.

The improved architecture introduces a cache on the front-end side. This cache stores all back-end responses where it is suspected they could be requested soon again:

This cache resides in explicitly allocated shared memory (using the POSIX interface shm_open). The Netmulticore library is used to manage this block of shared memory. Besides other data structures there is also Netmcore_hashtbl, an adaption of the well-known Hashtbl module of the standard library for use in shared memory.

As this code is really short and nice, I just show it here:

type cache_obj =
    [ `Fields of Json_type.t * string option 
    | `Refs of Json_type.t
    | `Html of Nethtml.document list
    ]

type cache_hdr =
    { mutable lock : Netmcore_mutex.mutex;
      mutable next_gc : float
    }

type cache =
  (string, float * string * string * cache_obj, cache_hdr) Netmcore_hashtbl.t)

The values cache_obj are stored in the cache (payload data). As you see, we cannot only store strings, but structured Ocaml values (with limitations, though). The shared hashtable features a so-called header which exists once per hashtable. Here, cache_hdr includes a mutex (to ensure that only one process can write at a time), and the field next_gc which is the point in time when the hashtable will be checked next for elements exceeding their lifetime. The cache, finally, maps URLs (given as strings) to tuples

(timeout, path1, path2, url,
element)

. Here, timeout is the point in time when the elements needs to be evicted from the cache, and the paths and URL are further metadata.

The cache lookup is as easy as:

let cache_lookup path =
  let cache = get_cache() in
  let (t_out, real_path, real_url, obj) = Netmcore_hashtbl.find_c cache path in
  if Unix.time() >= t_out then raise Not_found;
  (real_path, real_url, obj)

Note that we leave out here get_cache because it involves a bit of application-specific management code.

The function Netmcore_hashtbl.find_c creates a copy of the values found in the shared cache. This is required because we cannot allow that pointers to shared values escape the scope of this module - such pointers need special treatment (there are some programming rules to be followed). The copy is put into normal process-local memory, so these rules no longer apply then.

For storing value in the cache we have:

let cache_store path real_path real_url obj =
  try
    let cache = get_cache() in
    let now = Unix.time() in
    let t_out = now +. float !cache_default_timeout  in
    let hdr = Netmcore_hashtbl.header cache in
    Netmcore_mutex.lock hdr.lock;
    ( try
        if now >= hdr.next_gc then (
          let l = ref [] in
          Netmcore_hashtbl.iter
            (fun p (t,_,_,_) ->
               (* Warning: p, t are in shared mem *)
               if now >= t then l := p :: !l
            )
            cache;
          List.iter
            (fun p ->
               (* Warning: p is in shared mem *)
               Netmcore_hashtbl.remove cache p
            )
            !l;
          (* Floats are boxed! *)
          Netmcore_heap.modify
            (Netmcore_hashtbl.heap cache)
            (fun mut ->
               hdr.next_gc <- Netmcore_heap.add mut t_out
            );
        );
        if Netmcore_hashtbl.length cache < !cache_limit then
          Netmcore_hashtbl.replace
            cache path (t_out, real_path, real_url, obj);
        Netmcore_mutex.unlock hdr.lock;
      with
        | error -> 
            Netmcore_mutex.unlock hdr.lock;
            raise error
    )
  with
    | Netmcore_mempool.Out_of_pool_memory ->
        Netlog.logf `Warning "Shared cache: Out of pool memory"

We use a lock to ensure that only one process can write at a time. This lock is managed with Netmulticore's Netmcore_mutex module. Essentially, the lock guarantees that all modifications done at write time are done atomically, and thus consistency is preserved.

As you see we now and then throw out all elements exceeding their lifetime. This is done (for simplicity) by iterating over the whole hashtable, and checking each element. The keys of the found elements are gathered up in l, and are removed in a second step. Note that the iteration gives us direct pointers to shared memory, e.g. p is a string residing in shared memory. One has to be very careful with such values, because Netmulticore provides less guarantees how long such values exist than Ocaml programmers are used to. For example, once a key p is removed from the table, the string counts as no longer referenced, and can be deleted by Netmulticore's internal memory manager - even if we still have the p variable (because Netmulticore cannot cooperate with Ocaml's memory manager for this purpose).

Another strange thing is the Netmcore_heap.modify function. It is required for modifying shared values in-place, here next_gc. The value t_out is a float stored in normal process-local memory. Assigning it directly to next_gc would create an illegal pointer from shared memory to local memory (resulting in a crash). By using the "write protocol" as shown here, the float is copied to shared memory before doing the assignment.

The solution is surprisingly short. It was never so simple to profit from shared memory in Ocaml programs. The reason is that we need not to deal with serialization formats to translate values to strings. We just store values directly! It should also be noted that there are additional dangers resulting from shared memory. The worker processes are no longer completely isolated from each other - we made an exception by sharing memory for the cache. If a worker fails to comply to all programming rules required for accessing shared memory, not only this worker will crash, but all workers. Another risk are the shared locks. Imagine what happens when a worker is terminated in the middle of the cache_store function (e.g. by sending a signal from outside). The lock will never be released again, and the other workers will wait forever for the lock.

Anyway, these risks are manageable, and are roughly equivalent in severity to what multi-threaded programming is also exposed to. In summary, Netmulticore solves some of the problems arising from using multi-processing, and is definitely worth considering it.

Gerd Stolpmann works as O'Caml consultant

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge