BLOG ON CAMLCITY.ORG: Ocamlnet 3

Stranger in a strange land

What's new in Ocamlnet 3: The Win32 port - by Gerd Stolpmann, 2009-10-20

Ocamlnet is being renovated, and there will be soon a first testing version of Ocamlnet 3. The author, Gerd Stolpmann, explains in a series of articles what is new, and why Ocamlnet is the best networking platform ever seen. One of the fundamental and intriguing improvements is the port to Win32 - after all, Win32 isn't well-known as good platform for asynchronous programming, but Ocamlnet is based on this programming style. A number of hard problems had to be tackled.

In the POSIX world (let me use "POSIX" as general term for Unix/Linux/BSD) the select() system call is known as the linchpin for dispatching file descriptor events. Generally, a program using select() looks like

(* Event loop: *)
while <something to do> do
  <find out interesting descriptors>
  select();
  <interpret events>
done

and the crucial point is that all kinds of descriptors can be passed as input to select(). This makes it possible to wait simultaneously for very different events, like socket events, pipeline events, or events for devices. There is no such universal select() in Win32. The select() call provided by Win32 only works for sockets. There are other kinds of mechanisms for event handling, though, but they are less systematic, and one faces the problem that different ways of watching for events need to be integrated into the single event loop.

Since version 3.11, Ocaml includes a "fancy" version of select() in its standard library, which actually tries to emulate the POSIX semantics by combining several Win32 event-handling approaches in a quite tricky way. The reader might ask what the whole point for Ocamlnet is then - one could simply have relied on this emulation. However, there are a number of drawbacks. First, some of the used emulation techniques are incredibly simplistic. For example, if one of the descriptors references the output side of a pipe, the implementation falls back to a form of busy waiting (wasting CPU time). The input side of a pipe always signals that the pipe has space for new data (which may block the program). There is no special support for named pipes, although Win32 supports them much better than anoynmous pipes. The second problem is that the emulation is limited to 64 descriptors only - far too less for servers [please see update below]. For all these reasons, Ocamlnet does not make use of this select() emulation, but follows its own, more ambitious approach. Actually, the basic problem of this emulation is that select() is not the right level of abstraction for combining multiple ways of waiting for events into a single operation.

Update: I've got a message from Sylvain Le Gall, the implementor of this emulation code. He explained that actually 4096 handles can be waited for (which is true, sorry for my mistake). Also, he points out that he does not know how to handle anonymous pipes better, and that even Cygwin uses the same technique as his code. The reason for not treating named pipes specially is that the Ocaml standard library does not support them anyway. - I think his implementation decisions are perfectly rational for a general-purpose and drop-in select() emulation, and the standard library is now way better than before his contribution. However, Ocamlnet has special needs (like using named pipes as substitute for Unix Domain sockets), and by switching to pollsets (see below) as the API for handling events the system resources can be better managed.

Overlapped I/O

If you asked Microsoft whether Windows supported asynchronous programming well, they would point you proudly to overlapped I/O. Actually, overlapped I/O is a form of asynchronous I/O, but it is nevertheless very different from select()-style I/O. It works a lot like a non-blocking TCP connect: One has to start the I/O operation, and it is signaled to the caller when the operation is completed. The signal can be a callback (APC), or it can be an event variable that is set to signaled state. The difference to select() is that the latter indicates in advance whether an I/O operation would be possible (and non-blocking) before the operation is started whereas overlapped I/O requires that the operation is actually initiated, and one can only wait asynchronously for its completion. (Look here for how Microsoft explains overlapped I/O.)

This article is not about which style is better, but how to port programs using Ocamlnet that assume select() loops as their basic construction principle. So the question here is: Can we get some kind of emulation of select() for overlapped I/O?

Before answering, let us look at for what we would need overlapped I/O. Essentially, one can use it for files, sockets, and a few IPC mechanisms. Ocamlnet does not have support for reading or writing files asychronously anyway, and for sockets it is easy to use the Win32 call WSAWaitForMultipleEvents() in order to combine waiting for sockets with other events. Actually, Ocamlnet is mostly interested in overlapped I/O for named pipes, because named pipes are a good replacement for the otherwise missing Unix Domain sockets and socketpairs. (Note that Win32 named pipes are a different IPC mechanism than POSIX named pipes, and support connection multiplexing like TCP sockets.)

The Win32 functions ReadFile() and WriteFile() can be used to start an overlapped I/O request by passing an OVERLAPPED struct to them (for a non-overlapped request the argument would remain NULL). This causes that the function returns immediately, and that the completion of the operation is signaled to the caller by an IPC mechanism. In Ocamlnet Win32 events are used for that purpose. A Win32 event is a synchronization primitive that works similar to a condition variable: It can be in unsignaled or in signaled state, and it is possible to suspend the program until it enters signaled state. Win32 events have the big advantage that they provide the required level of genericity: Many Win32 objects are actually either subtypes of Win32 events, and implement the event interface directly, or they can be connected to Win32 events. For example, a process handle is also an event, and it is signaled when the referenced process is terminated - so waiting for the process handle as event means to wait until the process is finished. It is also possible to wait until one of several events enter signaled state (WaitForMultipleObjects()). By putting an event into the OVERLAPPED struct Windows notifies the user about the completion of the operation by signalling the event.

Collecting events and waiting until one of them is signaled sounds already a lot like select(). Still, the problem remains that one has to start the operation before one can wait for it. There is no way around it on the Win32 level. The only chance for porting Ocamlnet was to change the level of abstraction the user code sees. Actually, it was possible to provide an emulation, but the price is that the user code must no longer invoke the generic read/write operations, but special wrappers that do the required impedance transformation. So Unix.read and Unix.write are forbidden when dealing with named pipes, and instead the special wrapper functions Netsys_win32.pipe_read and Netsys_win32.pipe_write have to be called. Ocamlnet provides buffers for input and output, so that pipe_read only reads from the input buffer (and raises EWOULDBLOCK if the buffer is empty), and that pipe_write only writes to the output buffer (and raises EWOULDBLOCK if the buffer is full). In addition to that, Ocamlnet organizes that overlapped I/O operations are started in the background when data needs to be pumped from the named pipe to the input buffer, or from the output buffer to the named pipe. This way, the overlapped operation is hidden from the user, and Ocamlnet can provide a view so that an event signals when a pipe_read or a pipe_write can actually process data.

Here is a small subset of the named pipe API provided by Ocamlnet in the module Netsys_win32:

type w32_event   (* Win32 event objects *)
type w32_pipe    (* A pipe endpoint *)
type pipe_mode = Pipe_in | Pipe_out | Pipe_duplex

val pipe_pair : pipe_mode -> w32_pipe * w32_pipe    
  (* like socketpair *)

val pipe_read : w32_pipe -> string -> int -> int -> int
  (* like Unix.read *)

val pipe_write : w32_pipe -> string -> int -> int -> int
  (* like Unix.write *)

val pipe_shutdown : w32_pipe -> unit
  (* like Unix.shutdown *)

val pipe_rd_event : w32_pipe -> w32_event
val pipe_wr_event : w32_pipe -> w32_event
  (* get the events notifying about read/write possibility *)

val wsa_wait_for_multiple_events : 
      w32_event array -> int -> int option
  (* wait for a number of events, or until a timer times out *)

For instance, this code reads from two named pipes p1 and p2 simultaneously, and outputs the data to stdout:

let s = String.create 1024

let try_read p =
  try 
    let n = Netsys_win32.pipe_read p s 0 1024 in
    if n=0 then raise Exit;   (* deal somehow with eof *)
    print_string (String.sub s 0 n)
  with Unix.Unix_error(Unix.EWOULDBLOCK,_,_) -> ()

let loop() =
  try
    let e1 = Netsys_win32.pipe_rd_event p1 in
    let e2 = Netsys_win32.pipe_rd_event p2 in
    while true do
      match Netsys_win32.wsa_wait_for_multiple_events [| e1; e2 |] (-1) with
        | None -> ()
        | Some _ ->
            try_read p1;  (* always try both for simplicity of the example *)
            try_read p2
    done
  with
    | Exit -> ()

Note that p1 and p2 have type Netsys_win32.w32_pipe, and not Unix.file_descr.

This example has the shape of the select() loop outlined at the beginning of the article. There are still differences, though, to the POSIX way of doing it: The file handle provided by the OS is hidden by the Netsys_win32 layer, and cannot be directly used by the program (because this could break the abstraction). Also, one first has to create event objects (here by calling pipe_rd_event) in order to set up waiting. Last but not least the emulation itself is also not free of subtle artefacts introduced by the Netsys_win32 layer. In particular, there is no way of cancelling the overlapped I/O operations performed under the hood of the emulation (one can only close/disconnect the pipe to stop them). This can be an issue when the file descriptor is passed on to other processes. (N.B. Windows Vista promises to solve the cancellation issue, but I had not yet a chance to test it.)

Of course, this approach only works when the watched Win32 file object implements overlapped I/O. If not, one can only read and write synchronously, and Ocamlnet provides special helper threads for dealing with this issue. This is discussed in more detail below. First lets look how to generalize select() so it can also be backed by the Win32 call WSAWaitForMultipleEvents().

The pollset class type

The select() call is reflected by the Ocaml standard library as a function

val Unix.select :
  file_descr list -> file_descr list -> file_descr list -> float ->
    file_descr list * file_descr list * file_descr list

This interface has a few disadvantages. First, in every round of waiting one has to pass all descriptors to select(). This is time-consuming, and the reason for the bad reputation of select() with regards to performance (although in reality is not as bad as some bloggers pretend). Second, there is no way to cancel an already started select() from a different thread. This is important for multi-threaded programs, because a second thread may want to change the list of descriptors the first thread is watching.

Ocamlnet uses now a different interface for polling descriptors, so-called pollsets:

class type pollset =
object
  method find : Unix.file_descr -> poll_req_events
  method add : Unix.file_descr -> poll_req_events -> unit
  method remove : Unix.file_descr -> unit
  method wait : float -> 
                ( Unix.file_descr * 
                  poll_req_events * 
                  poll_act_events ) list
  method dispose : unit -> unit
  method cancel_wait : bool -> unit
end

These sets are used in this way: Descriptors may be added and removed from the set, and for each descriptor one can specify which events to watch for (reading or writing). When the set is ready, the user can invoke wait to start waiting for the specified events. The function returns the events that are actually signalled by the OS. It is possible to cancel waiting at any time by calling cancel_wait true.

I had not only the Win32 port in mind when designing the pollsets, but also POSIX-type OS. For example on Linux there is the epoll API that operates on a similar data structure, and that can easily back a pollset implementation.

The Win32 implementation of pollsets is done in two layers. The basic class Netsys_pollset_win32.pollset already supports all kinds of descriptors Ocamlnet needs, but is restricted to watch at most 64 Win32 event objects (corresponding to 63 sockets, or 31 named pipes). This restriction is abandoned by Netsys_pollset_win32.threaded_pollset. However, the latter class requires that the program is multi-threaded.

Essentially, the implementation works by inspecting the descriptors to be watched, and by looking up the required helper objects (like calling Netsys_win32.pipe_rd_event to get the Win32 event object reflecting the read status of a pipe). After that, WSAWaitForMultipleEvents() is invoked to start waiting, and when events happen, they are mapped back from the signaled event objects to the connected file descriptors. The cancel_wait feature is supported by always adding an additional Win32 event object to the set of watched events which is set to signaled state when cancel_wait is called.

Of course, this is only a rough sketch of the algorithm. It is quite complicated which helper objects are actually needed, and how they affect the central WSAWaitForMultipleEvents() call. Of course, this depends very much on the type of the descriptors put into the pollset, and it would go too far to fully present these details in this article.

However, one thing should not remain "magic" to the reader: In the above paragraphs, I pointed out that the representation of Win32 objects like named pipes is complex (e.g. it includes buffers, OVERLAPPED structs, and Win32 event objects), and that an opaque type like Netsys_win32.w32_pipe needs to hide the details of the representation from the user. Also, I mentioned that using the Unix.file_descr of the named pipe handle would break the abstraction, and that the handle is made unavailable to user code for this reason. However, pollsets nevertheless use file descriptors for passing system objects around. How does this fit together?

Ocamlnet does not give up on Unix.file_descr as the central type for referencing system objects - switching to a different type for this purpose would break tons of user code. Instead, a tricky mechanism has been added allowing us to keep Unix.file_descr but also to attach further management objects to such descriptors. This is explained in detail below. The crucial idea is that Ocamlnet introduces artificial descriptors that are only used for identifying system objects but that cannot be used for actually performing I/O. So the descriptor handed out to user code for a named pipe is not the Win32 handle for the named pipe (which would allow to do I/O and to break abstractions), but it is an additionally allocated handle that only exists for the purpose of identifying the system object. This handle, now called proxy descriptor, is the value passed to pollsets and other interfaces assuming Unix.file_descr as the type for referencing system objects.

Handling I/O with helper threads

Before looking at proxy descriptors, I should briefly present how Ocamlnet deals with other file handles than named pipes.

For sockets everything is very easy. As mentioned, the pollset implementation is based upon WSAWaitForMultipleEvents() which is actually a Winsock function. It supports sockets directly - no tricky emulation layers are required.

Win32 distinguishes anonymous pipes as returned by Unix.pipe from named pipes. Anonymous pipes do not support overlapped I/O. As this kind of pipes is important for starting subprocesses, Ocamlnet nevertheless tries to provide an asynchronous API for them. Because only synchronous I/O is possible helper threads need to be created which implement buffers in much the same way than it is done for overlapped I/O: The helper threads pump data from the buffer to the pipe, or from the pipe to the buffer (depending on the direction of I/O). The user code only accesses the buffer in a non-blocking way, and Win32 event objects are used to signal the state of the buffer (empty or full). The resulting API looks very much like the API for named pipes, and it is also required that the special read and write functions of the API are called by user code instead of Unix.read and Unix.write, and there are also proxy descriptors. As the implementation is done by helper threads, there is the difficulty how to stop these threads when there is no more interest in watching the descriptors. Unfortunately, this is not possible in the general case - when the pipe "hangs" the thread will also hang, and there is no means to interrupt it (there are no signals (software interrupts) in Win32, and thread cancellation is a hot issue). As anonymous pipes are mostly used for driving external processes this seems to be acceptable (there is always the fallback solution to kill the process).

The Win32 consoles are supported in the same way as anonymous pipes.

Even processes can be waited for. Although there is no direct data flow (neither read nor write make sense in any way), processes are referenced by means of file handles. When the handle is set to signaled state, this means that the process has terminated. So process handles can be added to pollsets, and this makes it easy to wait for the termination of a subprocess in parallel to managing the I/O over the pipes that are connected with the process.

For other types of file handles there is no good support yet (except one creates the mentioned helper threads). Of course, adding support would be easy for all handles where Win32 allows overlapped I/O. However, this seems not to be urgent.

Proxy descriptors

Back to the trick Ocamlnet uses to keep Unix.file_descr while having complex management objects for controlling asynchronous I/O. For example, one can get a proxy descriptor for a named pipe by calling:

val pipe_descr : w32_pipe -> Unix.file_descr

The returned descriptor cannot be used for anything except for looking up the attached named pipe up:

val lookup_pipe : Unix.file_descr -> w32_pipe

(There is also a slightly more general lookup function that can be used for any type of Win32 object using proxy descriptors.)

The proxy descriptors are backed by real file handles (otherwise it could happen that the next open() returns the same handle, and the proxy descriptor would no longer be identifiable as such), but a cheap kind of handle was chosen to avoid too much resource consumption. There is a hidden global table that maps proxy descriptors to the referenced complex objects, and by GC trickery it is ensured that the table shrinks when proxy descriptors are freed by GC runs (note that Unix.file_descr is a heap-allocated value for Win32, so we can add finalisers).

Of course, user code has to close the proxy descriptors when they are no longer needed (but only when they were actually requested). This means they have the same "lifetime" as normal file descriptors which also need to be closed after use.

Some kind of generic API

The consequence of the chosen emulation approach is that for each kind of file object a different set of I/O functions need to be called. This may be acceptable for special operations like connect where a generic approach is hard to get right, but is totally impractical for simple reading and writing. It would be required to call different functions that have very similar signatures, e.g. (read case) pipe_read for named pipes, input_thread_read for objects managed by helper threads, and of course the well-known Unix.recv for sockets and Unix.read for normal files.

In the Netsys module a simple generic approach of handling read and writes is available. There is a function inspecting the kind of file descriptor, and a set of generic functions for actually performing read/write:

type fd_style
  (* indicates the kind of descriptor (details omitted here) *)

val get_fd_style : Unix.file_descr -> fd_style
  (* get the file descriptor style *)

val gread : fd_style -> Unix.file_descr -> string -> int -> int -> int
  (* generic read: call the right implementation function depending
     on the fd style *)

val blocking_gread : fd_style -> Unix.file_descr -> string -> int -> int -> int
  (* similar to gread, but it is blocked until at least one byte can
     be read *)

val really_gread : fd_style -> Unix.file_descr -> string -> int -> int -> unit
  (* similar to gread, but it is blocked until exactly the passed number
     of bytes are read *)

(* similar functions are available for writing, for shutting down, and
   for closing
*)

If the descriptors are proxy descriptors, these functions automatically look up the underlying complex management object and invoke the right I/O function. If the descriptors are sockets, they call socket functions like Unix.recv. Otherwise, they fall back to Unix.read or Unix.write.

Large parts of Ocamlnet have been ported so they use this generic layer instead of directly calling Unix.read or Unix.write. For example, the class Netchannels.input_descr wraps a netchannel object around a file descriptor, and it has been changed so it can now also deal with all kinds of descriptors supported by gread.

High-level I/O

Many users don't want to see all these implementation details I have reported so far. They want to just use the high-level I/O functions like Http_client. The question is what can be supported on Win32.

Fortunately, the answer is - thanks to dealing with these details carefully - that almost everything works! Although not every module has been fully tested yet, the difficult modules could be ported, and there is now the conviction that the simpler ones are not in any way problematic.

The most difficult case was Netplex. Of course, there is no way to support multi-processing as there is no fork() equivalent in Win32. However, multi-threading works well. The socketpairs connecting the containers with the controller have been replaced by pairs of connected named pipes. For Unix Domain sockets there is the possibility of using either named pipes, or Internet sockets bound to localhost.

As Netplex uses SunRPC as base library, it was of course also possible to port this Ocamlnet feature. SunRPC cannot only be used on sockets, but also on named pipes.

Another difficult beast was the Shell library for starting and managing external processes. It is now as easy to create complex pipelines of interconnected subprocesses for Win32 as it used to be for POSIX.

The Nethttpd web server library could also be verified to be working, even in conjunction with Netplex.

Where to get Ocamlnet 3

There is no official release yet, not even an alpha release for developers. In order to get it, one has to check out the Subversion repository (use the svn command on this URL, or click on it to view it with your web browser - most of the discussed code lives in src/netsys).

The Win32 port of Ocamlnet requires the MinGW port of Ocaml. Also, the same set of base libraries are needed as for POSIX, especially PCRE. The simplest way to install that is to use GODI which also supports MinGW.

Gerd Stolpmann works as O'Caml consultant

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge