BLOG ON CAMLCITY.ORG: Real Life
The Story Behind Hydro - by Gerd Stolpmann, 2008-03-14
For two years now I have done consulting work for Wink Technologies who started out as a user-powered search engine, but switched later to people search. Actually, people search began as an experiment, and was originally developed in O'Caml with some Java standard components. It turned out that O'Caml was very well suited for crawling and parsing, and that our solution was convincing enough so we could go on with O'Caml as implementation language. It was also a big plus that it was already possible to develop server clusters in O'Caml with Ocamlnet's SunRPC implementation - we only had to add a highly available directory and configuration service. However, the problem of SunRPC is that there is no good C++ implementation (interestingly, there is an acceptable one for Java, RemoteTea). Too bad, since SunRPC is simple and robust, and its type system matches the one of O'Caml quite well (there are records, arrays, variants, and option types).
Looking for an alternative, somebody came up with ICE ("Internet Communications Engine"). This is a commercial product by ZeroC which is dual-licensed under the GPL (like e.g. MySQL). There are implementations for a number of languages, including C++, Java, Python, and PHP, but unfortunately not for O'Caml. Well, this is no surprise, but at least the other company languages are covered. So we looked closer at ICE. Does it match our needs?
ICE follows the object-oriented paradigm. For an RPC middleware this means that a remote call is seen as sending a message to a remote object. Of course, it is possible to have several such objects of the same type, and creating instances is made possible by a class construct. Such a design is very acceptable, but unfortunately object orientation often meant in the past that the rest of the type system was crippled. To some degree, this also happened to ICE - especially, there are no variants and no option types. Well, not optimal, but there are at least clean "design patterns" how to emulate these missing features with classes, and for pure OO languages like Java the ICE approach simplifies the language mapping.
Unlike CORBA, there is a fixed protocol the components have to use to talk to each other. That means a client in language X can directly contact a server in language Y, and there is no need for an intermediate instance to translate the protocol. Basically, this means you can use ICE without any infrastructure - no central server you are dependent upon. For developing massively parallel cluster services this is an essential requirement, because such central servers don't scale well enough, and are single points of failure.
For using ICE in a cluster context, there is the IceGrid add-on. Basically, this is a highly available directory and configuration service, and serves for a similar purpose like the service we had developed for SunRPC before. Clients ask IceGrid where to find their servers in the network, and IceGrid replies with a suggestion of TCP ports. This can be used for load-balancing and for high availability.
After ICE was found to be good enough, we needed an implementation for O'Caml. Well, this was my field - I already developed the SunRPC support in Ocamlnet years ago, and this made me an expert for this type of work. It took only about 3 weeks until it was possible to generate client code, and about another week until server support was ready. However, it was still challenging work, because the ICE type system needed to be mapped to O'Caml's type system. Furthermore, the ICE reference manual was full of errors, and everything had to be checked against ZeroC's implementation.
The difficulty of the type mapping is that ICE demands that objects and exceptions can be downcasted. O'Caml, however, does not support this operation, because there is no efficient implementation of downcasting for a type system like O'Caml's that includes structural subtyping. Nevertheless, downcasting is a reasonable operation in the context of RPC, and it is hardly possible to get around it.
Maybe an example demonstrates this best. In Slice, the IDL for ICE, one can easily define a hierarchy of classes (the syntax resembles Java's):
class SearchResult {
string url;
string title;
}
class PeopleSearchResult extends SearchResult {
string name;
}
class BandSearchResult extends SearchResult {
string bandName;
stringSeq bandMembers;
}
When the search engine returns a SearchResult
item, it
can also be one of the descendants of this class. Of course, a client
of the search engine that simply wants to display the result, needs to
know all details, and thus downcasts SearchResult
to the
real subclass.
In a normal OO program one can get rid of this downcast by adding an operation for displaying the result:
class SearchResult {
string url;
string title;
string display();
}
In an RPC context such an addition might be difficult, however, or may
break some other principle of the RPC design. Basically, RPC is about
marshalling data, and that means getting data out of the context of
one server and forcing them into the context of another server. The
"unity of data and operations", one of the OO principles, is
intentionally given up.
Note that ICE allows to define operations for classes, and
operations are always executed in the context of the data. In this
example, it would be in deed possible to define display
in a reasonable way, and to avoid the downcast. However,
display
then becomes part of the protocol, although it is
rather a detail of the client. Anyway, one quickly faces the situation
where downcasting is unavoidable.
In the O'Caml mapping generated by Hydro, these three classes would appear like
class type o_SearchResult =
object
inherit o_Ice_Object
method url : string ref
method title : string ref
end
class type o_PeopleSearchResult =
object
inherit o_SearchResult
method name : string ref
end
class type o_BandSearchResult =
object
inherit o_SearchResult
method bandName : string ref
method bandMembers : string array ref
end
val as_SearchResult :
#Hydro_lm.object_base -> o_SearchResult
val as_PeopleSearchResult :
#Hydro_lm.object_base -> o_PeopleSearchResult
val as_BandSearchResult :
#Hydro_lm.object_base -> o_BandSearchResult
This is a bit simplified, but shows the idea. The ICE classes are
mapped to O'Caml classes with some hidden machinery. The data members
appear as O'Caml methods returning references - the most direct
translation of this concept. The class hierarchy corresponds to the
hierarchy in ICE, so the O'Caml operator for upcasting, :>, can
be directly used. The hidden machinery comes into play by inheriting
from o_Ice_Object
, the root of the ICE hierarchy, and
by using object_base
, an even smaller antecedent that
defines the marshalling core.
The downcast operation is emulated by defining conversion functions
for every class type: as_PeopleSearchResult
checks whether
the argument is a PeopleSearchResult
in reality, and if so,
casts it to this class type. If not, an exception is raised.
Of course, this emulation is a bit inconvenient, but this is mostly a problem of generating good code. From a user's perspecitve, there is not much difference between calling a generated conversion function, or using a built-in language operation. It makes, however, the whole generated code a lot more difficult to understand.
In the people searcher context we use now both SunRPC and ICE. The former is arguably better when only O'Caml components have to talk with each other, and the latter is for crossing the language boundaries.
Links: