BLOG ON CAMLCITY.ORG: GODI
PXP-1.2.1 with a new reference manual - by Gerd Stolpmann, 2009-02-03
Writing documentation is something programmers do not like very much, and also in the case of PXP the code was far ahead of any description about it. There was a "User's Guide", but it took an oldish approach of explaining things I don't like anymore. Also, it was very incomplete. Last year, I got some funding from a company to improve the PXP documentation, so I faced the problem to reorganize it completely, and to add anything missing.
If you would like to take a look at the result, here it is: The PXP Reference.
The old "User's Guide" was written as docbook document. This is a good general-purpose text format that allows one to structure a large text into chapters, sections, etc., and to generate viewable and printable output from it (especially one can convert it into a bunch of HTML pages, and into PDF). However, there is one difficulty: It does not integrate well with ocamldoc-style interface references.
The "User's Guide" predates ocamldoc - when I wrote the first version of PXP documentation I had no other chance than to use some third-party tool to process it. Now what to do? Stick with docbook, and include ocamldoc somehow into the processing chain? The docbook format has clearly more features for formatting text, e.g. one can easily include pictures. However, ocamldoc cannot output in a format that would be convertible to docbook with only little effort, and this made this way unfeasible.
I decided to switch completely to ocamldoc. Not only the module
interfaces should be documented with it, but also the various
introductory chapters explaining concepts spanning several modules.
Since O'Caml 3.09, ocamldoc understands the file suffix *.txt and
takes these input files as pure documentation. One can still use all
formatting directives like {2 headings}
or
{!Hyperlinks}
pointing to code elements. However, there
was still the difficulty of missing features.
So I looked at developing a custom HTML generator (I am mostly
interested in outputting HTML). It is possible to load an add-on
into ocamldoc that modifies its behaviour. One just has to write
a class that inherits from Odoc_html.html
, and
overrides its methods:
class chtml = object(self) inherit Odoc_html.html as super method private html_of_<foo> ... = ... end let chtml = new chtml let _ = Odoc_args.set_doc_generator (Some chtml :> Odoc_args.doc_generator option)
Of course, it was still the question whether my features could be added this way (without rewriting half of the generator class). Yes, they can, and it only needed about 160 lines of code. I must admit it took quite a long time to develop this code, since I had to dig into internals of ocamldoc to understand it better. But anyway, ocamldoc turns out to be a customizable utility.
What I added in particular:
{picture}
tag for including picturesinclude Module
in interfaces so that the included interface is directly shown
instead of only the include statement as such. For clarity,
the included interface is indented, and has grey background.
This change can be turned on and off with a {directinclude}
tag.include
change requires another feature to be
really looking good. All references (hyperlinks and plain occurrences)
pointing to the included module should be rewritten so that
they point to the including module instead. That means if
module N
uses include M
we want that all
references M.x
are changed into N.x
.
The intention is that M
is no longer referenced,
and that the duplication of definitions in two modules cannot
confuse readers (especially those that are unfamiliar with the
module system).
I added that feature for my specific case, and the ocamldoc tag
{fixpxpcoretypes}
enables that rewriting. (It
changes Pxp_core_types.[S|I]
into Pxp_types
.)
include
feature.
With {knowntype}
and {knownclass}
one
can add identifiers to the lists of known types and classes, so
that the generator will emit hyperlinks to them, although there is
no such definition in reality. It turned out that many identifiers
were already pointing to the including module, but because there is no
definition in the mli file, ocamldoc does not make these identifiers
clickable. With {knowntype}
and {knownclass}
one can change that on a case by case basis.
The full source code of the custom generator class can be studied here:
chtml.ml. The module with the mentioned include
directive is Pxp_types
. Look here how nice
the generated page is.
XML is a cute and simple text format, right? Many people think like that, but given the fact that many XML parsers are either feature-rich and slow, or poor and fast, there must be some complexity in the XML definition. Recently, I read the article "XML fever" (by Erik Wilde and Robert J. Glushko, Communications of the ACM, issue 7, 2008), where the authors point out a number of deficiencies in the definition of XML that can lead to delusion about XML, and finally into "fever". After years of maintaining this XML parser, I can only second the authors. Clearly, there are problems even in the fundamental XML specification.
I do not want to complain about this - XML is widely used, and many of
the standards are practically unfixable without breaking large numbers
of programs. For me the problem arose how to explain all that. For
example, there is the question what is to be considered as the root
node of an XML tree. This is a conceptual question, and the
explanation should not be hidden in an interface description of a PXP
module. For that reason, I had to add a number of chapters to the
manual that explain concepts and generally introduce into the PXP
world. All the Intro_*
chapters are like this.
The nice thing is now that I can add direct links from introductory chapters to interface references and vice versa, since all documentation is now processed with the same utility, ocamldoc. When some complicated issue arises in some function description, it is now possible to point to the section in the introduction where this issue is explained in detail, and conversely, I can point to the definition in the interface when a function or type is used in an intro chapter.
I must admit that my interest in XML has not gained in the last years, to say it politely. XML is most often used as a base technology for HTML, or as a data exchange format. Many of the advanced XML standards like XSLT or XQuery have not found the way into the daily life of us programmers. The hype is over.
Nevertheless, I promise that I will still maintain PXP, and now and then add another feature. For example, there is a nice XPath evaluator in the development pipeline - again, I do not find time to finish it, but hey, there are still many years for doing it. (By the way, if you want to accelerate that and have some money, we will find a way to quickly finish XPath.)
In August 2009, PXP becomes 10 years old (counted from the first mentioning in the O'Caml mailing list). This is already a long time for a software library and an open source project. I am quite confident it will now also reach its 20th birthday!
Links: