Report from the First International "Semantic Web with Perl" Hackathon

Held at the London Hackspace from the 27th to the 30th of March 2011.

See also the invitation.

Participants and what they focused on

Dmitry Tsarkov and Toby Inkster worked mostly on adding reasoning capabilities to the Perl stack.

Gregory Todd Williams, Mischa Tuffield and Robert Barta worked on query engines mostly.

Florian Ragwitz made a great contribution to the above by providing his XS-expertise, which was instrumental in getting the Perl layer working with the C layer.

Chris Prather and Espen Borgen made a major refactoring of RDF::Helper to resurrect it from the dead.

Finally, Kjetil Kjernsmo, worked on configuration systems, testing, and generally roaming around, coordinating and organizing the hackathon.

Areas of development

Reasoning

The main goals of this activity was to enable reasoning engines to be used from Perl. To this end, we wanted to have a pure-perl reasoner, now known as RDF::Closure, and the possibility to call the C++-based reasoner FaCT++ from Perl. This was achieved by FaCT++ getting a C API that can be used through an XS bridge. Further, an implementation of OWL Functional Syntax was made on both sides to achieve integration. The new module OWL::DirectSemantics has been released with more than 2000 lines of code and 3000 lines of documentation.

High-level API

RDF::Helper was written by Kip Hampton (who couldn't make it) in 2004-2006 to provide a consistent and more Perlish API towards the various toolsets that existed at the time. Meanwhile, RDF::Trine solved half of the problems that RDF::Helper set out to solve.

It was decided to drop the goal of integration across toolsets, and rather concentrate the effort on RDF::Helper being a high-level API on the top of RDF::Trine. The RDF::Redland was nevertheless retained to be able to confirm that all tests passed also after the major refactoring.

Moose was used in the refactoring, and RDF::Helper itself is now an interesting example on how to refactor an old module to use Moose, while retaining the old API.

RDF::Helper is now in a releasable state, pending upload to CPAN. It now uses Dist::Zilla for the packaging.

Finally, work to bring Class::OWL onto the new RDF::Helper was started. Class::MOP had changed underneath it, but the necessary changes were done to bring it uptodate with these changes. It was decided that more work on OWL Direct Semantics is also needed to advance it.

Store integration work

A major goal of the hackathon was to integrate RDF::Trine and RDF::Query with other quad store backends. The emphasis was on 4store and AllegroGraph, since they were represented by people close to the code.

Originally, RDF::Query decomposed an incoming SPARQL query into algebra object, which was passed to a query planner for execution using RDF::Trine's API. It was decided that it should be able for a backend store to signal to RDF::Trine that it was willing to deal with the whole query execution process itself, and so that RDF::Query would not attempt to parse the query, but rather send the whole query down to the backend. Yet, the functionality provided by RDF::Trine should be supported as well.

With support for both the bypass mechanism and RDF::Trine gives the user the possibility to use the native SPARQL implementation of the backend, or RDF::Query's implementation, which is leading in terms of functionality, and is likely to be amongst the first to support SPARQL 1.1 fully.

AllegroGraph

AllegroGraph takes advantage of this bypass-behaviour of RDF::Trine, and a new RDF::Trine::Store::AllegroGraph is now ready for release, pending the release of RDF::Trine. This is to be considered a public beta, as some more work is required to make it completely usable for mainstream usage.

4store

A lot of work was done on 4store internals to make integration possible. Like AllegroGraph, 4store also takes advantage of the bypass-behaviour. 4store is now packaged using autoconf and it exposes a C API, which is something that was created during the hackathon. Code is also written to use this C API from RDF::Trine. To get this into a releaseable state, more coordination is needed with the core 4store developers, but the required programming time is down to a few hours. When this is done, and 4store has gone through the Garlik QA process, a new 4store release is expected, and then a new RDF::Trine::Store::FourStore release will be done.

Testing

Code to test stores have previously been confined to individual test cases, but since they are common to many different stores, they will be available in the RDF::Trine distribution as a new module Test::RDF::Trine::Store which can be called as old-style functions to test any store, and give a fair impression of whether the driver works as expected.

Configuration system evaluation

The configuration system has two issues we wanted to resolve: First, field keys, were inconsistent, e.g. the base URI was denoted base in some applications, and base_uri in others. In addition, some users had expressed concerns with the long dependency chain of Config::JFDI, which is currently used.

A survey was made to see if any alternatives would fulfil all the requirements, of being able to merge different configuration files, but also use environment variables to control certain behaviour, such as the location of the configuration files. It was not found any other modules that would fully replace Config::JFDI and it was decided that since the dependencies are rather common modules, it should not be a concern great enough to warrant spending a lot of time on for now.

This was further resolved with regards to consistency of configuration fields:

store
Used to configure a RDF::Trine::Store. The value of this key can be passed directly to the constructor.
base_uri
The base URI used to create absolute URIs from relative URIs.
namespaces
A key-value hashref containing prefixes and keys and a single namespace URI as value (some more work may be required on this)
update
A boolean value indicating whether Update operations should be allowed to be executed.
load_data
A boolean value indicating whether and RDF::Endpoint should use URLs that appear in FROM and FROM NAMED clauses to construct a SPARQL dataset by dereferencing the URLs and loading the retrieved RDF content.
service_description
An associative array (hash) containing details on which and how much information to include in the service description provided by an endpoint if no query is included for execution. The boolean values 'default' and 'named_graphs' indicate that the respective SPARQL dataset graphs should be described by the service description.

Other things

We also had a lengthy discussion about the RDF::Redland module, as it is outdated on CPAN but is important to provide better performance for Perl users. We also discussed if we should replace the SWIG interface with an XS interface, but it was decided against that. A new CPAN package was created, and it turned out to be quite straightforward.

RDF::Endpoint is now nearing release and is now in line with the current state of the SPARQL 1.1 Protocol document. In addition, there have been major interface improvements to the endpoint UI (although most of this work happened before the hackathon).

A developer release of Test::RDF was made, an ordinary release will be done once RDF::Trine is released, but this was also mostly done before the hackathon.

It is noteworthy that most of the code discussed on the hackathon is packaged for Debian and entered the testing release in the last few days. Thus, it is likely that it will be released with Ubuntu in October and with the next stable Debian.

What we didn't do (much)

The big Moose (which is an Meta-object-protocol-based object system) discussion didn't happen. That is not to say that there weren't a lot of Moose work being done, and that the value of migrating the rest of the modules weren't recognized, quite the contrary. It is already clear that there would be great value in migrating RDF::Trine and RDF::Query to Moose, and a discussion was held that made it clear that the diversity of RDF query engines made it desireable to have a Moose::Role based architecture. However, there were already so many important tasks to be undertaken for the integration that did happen on the hackathon that the detailed discussion about how we should use Moose to achieve more long-term goals was deferred to a later hackathon.

Further work on RDF::ACL did not happen either, partly because it was felt that to significantly advance it, it would need more OWL, which was a major topic.

Conclusion

This hackathon made a real ecosystem of the Perl and Semantic Web community. A large amount of code was written, and many outstanding issues were resolved. In spite of the good digital competence of the participants, it is clear that such advances can only be made by meeting in real life.

The organizer wishes to thank all the participants for their contribution, and for a good time. Another thanks goes to the London Hackspace for providing an excellent venue for the hackathon.

Sponsors