erlxsl progress

Erlxsl is one of my pet projects – it aims to provide proper xslt support for Erlang/OTP by linking to existing C/C++ based implementations at runtime. 

We started out life handing off to libxslt, but that proved problematic due to (a) thread safety concerns and (b) memory leaks that proved hard to track down. Then we moved to an implementation based on Sablotron. Making the move was easy, due to our simple “plugin” style design (little more than defining an xslt engine api as a set of ‘extern’ functions that are dynamically linked during compilation).

Now we’ve decided to go all out and use Xalan-C++ as our primary xslt provider. And still, despite all of this progress, the source code has never been published in our sourceforge based subversion repo! Why on earth not – you may well ask!? Are we being stupid and forgoing change control!?

Heaven forbid! But in fact, erlxsl is being developed for use in real world, production systems. As such, there’s a little contract negotiation going on over the terms of the licence I’ve chosen (the vanilla BSD license, in this case). Until I’ve reached an agreement with the others who’re involved, I’m not willing to release to sourceforge due to the potential legal ramifications. <sigh>

Once all that’s over, I’m looking forward to seeing this in action though. Not least because I’m keen to try out erlyweb, but don’t like it’s template views much! 😉

, , , , , , , ,

  1. #1 by Albert Lash on July 5, 2008 - 5:36 pm

    I’m interested! I’ve been working on several apps which can run on a few different interpreted language platforms thanks to XML, XSL, SQL. Would be nice to try erlang too, but xslt is a must.

  2. #2 by loggerheadz on July 6, 2008 - 5:51 pm

    Hi Albert,

    Cool, glad there’s some interest outside the Erlang community too. We’re currently working on making the build scripts work nicely across platforms and replacing the the Sablotron xsl implementation with Xalan-C. I’ll try and get svn sorted out soon so people can get a look at the current state of the project. We’re also looking into setting up some robustness/performance testing and I’ll probably open up a wordpress blog specifically for the project soon.

    On another note; one of the reasons I went for an embedded-C implementation was lack of decent xml support in Erlang – xmerl is slow and the API is horrible, so I wanted to just hand off to Xalan and the like. Since attending the Erlang Exchange even in London recently though, I’ve come across what looks like an excellent SAX implementation in pure Erlang (called erlsom), so once erlxsl is stable enough, I might have a stab at a pure Erlang XSLT implementation based on it [erlsom] – I’ve already had some ideas about that but previously the lack of a good xml library put me off.

    Cheers,

    Tim Watson

  3. #3 by Albert Lash on July 7, 2008 - 12:56 am

    Hi Tim, Thanks for the reply. I’ll take a look at erlsom, maybe I can help.

  4. #4 by Albert Lash on July 8, 2008 - 3:15 am

    Victor created an XSL library for erlang with sablotron:

    http://www.erlang.org/user.html#sablotron-1.0

    I tried it out, works fine. Haven’t done much with it though, I still have plenty to learn with erlang.

  5. #5 by loggerheadz on July 10, 2008 - 6:33 am

    Hi Albert,

    Yes I’ve seen Victor’s library – in fact it was this that inspired me to start working on erlxsl. I can’t use a library like Victor’s in a production environment where I work, because it’s (a) affected by the inherent latency of inter-process communications (it serializes data between the erlang runtime and an external C application), (b) needs more robust error handling, which we’d then have to maintain and (c) isn’t supported.

    We’re implementing erlxsl as a linked in driver – it is an embedded C library that is hosted and runs within the same address space as the erlang emulator, it doesn’t have to interact (e.g., read/write) with any external process, the heap allocated data is passed initially by reference and we’ve implemented a very simple protocol, such that we don’t even have to use the external term format (the modus operandi for most Erlang/C bridges) and can operate directly on the binary (which looks like a normal buffer to our C code) and cast to and from the data representation(s) we’re expecting. This works well and is very fast, with minimal data copying and is very easy to understand (and therefore maintain). Although our source isn’t in svn right now, you can get a 50k ft overview from the project home page (http://erlxsl.sourceforge.net/) – which is also hopelessly out of date – my kids seem to be alergic to sleep these days! 😉

    As for a pure erlang xslt implementation, yes I’d really like to do this and any help would be much appreciated. As I mentioned before, I’m going to try and get erlxsl into production by the last qtr of this year, so that will remain my priority. An experimental xpath 2.0/xslt 2.0 implementation in erlang would be lots of fun I think! I’ve some initial ideas around this stuff:

    1. We should be able to parallelize easily – this is Erlang after all.

    2. Operate on binaries and in the internal data structures you create when parsing, instead of copying the underlying data to the (internal) nodes, store the offsets into the binary. Accessing a binary in this way is extremely fast (efficient) and easy (elegant) in Erlang. Binaries are also heap allocated (unlike most everything else), passed by reference and therefore this will minimize a lot of overhead (minus the overhead of having to lookup the data when you actually need it though). I suspect that you can build up a result tree for each xsl:document (in 2.0) using a similar strategy, resolving the binary offsets to get to the data only when serialization needs to take place. Point (1) should make a lot of this easier too – concurrent lookups across different result trees, map-reduce/scatter-gather, first in wins, etc. Lots of fun optimizations to play with.

    3. When we parse an style sheet, instead of storing the instructions in memory as a thunk, let’s generate a new module and load it into the emulator dynamically at runtime. In case you haven’t heard, hot code loading is one of Erlang’s killer features and is very simple (and works across multiple executing Erlang processes/micro-threads). In addition to this, Erlang follows the whole “code as data” paradigm and generating a module at runtime is just a matter of producing a list of terms and passing them to the compilation support module(s). For an example, either see the docs or google for “erlyweb”, the erlang web framework – one of its internal subsystems performs runtime code generation in this way and it’s very simple.

    A compiled module can be made super efficient, especially as we’re able to pattern match. Hmn….

    call_template(‘mytemplatename’, etc…) ->

    apply_templates(‘/’, etc…) ->
    LName = variable(prev_name, select(“local-name(.)”, Context)),
    apply_templates(select(“child::*”, Context), [ LName ]);

    apply_templates(…. etc.

    That seems to have the right feel to me, obviously xpath needs to be pre-compiled (generating a compiled xpath expression structure directly into the module code isn’t much additional work, given that we’re generating the module code itself anyway), we’re actually passing around reference(s) to some data structures and xpath functions and so on. Anyway, it’s obvious that a functional language with pattern matching is a good fit for this. If you’re still interested in a few weeks time (after I’ve managed to tidy up and check in erlxsl and update the project site a bit), then we’ll talk some more about it. I know at least 1 other person who might be interested (though he’s even shorter on time than I am) and I think a team of 3 or 4 would be ideal for this. A pair to focus on the xpath/xml stuff and another to look at the xsl (code gen, etc), which [second pair] can just code against some interface (parameterized modules would be hugely useful here, but aren’t in R12 at all – I’m hoping they’ll make it in soon though, and in the meanwhile it’s passing module names to server init functions I’m afraid).

  6. #6 by Jonathan Harrington on October 19, 2010 - 3:53 pm

    Hi, Tim.

    Did you ever manage to release this code? The erlang & XSLT lanscape still seems to be fairly barren.

    • #7 by Hyperthunk on December 15, 2010 - 10:56 am

      Hi Jonathan,

      Sorry for the big delay in getting back to you. I didn’t finish this work, but I’m considering resurrecting this code on github – perhaps we could collaborate? I’ve been rather tied up with other projects, but I could probably progress this with someone’s help. The main problem I ran into was rather than using the OTP library code to read the binaries passed between erts and the driver, I was just munging into a structure and unsurprisingly this stopped working once I incremented the erts version. Also whilst I was running the driver in async mode, I hadn’t dealt with synchronisation and/or caching of compiled stylesheets etc.

      Let me know if you’re interested in working together on this and I’ll set up a project.

      Cheers

  1. Erlang XSLT at Docunext Technology

Leave a reply to Jonathan Harrington Cancel reply