Generating XML in Perl

I've been working on a big Bricolage project recently, and one of the requirements is to parse an incoming NewsML feed, turn individual stories into Bricolage SOAP XML, and import them into Bricolage. I'm using the amazing--if hideously documented--XML::LibXML to do the parsing of the incoming NewsML, taking advantage of the power of XPath to pull out the bits I need. But then came the question: what should I use to generate the XML for Bricolage?

Based on feedback from various Tweeps, I tried a few different approaches: XML::LibXML, XML::Genx, and of course the venerable XML::Writer. In truth, they all suck for one reason: interface. As a test, I wanted to generate this dead simple XML:

<?xml version="1.0" encoding="utf-8"?>
<assets xmlns="http://bricolage.sourceforge.net/assets.xsd">
  <story id="1234" type="story">
    <name>Catch as Catch Can</name>
  </story>
</assets>

Just get a load of how you create this XML in XML::LibXML:

use XML::LibXML;

my $dom = XML::LibXML::Document->new( '1.0', 'utf-8' );
my $assets = $dom->createElementNS('http://bricolage.sourceforge.net/assets.xsd', 'assets');
$dom->addChild($assets);

my $story = $dom->createElement('story');
$assets->addChild($story);
$story->addChild( $dom->createAttribute( id => 1234));
$story->addChild( $dom->createAttribute( type => 'story'));
my $name = $dom->createElement('name');
$story->addChild($name);
$name->addChild($dom->createTextNode('Catch as Catch Can'));

say $dom->toString;

Does anyone actually think that this is intuitive? Okay, if you're used to dealing with the XHTML DOM in JavaScript it's at least familiar, but that's hardly an endorsement. XML::Genx isn't much better:

use XML::Genx::Simple;

my $w = XML::Genx::Simple->new;
$w->StartDocString;

$w->StartElementLiteral( $w->DeclareNamespace( 'http://bricolage.sourceforge.net/assets.xsd', ''), 'assets' );
$w->StartElementLiteral( 'story' );
$w->AddAttributeLiteral( id => 1234);
$w->AddAttributeLiteral( type => 'story');
$w->Element( 'name' => 'Catch as Catch Can' );
$w->EndElement;
$w->EndElement;
$w->EndDocument;

say $w->GetDocString;

It's not like messing with the DOM, but it's essentially the same: Use a bunch of camelCase methods to declare each thing one-at-a-time. And you have to count the number of open elements you have yourself, to know how many times to call EndElement() to close elements. Can't we get the computer to do this for us?

Feeling a bit frustrated, I went back to XML::Writer, which is what Bricolage uses internally to generate the XML exported by its SOAP interface. It looks like this:

use XML::Writer;

my $output = '';
my $writer = XML::Writer->new(
    OUTPUT=> \$output,
    ENCODING => 'utf8',
);

$writer->xmlDecl('UTF-8');
#$writer->startTag(['http://bricolage.sourceforge.net/assets.xsd', 'stories']);
$writer->startTag('assets', xmlns => 'http://bricolage.sourceforge.net/assets.xsd');
$writer->startTag('story', id => 1234, type => 'story');
$writer->dataElement(name => 'Catch as Catch Can');
$writer->endTag('story');

$writer->endTag('assets');

say $output;

That's a bit better, in that you can specify the attributes and value of an element all in one method call. I still have to count opened elements and figure out where to close them, though. The thing that's missing, as with the other approaches, is an API that reflects the hierarchical nature of XML itself. I'm outputting a tree-like document; why should the API be so hideously object-oriented and flat?

With this insight, I remembered Jesse Vincent's Template::Declare. It bills itself as a templating library, but really it provides an interface for declaratively and hierarchically generating XML. After a bit of hacking I came up with this:

package Template::Declare::TagSet::Bricolage;
BEGIN { $INC{'Template/Declare/TagSet/Bricolage.pm'} = __FILE__; }
use base 'Template::Declare::TagSet';

sub get_tag_list {
    return [qw( assets story name )];
}

package My::Template;
use Template::Declare::Tags 'Bricolage';
use base 'Template::Declare';

template bricolage => sub {
    xml_decl { 'xml', version => '1.0', encoding => 'utf-8' };
    assets {
        xmlns is 'http://bricolage.sourceforge.net/assets.xsd';
        story {
            attr { id => 1234, type => 'story' };
            name { 'Catch as Catch Can' }
        };
    };
};

package main;
use Template::Declare;
Template::Declare->init( roots => ['My::Template']);
say Template::Declare->show('bricolage');

Okay, to be fair I had to do a lot more work to set things up. But once I did, the core of the XML generation, there in the bricolage template, is quite simple and straight-forward. Furthermore, thanks to the hierarchical nature of Template::Declare, the tree structure of the resulting XML is apparent in the code. And it's so concise!

Armed with this information, I whipped up a new module for CPAN: Template::Declare::Bricolage. This module subclasses Template::Declare to provide a dead-simple interface for generating XML for the Bricolage SOAP interface. Using this module to generate the same XML is quite simple:

use Template::Declare::Bricolage;

say bricolage {
    story {
        attr { id => 1234, type => 'story' };
        name { 'Catch as Catch Can' }
    };
};

Yeah. Really. That's it. Because the Bricolage SOAP interface requires that all XML documents have the top-level <assets> tag, I just had the bricolage function handle that, as well as actually executing the template and returning the XML. More complex XML is just a simple, assuming that you use nice indentation to format your code. Here's the code to generate XML for a Bricolage workflow object:

use Template::Declare::Bricolage;

say bricolage {
    workflow {
        attr        { id => 1027     };
        name        { 'Blogs'        }
        description { 'Blog Entries' }
        site        { 'Main Site'    }
        type        { 'Story'        }
        active      { 1              }
        desks  {
            desk { attr { start   => 1 }; 'Blog Edit'    }
            desk { attr { publish => 1 }; 'Blog Publish' }
        }
    }
};

Simple, huh? So the next time you need to generate XML, have a look at Template::Declare. It may not be the fastest XML generator around, but if you have a well-defined list of elements you need, it's certainly the nicest to use.

Oh, and Bricolage users? Just make use of use Template::Declare::Bricolage to deaden the pain.

Backtalk

Phillip Smith wrote:

Ewwww.... very nice. And great timing too. :-)

Aristotle Pagaltzis wrote:

After I’m done hacking on it and have pushed it to CPAN, you may want to take a look at XML::Builder, which is inspired by HTML::Tiny. It’s not going to be as syntactically neat as Template::Declare, since there’s a lot of method call “noise” still in there, but the structure is very similar. And the namespace support in particular is a lot less syntactically magical.

Until I get it pushed out, you can look at it on github – check out the tests to get the idea, since it lacks any docs as yet.

Stefan Petrea wrote:

XML::Writer::Nest

You forgot XML::Writer::Nest , it makes use of destructors of objects which are called at the end of a scope. So you go deeper by embedding a new scope and creating a new tag in it and it will be destroyed at the end of that scope so you will also get your ending tag which is an awesome idea :) It saves you some keystrokes :)

Theory wrote:

Replies

@Aristotle—Yes, if I'd seen XML::Builder on CPAN, I probably wouldn't have bothered with Template::Declare. This is quite nice, though I don't yet follow how the namespace is supposed to work:

use XML::Builder;

my $x = XML::Builder->new;
say $x->tag(
    'assets', { xmlns => 'http://bricolage.sourceforge.net/assets.xsd' },
    $x->tag(
        'story', { id => 1234, type => 'story' },
        $x->tag( name => 'Catch as Catch Can' ),
    ),
);

@Stefan—XML::Writer::Nest looks awful, I'm sorry to say. The docs are lousy; the example in the synopsis doesn't even show how to get the XML out of the document, or how to set actual values for attributes. It took me a while to figure out that some methods are called on XML::Writer::Nest or its instances, while others are called on an XML::Writer object. I finally got it to work with this code:

my $output = '';
use XML::Writer::Nest;
my $w = XML::Writer->new(
    OUTPUT=> \$output,
    ENCODING => 'utf8',
);

ASSETS: {
    my $assets = XML::Writer::Nest->new(
        tag => 'assets',
        attr => { xmlns => 'http://bricolage.sourceforge.net/assets.xsd' },
        writer => $w,
    );
    STORY: {
        my $story = $assets->nest(
            'story' => { id => 1234, type => 'story' },
        );
        $w->dataElement(name => 'Catch as Catch Can' );
    }
}

say $output;

I appreciate the nesting, but I find the interface unintuitive and verbose. Yech.

Interestingly, the XML::Writer::Nest documentation mentions XML::Generator. This library seems pretty nice, too:

use XML::Generator ':pretty';

say assets(
    ['http://bricolage.sourceforge.net/assets.xsd'],
    story(
        { id => 1234, type => 'story' },
        name( 'Catch as Catch Can' )
    )
);

This adds an empty "xmlns" attribute to the <story> tag, but otherwise is just about right. I'm not even sure how it magically creates those functions for me, given that they're not declared anywhere. The OO interface for it is a bit clearer:

require XML::Generator;
my $x = XML::Generator->new(':pretty', encoding => 'utf-8' );
say $x->xmldecl, $x->assets(
    ['http://bricolage.sourceforge.net/assets.xsd'],
    $x->story(
        { id => 1234, type => 'story' },
        name( 'Catch as Catch Can' )
    )
);

Pretty clean, and not far from XML::Builder. If I had to choose, though, I think I'd prefer XML::Builder. I look forward to seeing its release!

Kevin Lenzo wrote:

You could have built that just as easily on top of any of the options... your contribution was hiding any of them, not one of them.

Theory wrote:

@Kevin—Not sure I follow what you're saying. I didn't build a solution on top of anything. I just used Template::Declare. All I had to do was create a class that lists the tags I want, and basically I was done. The rest was sugar.

Now I could have built an interface like Template::Declare::Bricolage on top of something like XML::LibXML or XML::Genx. But that would have been a lot more work: I would have had to build something. I'm way too lazy for that. :-)

—Theory

Dominic Mitchell wrote:

Thanks for trying XML::Genx. Like I said, I'm well aware of the awful interface. On the other hand, I've had so little occasion to actually use it recently, there hasn't been much incentive it improve it. :(

@Aristotle: XML::Builder looks nice. It reminds me of XML::Atom::SimpleFeed…

Dominic Mitchell wrote:

I forgot to mention one option: XML::SAX::Builder. It was always a bit of an experiment, but you may find it interesting. :)

Jesse Sheidlower wrote:

I know--or at least I think--that when someone suggested this on Fb or wherever, it was meant as a joke. But when I have structured data that I need to output as XML, I usually end up just falling back on some variety of print statement (or sometimes a templating system). There is effort that it takes to ensure that it'll be valid, sure, but that effort has always worked out to be far, far less than grokking stuff like what you show above.

Aristotle Pagaltzis wrote:

I don’t yet follow how the namespace is supposed to work

What’s ultimately going on is that you pass namespaced element and attribute names in Clark notation. Ie. you ask for the tag ’{http://www.w3.org/1999/xhtml}html’ and XML::Builder will take care of managing the namespace prefixes for you, to the extent that you want. You can ask it to use a specific prefix for a given namespace attribute, and you can use the XML::Builder::NS helper class to reduce repetition, but both of these are optional – if you just need to use the namespace in one or two places, you don’t need to deal with any red tape, Builder will do all the bookkeeping for you.

You just have to make sure to eventually ask for a root instead of a tag, which is how XML::Builder knows on which element to declare the xmlns attributes.

I think this strikes roughly the right balance between giving the user all the options along with all the work, and doing all the work for the user while giving him little and inconvenient control.

I’m not sure yet how to handle the case where the user passes in an xmlns attribute explicitly. Currently Builder is oblivious to that, which is dangerous. Turning those things into implicit namespace declaration calls is not entirely the right thing to do, though, since by the time you get to the root element it’s too late for namespace declarations to affect the child element generation calls. But I don’t just want to treat them as an error either. Still pondering the details on this…

Charlie wrote:

Another approach

I like the look of XML::Builder.

I generally use XML::Simple for this sort of task. You just give it an appropriate hashref. Like this:

#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;

my $xs = new XML::Simple(RootName => undef);

my $hashref = {
    xml => {
      version  => "1.0",
      encoding => "utf-8",
      assets   => {
        xmlns   => "http://bricolage.sourceforge.net/assets.xsd",
        story   => {
          id     => "1234",
          type   => "story",
          name   => [ "Catch as Catch Can" ],
      },
    },
  },
};

print $xs->XMLout($hashref);

Theory wrote:

More Responses

@Dominic—XML::SAX::Builder looks similar to XML::Generator:

use XML::SAX::Builder;

my $x = XML::SAX::Builder->new;
say $x->xml(
    $x->xmlpi( 'xml' => 'whatever' ),
    $x->xmlns(
        '', 'http://bricolage.sourceforge.net/assets.xsd',
        $x->assets(
            $x->story(
                { id => 1234, type => 'story' },
                $x->name( 'Catch as Catch Can' )
            )
        )
    )
);

That's not bad, although I couldn't figure out how to get the <?xml> declaration at the top. Also, it emitted an bogus "0" (zero) after the XML string, which is kind of weird. Bug?

@Jesse—Yeah, print statements can be useful. I've actually written quite a few Bricolage Mason templates to generate RSS without using any kind of generator. Sometimes it's just easier that way, no question.

@Aristotle—So I tried this:

use XML::Builder;

my $x = XML::Builder->new;
say $x->tag(
    '{http://bricolage.sourceforge.net/assets.xsd}assets',
    $x->tag(
        'story', { id => 1234, type => 'story' },
        $x->tag( name => 'Catch as Catch Can' ),
    ),
);

But what I ended up with was:

<a1:assets>
  <story id="1234" type="story">
    <name>Catch as Catch Can</name>
  </story>
</a1:assets>

I guess I can see why it would generate a prefix, although I'd like to tell it not to. But where is the URL?

To me, the clark notation seems a bit opaque. I think I prefer XML::Generator's approach of taking an array argument to handle namespaces. That keeps it Perlish.

@Charlie—Thanks for reminding me about XML::Simple. It's not an ideal interface for generating an XML document, but it's not bad. One downside to your code, however, is that the top-level tag should not be <xml>. The XML bit should actually be a declaration, in the form <?xml ?>. It looks like XML::Simple can handle that, but unless you want the default XML declaration, you have to paste in the entire thing yourself, which is a little weird:

use XML::Simple;
my $xs = new XML::Simple(
    RootName => undef,
    XMLDecl => '<?xml version="1.0" "encoding=utf-8"?>'
);

my $hashref = {
    assets   => {
        xmlns   => "http://bricolage.sourceforge.net/assets.xsd",
        story   => {
            id     => "1234",
            type   => "story",
            name   => [ "Catch as Catch Can" ],
        },
    },
};

say $xs->XMLout($hashref);

But that does work.

draegtun wrote:

Let me add yet another XML generator to the increasing list! : Builder

Here is your example using Builder:

use Builder;
my $builder = Builder->new;
my $x = $builder->block( 'Builder::XML', { indent => 2, newline => 1 } );

$x->assets( { xmlns => "http://bricolage.sourceforge.net/assets.xsd" },
    $x->story( { id => '1234', type => 'story' },
        $x->name('Catch as Catch Can')
    ),
);

say $builder->render;

My non CPAN version of Builder does have XMLdecl, CSS & HTML4 blocks and have used Builder as a XML::Writer replacement for nearly a year now.

/I3az/

Aristotle Pagaltzis wrote:

To me, the clark notation seems a bit opaque. I think I prefer XML::Generator's approach of taking an array argument to handle namespaces. That keeps it Perlish.

Huh, I’ll be honest – that never occurred to me. I’m not sure why but it didn’t. I don’t know if I want to switch over completely – Clark notation is well-known over in the XML world, so whether one considers it obscure depends on one’s background. But I’ll think about the options there.

use XML::Builder;

my $x = XML::Builder->new; say $x->tag( '{http://bricolage.sourceforge.net/assets.xsd}assets', $x->tag( 'story', { id => 1234, type => 'story' }, $x->tag( name => 'Catch as Catch Can' ), ), );

What you want is

use XML::Builder;

my $x = XML::Builder->new;
say $x->root(
    '{http://bricolage.sourceforge.net/assets.xsd}assets',
    $x->tag(
        '{http://bricolage.sourceforge.net/assets.xsd}story', { id => 1234, type => 'story' },
        $x->tag( '{http://bricolage.sourceforge.net/assets.xsd}name' => 'Catch as Catch Can' ),
    ),
);

Note the root vs tag call for the root element, and that you need to specify for every element which namespace it belongs to, independent of which prefix (the empty prefix in your case) it is tied to.

Obviously that’s annoying, so you shorten it like so:

use XML::Builder;

my $x = XML::Builder->new;
my $an = XML::Builder::NS->new( 'http://bricolage.sourceforge.net/assets.xsd' );

say $x->root(
    $an->assets,
    $x->tag(
        $an->story, { id => 1234, type => 'story' },
        $x->tag( $an->name => 'Catch as Catch Can' ),
    ),
);

Note how the variable that stores the ::NS instance effectively becomes an element NS prefix within the code – which is entirely decoupled from the namespace prefix in the resulting document.

Aristotle Pagaltzis wrote:

Oh, and if want to specify the prefix, you say, eg.:

my $an = XML::Builder::NS->new( 'http://bricolage.sourceforge.net/assets.xsd' );
my $x = XML::Builder->new->register_ns( '' => $an );

(The second argument to register_ns doesn’t expect an ::NS object but a string – since an ::NS object stringifies to the namespace URI, however, you can just pass that.)

Theory wrote:

@draegtun—Builder looks pretty nice, too. Thanks for the link and example.

@Aristotle—Sure enough, this code gets me exactly the XML I want, modulo the <?xml> declaration:

use XML::Builder;

my $an = XML::Builder::NS->new( 'http://bricolage.sourceforge.net/assets.xsd' );
my $x = XML::Builder->new->register_ns( '' => $an );

say $x->root(
    $an->assets,
    $x->tag(
        $an->story, { id => 1234, type => 'story' },
        $x->tag( $an->name => 'Catch as Catch Can' ),
    ),
);

It might be useful to have a method in XML::Builder::NS to generate a tag without calling a method named for that tag. Maybe named()?

sub named { '{' . ${$_[0]} . '}' . $_[1] }

I like the use of AUTOLOAD there, but sometimes I don't know the names of the methods.

Anyway, looking forward to seeing it released—I'll definitely make use of it!

Oh, and as for the namespace stuff—seems to me that it'd be useful to support both the Clark notation and an array ref. That way everyone is happy. ;-)

—Theory

Ovid wrote:

Fed up

I got so fed up with with how we build XML on our BBC project that I rewrote all of it. Instead of using XML::LibXML directly, we create a simple data structure. So for this:

<?xml version="1.0" encoding="utf-8"?>
<assets xmlns="http://bricolage.sourceforge.net/assets.xsd">
  <story id="1234" type="story">
    <name>Catch as Catch Can</name>
  </story>
</assets>

We just create this:

[ assets => { xmlns => "http://bricolage.sourceforge.net/assets.xsd" },
  [ story => { id => "1234", type => "story" },
    [ name => {}, "Catch as Catch Can" ],
  ]
]

And this also makes it trivial to serialize as YAML or JSON. It's also very, very easy to understand and write.

Theory wrote:

Re: Fed up

@Ovid—I like it!

—Theory

Aristotle Pagaltzis wrote:

Ovid: my quibble with that is that you do nothing to handle namespaces for the user; so they're going to have to provide the prefixes as parts of the elements name and it's just generally not a fun scene.

If all you ever do is declare a default namespace on the root element then of course you get to avoid all this; but merely by virtue of only caring about the simplest case and leaving out anything non-trivial.

You can do pretty much the same with XML::Builder btw (it inherits that from HTML::Tiny, with a minor alteration):

$x->render( \[ '{http://bricolage.sourceforge.net/assets.xsd}assets',
  \[ '{http://bricolage.sourceforge.net/assets.xsd}story' => { id => "1234", type => "story" },
    \[ '{http://bricolage.sourceforge.net/assets.xsd}name', "Catch as Catch Can" ],
  ]
] );

(Untested... but that's basically the syntax.)

Theory wrote:

@Aristotle—What are the backslashes for? Must one pass references to array references? Seems a bit silly. And I assume that if you don't need a namespace that you can omit it, yes? Something like:

$x->render([
    assets => [
        story => { id => "1234", type => "story" }, [
            name => "Catch as Catch Can"
        ],
    ]
]);

Looking forward to seeing this on CPAN, BTW. When will that happen? :-)

—Theory

Aristotle Pagaltzis wrote:

Except it’s not silly. Plain array references are an escape from distributivity, just like in HTML::Tiny. Eg. if you write

$x->tag( 'i', $x->tag( 'b', 'foo', 'bar' ) );

you’ll get

<i><b>foo</b></i><i><b>bar</b></i>

whereas if you write

$x->tag( 'i', [ $x->tag( 'b', 'foo', 'bar' ) ] );

you’ll get only

<i><b>foo</b><b>bar</b></i>

or if you write

$x->tag( 'i', $x->tag( 'b', [ 'foo', 'bar' ] ) );

it will be

<i><b>foobar</b></i>

And the same escape applies when passing the same structure declaratively to render:

$x->render( \[ 'i', \[ 'b', [ 'foo', 'bar' ] ] ] );

In HTML::Tiny this would instead be written like so:

$h->stringify( [ \'i', [ \'b', [ 'foo', 'bar' ] ] ] );

There’s no real difference though… and the XML::Builder style simplifies the internals a tad, and I think it might also be slightly easier to emit such a structure.

Aristotle Pagaltzis wrote:

Oh, and the point of all that? HTML::Tiny has a great example in the POD:

$x->tag( table =>
    [ $x->tag( tr =>
      [ $x->tag( th => qw( X Y Z ) ) ],
      [ $x->tag( td => qw( 1 2 3 ) ) ],
      [ $x->tag( td => qw( 4 5 6 ) ) ],
      [ $x->tag( td => qw( 7 8 9 ) ) ],
    ) ],
  );

Notice how you need to ask for td tags only once per row, and for tr tags only once per entire table.

Theory wrote:

@Aristotle—Ah, yes, I can see how that'd be useful. It even looks like a table. Nice!

—Theory