Just a Theory

By David E. Wheeler

Posts about XML

Atom Sources

I’m working on a project where I aggregate entries from a slew of feeds into a single feed. The output feed will be a valid Atom feed, and of course I want to make sure that I maintain all the appropriate metadata for each entry I collect. The <source> element seems to be exactly what I need:

If an entry is copied from one feed into another feed, then the source feed’s metadata (all child elements of feed other than the entry elements) should be preserved if the source feed contains any of the child elements author, contributor, rights, or category and those child elements are not present in the source entry.

<source>
  <id>http://example.org/</id>
  <title>Fourty-Two</title>
  <updated>2003-12-13T18:30:02Z</updated>
  <rights>© 2005 Example, Inc.</rights>
</source>

That’s perfect: It allows me to keep the title, link, rights, and icon of the originating blog associated with each entry.

Except, maybe it’s the database expert in me, but I’d like to be able to have it be more normalized. My feed might have 1000 entries in it from 100 sources. Why would I want to dupe that information for every single entry from a given source? Is there now better way to do this, say to have the source data once, and to reference the source ID only for each entry? That would make for a much smaller feed, I expect, and a lot less duplication.

Is there any way to do this in an Atom feed?

Looking for the comments? Try the old layout.

Use Rubyish Blocks with Test::XPath

Thanks to the slick Devel::Declare-powered PerlX::MethodCallWithBlock created by gugod, the latest version of Test::XPath supports Ruby-style blocks. The Ruby version of assert_select, as I mentioned previously, looks like this:

assert_select "ol" { |elements|
  elements.each { |element|
    assert_select element, "li", 4
  }
}

I’ve switched to the brace syntax for greater parity with Perl. Test::XPath, meanwhile, looks like this:

my @css = qw(foo.css bar.css);
$tx->ok( '/html/head/style', sub {
    my $css = shift @css;
    shift->is( './@src', $css, "Style src should be $css");
}, 'Should have style' );

But as of Test::XPath 0.13, you can now just use PerlX::MethodCallWithBlock to pass blocks in the Rubyish way:

use PerlX::MethodCallWithBlock;
my @css = qw(foo.css bar.css);
$tx->ok( '/html/head/style', 'Should have style' ) {
    my $css = shift @css;
    shift->is( './@src', $css, "Style src should be $css");
};

Pretty slick, eh? It required a single-line change to the source code. I’m really happy with this sugar. Thanks for the great hack, gugod!

Looking for the comments? Try the old layout.

Test XML and HTML with XPath

When I was hacking Rails projects back in 2006-2007, there was a lot of stuff about Rails that drove me absolutely batshit (<cough>ActiveRecord</cough>), but there were also a (very) few things that I really liked. One of those things was the assert_select test method. There was a bunch of magic involved in sending a request to your Rails app and stuffing the body someplace hidden (hrm, that sounds kind of evil; intentional?), but then you could call assert_select to use CSS selectors to test the structure and content of the document (assuming, of course, that it was HTML or XML). For example, (to borrow from the Rails docs), if you wanted to test that a response contains two ordered lists, each with four list elements then you’d do something like this:

assert_select "ol" do |elements|
    elements.each do |element|
    assert_select element, "li", 4
    end
end

What it does is select all of the <ol> elements and pass them to the do block, where you can call assert_select on each of them. Nice, huh? You can also implicitly call assert_select on the entire array of passed elements, like so:

assert_select "ol" do
    assert_select "li", 8
end

Slick, right? I’ve always wanted to have something like this in Perl, but until last week, I didn’t really have an immediate need for it. But I’ve started on a Catalyst project with my partners at PGX, and of course I’m using a view to generate XHTML output. So I started asking around for advice on proper unit testing for Catalyst views. The answer I got was, basically, Test::WWW::Mechanize::Catalyst. But I found it insufficient:

$mech->get_ok("/");
$mech->html_lint_ok( "HTML should be valid" );
$mech->title_is( "Root", "On the root page" );
$mech->content_contains( "This is the root page", "Correct content" );

Okay, I can check the title of the document directly, which is kind of cool, but there’s no other way to examine the structure? Really? And to check the content, there’s just content_contains(), which concatenates all of the content without any tags! This is useful for certain very simple tests, but if you want to make sure that your document is properly structured, and the content is in all the right places, you’re SOL.

Furthermore, the html_link_ok() method didn’t like the Unicode characters output by my view:

#   Failed test 'HTML should be valid (http://localhost/)'
#   at t/view_TD.t line 30.
# HTML::Lint errors for http://localhost/
#  (4:3) Invalid character \x2019 should be written as &rsquo;
#  (18:5) Invalid character \xA9 should be written as &copy;
# 2 errors on the page

Of course, those characters aren’t invalid, they’re perfectly good UTF-8 characters. In some worlds, I suppose, they should be wrong, but I actually want them in my document.

So I switched to Test::XML, which uses a proper XML parser to validate a document:

ok my $res = request("http://localhost:3000/"), "Request home page";
ok $res->is_success, "Request should have succeeded";

is_well_formed_xml $res->content, "The HTML should be well-formed";

Cool, so now I know that my XHTML document is valid, it’s time to start examining the content and structure in more detail. Thinking fondly on assert_select, I went looking for a test module that uses XPath to test an XML document, and found Test::XML::XPath right in the Test::XML distribution, which looked to be just what I wanted. So I added it to my test script and added this line to test the content of the <title> tag:

is_xpath $res->content, "/html/head/title", "Welcome!";

I ran the test…and waited. It took around 20 seconds for that test to run, and then it failed!

#   Failed test at t/view_TD.t line 25.
#          got: ''
#     expected: 'Welcome!'
#   evaluating: /html/head/title
#      against: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
# <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
#  <head>
#   <title>Welcome!</title>
#  </head>
# </html>

No doubt the alert among my readership will spot the problem right away, but I was at a loss. Fortunately, Ovid was over for dinner last week, and he pointed out that it was due to the namespace. That is, the xmlns attribute of the <html> element requires that one register a namespace prefix to use in the XPath expression. He pointed me to his fork of XML::XPath, called Test::XHTML::XPath, in his Escape project. It mostly duplicates Test::XML::XPath, but contains this crucial line of code:

$xpc->registerNs( x => "http://www.w3.org/1999/xhtml" );

By registering the prefix “x” for the XHTML namespace, he’s able to write tests like this:

is_xpath $res->content, "/x:html/x:head/x:title", "Welcome!";

And that works. It seems that the XPath spec requires that one use prefixes when referring to elements within a namespace. Test::XML::XPath, alas, provides no way to register a namespace prefix.

Perhaps worse is the performance problem. I discovered that if I stripped out the DOCTYPE declaration from the XHTML before I passed it to is_xpath, the test was lightning fast. Here the issue is that XML::LibXML, used by Test::XML::XPath, is fetching the DTD from the w3.org Web site as the test runs. I can disable this by setting the no_network and recover_silently XML::LibXML options, but, again, Test::XML::XPath provides no way to do so.

Add to that the fact that Test::XML::XPath has no interface for recursive testing like assert_select and I was ready to write my own module. One could perhaps update Test::XML::XPath to be more flexible, but for the fact that it falls back on XML::XPath when it can’t find XML::LibXML, and XML::XPath, alas, behaves differently than XML::LibXML (it didn’t choke on my lack of a namespace prefix, for example). So if you ship an application that uses Test::XML::XPath, tests might fail on other systems where it would use a different XPath module than you used.

And so I have written a new test module.

Introducing Test::XPath, your Perl module for flexibly running XPath-powered tests on the content and structure of your XML and HTML documents. With this new module, the test for my Catalyst application becomes:

my $tx = Test::XPath->new( xml => $res->content, is_html => 1 );
$tx->is("/html/head/title", "Welcome", "Title should be correct" );

Notice how I didn’t need a namespace prefix there? That’s because the is_html parameter coaxes XML::LibXML into using its HTML parser instead of its XML parser. One of the side-effects of doing so is that the namespace appears to be assumed, so I can ignore it in my tests. The HTML parser doesn’t bother to fetch the DTD, either. For tests where you really need namespaces, you’d do this:

my $tx = Test::XPath->new(
    xml     => $res->content,
    xmlns   => { x => "http://www.w3.org/1999/xhtml" },
    options => { no_network => 1, recover_silently => 1 },
);
$tx->is("/x:html/x:head/x:title", "Welcome", "Title should be correct" );

Yep, you can specify XML namespace prefixes via the xmlns parameter, and pass options to XML::LibXML via the options parameter. Here I’ve shut off the network, so that XML::LibXML prevents network access, and told it to recover silently when it tries to fetch the DTD, but fails (because, you know, it can’t access the network). Not bad, eh?

Of course, the module provides the usual array of Test::More-like test methods, including ok(), is(), like() and cmp_ok(). They all work just like in Test::More, except that the first argument must be an XPath expressions. Some examples borrowed from the documentation:

$tx->ok( '//foo/bar', 'Should have bar element under foo element' );
$tx->ok( 'contains(//title, "Welcome")', 'Title should "Welcome"' );

$tx->is( '/html/head/title', 'Welcome', 'Title should be welcoming' );
$tx->isnt( '/html/head/link/@type', 'hello', 'Link type should not' );

$tx->like( '/html/head/title', qr/^Foobar Inc.: .+/, 'Title context' );
$tx->unlike( '/html/head/title', qr/Error/, 'Should be no error in title' );

$tx->cmp_ok( '/html/head/title', 'eq', 'Welcome' );
$tx->cmp_ok( '//story[1]/@id', '==', 1 );

But the real gem is the recursive testing feature of the ok() test method. By passing a code reference as the second argument, you can descend into various parts of your XML or HTML document to test things more deeply. ok() will pass if the XPath expression argument selects one or more nodes, and then it will call the code reference for each of those nodes, passing the Test::XPath object as the first argument. This is a bit different than assert_select, but I view the reduced magic as a good thing.

For example, if you wanted to test for the presence of <story> elements in your document, and to test that each such element had an incremented id attribute, you’d do something like this:

my $i = 0;
$tx->ok( '//assets/story', sub {
    shift->is('./@id', ++$i, "ID should be $i in story $i");
}, 'Should have story elements' );

For convenience, the XML::XPath object is also assigned to $_ for the duration of the call to the code reference. Either way, you can call ok() and pass code references anywhere in the hierarchy. For example, to ensure that an Atom feed has entries and that each entry has a title, a link, and a very specific author element with name, uri, and email subnodes, you can do this:

$tx->ok( '/feed/entry', sub {
    $_->ok( './title', 'Should have a title' );
    $_->ok( './author', sub {
        $_->is( './name',  'Larry Wall',       'Larry should be author' );
        $_->is( './uri',   'http://wall.org/', 'URI should be correct' );
        $_->is( './email', 'perl@example.com', 'Email should be right' );
    }, 'Should have author elements' );
}, 'Should have entry elements' );

There are a lot of core XPath functions you can use, too. For example, I’m going to write a test for every page returned by my application to make sure that I have the proper numbers of various tags:

$tx->is('count(/html)',     1, 'Should have 1 html element' );
$tx->is('count(/html/head') 1, 'Should have 1 head element' );
$tx->is('count(/html/body)  1, 'Should have 1 body element' );

I’m going to use this module to the hilt in all my tests for HTML and XML documents from here on in. The only thing I’m missing from assert_select is that it supports CSS 2 selectors, rather than XPath expressions, and the implementation offers quite a few other features including regular expression operators for matching attributes, pseudo-classes, and other fun stuff. Still, XPath gets me all that I need; the rest is just sugar, really. And with the ability to define custom XPath functions in Perl, I can live without the extra sugar.

Maybe you’ll find it useful, too.

Looking for the comments? Try the old layout.

Generating XML in Perl

I’ve been working on a big Bricolage project recently, and one of the requirements is to parse an incoming NewsML feed, turn individual stories into Bricolage SOAP XML, and import them into Bricolage. I’m using the amazing–if hideously documented–XML::LibXML to do the parsing of the incoming NewsML, taking advantage of the power of XPath to pull out the bits I need. But then came the question: what should I use to generate the XML for Bricolage?

Based on feedback from various Tweeps, I tried a few different approaches: XML::LibXML, XML::Genx, and of course the venerable XML::Writer. In truth, they all suck for one reason: interface. As a test, I wanted to generate this dead simple XML:

<?xml version="1.0" encoding="utf-8"?>
<assets xmlns="http://bricolage.sourceforge.net/assets.xsd">
    <story id="1234" type="story">
    <name>Catch as Catch Can</name>
    </story>
</assets>

Just get a load of how you create this XML in XML::LibXML:

use XML::LibXML;

my $dom = XML::LibXML::Document->new( '1.0', 'utf-8' );
my $assets = $dom->createElementNS('http://bricolage.sourceforge.net/assets.xsd', 'assets');
$dom->addChild($assets);

my $story = $dom->createElement('story');
$assets->addChild($story);
$story->addChild( $dom->createAttribute( id => 1234));
$story->addChild( $dom->createAttribute( type => 'story'));
my $name = $dom->createElement('name');
$story->addChild($name);
$name->addChild($dom->createTextNode('Catch as Catch Can'));

say $dom->toString;

Does anyone actually think that this is intuitive? Okay, if you’re used to dealing with the XHTML DOM in JavaScript it’s at least familiar, but that’s hardly an endorsement. XML::Genx isn’t much better:

use XML::Genx::Simple;

my $w = XML::Genx::Simple->new;
$w->StartDocString;

$w->StartElementLiteral( $w->DeclareNamespace( 'http://bricolage.sourceforge.net/assets.xsd', ''), 'assets' );
$w->StartElementLiteral( 'story' );
$w->AddAttributeLiteral( id => 1234);
$w->AddAttributeLiteral( type => 'story');
$w->Element( 'name' => 'Catch as Catch Can' );
$w->EndElement;
$w->EndElement;
$w->EndDocument;

say $w->GetDocString;

It’s not like messing with the DOM, but it’s essentially the same: Use a bunch of camelCase methods to declare each thing one-at-a-time. And you have to count the number of open elements you have yourself, to know how many times to call EndElement() to close elements. Can’t we get the computer to do this for us?

Feeling a bit frustrated, I went back to XML::Writer, which is what Bricolage uses internally to generate the XML exported by its SOAP interface. It looks like this:

use XML::Writer;

my $output = '';
my $writer = XML::Writer->new(
    OUTPUT=> \$output,
    ENCODING => 'utf8',
);

$writer->xmlDecl('UTF-8');
#$writer->startTag(['http://bricolage.sourceforge.net/assets.xsd', 'stories']);
$writer->startTag('assets', xmlns => 'http://bricolage.sourceforge.net/assets.xsd');
$writer->startTag('story', id => 1234, type => 'story');
$writer->dataElement(name => 'Catch as Catch Can');
$writer->endTag('story');

$writer->endTag('assets');

say $output;

That’s a bit better, in that you can specify the attributes and value of an element all in one method call. I still have to count opened elements and figure out where to close them, though. The thing that’s missing, as with the other approaches, is an API that reflects the hierarchical nature of XML itself. I’m outputting a tree-like document; why should the API be so hideously object-oriented and flat?

With this insight, I remembered Jesse Vincent’s Template::Declare. It bills itself as a templating library, but really it provides an interface for declaratively and hierarchically generating XML. After a bit of hacking I came up with this:

package Template::Declare::TagSet::Bricolage;
BEGIN { $INC{'Template/Declare/TagSet/Bricolage.pm'} = __FILE__; }
use base 'Template::Declare::TagSet';

sub get_tag_list {
    return [qw( assets story name )];
}

package My::Template;
use Template::Declare::Tags 'Bricolage';
use base 'Template::Declare';

template bricolage => sub {
    xml_decl { 'xml', version => '1.0', encoding => 'utf-8' };
    assets {
        xmlns is 'http://bricolage.sourceforge.net/assets.xsd';
        story {
            attr { id => 1234, type => 'story' };
            name { 'Catch as Catch Can' }
        };
    };
};

package main;
use Template::Declare;
Template::Declare->init( roots => ['My::Template']);
say Template::Declare->show('bricolage');

Okay, to be fair I had to do a lot more work to set things up. But once I did, the core of the XML generation, there in the bricolage template, is quite simple and straight-forward. Furthermore, thanks to the hierarchical nature of Template::Declare, the tree structure of the resulting XML is apparent in the code. And it’s so concise!

Armed with this information, I whipped up a new module for CPAN: Template::Declare::Bricolage. This module subclasses Template::Declare to provide a dead-simple interface for generating XML for the Bricolage SOAP interface. Using this module to generate the same XML is quite simple:

use Template::Declare::Bricolage;

say bricolage {
    story {
        attr { id => 1234, type => 'story' };
        name { 'Catch as Catch Can' }
    };
};

Yeah. Really. That’s it. Because the Bricolage SOAP interface requires that all XML documents have the top-level <assets> tag, I just had the bricolage function handle that, as well as actually executing the template and returning the XML. More complex XML is just a simple, assuming that you use nice indentation to format your code. Here’s the code to generate XML for a Bricolage workflow object:

use Template::Declare::Bricolage;

say bricolage {
    workflow {
        attr        { id => 1027     };
        name        { 'Blogs'        }
        description { 'Blog Entries' }
        site        { 'Main Site'    }
        type        { 'Story'        }
        active      { 1              }
        desks  {
            desk { attr { start   => 1 }; 'Blog Edit'    }
            desk { attr { publish => 1 }; 'Blog Publish' }
        }
    }
};

Simple, huh? So the next time you need to generate XML, have a look at Template::Declare. It may not be the fastest XML generator around, but if you have a well-defined list of elements you need, it’s certainly the nicest to use.

Oh, and Bricolage users? Just make use of use Template::Declare::Bricolage to deaden the pain.

Looking for the comments? Try the old layout.

More about…

My Adventures with Mac OS X

I recently decided to make the leap from Yellow Dog Linux to Mac OS X on my Titanium PowerBook. Getting everything to work the way I wanted proved to be a challenge, but well worth it. This document outlines all that I learned, so that neither you nor I will have to experience such pain again. The overall goal was to get Bricolage up and running, figuring that if it worked, then just about any mod_perl based solution would run. I’m happy to say that I was ultimately successful. You can be, too.

In the descriptions below, I provide links to download the software you’ll need, as well as the shell commands I used to compile and install each package. In all cases (except for the installation of the Developer Tools), I saved each package’s sources to /usr/local/src and gunzipped and untarred them there. I also carried out each step as root, by running sudo -s. If you’re not comfortable using a Unix shell, you might want to read up on it, first. All of my examples also assume a sh-compatible shell, such as bash or zsh. Fortunately, zsh comes with OS X, so you can just enable it for yourself in NetInfo Manager by setting users -> <username> -> shell to “/bin/zsh”, where <username> is your user name.

Developer Tools

All of the software that I describe installing below must be compiled. To compile software on Mac OS X, you need to install the Mac OS X Developer Tools. These provide the cc compiler and many required libraries. Conveniently, these come on a CD-ROM with the Mac OS X Version 10.1 upgrade kit. I just popped in the CD and installed them like you’d install any other OS X software. I needed administrative access to OS X to install the Developer Tools (or, indeed, to install any of the other software I describe below), but otherwise it posed no problems.

The best time to install the Developer Tools is immediately after upgrading to OS X version 10.1. Then run the Software Update applet in the System preferences to get your system completely up-to-date. By the time I was done, I had the system updated to version 10.1.3.

Emacs

The first step I took in the process of moving to OS X was to get working the tools I needed most. Essentially, what this meant was GNU Emacs. Now I happen to be a fan of the X version of Emacs – not XEmacs, but GNU Emacs with X support built in. I wasn’t relishing the idea of having to install X on OS X (although there are XFree86 ports that do this), so I was really pleased to discover the Mac-Emacs project. All I had to do was patch the GNU Emacs 21.1 sources and compile them, and I was ready to go! GNU Emacs works beautifully with the OS X Aqua interface.

There were a few configuration issues for me to work out, however. I have become addicted to the green background that an old RedHat .XConfig file had set, and I wanted this feature in OS X, too. Plus, the default font was really ugly (well, too big, really – anyone know how to make it smaller in Emacs?) and the Mac command key was working as the Emacs META key, rather than the option key. So I poked around the net until I found the settings I needed and put them into my .emacs file:

(custom-set-faces
'(default ((t (:stipple nil
  :background "DarkSlateGrey"
  :foreground "Wheat"
  :inverse-video nil
  :box nil
  :strike-through nil
  :overline nil
  :underline nil
  :slant normal
  :weight normal
  :height 116
  :width normal
  :family "apple-andale mono"))))
'(cursor ((t (:background "Wheat"))))
; Use option for the meta key.
(setq mac-command-key-is-meta nil)

Installing Emacs is not required for installing any of the other packages described below – it just happens to be my favorite text editor and IDE. So I don’t provide the instructions here; the Mac-Emacs project does a plenty good job. If you’re not comfortable with Unix editors, you can use whatever editor you like. BBEdit is a good choice.

GDBM

Mac OS X doesn’t come with a DBM! But since mod_ssl needs it, we have to install it. Fortunately, I found this PDF detailing someone else’s adventures with mod_ssl on OS X, and it provided decent instructions for installing GDBM. First, I created a new user for GDBM. In NetInfoManager, I created a duplicate of the “unknown” user and named it “bin”. Then, I downloaded GDBM from the FSF, and installed it like this:

cd /usr/local/src/gdbm-1.8.0
cp /usr/libexec/config* .
./configure
make
make install
ln -s /usr/local/lib/libgdbm.a /usr/local/lib/libdbm.a

That did the trick. Nothing else was involved, fortunately.

Expat

Who doesn’t do something with XML these days? If your answer is, “not me!”, then you’ll need to install the Expat library in order to work with XML::Parser in Perl. Fortunately it’s relatively easy to install, although support for the -static flag appears to be broken in cc on OS X, so it needs to be stripped out. I downloaded it from its project bpage, and then did this:

cd /usr/local/src/expat-1.95.2
./configure
perl -i.bak -p -e \
  's/LDFLAGS\s*=\s*-static/LDFLAGS=/' \
  examples/Makefile
perl -i.bak -p -e \
  's/LDFLAGS\s*=\s*-static/LDFLAGS=/' \
  xmlwf/Makefile
make
make install

Perl

Although Mac OS X ships with Perl (Yay!), it’s the older 5.6.0 version. There have been many bug fixes included in 5.6.1, so I wanted to make sure I got the latest stable version before I built anything else around it (mod_perl, modules, etc.).

Being a Unix program, Perl doesn’t expect to run into the problems associated with a case-insensitive file system like that Mac OS X’s HFS Plus. So there are a couple of tweaks to the install process that make it slightly more complicated than you would typically expect. Fortunately, many have preceded us in doing this, and the work-arounds are well-known. Basically, it comes down to this:

cd /usr/local/src/perl-5.6.1/
export LC_ALL=C
export LANG=en_US
perl -i.bak -p -e 's|Local/Library|Library|g' hints/darwin.sh
sh Configure -des -Dfirstmakefile=GNUmakefile -Dldflags="-flat_namespace"
make
make test
make install

There were a few errors during make test, but none of them seems to be significant. Hopefully, in the next version of Perl, the build will work just as it does on other platforms.

Downloads

Before installing Open SSL, mod_ssl, mod_perl, and Apache, I needed to get all the right pieces in place. The mod_ssl and mod_perl configure processes patch the Apache sources, so the Apache sources have to be downloaded and gunzipped and untarred into an adjacent directory. Furthermore, the mod_ssl version number corresponds to the Apache version number, so you have to be sure that they match up. Normally, I would just download the latest versions of all of these pieces and run with it.

However, Bricolage requires the libapreq library and its supporting Perl modules to run, and these libraries have not yet been successfully ported to Mac OS X. But worry not; fearless mod_perl hackers are working on the problem even as we speak, and there is an interim solution to get everything working.

As of this writing, the latest version of Apache is 1.3.24. But because I needed libapreq, I had to use an experimental version of Apache modified to statically compile in libapreq. Currently, only version 1.3.23 has been patched for libapreq, so that’s what I had to use. I discovered this experimental path thanks to a discussion on the Mac OS X Perl mail list.

So essentially what I did was download the experimental apache.tar.gz and the experimental lightweight apreq.tar.gz packages and gunzip and untar them into /usr/local/src. Then I was ready to move on to Open SSL, mod_ssl, and mod_perl.

Open SSL

Compiling Open SSL was pretty painless. One of the tests fails, but it all seems to work out, anyway. I download the sources from the Open SSL site, and did this:

cd /usr/local/src/openssl-0.9.6c
./config
make
make test

mod_ssl

The mod_ssl Apache module poses no problems whatsoever. I simply downloaded mod_ssl-2.8.7-1.3.23 from the mod_ssl site (note that the “1.3.23” at the end matches the version of Apache I downloaded) and gunzipped and untarred it into /usr/local/src/. Then I simply excuted:

./configure --with-apache=/usr/local/src/apache_1.3.23

mod_perl

Configuring and installing mod_ssl was, fortunately, a relatively straight-forward process. Getting Apache compiled with mod_perl and mod_ssl, however, was quite tricky, as you’ll see below. A number of braver folks than I have preceded me in installing mod_perl, so I was able to rely on their hard-earned knowledge to get the job done. For example, Randal Schwartz posted instructions to the mod_perl mail list, and his instructions worked well for me. So I downloaded the sources from the mod_perl site, and did this:

cd /usr/local/src/mod_perl-1.26
perl Makefile.PL \
  APACHE_SRC=/usr/local/src/apache_1.3.23/src \
  NO_HTTPD=1 \
  USE_APACI=1 \
  PREP_HTTPD=1 \
  EVERYTHING=1
make
make install

Apache

Getting Apache compiled just right was the most time-consuming part of this process for me. Although many had gone before me in this task, everybody seems to do it differently. I had become accustomed to just allowing Apache to use most of its defaults when I compiled under Linux, but now I was getting all kinds of errors while following different instructions from different authorities from around the web. Sometimes Apache wouldn’t compile at all, and I’d get strange errors. Other times it would compile, pass all of its tests, and install, only to offer up errors such as

dyld: /usr/local/apache/bin/httpd Undefined symbols: _log_config_module

when I tried to start it. It turns out that the problem there was that I had a number of modules compiled as DSOs – that is, libraries that can be loaded into Apache dynamically – but wasn’t loading them properly in my httpd.conf. This was mainly because I’ve grown accustomed to Apache having all the libraries I needed compiled in statically, so I simply didn’t have to worry about them.

But I finally hit on the right incantation to get Apache to compile with everything I need added statically, but still with support for DSOs by compiling in mod_so. I present it here for your viewing pleasure:

SSL_BASE=/usr/local/src/openssl-0.9.6c/ \
    ./configure \
    --with-layout=Apache \
    --enable-module=ssl \
    --enable-module=rewrite \
    --enable-module=so \
    --activate-module=src/modules/perl/libperl.a \
    --disable-shared=perl \
    --without-execstrip
  make
  make certificate TYPE=custom 
  make install

This series of commands successfully compiled Apache with mod_perl and mod_ssl support statically compiled in, along with most of the other default modules that come with Apache. In short, everything is there that you need to run a major application with security such as Bricolage.

Note that make certificate will lead you through the process of creating an SSL certificate. I like to use the “custom” type so that it reflects the name of my organization. But you can use whatever approach you’re most comfortable with. Consult the mod_ssl INSTALL file for more information.

libapreq

Once Apache is installed with mod_perl and mod_ssl, the rest is gravy! The experimental libapreq library I downloaded installed without a hitch:

cd /usr/local/src/httpd-apreq
perl Makefile.PL
make
make install

PostgreSQL

PostgreSQL is a sophisticated open-source Object-Relational DBMS. I use it a lot in my application development, and it, too, is required by Bricolage. I was a bit concerned about how well it would compile and work on Mac OS X, but I needn’t have worried. First of all, Apple has provided some pretty decent instructions. Although they mainly document how to install MySQL, a competing open-source RDBMS, many of the same concepts apply to PostgreSQL.

The first thing I had to do was to create the “postgres” user. This is the system user that PostgreSQL typically runs as. I followed Apple’s instructions, using NetInfo Manager to duplicate the default “www” group and “www” user and give the copies the name “postgres” and a new gid and uid, respectively.

Next I downloaded the PostgreSQL version 7.2.1 sources. Version 7.2 is the first to specifically support Mac OS X, so going about the install was as simple as it is on any Unix system:

./configure --enable-multibyte=UNICODE
make
make install

That was it! PostgreSQL was now installed. Next I had to initialize the PostgreSQL database directory. Again, this works much the same as it does on any Unix system:

sudo -u postgres /usr/local/pgsql/bin/initdb \
  -D /usr/local/pgsql/data

The final step was to start PostgreSQL and try to connect to it:

sudo -u postgres /usr/local/pgsql/bin/pg_ctl start \
  -D /usr/local/pgsql/data /usr/local/pgsql/bin/psql -U postgres template1

If you follow the above steps and find yourself at a psql prompt, you’re in business! Because I tend to use PostgreSQL over TCP, I also enabled TCP connectivity by enabling the “tcpip_socket” option in the postgresql.conf file in the data directory created by initdb:

tcpip_socket = true

If you’re like me, you like to have servers such as PostgreSQL start when your computer starts. I enabled this by creating a Mac OS X PostgreSQL startup bundle. It may or may not be included in a future version of PostgreSQL, but in the meantime, you can download it from here. Simply download it, gunzip and untar it into /Library/StartupItems, restart OS X, and you’ll see it start up during the normal Mac OS X startup sequence. I built this startup bundle by borrowing from the existing FreeBSD PostgreSQL startup script, the Apache startup script that ships with OS X, and by reading the Creating SystemStarter Startup Item Bundles HOWTO.

XML::Parser

At this point, I had most

of the major pieces in place, and it was time for me to install the Perl modules I needed. First up was XML::Parser. For some reason, XML::Parser can’t find the expat libraries, even though the location in which I installed them is pretty common. I got around this by installing XML::Parser like this:

perl Makefile.PL EXPATLIBPATH=/usr/local/lib \
  EXPATINCPATH=/usr/local/include
make
make test
make install

Text::Iconv

In Bricolage, Text::Iconv does all the work of converting text between character sets. This is because all of the data is stored in the database in Unicode, but we wanted to allow users to use the character set with which they’re accustomed in the UI. So I needed to install Text::Iconv. Naturally, Mac OS X doesn’t come with libiconv – a library on which Text::Iconv depends – so I had to install it. Fortunately, it was a simple process to download it and do a normal build:

cd /usr/local/src/libiconv-1.7
./configure
make
make install

Now, Text::Iconv itself was a little more problematic. You have to tell it to look for libiconv by adding the -liconv option to the LIBS key in Makefile.PL. I’ve simplified doing this with the usual Perl magic:

perl -i.bak -p -e \
  "s/'LIBS'\s*=>\s*\[''\]/'LIBS' => \['-liconv'\]/" \
  Makefile.PL
perl Makefile.PL
make
make test
make install

DBD::Pg

Although the DBI installed via the CPAN module without problem, DBD::Pg wanted to play a little less nice. Of course I specified the proper environment variables to install it (anyone know why DBD::Pg’s Makefile.PL script can’t try to figure those out on its own?), but still I got this error during make:

/usr/bin/ld: table of contents for archive:
/usr/local/pgsql/lib/libpq.a is out of date;
rerun  ranlib(1) (can't load from it)

But this was one of those unusual situations in which the error message was helpful. So I took the error message’s advice, and successfully compiled and installed DBD::Pg like this:

ranlib /usr/local/pgsql/lib/libpq.a
export POSTGRES_INCLUDE=/usr/local/pgsql/include
export POSTGRES_LIB=/usr/local/pgsql/lib
perl Makefile.PL
make
make test
make install

LWP

The last piece I needed to worry about customizing when I installed it was LWP. Before installing, back up /usr/bin/head. The reason for this is that LWP will install /usr/bin/HEAD, and because HFS Plus is a case-insensitive file system, it’ll overwrite /usr/bin/head! This is a pretty significant issue, since many configure scripts use /usr/bin/head. So after installing LWP, move /usr/bin/HEAD, GET, & POST to /usr/local/bin. Also move /usr/bin/lwp* to /usr/local/bin. Then move your backed-up copy of head back to /usr/bin.

Naturally, I didn’t realize that this was necessary until it was too late. I installed LWP with the CPAN module, and it wiped out /usr/bin/head. Fortunately, all was not lost (though it took me a while to figure out why my Apache compiles were failing!): I was able to restore head by copying it from the Mac OS X installer CD. I Just popped it in an executed the command:

cp "/Volumes/Mac OS X Install CD/usr/bin/head" /usr/bin

And then everything was happy again.

Bricolage

And finally, the pièce de résistance: Bricolage! All of the other required Perl modules installed fine from Bundle::Bricolage:

perl -MCPAN -e 'install Bundle::Bricolage'

Then I simply followed the directions in Bricolage’s INSTALL file, and started ‘er up! I would document those steps here, but the install process is currently in flux and likely to change soon. The INSTALL file should always be current, however – check it out!

To Be Continued

No doubt my adventures with Unix tools on Mac OS X are far from over. I’ve reported to various authors on the issues I’ve described above, and most will soon be releasing new versions to address those issues. As they do, I’ll endeavor to keep this page up-to-date. In the meantime, I am thoroughly enjoying working with the first really solid OS that Apple has released in years, and thrilled that I can finally have the best of both worlds: a good, reliable, and elegant UI, and all the Unix power tools I can stand! I hope you do, too.

Looking for the comments? Try the old layout.