Just a Theory

Black lives matter

Fuck Typing LWP

I’m working on a project that fetches various files from the Internet via LWP. I wanted to make sure that I was a polite user, such that my app would pay attention to Last-Modified/If-Modified-Since and ETag/If-None-Match headers. And in most contexts I also want to respect the robots.txt file on the hosts to which I’m sending requests. So I was very interested to read chromatic’s hack for this very issue. I happily implemented two classes for my app, MyApp::UA, which inherits from LWP::UserAgent::WithCache, and MyApp::UA::Robot, which inherits from MyApp::UA but changes LWP::UserAgent::WithCache to inherit from LWP::UARobot:

@LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');

So far so good, right? Well, no. What I didn’t think about, stupidly, is that by changing LWP::UserAgent::WithCache’s base class, I was doing so globally. So now both MyApp::UA and MyApp::UA::Robot were getting the LWP::RobotUA behavior. Urk.

So my work around is to use a little fuck typing to ensure that MyApp::UA::Robot has the robot behavior but MyApp::UA does not. Here’s what it looks like (BEWARE: black magic ahead!):

package MYApp::UA::Robot;

use 5.12.0;
use utf8;
use parent 'MyApp::UA';
use LWP::RobotUA;

do {
    # Import the RobotUA interface. This way we get its behavior without
    # having to change LWP::UserAgent::WithCache's inheritance.
    no strict 'refs';
    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k ne 'new';

sub new {
    my ($class, $app) = (shift, shift);
    # Force RobotUA configuration.
    local @LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');
    return $class->SUPER::new(
        delay => 1, # be very nice -- max one hit per minute.

The do block is where I do the fuck typing. It iterates over all the symbols in LWP::RobotUA, inserts a reference to all subroutines into the current package. Except for new, which I implement myself. This is so that I can keep my inheritance from MyApp::UA intact. But in order for it to properly configure the LWP::RobotUA interface, new must temporarily fool Perl into thinking that LWP::UserAgent::WithCache inherits from LWP::RobotUA.

Pure evil, right? Wait, it gets worse. I’ve also overridden LWP::RoboUA’s host_wait method, because if it’s the second request to a given host, I don’t want it to sleep (the first request is for the robots.txt, and I see no reason to sleep after that). So I had to modify the do block to skip both new and host_wait:

    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k !~ /^(?:new|host_wait)$/;

If I “override” any other LWP::RobotUA methods, I’ll need to remember to add them to that regex. Of course, since I’m not actually inheriting from LWP::RobotUA, in order to dispatch to its host_wait method, I can’t use SUPER, but must dispatch directly:

sub host_wait {
    my ($self, $netloc) = @_;
    # First visit is for robots.txt, so let it be free.
    return if !$netloc || $self->no_visits($netloc) < 2;

Ugly, right? Yes, I am an evil bastard. “Fuck typing” is right, yo! At least it’s all encapsulated.

This just reinforces chromatic’s message in my mind. I’d sure love to see LWP reworked to use roles!

Looking for the comments? Try the old layout.