Fuck Typing LWP

I'm working on a project that fetches various files from the Internet via LWP. I wanted to make sure that I was a polite user, such that my app would pay attention to Last-Modified/If-Modified-Since and ETag/If-None-Match headers. And in most contexts I also want to respect the robots.txt file on the hosts to which I'm sending requests. So I was very interested to read chromatic’s hack for this very issue. I happily implemented two classes for my app, MyApp::UA, which inherits from LWP::UserAgent::WithCache, and MyApp::UA::Robot, which inherits from MyApp::UA but changes LWP::UserAgent::WithCache to inherit from LWP::UARobot:

@LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');

So far so good, right? Well, no. What I didn’t think about, stupidly, is that by changing LWP::UserAgent::WithCache’s base class, I was doing so globally. So now both MyApp::UA and MyApp::UA::Robot were getting the LWP::RobotUA behavior. Urk.

So my work around is to use a little fuck typing to ensure that MyApp::UA::Robot has the robot behavior but MyApp::UA does not. Here’s what it looks like (BEWARE: black magic ahead!):

package MYApp::UA::Robot;

use 5.12.0;
use utf8;
use parent 'MyApp::UA';
use LWP::RobotUA;

do {
    # Import the RobotUA interface. This way we get its behavior without
    # having to change LWP::UserAgent::WithCache's inheritance.
    no strict 'refs';
    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k ne 'new';
    }
};

sub new {
    my ($class, $app) = (shift, shift);
    # Force RobotUA configuration.
    local @LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');
    return $class->SUPER::new(
        $app,
        delay => 1, # be very nice -- max one hit per minute.
    );
}

The do block is where I do the fuck typing. It iterates over all the symbols in LWP::RobotUA, inserts a reference to all subroutines into the current package. Except for new, which I implement myself. This is so that I can keep my inheritance from MyApp::UA intact. But in order for it to properly configure the LWP::RobotUA interface, new must temporarily fool Perl into thinking that LWP::UserAgent::WithCache inherits from LWP::RobotUA.

Pure evil, right? Wait, it gets worse. I've also overridden LWP::RoboUA’s host_wait method, because if it’s the second request to a given host, I don’t want it to sleep (the first request is for the robots.txt, and I see no reason to sleep after that). So I had to modify the do block to skip both new and host_wait:

    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k !~ /^(?:new|host_wait)$/;
    }

If I “override” any other LWP::RobotUA methods, I'll need to remember to add them to that regex. Of course, since I'm not actually inheriting from LWP::RobotUA, in order to dispatch to its host_wait method, I can’t use SUPER, but must dispatch directly:

sub host_wait {
    my ($self, $netloc) = @_;
    # First visit is for robots.txt, so let it be free.
    return if !$netloc || $self->no_visits($netloc) < 2;
    $self->LWP::RobotUA::host_wait($netloc);
}

Ugly, right? Yes, I am an evil bastard. “Fuck typing” is right, yo! At least it’s all encapsulated.

This just reinforces chromatic’s message in my mind. I'd sure love to see LWP reworked to use roles!

Backtalk

Nilson wrote:

I guess this is the kind of thing that makes Perl a very nice tool.

The code is ugly as hell and you can be sure people not familiar with Perl will generally frown upon the snippets above.

But at least it's possible - try doing that in Java.

David Rolsky wrote:

Why "do"?

Take it away and you get the exact same behavior.

Theory wrote:

@Dave—

The do limits the scope of no strict 'refs';. Yes, I could omit it, but I like to see a string in front of the braces. I could change it to a label, I suppose.

—Theory

David Rolsky wrote:

Actually, it's surrounding braces that limit the scope. The do is just noise.

Aristotle Pagaltzis wrote:

*{$k} = *{$v}{CODE} if *{$v}{CODE} and not *{$k}{CODE};

Aristotle Pagaltzis wrote:

Let me say, however:

I don’t understand why you go through all that trouble. You shouldn’t make WithCache inherit from RobotUA, you should just make RobotUA inherit from WithCache. Then you can have UA objects that have either WithCache behaviour or WithCache plus RobotUA behaviour.

Only if you needed all three possible combinations (only WithCache + only RobotUA + both WithCache and RobotUA) would you need to play any tricks like these.

Theory wrote:

@Dave—I know that. I just like having some text in front of the braces.

@Arsitsotle—Yes, I could probably do that. I'd have to have a bit of duplicate code in the two classes, but it'd be better not to have the one inherit from the other. I'll fiddle with that this week.

Actually, I could probably go ahead had always use the combination of the two by having host_wait() always return false. Perhaps I'll send in a patch to make delay => 0 acceptable as "no delay."

—Theory