LWP::Parallel::RobotUA - A class for Parallel Web Robots
require LWP::Parallel::RobotUA;
$ua = new LWP::Parallel::RobotUA 'my-robot/0.1', 'me@foo.com';
$ua->delay(0.5); # in minutes!
...
# just use it just like a normal LWP::Parallel::UserAgent
$ua->register ($request, \&callback, 4096); # or
$ua->wait ( $timeout );
This class implements a user agent that is suitable for robot
applications. Robots should be nice to the servers they visit. They
should consult the /robots.txt file to ensure that they are welcomed
and they should not make requests too frequently.
But, before you consider writing a robot take a look at
<URL:http://info.webcrawler.com/mak/projects/robots/robots.html>.
When you use a LWP::Parallel::RobotUA as your user agent, then you do not
really have to think about these things yourself. Just send requests
as you do when you are using a normal LWP::Parallel::UserAgent and this
special agent will make sure you are nice.
The LWP::Parallel::RobotUA is a sub-class of LWP::Parallel::UserAgent
and LWP::RobotUA and implements a mix of their methods.
In addition to LWP::Parallel::UserAgent, these methods are provided:
Your robot's name and the mail address of the human responsible for
the robot (i.e. you) are required by the constructor.
Optionally it allows you to specify the WWW::RobotRules object to
use. (See the WWW::RobotRules::AnyDBM_File manpage for persistent caching of
robot rules in a local file)
Set/Get the minimum delay between requests to the same server. The
default is 1 minute.
Note: Previous versions of LWP Parallel-Robot used Seconds instead of
Minutes! This is now compatible with LWP Robot.
Returns the number of seconds you must wait before you can make a new
request to this server. This method keeps track of all of the robots
connection, and enforces the delay constraint specified via the delay
method above for each server individually.
Note: Although it says 'host', it really means 'netloc/server',
i.e. it differentiates between individual servers running on different
ports, even though they might be on the same machine ('host'). This
function is mostly used internally, where RobotUA calls it to find out
when to send the next request to a certain server.
Returns a string that describes the state of the UA.
Mainly useful for debugging.
the LWP::Parallel::UserAgent manpage, the LWP::RobotUA manpage, the WWW::RobotRules manpage
Copyright 1997-2004 Marc Langheinrich <marclang@cpan.org>
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
|