[RFC] Log::Dispatch::FileX

am 02.01.2007 23:35:08 von Craig Manley

Hi,

This is a file output handler for Log::Dispatch that is more suitable for use in multitasking environments.

It contains features I miss in both Log::Dispatch::File (doesn't do locking) and Log::Dispatch::File::Locked (locks persist
for as long as the file is open). In lack a naming fantasy, I temporarily called it Log::Dispatch::FileX.
This class is almost compatible with Log::Dispatch::File. The most important new features are these:

It can perform locking that is only active during writes (1st a non-blocking flock is attempted and if that fails, then a
blocking flock in a short timeout wrapper).

It has a close_after_modperl_request option as an alternative to leaving a file open forever (= can't be rolled-over) or
closing it after each write (= slow).

Below is the POD documentation. An installable source package is available here:
http://craigmanley.com/Log-Dispatch-FileX-1.01.tar.gz

As this is a RFC, I'ld appreciate some feedback on possible naming (of the module that is) and opinions because I don't want
to spam CPAN with redundent stuff. I've emailed the Log::Dispatch author too and I'm waiting for a reply.

-Craig Manley

--------------------
Module Documentation
--------------------

NAME
Log::Dispatch::FileX - Object for logging to file.

SYNOPSIS
use Log::Dispatch::FileX;

my $output = Log::Dispatch::FileX->new(
name => 'test',
min_level => 'info',
filename => 'logfile.txt',
);

$output->log( level => 'emerg', message => "Time to die." );

DESCRIPTION
This module provides an object for logging to files under the
Log::Dispatch::* system. Log messages are written using the flock file
locking mechanism per default on a per write basis which means this
module is suitable for sharing a log file in a multitasking environment.

METHODS
* new(%p)
This method takes a hash of parameters. The following options are
valid:

* name ($)
The name of the object (not the filename!). Required.

* min_level ($)
The minimum logging level this object will accept. See the
Log::Dispatch documentation on Log Levels for more
information. Required.

* max_level ($)
The maximum logging level this obejct will accept. See the
Log::Dispatch documentation on Log Levels for more
information. This is not required. By default the maximum is
the highest possible level (which means functionally that
the object has no maximum).

* filename ($)
The filename to be opened for appending.

* mode ($)
The mode the file should be opened with. Valid options are
'>' (write) and '>>' (append). The default is '>>' (append).

* perms ($)
If the file does not already exist, the permissions that it
should be created with. Optional. The argument passed must
be a valid octal value, such as 0600. It is affected by the
current or given umask.

* umask ($)
The optional umask to use when the file is created for the
first time.

* flock ($)
Whether or not log writes should be wrapped in a flock.
Defaults to true. If true, then for each logged message, a
non-blocking flock is attempted first, and if that fails,
then a blocking flock is attemped with a timeout.

* close_after_write ($)
Whether or not the file should be closed after each write.
This defaults to false. If set to true, then the mode will
aways be append, so that the file is not re-written for each
new message.

Note: opening and closing a file for each write is a
relatively slow process (especially on windoze systems) as
demonstrated in the performance benchmarks.

* close_after_modperl_request ($)
Only applicable for code running in a mod_perl (1 or 2)
environment and defaults to false. Set this to true if the
file should be closed after each mod_perl request which is
useful if you're using a persistent Log::Dispatch object and
intend to periodically roll your log files without having to
restart your web server each time.

* autoflush ($)
Whether or not the file should be autoflushed. This defaults
to true. If flock is true, then flushing always occurs no
matter what this is set to.

* callbacks( \& or [ \&, \&, ... ] )
This parameter may be a single subroutine reference or an
array reference of subroutine references. These callbacks
will be called in the order they are given and passed a hash
containing the following keys:

( message => $log_message, level => $log_level )

The callbacks are expected to modify the message and then
return a single scalar containing that modified message.
These callbacks will be called when either the "log" or
"log_to" methods are called and will only be applied to a
given message once.

* log_message( message => $ )
Sends a message to the appropriate output. Generally this shouldn't
be called directly but should be called through the "log()" method
(in Log::Dispatch::Output).

BENCHMARKS
FreeBSD 6.1 with a single Intel(R) Xeon(TM) CPU 3.60GHz
Measuring 10000 logs of using defaults...
Log::Dispatch::FileX... 0.739 seconds (avg 0.00007)
Log::Dispatch::File... 0.622 seconds (avg 0.00006)
Measuring 10000 logs of using autoflush=0, flock=0...
Log::Dispatch::FileX... 0.575 seconds (avg 0.00006)
Log::Dispatch::File... 0.574 seconds (avg 0.00006)
Measuring 10000 logs of using autoflush=1, flock=0...
Log::Dispatch::FileX... 0.618 seconds (avg 0.00006)
Log::Dispatch::File... 0.623 seconds (avg 0.00006)
Measuring 10000 logs of using flock=1...
Log::Dispatch::FileX... 0.739 seconds (avg 0.00007)

Measuring 10000 logs of using close_after_write=1, flock=0...
Log::Dispatch::FileX... 1.080 seconds (avg 0.00011)
Log::Dispatch::File... 1.035 seconds (avg 0.00010)
Measuring 10000 logs of using close_after_modperl_request=1, flock=1...
Log::Dispatch::FileX... 0.768 seconds (avg 0.00008)

Windoze XP with a Pentium CPU 3.0GHz
Measuring 10000 logs of using defaults...
Log::Dispatch::FileX... 1.235 seconds (avg 0.00012)
Log::Dispatch::File... 1.047 seconds (avg 0.00010)
Measuring 10000 logs of using autoflush=0, flock=0...
Log::Dispatch::FileX... 0.875 seconds (avg 0.00009)
Log::Dispatch::File... 0.907 seconds (avg 0.00009)
Measuring 10000 logs of using autoflush=1, flock=0...
Log::Dispatch::FileX... 1.063 seconds (avg 0.00011)
Log::Dispatch::File... 1.047 seconds (avg 0.00010)
Measuring 10000 logs of using flock=1...
Log::Dispatch::FileX... 1.251 seconds (avg 0.00013)

Measuring 10000 logs of using close_after_write=1, flock=0...
Log::Dispatch::FileX... 74.128 seconds (avg 0.00741)
Log::Dispatch::File... 79.660 seconds (avg 0.00797)

Note how rediculously slow Windoze is when close_after_write=1 is
used.

SEE ALSO
Log::Dispatch::File.

AUTHOR
Craig Manley

COPYRIGHT AND LICENSE
Copyright (C) 2007 Craig Manley This library is free software; you can
redistribute it and/or modify it under the same terms as Perl itself.

ACKNOWLEDGEMENTS
Dave Rolsky, author of the Log::Dispatch suite including
Log::Dispatch::File on which this module is based.

Sending output to Spreadsheet

am 03.01.2007 06:11:31 von dysgraphia

Hi, New to Perl, using ActiveState 5.8, Win XP

I'm not sure if this is the best ng...should misc be my first
port of call?

I am trying to adapt Brent Hughes rget-links.pl original code by
collecting discovered web links into a spreadsheet for later use.
BTW, Any errors are due to me, not Brent!

The code runs OK and prints out the web links found into the command window.

I have looked at the Spreadsheet::SimpleExcel module but I cannot work
out the syntax to get the accumulated links into my_List.xls file.

Any suggestions will be appreciated!
Cheers, Peter

#!/usr/bin/perl

use warnings;
use strict;

package RGetLinks;

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
use Getopt::Long;
use Spreadsheet::SimpleExcel;

$| = 1;

# global data for this program
my $depth;
my %files;

# command line options
my $opt_depth = 4;

# retrieve command line options
my $options = GetOptions ("depth=i" => \$opt_depth); # numeric

my $url = 'http://somesite/';

# abort if the options are improperly formatted
if(!defined $url){ usage(); }

# program enters actual processing at this point
rgetlinks($url,$opt_depth);

# create a new instance of Excel
my $excel = Spreadsheet::SimpleExcel->new();
# add worksheet
$excel->add_worksheet('Sheet1',{-headers => \@header, -data => \@data});
# print result into a file and handle error
$excel->output_to_file('c:/Documents and Settings/my_List.xls') or die
$excel->errstr();
$excel->errstr();

# Subroutines

# A routine to get links recursively
sub rgetlinks
{
my($url,$maxdepth) = @_;
chomp($url);

# initialize globals
$depth = 0;
%files = ();

# descend
rgetlinkshelper($url,$maxdepth);
}

# A helper routine to get links recursively
sub rgetlinkshelper
{
my($url,$maxdepth) = @_;

# return if too deep or already been here
if($depth >= $maxdepth || defined $files{$url})
{
return;
}
else
{
# drop down a level and add the file to the hash
$depth++; $files{$url} = 1;

# show our current location
foreach(1..$depth) {print ' ';}
print $url, "\n";

# retrieve all links
my @links = getlinks($url);

# recursive step
foreach(@links){ rgetlinkshelper($_,$maxdepth); }

# pop up a level # line 101
$depth--;
}
}

# A routine to return links from a URL
# Only retrieve links from text/html files.

my @links = ();

sub getlinks
{
my($url) = @_; # for instance
my $ua = new LWP::UserAgent;

# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
@links = ();
my $p = HTML::LinkExtor->new(\&callback);

# Look at the header to determine what type of document we have
my $headreq = HTTP::Request->new(HEAD => $url);
my $headres = $ua->request($headreq);
my $type = $headres->header('content-type');

# only parse the document for links if it is a text or html document
if(defined $type && $type =~ /text|html/)
{
# Request document and parse it as it arrives
my $getreq = HTTP::Request->new(GET => $url);
my $getres = $ua->request($getreq, sub{ $p->parse($_[0])});

# Expand all URLs to absolute ones
my $base = $getres->base;
@links = map { $_ = url($_, $base)->abs; } @links;
}

# Return the links
return @links;
}

# Set up a callback that collects links
sub callback {
my($tag, %attr) = @_;

return if $tag ne 'a'; # we only look closer at
push(@links, values %attr);
}

# A routine to provide instructions
sub usage
{
# strip the progname with a regex
my $progname = $0;
$progname =~ s/(.*\\|.*\/)(.*)/$2/g;

# show instructions
print "\nUsage:\n\t\t",
$progname, " [args] target-url > output-file\n\n",
"Example:\n\t\t",
$progname, " --depth=4 http://www.perl.org\n\n"; # depth=3

print "Options\n", "=======\n",

"--depth\t\t",
"The maximum depth of links to traverse (default = 3)\n";

exit();
}

Re: Sending output to Spreadsheet

am 03.01.2007 06:21:16 von dysgraphia

dysgraphia wrote:
> Hi, New to Perl, using ActiveState 5.8, Win XP
>
Apologies!...this post got stuck in the wrong thread...should be a new one
....Peter

Re: [RFC] Log::Dispatch::FileX

am 04.01.2007 11:37:09 von Dominique Dumont

Craig Manley writes:

> It contains features I miss in both Log::Dispatch::File (doesn't do
> locking) and Log::Dispatch::File::Locked (locks persist for as long
> as the file is open). In lack a naming fantasy, I temporarily called
> it Log::Dispatch::FileX. This class is almost compatible with
> Log::Dispatch::File. The most important new features are these:

To avoid creating yet another perl module, I'd rather see this
functionality added to Log::Dispatch::File::Locked as an option.

Could you check with Dave if your need fits with his view of
Log::Dispatch::File::Locked ?

Thanks

--
Dominique Dumont
"Delivering successful solutions requires giving people what they
need, not what they want." Kurt Bittner