URI.pm error

URI.pm error

am 24.01.2006 13:13:20 von njh

--------------080807010401000200020000
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

What are the circumstances under which this error appears, and what
can be done to avoid it?

"Use of uninitialized value in substitution iterator at /usr/share/perl5/URI.pm line 76."

-Nigel

--
Nigel Horne. Arranger, Adjudicator, Band Trainer, Composer, Typesetter.
NJH Music, Barnsley, UK. ICQ#20252325
njh@bandsman.co.uk http://www.bandsman.co.uk


--------------080807010401000200020000--

Re: URI.pm error

am 24.01.2006 13:29:26 von gisle

Nigel Horne writes:

> What are the circumstances under which this error appears, and what
> can be done to avoid it?
>
> "Use of uninitialized value in substitution iterator at /usr/share/perl5/URI.pm line 76."

What kind of statement do you find at this line? What version of
URI.pm do you have installed?

--Gisle

Re: URI.pm error

am 24.01.2006 14:31:46 von njh

--------------030009010904020905050106
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Gisle Aas wrote:
> Nigel Horne writes:
>
>
>>What are the circumstances under which this error appears, and what
>>can be done to avoid it?
>>
>>"Use of uninitialized value in substitution iterator at /usr/share/perl5/URI.pm line 76."
>
>
> What kind of statement do you find at this line? What version of
> URI.pm do you have installed?

Man URI doesn't tell you the version, but running 'r' within CPAN shows
no updates available, so I must presume it is the most up to date
version.

I have two calls to URI in my code, it's difficult to tell which
is causing the error at the moment, but there are few calls to URI
in the code so it can't be that difficult to see:

[njh@njh cgi-bin]$ fgrep URI search.cgi
use URI;
require URI::URL;
$URI::ABS_REMOTE_LEADING_DOTS = 1;
$URI::ABS_REMOTE_LEADING_DOTS = 1;
$newurl = URI->new_abs($newurl,
$webdoc->base);
my $nexturl = URI->new_abs($1 . ".htm",
$url);
$nexturl = URI->new_abs($nexturl, $base);
$title = $urltitles{$home} . ' (' .
URI->new($url)->path . ')';

> --Gisle

-Nigel

--
Nigel Horne. Arranger, Adjudicator, Band Trainer, Composer, Tutor,
Typesetter.
NJH Music, Barnsley, UK. ICQ#20252325
njh@bandsman.co.uk http://www.bandsman.co.uk

--------------030009010904020905050106--

Re: URI.pm error

am 24.01.2006 14:45:29 von gisle

Nigel Horne writes:

> Gisle Aas wrote:
> > Nigel Horne writes:
> >
> >>What are the circumstances under which this error appears, and what
> >>can be done to avoid it?
> >>
> >>"Use of uninitialized value in substitution iterator at /usr/share/perl5/URI.pm line 76."
> > What kind of statement do you find at this line? What version of
> > URI.pm do you have installed?
>
> Man URI doesn't tell you the version, but running 'r' within CPAN shows
> no updates available, so I must presume it is the most up to date
> version.

You could run 'grep VERSION /usr/share/perl5/URI.pm' to get the
version number out of the file. If you have the same version as me
you will find:

$ perl -ne 'print if 72..76' /usr/share/perl5/URI.pm
sub _init
{
my $class = shift;
my($str, $scheme) = @_;
$str =~ s/([^$uric\#])/$URI::Escape::escapes{$1}/go;

so the warning means that perl finds characters in your URI that does
not have corresponding entries in %URI::Escape::escapes. Hmm, I guess
that can happen if you feed it strings with Unicode chars outside the
Latin 1 range. Can that be the case?

--Gisle

Re: URI.pm error

am 24.01.2006 17:24:39 von Andy

On Jan 24, 2006, at 6:13 AM, Nigel Horne wrote:

> What are the circumstances under which this error appears, and what
> can be done to avoid it?
>
> "Use of uninitialized value in substitution iterator at /usr/share/
> perl5/URI.pm line 76."

I think you're the best to tell us what the circumstances are, by
showing us the code that causes it, and the data you're using.

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Re: URI.pm error

am 25.01.2006 17:30:39 von njh

--------------080801040702060800070305
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

> You could run 'grep VERSION /usr/share/perl5/URI.pm' to get the
> version number out of the file.

[njh@njh ~]$ grep VERSION /usr/share/perl5/URI.pm
grep: /usr/share/perl5/URI.pm: No such file or directory
[njh@njh ~]$ grep VERSION /usr/lib/perl5/vendor_perl/5.8.6/URI.pm
use vars qw($VERSION);
$VERSION = "1.35"; # $Date: 2004/11/05 14:17:33 $
[njh@njh ~]$

It's taken 2 days of hard graft to nail down a small[ish]
program that will reproduce it. Probably it's a character
set decoding problem in my code. Here is is:

#!/usr/bin/perl -wT

use strict;
use HTML::SimpleLinkExtor;
use WWW::RobotRules::AnyDBM_File;
use LWP::RobotUA;
use LWP::Charset;
use Encode;

my $url = 'http://www5b.biglobe.ne.jp/~ubs/html/history.html';

my $rules =
WWW::RobotRules::AnyDBM_File->new('www.bandsman.co.uk/Spider ',
'/tmp/robots.cache');

my $robot = LWP::RobotUA->new('www.bandsman.co.uk/Spider',
'njh@despammed.com', $rules);
$robot->timeout(20);
$robot->protocols_allowed(['http']); # disabling all others

$robot->env_proxy();
my $request = new HTTP::Request 'GET' => $url;
my $webdoc = $robot->simple_request($request);
my $content = $webdoc->content;

my $extor = HTML::SimpleLinkExtor->new($url);

unless($extor) {
die "Couldn't start extor\n";
}

my $charset = LWP::Charset::getCharset($webdoc);

if($charset) {
# print "$url: Charset is $charset\n";
if($charset =~ /(.+),/) {
$charset = $1;
}
if(Encode::resolve_alias($charset)) {
if($charset eq 'Shift_JIS') {
$content =
ShiftJIS::X0213::MapUTF::sjis2004_to_utf8($content);
$content = Encode::decode_utf8($content);
} elsif($charset ne 'us-ascii') {
$content = Encode::decode($charset, $content);
}
} else {
die "$url: Has an unknown character set: $charset\n";
}
}

$extor->parse($content);

URLLOOP: foreach ($extor->links) {
# print "Considering $_\n";
next URLLOOP if(/^(mailto|news|javascript|clsid):/i);
next URLLOOP if(/^(ftp:\/\/|\#.+)/i);

if(/^file:/i) {
# print "File protocol not supported since that does
not work over the Internet\n";
next URLLOOP;
}
# Remove any CGI arguments to get the bare page
# Watch the very broken
# http://www.mvlausen.ch/index.php
# Don't anchor - do the whole doc
my $page = $_;
$page =~ s/(\?|\#).*$//;

# Handle the equally broken
# http://watfordband.org.uk/~greg/band/news/
# which just keeps on scrolling back and back
if($page =~ /(.+\.php)\/.+/) {
$page = $1;
}

# Remove double slashes from the url. They
# are valid according to RFC2398, but they confuse us
# TODO: https
if($page =~ /^http:\/\/(.*\/\/)/) {
$page = $1;
$page =~ s/\/\//\//g;
$page = 'http://' . $page;
} elsif($page !~ /^http:\/\//) {
# Doesn't start http - we can remove double
# slash easily
if($page =~ /.*\/\/.*/) {
$page =~ s/\/\//\//g;
}
}

# print "Found: $page\n";
}


--
Nigel Horne. Arranger, Adjudicator, Band Trainer, Composer, Tutor,
Typesetter.
NJH Music, Barnsley, UK. ICQ#20252325
njh@bandsman.co.uk http://www.bandsman.co.uk

--------------080801040702060800070305--