Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

/proc/kallsyms format, sqldatasource dal, wwwxxxenden, convert raid5 to raid 10 mdadm, apache force chunked, nrao wwwxxx, xxxxxdup, procmail change subject header, wwwXxx not20, Wwwxxx.doks sas

Links

XODOX
Impressum

#1: WWW::Mechanize doesn"t always follow_link(text

Posted on 2008-04-20 19:45:13 by mikaelb

I'm using WWW::Mechanize 1.34 and have a problem.
This doesn't work:
$agent->follow_link(text => 'Edit Librarians', n => 1);
It doesn't work in the sense that the link isn't followed and the $agent
is still on the same page. Is there a bug in my code or is there a known
bug in WWW::Mechanize. I've tried to change   to space but that
didn't work.

This works:
$agent->follow_link(url_regex => qr/librarians/, n => 1);

The corresponding XHTML code is:
<a href="mkbAdmin?func=librarians&amp;lang=en">Edit&nbsp;Librarians</a>

I want it to work since I use HTTP::Recorder to generate the code
automatically as I surf using a proxy and it generates code of the type
that doesn't work.

This works:
$agent->follow_link(text => 'Logout', n => 1);

By the way HTTP::Recorder actually generates:
$agent->follow_link(text => 'Edit&nbsp;Librarians', n => '1');

Report this message

#2: Re: WWW::Mechanize doesn"t always follow_link(text

Posted on 2008-04-21 19:34:49 by John Bokma

"M.O.B. i L." <mikaelb@df.lth.se> wrote:

> I'm using WWW::Mechanize 1.34 and have a problem.
> This doesn't work:
> $agent->follow_link(text => 'Edit&nbsp;Librarians', n => 1);
> It doesn't work in the sense that the link isn't followed and the $agent
> is still on the same page. Is there a bug in my code or is there a known
> bug in WWW::Mechanize. I've tried to change &nbsp; to space but that
> didn't work.
>
> This works:
> $agent->follow_link(url_regex => qr/librarians/, n => 1);
>
> The corresponding XHTML code is:
> <a href="mkbAdmin?func=librarians&amp;lang=en">Edit&nbsp;Librarians</a>
>
> I want it to work since I use HTTP::Recorder to generate the code
> automatically as I surf using a proxy and it generates code of the type
> that doesn't work.
>
> This works:
> $agent->follow_link(text => 'Logout', n => 1);
>
> By the way HTTP::Recorder actually generates:
> $agent->follow_link(text => 'Edit&nbsp;Librarians', n => '1');

HTML::TreeBuilder, or a module it's using, returns &nbsp; as a single
character, it might be that you have to
use the code instead.

Comment on http://johnbokma.com/perl/search-term-suggestion-tool.html
says: (&nbsp;, stored as char 225)

So you might want to try: "Edit\xe1Librarians".

Wild guess.

--
John

Arachnids near Coyolillo
http://johnbokma.com/perl/

Report this message

#3: Re: WWW::Mechanize doesn"t always follow_link(text

Posted on 2008-04-23 14:09:04 by mikaelb

John Bokma wrote:
> "M.O.B. i L." <mikaelb@df.lth.se> wrote:
>
>> I'm using WWW::Mechanize 1.34 and have a problem.
>> This doesn't work:
>> $agent->follow_link(text => 'Edit&nbsp;Librarians', n => 1);
>> It doesn't work in the sense that the link isn't followed and the $agent
>> is still on the same page. Is there a bug in my code or is there a known
>> bug in WWW::Mechanize. I've tried to change &nbsp; to space but that
>> didn't work.
>>
>> This works:
>> $agent->follow_link(url_regex => qr/librarians/, n => 1);
>>
>> The corresponding XHTML code is:
>> <a href="mkbAdmin?func=librarians&amp;lang=en">Edit&nbsp;Librarians</a>
>>
>> I want it to work since I use HTTP::Recorder to generate the code
>> automatically as I surf using a proxy and it generates code of the type
>> that doesn't work.
>>
>> This works:
>> $agent->follow_link(text => 'Logout', n => 1);
>>
>> By the way HTTP::Recorder actually generates:
>> $agent->follow_link(text => 'Edit&nbsp;Librarians', n => '1');
>
> HTML::TreeBuilder, or a module it's using, returns &nbsp; as a single
> character, it might be that you have to
> use the code instead.
>
> Comment on http://johnbokma.com/perl/search-term-suggestion-tool.html
> says: (&nbsp;, stored as char 225)
>
> So you might want to try: "Edit\xe1Librarians".
>
> Wild guess.
>
Thanks! But it should be \xa0. First I tried matching with regular
expressions and that worked using . (dot) for the unknown character. I
then found this page about &nbsp;
<http://www.w3.org/International/questions/qa-escapes> where it says:
"An example of an ambiguous character is 00A0: NO-BREAK SPACE. This type
of space prevents line breaking, but it looks just like any other space
when used as a character. Using &nbsp; (or &#xA0;) makes it quite clear
where such spaces appear in the text.".

So this works:
$agent->follow_link(text => "Edit\xa0Librarians", n => 1);

Report this message

#4: Re: WWW::Mechanize doesn"t always follow_link(text

Posted on 2008-04-23 18:07:27 by John Bokma

"M.O.B. i L." <mikaelb@df.lth.se> wrote:

> John Bokma wrote:

[..]

>> HTML::TreeBuilder, or a module it's using, returns &nbsp; as a single
>> character, it might be that you have to
>> use the code instead.
>>
>> Comment on http://johnbokma.com/perl/search-term-suggestion-tool.html
>> says: (&nbsp;, stored as char 225)
>>
>> So you might want to try: "Edit\xe1Librarians".
>>
>> Wild guess.
>>
> Thanks! But it should be \xa0.

Yeah, but HTML::TreeBuilder returns it as 225 :-D.

[..]

> So this works:
> $agent->follow_link(text => "Edit\xa0Librarians", n => 1);

Glad my post was able to help you in the right way.

--
John

http://johnbokma.com/perl/

Report this message

#5: Re: WWW::Mechanize doesn"t always follow_link(text

Posted on 2008-04-24 19:17:55 by mikaelb

M.O.B. i L. wrote:
> Thanks! But it should be \xa0. First I tried matching with regular
> expressions and that worked using . (dot) for the unknown character. I
> then found this page about &nbsp;
> <http://www.w3.org/International/questions/qa-escapes> where it says:
> "An example of an ambiguous character is 00A0: NO-BREAK SPACE. This type
> of space prevents line breaking, but it looks just like any other space
> when used as a character. Using &nbsp; (or &#xA0;) makes it quite clear
> where such spaces appear in the text.".
>
> So this works:
> $agent->follow_link(text => "Edit\xa0Librarians", n => 1);

I add that I have developed these command lines to convert back and forth:
sed -i '/&nbsp;/s/&nbsp;/\\xa0/g;/\\xa0/s/'\''/"/g' MKBTest.pl
sed -i '/\\xa0/s/\\xa0/\&nbsp;/g;/&nbsp;/s/"/'\''/g' MKBTest.pl

Report this message