Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

bind-address mysql multiple, sanibleone xxxx, ftp://192.168.100.100/, www.xxxcon, which comes first ob_start or session, wwwxxx/58/2010, xxxxdup, xxxxdup, mailx informatii, should producers of software-based services, such as atms, be held liable for economic injuries suffered when their systems fail?

Links

XODOX
Impressum

#1: WWW::Mechanize : Is immediate caching of images possible?

Posted on 2008-01-04 23:09:07 by hikari.no.hohenheim

Traditionally when using WWW::Mechanize to dl images I first fetch the
root page:
my $mech = WWW::Mechanize->new();
$mech->get($url);

then proceed to find all images and 'get' them one by one: (forgive
the crude code)

my @links = $mech->find_all_images();
foreach my $link (@links){
my $imageurl = $link->url_abs();
$imageurl =~ m/([^\/]+)$/;
$mech->get($imageurl, ':content_file' => $1);
}

My current problem with this is that I'm trying to dl an image
generated with information from the session of the original
get($url). It's not a static *.jpg or something simple it's a black
box that displays an image relevant to the session. Meaning, when I
fetch the image (http://www.domain.com/image/ which is embedded in the
page) as shown above, it's a new request and I get a completely random
image.

Is there a way to cache the images that are loaded during the initial
get($url) so that the image matches the content of the page
retrieved? Or even to capture the session information transmitted to
the black box, domain.com/image/, so I can clone the information and
submit it with the get($imageurl)?

Ideally I would effectively like a routine like: $mech-
>getComplete($url,$directory); which would save the source and images/
etc associate with the page. Analogous to the Save-> Web page,
Complete in Firefox.

Thanks all. I think I'm getting pretty proficient with WWW::Mechanize
but don't be afraid to respond like I am an idiot so that we know your
answer doesn't go over my head.

Hikari

Report this message

#2: Re: WWW::Mechanize : Is immediate caching of images possible?

Posted on 2008-01-26 22:27:01 by david

------=_Part_3342_8968132.1201382821969
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On 1/4/08, hikari.no.hohenheim@gmail.com <hikari.no.hohenheim@gmail.com>
wrote:
>
> my $mech = WWW::Mechanize->new();
> $mech->get($url);
>
> my @links = $mech->find_all_images();
> foreach my $link (@links){
> my $imageurl = $link->url_abs();
> $imageurl =~ m/([^\/]+)$/;
> $mech->get($imageurl, ':content_file' => $1);
> }
>
> My current problem with this is that I'm trying to dl an image
> generated with information from the session of the original
> get($url).


How is the session created, and passed to the image?

If the session is established by a cookie within the HTTP response headers,
your first request should create the cookie and WWW::Mechanize should
remember it. Your subsequent requests for the images should be sent the
same cookie, and your image requests should receive it and be able to
associate it with the same session.

If your session is established by a cookie set by JavaScript, then you'll
need to parse the cookie value out of the page (JavaScript) content yourself
and set up your own cookie jar explicitly, and populate it with this value
once you discover it.

A web browser isn't doing too much more than the logic you have above. HTTP
is inherently stateless, and aside from mechanisms like the HTTP Referer
request header, cookies, and the URL itself, there's little else that the
browser can do to carry information from one request to the next. I believe
WWW::Mechanize supports all of this, so if the above doesn't help, more
information may be needed.

David

------=_Part_3342_8968132.1201382821969--

Report this message