WWW::Mechanize : Is immediate caching of images possible?

am 04.01.2008 23:09:07 von hikari.no.hohenheim

Traditionally when using WWW::Mechanize to dl images I first fetch the
root page:
my $mech = WWW::Mechanize->new();
$mech->get($url);

then proceed to find all images and 'get' them one by one: (forgive
the crude code)

my @links = $mech->find_all_images();
foreach my $link (@links){
my $imageurl = $link->url_abs();
$imageurl =~ m/([^\/]+)$/;
$mech->get($imageurl, ':content_file' => $1);
}

My current problem with this is that I'm trying to dl an image
generated with information from the session of the original
get($url). It's not a static *.jpg or something simple it's a black
box that displays an image relevant to the session. Meaning, when I
fetch the image (http://www.domain.com/image/ which is embedded in the
page) as shown above, it's a new request and I get a completely random
image.

Is there a way to cache the images that are loaded during the initial
get($url) so that the image matches the content of the page
retrieved? Or even to capture the session information transmitted to
the black box, domain.com/image/, so I can clone the information and
submit it with the get($imageurl)?

Ideally I would effectively like a routine like: $mech-
>getComplete($url,$directory); which would save the source and images/
etc associate with the page. Analogous to the Save-> Web page,
Complete in Firefox.

Thanks all. I think I'm getting pretty proficient with WWW::Mechanize
but don't be afraid to respond like I am an idiot so that we know your
answer doesn't go over my head.

Hikari

Re: WWW::Mechanize : Is immediate caching of images possible?

am 26.01.2008 22:27:01 von david

------=_Part_3342_8968132.1201382821969
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On 1/4/08, hikari.no.hohenheim@gmail.com
wrote:
>
> my $mech = WWW::Mechanize->new();
> $mech->get($url);
>
> my @links = $mech->find_all_images();
> foreach my $link (@links){
> my $imageurl = $link->url_abs();
> $imageurl =~ m/([^\/]+)$/;
> $mech->get($imageurl, ':content_file' => $1);
> }
>
> My current problem with this is that I'm trying to dl an image
> generated with information from the session of the original
> get($url).

How is the session created, and passed to the image?

If the session is established by a cookie within the HTTP response headers,
your first request should create the cookie and WWW::Mechanize should
remember it. Your subsequent requests for the images should be sent the
same cookie, and your image requests should receive it and be able to
associate it with the same session.

If your session is established by a cookie set by JavaScript, then you'll
need to parse the cookie value out of the page (JavaScript) content yourself
and set up your own cookie jar explicitly, and populate it with this value
once you discover it.

A web browser isn't doing too much more than the logic you have above. HTTP
is inherently stateless, and aside from mechanisms like the HTTP Referer
request header, cookies, and the URL itself, there's little else that the
browser can do to carry information from one request to the next. I believe
WWW::Mechanize supports all of this, so if the above doesn't help, more
information may be needed.

David

------=_Part_3342_8968132.1201382821969--