Javascript Execution

Javascript Execution

am 17.12.2005 16:19:53 von webmaster

Hi,

I am working on a http agent which harvests various products and prices on
a number of websites. The problem I run into is that some sites use
javascript.

After downloading a web page from my perl script, what I want to do is:
1. Execute any existing javascript on the page.
2. Modify the page according to the javascript results.
3. Save the page to a local file.

Hence I want to do the exact thing that a normal web browser does but
instead of writing to a browser window the output shall be written to a file.

Does anyone know of any possible solution or hint?

All suggestions kindly appreciated!

Regards,
Erik Axelkrans

Re: Javascript Execution

am 17.12.2005 18:05:20 von Andy

> I am working on a http agent which harvests various products and
> prices on
> a number of websites. The problem I run into is that some sites use
> javascript.

First, look at WWW::Mechanize to make most of your job easier.

Second, there is no client that does Javascript. See the
WWW::Mechanize FAQ

http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/ FAQ.pod

xoxo,
Andy

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Re: Javascript Execution

am 17.12.2005 18:08:38 von derhoermi

* webmaster@awwwsol.com wrote:
>I am working on a http agent which harvests various products and prices on
>a number of websites. The problem I run into is that some sites use
>javascript.

There is Win32::IE::Mechanize.

>After downloading a web page from my perl script, what I want to do is:
>1. Execute any existing javascript on the page.
>2. Modify the page according to the javascript results.
>3. Save the page to a local file.

Note that the scripts might not terminate, so you might get the DOM
at a specific point, but there is not necessarily a specific result.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Re: Javascript Execution

am 17.12.2005 18:16:29 von hartct

There are also JavaScript engines available in C and Java
(SpiderMonkey and Rhino, respectively, available on mozilla.org). You
may be able to leverage those.

Chris

On 12/17/05, Bjoern Hoehrmann wrote:
> * webmaster@awwwsol.com wrote:
> >I am working on a http agent which harvests various products and prices =
on
> >a number of websites. The problem I run into is that some sites use
> >javascript.
>
> There is Win32::IE::Mechanize.
>
> >After downloading a web page from my perl script, what I want to do is:
> >1. Execute any existing javascript on the page.
> >2. Modify the page according to the javascript results.
> >3. Save the page to a local file.
>
> Note that the scripts might not terminate, so you might get the DOM
> at a specific point, but there is not necessarily a specific result.
> --
> Björn Höhrmann =B7 mailto:bjoern@hoehrmann.de =B7 http://bjoern.hoehr=
mann.de
> Weinh. Str. 22 =B7 Telefon: +49(0)621/4309674 =B7 http://www.bjoernsworld=
..de
> 68309 Mannheim =B7 PGP Pub. KeyID: 0xA4357E78 =B7 http://www.websitedev.d=
e/
>

Re: Javascript Execution

am 17.12.2005 18:48:36 von derhoermi

* Christopher Hart wrote:
>There are also JavaScript engines available in C and Java
>(SpiderMonkey and Rhino, respectively, available on mozilla.org). You
>may be able to leverage those.

Though note that the engines alone won't help much here, you'd need an
implementation of the various APIs the sites use aswell (e.g., the DOM
APIs to manipulate the document). There are of course several such
implementations available that interact well with the two engines, it
might however be difficult to reuse them.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Re: Javascript Execution

am 17.12.2005 20:10:04 von Andy

On Sat, Dec 17, 2005 at 12:16:29PM -0500, Christopher Hart (hartct@gmail.com) wrote:
> There are also JavaScript engines available in C and Java
> (SpiderMonkey and Rhino, respectively, available on mozilla.org). You
> may be able to leverage those.

I didn't know about SpiderMonkey. I'm going to have a look at it to see
if it will fit into WWW::Mechanize.

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Re: Javascript Execution

am 17.12.2005 21:39:47 von webmaster

Thank you all for your responses. WWW::Mechanize is new to me and seems
very good.

What about: Mozilla::Mechanize

Does it interpret javascript?

Erik

Re: Javascript Execution

am 17.12.2005 22:39:27 von vskytta

On Sat, 2005-12-17 at 13:10 -0600, Andy Lester wrote:

> I didn't know about SpiderMonkey. I'm going to have a look at it to see
> if it will fit into WWW::Mechanize.

FYI, there's also http://search.cpan.org/dist/JavaScript-SpiderMonkey/

Re: Javascript Execution

am 18.12.2005 00:50:19 von jalotta

You might also look into using a scriptable browser. I think the =20
Mozilla organization has
something along these lines.

On Dec 17, 2005, at 11:16 AM, Christopher Hart wrote:

> There are also JavaScript engines available in C and Java
> (SpiderMonkey and Rhino, respectively, available on mozilla.org). You
> may be able to leverage those.
>
> Chris
>
> On 12/17/05, Bjoern Hoehrmann wrote:
>> * webmaster@awwwsol.com wrote:
>>> I am working on a http agent which harvests various products and =20
>>> prices on
>>> a number of websites. The problem I run into is that some sites use
>>> javascript.
>>
>> There is Win32::IE::Mechanize.
>>
>>> After downloading a web page from my perl script, what I want to =20
>>> do is:
>>> 1. Execute any existing javascript on the page.
>>> 2. Modify the page according to the javascript results.
>>> 3. Save the page to a local file.
>>
>> Note that the scripts might not terminate, so you might get the DOM
>> at a specific point, but there is not necessarily a specific result.
>> --
>> Björn Höhrmann =B7 mailto:bjoern@hoehrmann.de =B7 http://=20
>> bjoern.hoehrmann.de
>> Weinh. Str. 22 =B7 Telefon: +49(0)621/4309674 =B7 http://=20
>> www.bjoernsworld.de
>> 68309 Mannheim =B7 PGP Pub. KeyID: 0xA4357E78 =B7 http://=20
>> www.websitedev.de/
>>

Re: Javascript Execution

am 18.12.2005 01:59:40 von Andy

>
> What about: Mozilla::Mechanize
>
> Does it interpret javascript?

I don't know. I imagine the docs tell you.

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Re: Javascript Execution

am 18.12.2005 15:45:22 von jjl

On Sat, 17 Dec 2005, Andy Lester wrote:

> On Sat, Dec 17, 2005 at 12:16:29PM -0500, Christopher Hart (hartct@gmail.com) wrote:
> > There are also JavaScript engines available in C and Java
> > (SpiderMonkey and Rhino, respectively, available on mozilla.org). You
> > may be able to leverage those.
>
> I didn't know about SpiderMonkey. I'm going to have a look at it to see
> if it will fit into WWW::Mechanize.

Hi Andy

As I've posted about here before a few times (search Gmane), I actually
did this with my Python port of WWW::Mechanize a few years back, using
spidermonkey. My implementation was a first-cut half-baked thing, but I
did get it working for a few pages. I decided that was enough excitement
for me ;-) I know a few people used it for projects of their own and
improved on it a bit, though (eg. one guy used it in a college project to
make JS-using pages accessible on non-JS devices, by having a proxy server
and executing the JS there -- nice idea). The code is still available at
wwwsearch.sf.net

I made use of the Perl wrapper of SpiderMonkey to write something very
similar for Python. IIRC, I had to extend it a little over what was in
the Perl thing.

I used an existing HTML DOM, but had to modify both the DOM, and of course
the DOM builder (and add event stuff and browser object model). This is
where the work lies :-) If you intend to try this, and you're not
intimately familiar with the bizarre ways in which people can and do use