AJAX/Google Pages
am 16.06.2006 21:13:47 von rperry
------=_Part_4534_33234231.1150485227636
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
I want to access Google Pages. Since there is no native API I thought I
could use LWP. Can I since the site is AJAX based?
Thanks!
------=_Part_4534_33234231.1150485227636--
RE: AJAX/Google Pages
am 16.06.2006 21:16:24 von Andrew.Johnson
There is are Google APIs:
http://code.google.com/
-----Original Message-----
From: Ryan Perry [mailto:rperry@madisonip.com]
Sent: Fri 6/16/2006 3:13 PM
To: libwww@perl.org
Subject: AJAX/Google Pages
=20
I want to access Google Pages. Since there is no native API I thought I
could use LWP. Can I since the site is AJAX based?
Thanks!
Re: AJAX/Google Pages
am 16.06.2006 21:17:35 von Andy
On Jun 16, 2006, at 2:13 PM, Ryan Perry wrote:
> I want to access Google Pages. Since there is no native API I
> thought I
> could use LWP. Can I since the site is AJAX based?
No.
--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
Re: AJAX/Google Pages
am 16.06.2006 21:23:59 von rperry
------=_Part_4742_6097275.1150485839549
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
But not for Google Pages.
I'd like to run this by this list:
What about using libjs from Mozilla along side LWP, then is it possible? Or
is that going to be a headache?
Thanks for your advice!
Ryan
On 6/16/06, Andrew Johnson wrote:
>
>
> There is are Google APIs:
>
> http://code.google.com/
>
>
------=_Part_4742_6097275.1150485839549--
Re: AJAX/Google Pages
am 16.06.2006 21:38:46 von peter.stevens
Yes and no. AJAX is pretty heavily into Java script - you'll be spending
a lot of time with Tamper Data to figure out what data is being sent
back and forth. You can look at and reverse engineer the java script.
Can it be done? Certainly. Is it easy? No. Is it practical? Well that
depends on the complexity of the javascript environment and how much
time & energy you have for the problem.
We need a Javascript + DOM implementation on top of Mechanize!
Cheers,
Peter
Ryan Perry wrote:
> I want to access Google Pages. Since there is no native API I thought I
> could use LWP. Can I since the site is AJAX based?
>
> Thanks!
>
--
------------------------------------------------------------ ----------
Peter Stevens The Free Cell Phone Monitoring Service
www.MinuteWatcher.com
Re: AJAX/Google Pages
am 16.06.2006 21:40:39 von Andy
On Jun 16, 2006, at 2:38 PM, Peter Stevens wrote:
> We need a Javascript + DOM implementation on top of Mechanize!
Patches welcome, my brother. Patches welcome.
--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
Re: AJAX/Google Pages
am 16.06.2006 22:25:42 von peter.stevens
The impossible we do immediately. That which is trully diffucult, takes
a wee bit longer ;-)
Andy Lester wrote:
>
> On Jun 16, 2006, at 2:38 PM, Peter Stevens wrote:
>
>> We need a Javascript + DOM implementation on top of Mechanize!
>
>
> Patches welcome, my brother. Patches welcome.
>
> --
> Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
>
>
>
>
>
--
------------------------------------------------------------ ----------
Peter Stevens Phone: +41 43 535 8517
www.MinuteWatcher.com Fax: +41 44 544 8392
Re: AJAX/Google Pages
am 17.06.2006 01:47:27 von rperry
On Jun 16, 2006, at 2:38 PM, Peter Stevens wrote:
> Yes and no. AJAX is pretty heavily into Java script - you'll be
> spending a lot of time with Tamper Data to figure out what data is
> being sent back and forth. You can look at and reverse engineer the
> java script.
>
> Can it be done? Certainly. Is it easy? No. Is it practical? Well
> that depends on the complexity of the javascript environment and
> how much time & energy you have for the problem.
>
> We need a Javascript + DOM implementation on top of Mechanize!
I strongly agree. It's the future and we need to be there. How
would one go about this? I'd be happy to contribute. Where do I start?
Thanks!
Re: AJAX/Google Pages
am 17.06.2006 03:35:01 von DavidS
Ryan Perry wrote:
>
> On Jun 16, 2006, at 2:38 PM, Peter Stevens wrote:
>
>> Yes and no. AJAX is pretty heavily into Java script - you'll be
>> spending a lot of time with Tamper Data to figure out what data is
>> being sent back and forth. You can look at and reverse engineer the
>> java script.
>>
>> Can it be done? Certainly. Is it easy? No. Is it practical? Well that
>> depends on the complexity of the javascript environment and how much
>> time & energy you have for the problem.
>>
>> We need a Javascript + DOM implementation on top of Mechanize!
>
> I strongly agree. It's the future and we need to be there. How would
> one go about this? I'd be happy to contribute. Where do I start?
>
> Thanks!
Well, this may get me fried, but I do feel compelled to comment because
my day to day job is to create code to do web scraping on some extremely
complex web sites, many of which use a lot of AJAX and DHTML (Javascript
and *shudder* VBS).
My first web scraper was written in Perl/Mechanize, and we managed to
get past the DHTML portions by decoding the Javascript and returning
responses that took into account the Javascript that *would* have run.
It ran, but took over 3 months to write, and THAT site was well written,
with nice ID's and no AJAX. I still had to assume a few page locations
and do blind posts because of complex Javascript that did page redirects
to pages who's Javascript did MORE page re-directs.
After doing some research on upcoming projects scraping sites using AJAX
and more complex Javascript I found a tool called Watir
(http://openqa.org/watir/). Watir is a library written in Ruby (which
we were already starting to use) and gets around the whole
Javascript/AJAX issues by automating use of Internet Explorer so all
scripts actually run. While having to run it under Windows and only
having support for IE was the downside, the upside has been that our
scrapers have been much easier to write, and the last one I did took
about two weeks, and it does a lot more than our first one.
One thing that still gave us problems until recently was the issue of
pop-up windows. While Watir had a rather crude way of clicking past a
pop-up window if you knew it was coming, modal dialogs were still hard
to automate because they can have any valid HTML, but the IE "click"
would block until the dialog closed, and there was no way to attach to
the modal dialog and get access to the DOM if you did the click in
another thread/process. Finally someone put together an intricate
method to attach to a modal dialog window by using the current IE's
HWND, and then link the pointers together to get access to the modal
dialog's DOM.
Now, I can automate a modal dialog window as easily as a normal browser
window. Here's the code to fire up IE to a page, click a button which
brings up a modal dialog, attach to that dialog, fill a text box on the
dialog, click on the dialog close box, and retrieve the entered value
from the original window. All in the following few lines of Ruby/Watir
code:
require 'watir'
include Watir
ie = IE.new
ie.goto('http://SITE/modal_dialog_launcher.html')
ie.button(:value, 'Launch Dialog').click_no_wait
ie.modal_dialog.text_field(:name, 'modal_text').set('hello')
ie.modal_dialog.button(:value, 'Close').click
modal_text = ie.text_field(:name, 'modaloutput').value
ie.close
That is code I just executed in Ruby's interactive shell (IRB) on one of
the HTML files in the Watir unit test suite.
Now, to get that functionality you need to check out the latest
developer versions using SVN as I just added the modal_dialog
functionality recently, but it does work. (I just put together the
pieces I found in a number of places to get that functionality, so I
can't take much credit, but I did submit it to the Watir project.)
There is a project called FireWatir which is aimed at using Firefox
(under any O/S that Firefox runs under), but it's still lagging a bit
behind, and performance is still very poor from what I've heard. But
there is hope.
For now, I'd recommend checking out Watir for your web automation
projects, if you can get away with using IE under Windows.
David Schmidt
davids@tower-mt.com
Re: AJAX/Google Pages
am 17.06.2006 05:38:56 von Andy
On Jun 16, 2006, at 6:47 PM, Ryan Perry wrote:
>> We need a Javascript + DOM implementation on top of Mechanize!
>
> I strongly agree. It's the future and we need to be there. How
> would one go about this? I'd be happy to contribute. Where do I
> start?
1) Write a JavaScript engine.
2) Write a DOM doodad.
3) Integrate with WWW::Mechanize.
4) Send me the patch.
5) I distribute it.
6) The world loves you.
xoxo,
Andy
--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
Re: AJAX/Google Pages
am 17.06.2006 21:49:53 von M
On Fri, 16 Jun 2006, Andy Lester wrote:
> 1) Write a JavaScript engine.
No need to re-invent [1] and [2].
[1] http://search.cpan.org/perldoc?JavaScript%3A%3ASpiderMonkey
[2] http://www.mozilla.org/js/spidermonkey/
> 2) Write a DOM doodad.
That's the big one.
And don't forget that this is just *another* DOM, on top of IE's,
Firefox', Safari's, and Whatnot's DOM. Make it pluggable.
Thank you, the world will indeed love you.
-- Mike
Mike Schilli
m@perlmeister.com