Automated web browing

Automated web browing

am 17.01.2008 22:52:20 von mr_marcin

Hi

Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.

Thanks for help
Marcin

Re: Automated web browing

am 17.01.2008 23:09:17 von Manuel Lemos

on 01/17/2008 07:52 PM mr_marcin said the following:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.

You may want to try this HTTP client class. Basically it acts like a
browser accessing pages, submitting forms, collecting cookies, handling
redirection, etc. which seems what you need to retrieve the pages with
the prices you want to grab.

http://www.phpclasses.org/httpclient


--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

Re: Automated web browing

am 18.01.2008 01:15:50 von Jerry Stuckle

mr_marcin wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.
>
> Thanks for help
> Marcin
>

cURL will allow you to get or post to pages, and will return the data.
I much prefer it over the HTTPClient class. It's more flexible.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Automated web browing

am 18.01.2008 10:42:32 von Marlin Forbes

mr_marcin wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.

Hi there,

SimpleTest has a class included called SimpleBrowser, which does what
you want, with a very intuitive API. It's not too fast tho...

SimpleTest: http://www.lastcraft.com/simple_test.php

Or, you can interactively setup browsing sessions with the Selenium IDE
and then use the PHP client for the Selenium Remote Control to run them...

Selenium IDE: http://www.openqa.org/selenium-ide/
Selenium RC: http://www.openqa.org/selenium-rc/
PHP Client for Selenium: http://pear.php.net/package/Testing_Selenium

Misc:
http://blog.thinkphp.de/archives/133-Practical-Testing-PHP-A pplications-with-Selenium.html


Regards,
Marlin Forbes
Freelance Developer
Data Shaman
datashaman.com
+27 (0)82 501-6647

Re: Automated web browing

am 18.01.2008 13:22:29 von mr_marcin

> Or, you can interactively setup browsing sessions with the Selenium IDE
> and then use the PHP client for the Selenium Remote Control to run them...
>
> Selenium IDE:http://www.openqa.org/selenium-ide/
> Selenium RC:http://www.openqa.org/selenium-rc/
> PHP Client for Selenium:http://pear.php.net/package/Testing_Selenium

This sounds like a quite easy to use package, but will this be
efficient enough? I will check all options next week.

Re: Automated web browing

am 18.01.2008 13:23:36 von mr_marcin

> cURL will allow you to get or post to pages, and will return the data.
> I much prefer it over the HTTPClient class. It's more flexible.
>

I guess this approach requires some manual job, but you are right -
thats the most flexible and probably most effective way.

Re: Automated web browing

am 18.01.2008 14:09:50 von ng4rrjanbiah

On Jan 18, 2:52 am, mr_marcin wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.

1. If you're looking for client tools http://www.iopus.com/imacros/firefox/
2. Web scraping with cURL or HTTPClient class
3. Look for the Web services (SOAP, XML, etc)

--

Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Re: Automated web browing

am 18.01.2008 17:40:01 von Manuel Lemos

Hello,

on 01/17/2008 10:15 PM Jerry Stuckle said the following:
>> Does anybody have some idea how to input some text into inputbox on
>> one page, than press some button on that page, that will load another
>> page, and finally read the responde? Suppose I want to write a price
>> comparision engine, where I would like to parse shops website for
>> price each time user wants.
>>
>> I have found similar feature in Symfony framework, called sfBrowser
>> (or sfTestBrowser). These are made for automated functional testing,
>> but should provide the functinality I am requesting.
>>
>> The question is: will this be efficient enough? Maybe there are other
>> ways to achieve this? Of course I can always try to make it more
>> manually - look for some pattern in url (search is usually done via
>> GET), and parse output html.
>>
>> Thanks for help
>> Marcin
>>
>
> cURL will allow you to get or post to pages, and will return the data. I
> much prefer it over the HTTPClient class. It's more flexible.

I wonder which HTTP client you are talking about. The HTTP client I
mentioned wraps around Curl or socket functions depending on which is
more convinient to use in each PHP setup. This is the HTTP client class
I meant:

http://www.phpclasses.org/httpclient

As for Curl being flexible, I wonder what you are talking about.

Personally I find it very odd that you cannot read retrieved pages with
Curl in small chunks at a time without having to use callbacks. This is
bad because it makes very difficult to retrieve and process large pages
without using external files nor exceeding the PHP memory limits.

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

Re: Automated web browing

am 19.01.2008 02:46:21 von Jerry Stuckle

Manuel Lemos wrote:
> Hello,
>
> on 01/17/2008 10:15 PM Jerry Stuckle said the following:
>>> Does anybody have some idea how to input some text into inputbox on
>>> one page, than press some button on that page, that will load another
>>> page, and finally read the responde? Suppose I want to write a price
>>> comparision engine, where I would like to parse shops website for
>>> price each time user wants.
>>>
>>> I have found similar feature in Symfony framework, called sfBrowser
>>> (or sfTestBrowser). These are made for automated functional testing,
>>> but should provide the functinality I am requesting.
>>>
>>> The question is: will this be efficient enough? Maybe there are other
>>> ways to achieve this? Of course I can always try to make it more
>>> manually - look for some pattern in url (search is usually done via
>>> GET), and parse output html.
>>>
>>> Thanks for help
>>> Marcin
>>>
>> cURL will allow you to get or post to pages, and will return the data. I
>> much prefer it over the HTTPClient class. It's more flexible.
>
> I wonder which HTTP client you are talking about. The HTTP client I
> mentioned wraps around Curl or socket functions depending on which is
> more convinient to use in each PHP setup. This is the HTTP client class
> I meant:
>
> http://www.phpclasses.org/httpclient
>

The same one.

> As for Curl being flexible, I wonder what you are talking about.
>

I can do virtually anything with it that I can do with a browser, with
the exception of client side scripting. Also much less overhead than
the httpclient class.

> Personally I find it very odd that you cannot read retrieved pages with
> Curl in small chunks at a time without having to use callbacks. This is
> bad because it makes very difficult to retrieve and process large pages
> without using external files nor exceeding the PHP memory limits.
>

So? I never needed to. First of all, I have no need to retrieve huge
pages. The larges I've ever downloaded (a table with lots of info) was
a little over 3MB and Curl and PHP handled it just fine.

But if the text were split, you need to do additional processing to
handle splits at inconvenient locations. Much easier to add everything
to a temporary file and read it back in the way I need to so it.

But that's one of the advantages of cURL - it gives me the option of
doing the callbacks or not.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Automated web browing

am 19.01.2008 04:17:37 von Manuel Lemos

Hello,

on 01/18/2008 11:46 PM Jerry Stuckle said the following:
>>>> Does anybody have some idea how to input some text into inputbox on
>>>> one page, than press some button on that page, that will load another
>>>> page, and finally read the responde? Suppose I want to write a price
>>>> comparision engine, where I would like to parse shops website for
>>>> price each time user wants.
>>>>
>>>> I have found similar feature in Symfony framework, called sfBrowser
>>>> (or sfTestBrowser). These are made for automated functional testing,
>>>> but should provide the functinality I am requesting.
>>>>
>>>> The question is: will this be efficient enough? Maybe there are other
>>>> ways to achieve this? Of course I can always try to make it more
>>>> manually - look for some pattern in url (search is usually done via
>>>> GET), and parse output html.
>>>>
>>>> Thanks for help
>>>> Marcin
>>>>
>>> cURL will allow you to get or post to pages, and will return the data. I
>>> much prefer it over the HTTPClient class. It's more flexible.
>>
>> I wonder which HTTP client you are talking about. The HTTP client I
>> mentioned wraps around Curl or socket functions depending on which is
>> more convinient to use in each PHP setup. This is the HTTP client class
>> I meant:
>>
>> http://www.phpclasses.org/httpclient
>>
>
> The same one.
>
>> As for Curl being flexible, I wonder what you are talking about.
>>
>
> I can do virtually anything with it that I can do with a browser, with
> the exception of client side scripting. Also much less overhead than
> the httpclient class.

In practice the real overhead is in the network access.

Anyway, as I mentioned above the HTTP client class uses curl library
functions for SSL if you are running an older version than PHP 4.3.0.
From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
functions.

If your hosting company does not have Curl enabled, at least with the
HTTP client class you are not stuck. I think this is more flexible than
relying on curl library availability.


>> Personally I find it very odd that you cannot read retrieved pages with
>> Curl in small chunks at a time without having to use callbacks. This is
>> bad because it makes very difficult to retrieve and process large pages
>> without using external files nor exceeding the PHP memory limits.
>>
>
> So? I never needed to. First of all, I have no need to retrieve huge
> pages. The larges I've ever downloaded (a table with lots of info) was
> a little over 3MB and Curl and PHP handled it just fine.

That is because 3MB is below the PHP 8MB limits. You are talking
specifically of your needs. People with higher needs will not be able to
handle it with Curl functions.


> But if the text were split, you need to do additional processing to
> handle splits at inconvenient locations. Much easier to add everything
> to a temporary file and read it back in the way I need to so it.
>
> But that's one of the advantages of cURL - it gives me the option of
> doing the callbacks or not.

With the HTTP client class you do not need callbacks. You just need to
read response in small chunks and process them on demand.

The ability to stream data in limited size chunks is not a less
important feature. For instance, Cesar Rodas used the HTTP client class
wrote a cool stream wrapper class that lets you store and retrieve files
of any size in Amazon S3 service:

http://www.phpclasses.org/gs3

Same thing for SVN client stream wrapper:

http://www.phpclasses.org/svnclient

Another interesting use of the stream wrapper streaming capabilities is
the Print IPP class. It lets you print any documents sending them
directly to a networked printer. IPP is a protocol that works on top of
HTTP. IPP is the protocol used by CUPS (printing system for Linux and
Unix systems). Nowadays there are many networked printers (especially
the wireless ones) that have IPP support built-in.

http://www.phpclasses.org/printipp

Anyway, streaming capabilities is just one feature that the HTTP client
class provides flexibility.

The HTTP client was not developed to compete with the curl functions,
but rather to provide a solution that complements the curl HTTP access
or even replace it when it is not enabled.

If you browse the HTTP client class forum, you may find people that had
difficulties when they tried the curl library functions but they succeed
with the HTTP client class.

http://www.phpclasses.org/discuss/package/3/

Maybe it is not your case now, but maybe one day you will stumble in one
of those difficulties that prevents you from using curl functions. In
that case feel free to use the HTTP client class. ;-)

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

Re: Automated web browing

am 19.01.2008 04:28:19 von Jerry Stuckle

Manuel Lemos wrote:
> Hello,
>
> on 01/18/2008 11:46 PM Jerry Stuckle said the following:
>>>>> Does anybody have some idea how to input some text into inputbox on
>>>>> one page, than press some button on that page, that will load another
>>>>> page, and finally read the responde? Suppose I want to write a price
>>>>> comparision engine, where I would like to parse shops website for
>>>>> price each time user wants.
>>>>>
>>>>> I have found similar feature in Symfony framework, called sfBrowser
>>>>> (or sfTestBrowser). These are made for automated functional testing,
>>>>> but should provide the functinality I am requesting.
>>>>>
>>>>> The question is: will this be efficient enough? Maybe there are other
>>>>> ways to achieve this? Of course I can always try to make it more
>>>>> manually - look for some pattern in url (search is usually done via
>>>>> GET), and parse output html.
>>>>>
>>>>> Thanks for help
>>>>> Marcin
>>>>>
>>>> cURL will allow you to get or post to pages, and will return the data. I
>>>> much prefer it over the HTTPClient class. It's more flexible.
>>> I wonder which HTTP client you are talking about. The HTTP client I
>>> mentioned wraps around Curl or socket functions depending on which is
>>> more convinient to use in each PHP setup. This is the HTTP client class
>>> I meant:
>>>
>>> http://www.phpclasses.org/httpclient
>>>
>> The same one.
>>
>>> As for Curl being flexible, I wonder what you are talking about.
>>>
>> I can do virtually anything with it that I can do with a browser, with
>> the exception of client side scripting. Also much less overhead than
>> the httpclient class.
>
> In practice the real overhead is in the network access.
>
> Anyway, as I mentioned above the HTTP client class uses curl library
> functions for SSL if you are running an older version than PHP 4.3.0.
> From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
> functions.
>

Which means it has move overhead than using cURL directly. It's another
layer on top of cURL.

> If your hosting company does not have Curl enabled, at least with the
> HTTP client class you are not stuck. I think this is more flexible than
> relying on curl library availability.
>

I only use VPS's and dedicated servers. But even when I was using
shared hosting, I was able to find hosting companies who either had cURL
enabled or would do it for you.

OTOH, I've found more who won't allow fsockopen() than cURL.

But either way, if your hosting company won't provide what you need,
there's an easy answer.

>
>>> Personally I find it very odd that you cannot read retrieved pages with
>>> Curl in small chunks at a time without having to use callbacks. This is
>>> bad because it makes very difficult to retrieve and process large pages
>>> without using external files nor exceeding the PHP memory limits.
>>>
>> So? I never needed to. First of all, I have no need to retrieve huge
>> pages. The larges I've ever downloaded (a table with lots of info) was
>> a little over 3MB and Curl and PHP handled it just fine.
>
> That is because 3MB is below the PHP 8MB limits. You are talking
> specifically of your needs. People with higher needs will not be able to
> handle it with Curl functions.
>

Exactly how many pages do you know which are larger than 8MB? And BTW -
8MB is only the default. On some servers where I have customers with
needs for large amounts of data, I raise it as high as 128 MB.

But again - you can do it with even 1MB by providing the appropriate
callback functions. And it's not hard at all to do.

>
>> But if the text were split, you need to do additional processing to
>> handle splits at inconvenient locations. Much easier to add everything
>> to a temporary file and read it back in the way I need to so it.
>>
>> But that's one of the advantages of cURL - it gives me the option of
>> doing the callbacks or not.
>
> With the HTTP client class you do not need callbacks. You just need to
> read response in small chunks and process them on demand.
>

So - what's the problem with callbacks? They're quick and easy. And
they give you much more control over what's going on.

For instance - you may not be interested in everything. It's very easy
for the callback to throw away what you don't want. You can't do that
with the HTTP client class.

> The ability to stream data in limited size chunks is not a less
> important feature. For instance, Cesar Rodas used the HTTP client class
> wrote a cool stream wrapper class that lets you store and retrieve files
> of any size in Amazon S3 service:
>
> http://www.phpclasses.org/gs3
>
> Same thing for SVN client stream wrapper:
>
> http://www.phpclasses.org/svnclient
>
> Another interesting use of the stream wrapper streaming capabilities is
> the Print IPP class. It lets you print any documents sending them
> directly to a networked printer. IPP is a protocol that works on top of
> HTTP. IPP is the protocol used by CUPS (printing system for Linux and
> Unix systems). Nowadays there are many networked printers (especially
> the wireless ones) that have IPP support built-in.
>
> http://www.phpclasses.org/printipp
>

Which has absolutely nothing to do with this conversation. Please limit
your comments to the topic at hand.


> Anyway, streaming capabilities is just one feature that the HTTP client
> class provides flexibility.
>

No problem with that. But it is still less flexible than cURL.

> The HTTP client was not developed to compete with the curl functions,
> but rather to provide a solution that complements the curl HTTP access
> or even replace it when it is not enabled.
>

Fine. No problem. My only comment was that I prefer cURL because it is
more flexible. You challenged that. Now you're arguing completely
different topics to try to "prove" that the httpclient class is "better".

> If you browse the HTTP client class forum, you may find people that had
> difficulties when they tried the curl library functions but they succeed
> with the HTTP client class.
>
> http://www.phpclasses.org/discuss/package/3/
>

Sure. And there are people who have had problems with the httpclient
class and found the cURL functions work. That proves nothing.

> Maybe it is not your case now, but maybe one day you will stumble in one
> of those difficulties that prevents you from using curl functions. In
> that case feel free to use the HTTP client class. ;-)
>

Nope. I've tried the httpclient class. I find it too limiting with
excessive overhead for my needs.

But as I said above - you tell me they don't compete. But then you keep
trying to tell my how the httpclient class is "better". Which is it?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Automated web browing

am 19.01.2008 21:32:03 von Manuel Lemos

Hello,

on 01/19/2008 01:28 AM Jerry Stuckle said the following:
>>>> As for Curl being flexible, I wonder what you are talking about.
>>>>
>>> I can do virtually anything with it that I can do with a browser, with
>>> the exception of client side scripting. Also much less overhead than
>>> the httpclient class.
>>
>> In practice the real overhead is in the network access.
>>
>> Anyway, as I mentioned above the HTTP client class uses curl library
>> functions for SSL if you are running an older version than PHP 4.3.0.
>> From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
>> functions.
>>
>
> Which means it has move overhead than using cURL directly. It's another
> layer on top of cURL.

If you mean the class PHP code execution overhead, that is negligible.
What is a few microseconds executing PHP code when you have to wait
seconds for the data to be sent or received from remote Web servers?


>> If your hosting company does not have Curl enabled, at least with the
>> HTTP client class you are not stuck. I think this is more flexible than
>> relying on curl library availability.
>>
>
> I only use VPS's and dedicated servers. But even when I was using
> shared hosting, I was able to find hosting companies who either had cURL
> enabled or would do it for you.

I found users complaining in the HTTP client class forum that they could
not use the curl library functions in their PHP setup.


> OTOH, I've found more who won't allow fsockopen() than cURL.

That is another aspect that using the HTTP client class is more
flexible. If curl support is missing, the class will use fsockopen and
vice-versa.



> But either way, if your hosting company won't provide what you need,
> there's an easy answer.

Many developers do not have a choise of hosting company because it is up
to their clients to decide and often they do not want to move.



>>>> Personally I find it very odd that you cannot read retrieved pages with
>>>> Curl in small chunks at a time without having to use callbacks. This is
>>>> bad because it makes very difficult to retrieve and process large pages
>>>> without using external files nor exceeding the PHP memory limits.
>>>>
>>> So? I never needed to. First of all, I have no need to retrieve huge
>>> pages. The larges I've ever downloaded (a table with lots of info) was
>>> a little over 3MB and Curl and PHP handled it just fine.
>>
>> That is because 3MB is below the PHP 8MB limits. You are talking
>> specifically of your needs. People with higher needs will not be able to
>> handle it with Curl functions.
>>
>
> Exactly how many pages do you know which are larger than 8MB? And BTW -

It is very easy to find people that need to download or upload files via
HTTP that are larger than 8MB.


> 8MB is only the default. On some servers where I have customers with
> needs for large amounts of data, I raise it as high as 128 MB.

Many shared hosting clients cannot change php.ini options.



> But again - you can do it with even 1MB by providing the appropriate
> callback functions. And it's not hard at all to do.

I wonder if you really tried using callbacks to stream data to send or
receive from the HTTP server using callbacks.

Last time that I tried it seems your callbacks have to manually craft
HTTP requests and interpret raw HTTP responses, basically implement an
HTTP client inside the callback functions. It seemed that you would have
to know the whole HTTP protocol to sort the data you need to send or
receive.

Basically that is what the HTTP client class does without requiring that
you learn and implement the HTTP protocol by hand.


>>> But if the text were split, you need to do additional processing to
>>> handle splits at inconvenient locations. Much easier to add everything
>>> to a temporary file and read it back in the way I need to so it.
>>>
>>> But that's one of the advantages of cURL - it gives me the option of
>>> doing the callbacks or not.
>>
>> With the HTTP client class you do not need callbacks. You just need to
>> read response in small chunks and process them on demand.
>>
>
> So - what's the problem with callbacks? They're quick and easy. And
> they give you much more control over what's going on.

Other than the complexity of dealing with raw HTTP data, the main
problem that I see is that callbacks do not pass control to your
application. You need to do something with the data and return control
to the curl library.

For instance, if you want to download a large data block retrieved with
one HTTP request, and then upload it to another server with another HTTP
request, it does not seem you can do it passing small chunks of data
using curl callbacks.


> For instance - you may not be interested in everything. It's very easy
> for the callback to throw away what you don't want. You can't do that
> with the HTTP client class.

I do not want to deal with raw HTTP protocol data. I developed the class
precisely for it to do that for me.

If callbacks were useful for me, I would have added support in the class
to invoke whatever callback functions.


>> The ability to stream data in limited size chunks is not a less
>> important feature. For instance, Cesar Rodas used the HTTP client class
>> wrote a cool stream wrapper class that lets you store and retrieve files
>> of any size in Amazon S3 service:
>>
>> http://www.phpclasses.org/gs3
>>
>> Same thing for SVN client stream wrapper:
>>
>> http://www.phpclasses.org/svnclient
>>
>> Another interesting use of the stream wrapper streaming capabilities is
>> the Print IPP class. It lets you print any documents sending them
>> directly to a networked printer. IPP is a protocol that works on top of
>> HTTP. IPP is the protocol used by CUPS (printing system for Linux and
>> Unix systems). Nowadays there are many networked printers (especially
>> the wireless ones) that have IPP support built-in.
>>
>> http://www.phpclasses.org/printipp
>>
>
> Which has absolutely nothing to do with this conversation. Please limit
> your comments to the topic at hand.

On the contrary, this has all to do with what I am explaining to you.

For instance, with the classes above that use the HTTP client class
streaming capabilities, you copy large files without exceeding your PHP
memory limits just using this:

copy('svn://server/file', 's3:/bucket/file');


>> The HTTP client was not developed to compete with the curl functions,
>> but rather to provide a solution that complements the curl HTTP access
>> or even replace it when it is not enabled.
>>
>
> Fine. No problem. My only comment was that I prefer cURL because it is
> more flexible. You challenged that. Now you're arguing completely
> different topics to try to "prove" that the httpclient class is "better".

Jerry, relax. There seems to be a misunderstanding here. I did not
challenge you. I was just curious to know what relevant issues did you
find it more flexible to use curl than the HTTP client class.

The class has evolved according to the needs of users that found
limitations on it and told me about it. So I wanted to understand what
you are talking about.

So far you keep telling me about that curl is more flexible, but I have
yet to see where is the flexibility.


>> If you browse the HTTP client class forum, you may find people that had
>> difficulties when they tried the curl library functions but they succeed
>> with the HTTP client class.
>>
>> http://www.phpclasses.org/discuss/package/3/
>>
>
> Sure. And there are people who have had problems with the httpclient
> class and found the cURL functions work. That proves nothing.

Like for instance?

Please understand that I am not here to prove anything, even less to
compete with your arguments.

I just want to learn which are the relevant limitations that people have
found in the HTTP client, so I can work on them. That is helpful for me
because addressing other people's needs I will be eventually addressing
my own needs, if not present, at least future.

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

Re: Automated web browing

am 19.01.2008 21:52:42 von Jerry Stuckle

Manuel Lemos wrote:



Manuel,

I'm not going to argue with you about whether the HTTPClass is easier to
use or whatever.

My single point was that cURL is more flexible. You can do anything
with cURL that you can with the HTTPClient class and more. That is
pretty obvious - because the HTTPClient class is built on cURL - so if
cURL can't do it, neither can the HTTPClient class.

But being built on cURL, the HTTPClient class restricts what you can do.
So it is less flexible.

You can sit there and argue all you want as to the other merits of your
class. I won't bite. Because that was not my point.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Automated web browing

am 19.01.2008 22:06:17 von unknown

Post removed (X-No-Archive: yes)

Re: Automated web browing

am 19.01.2008 22:09:56 von Manuel Lemos

Hello,

on 01/19/2008 06:52 PM Jerry Stuckle said the following:
> I'm not going to argue with you about whether the HTTPClass is easier to
> use or whatever.
>
> My single point was that cURL is more flexible. You can do anything
> with cURL that you can with the HTTPClient class and more. That is
> pretty obvious - because the HTTPClient class is built on cURL - so if
> cURL can't do it, neither can the HTTPClient class.
>
> But being built on cURL, the HTTPClient class restricts what you can do.
> So it is less flexible.

No, that is not the way it works. I already explained that to you.

The HTTP client class uses Curl when fsockopen calls cannot be used
under the current PHP setup. Curl is used as a better than nothing solution.

For instance before PHP 4.3.0 you can only make SSL request with curl.
The class used curl for SSL requests, but of course, with curl it cannot
not send or receive streamed data in small chunks that never exceed the
PHP memory limits.

If you want that flexibility you need to use PHP 4.3.0 or newer. Then
the class will use fsockopen for SSL requests.

In any case, the HTTP client class abstracts that for you. You do not
need to adapt your application code depending on the PHP version, as the
class does it for you.

I developed the HTTP client class not just as a mere curl wrapper, but
to actually add some benefits on top of curl/fsockopen. So, it was meant
to add flexibility, not to remove it.

That is why I questioned you about you flexibility statement. Maybe you
tried an old version of the HTTP client class and you found some
limitations that no longer exist. But if you still find it less
flexible, I want to understand what you are talking about.



--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

Re: Automated web browing

am 20.01.2008 04:56:38 von Jerry Stuckle

Manuel Lemos wrote:
> Hello,
>
> on 01/19/2008 06:52 PM Jerry Stuckle said the following:
>> I'm not going to argue with you about whether the HTTPClass is easier to
>> use or whatever.
>>
>> My single point was that cURL is more flexible. You can do anything
>> with cURL that you can with the HTTPClient class and more. That is
>> pretty obvious - because the HTTPClient class is built on cURL - so if
>> cURL can't do it, neither can the HTTPClient class.
>>
>> But being built on cURL, the HTTPClient class restricts what you can do.
>> So it is less flexible.
>
> No, that is not the way it works. I already explained that to you.
>
> The HTTP client class uses Curl when fsockopen calls cannot be used
> under the current PHP setup. Curl is used as a better than nothing solution.
>
> For instance before PHP 4.3.0 you can only make SSL request with curl.
> The class used curl for SSL requests, but of course, with curl it cannot
> not send or receive streamed data in small chunks that never exceed the
> PHP memory limits.
>
> If you want that flexibility you need to use PHP 4.3.0 or newer. Then
> the class will use fsockopen for SSL requests.
>
> In any case, the HTTP client class abstracts that for you. You do not
> need to adapt your application code depending on the PHP version, as the
> class does it for you.
>
> I developed the HTTP client class not just as a mere curl wrapper, but
> to actually add some benefits on top of curl/fsockopen. So, it was meant
> to add flexibility, not to remove it.
>
> That is why I questioned you about you flexibility statement. Maybe you
> tried an old version of the HTTP client class and you found some
> limitations that no longer exist. But if you still find it less
> flexible, I want to understand what you are talking about.
>
>
>

Manuel,

You are obviously not able to step back and take an objective look at
your classes. I have tried to discuss much of this with you previously,
but you have consistently argued about unrelated things.

I really don't feel like continuing this argument. Please let me know
when you can look at it objectively, and I will be happy to *discuss* it
with you.

I will continue to recommend cURL for the reasons I have outlined. The
difference is I have no relationship with cURL, other than as a user of
the library.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Automated web browing

am 20.01.2008 13:10:53 von Paul Lautman

mr_marcin wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.
>
> Thanks for help
> Marcin

Take a look at Snoopy
http://sourceforge.net/project/showfiles.php?group_id=2091

Re: Automated web browing

am 20.01.2008 22:09:49 von lingoboyd

Jerry Stuckle posted in comp.lang.php:

> Manuel Lemos wrote:
>>
>> I developed the HTTP client class not just as a mere curl wrapper, but
>> to actually add some benefits on top of curl/fsockopen. So, it was
>> meant to add flexibility, not to remove it.
>>
>> That is why I questioned you about you flexibility statement. Maybe you
>> tried an old version of the HTTP client class and you found some
>> limitations that no longer exist. But if you still find it less
>> flexible, I want to understand what you are talking about.
>>
>>
>
> Manuel,
>
> You are obviously not able to step back and take an objective look at
> your classes. I have tried to discuss much of this with you previously,
> but you have consistently argued about unrelated things.
>
> I really don't feel like continuing this argument. Please let me know
> when you can look at it objectively, and I will be happy to *discuss* it
> with you.
>
> I will continue to recommend cURL for the reasons I have outlined. The
> difference is I have no relationship with cURL, other than as a user of
> the library.

PMFBI, but I don't understand why you don't see the flexibility offered by
Manuel's class.

I suspect that most of us are not hired at the consultant level, but as
application/Web developers. As such, we likely have much less influence over
our clients' decisions about Web server configurations. If a client is paying
consultant fees for advice, I would think they are more willing to listen to
such advice. When hiring/contracting a Web developer, they are more apt to
set the requirements that he/she must comply with.

So, if one can build a library of reusable code via the httpClient class that
works with/without cURL, well, isn't that flexible?

Or are you suggesting that one should develope this library with code to
handle either situation oneself? If you were to do that, would you create two
separate libraries of code or would you create a class that can handle either
situation? (Ponder this question as a developer, not as a higher-paid
consultant.)

Or would you simply turn down jobs that cannot use cURL?


(Note: I've only used cURL myself, but then I only work on our own sites -
unfortunately inheriting some frightening stuff.)


--
Mark A. Boyd
Keep-On-Learnin' :)

Re: Automated web browing

am 20.01.2008 22:58:26 von Manuel Lemos

Hello,

on 01/20/2008 01:56 AM Jerry Stuckle said the following:
> You are obviously not able to step back and take an objective look at
> your classes. I have tried to discuss much of this with you previously,
> but you have consistently argued about unrelated things.
>
> I really don't feel like continuing this argument. Please let me know
> when you can look at it objectively, and I will be happy to *discuss* it
> with you.
>
> I will continue to recommend cURL for the reasons I have outlined. The
> difference is I have no relationship with cURL, other than as a user of
> the library.

Jerry, never mind, this is not important for me.

I am afraid you continue to avoid my points. You keep talking about
arguing, discussing, A is better than B, competing. But I did not come
here for that.

I am not interested in competition with you. For me, cooperating is
better than competing.

I am interested in spreading my class because the constructive feedback
that I get from the users helps me to improve the class, so it will be
better prepared for my present and future needs.

You said you tried my class in the past. So I invited you to specify the
limitations that you found on it. Unfortunately you avoided to be
specific, and so, you failed to provide any constructive feedback.

You just made vague assertions about not being flexible without
providing a real world usage that demonstrates where my class was not
able to solve a problem that you had in accessing resources via HTTP.]

When I demonstrated the limitations of relying on the Curl library, you
basically just tried to minimize the problems or just ignored their
relevance.

Maybe I am getting this wrong, but the impression that you are passing,
at least to me, is that you really are not interested in helping, but
rather want to minimize the work and participation of other people in
this newsgroup, so your participation can prevail.

The bottom line, is if you are not interested to be helpful and your
interest is just competing with arguments, don't bother, I am not
interested.


--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

Re: Automated web browing

am 21.01.2008 00:01:28 von Jerry Stuckle

Mark A. Boyd wrote:
> Jerry Stuckle posted in comp.lang.php:
>
>> Manuel Lemos wrote:
>>> I developed the HTTP client class not just as a mere curl wrapper, but
>>> to actually add some benefits on top of curl/fsockopen. So, it was
>>> meant to add flexibility, not to remove it.
>>>
>>> That is why I questioned you about you flexibility statement. Maybe you
>>> tried an old version of the HTTP client class and you found some
>>> limitations that no longer exist. But if you still find it less
>>> flexible, I want to understand what you are talking about.
>>>
>>>
>> Manuel,
>>
>> You are obviously not able to step back and take an objective look at
>> your classes. I have tried to discuss much of this with you previously,
>> but you have consistently argued about unrelated things.
>>
>> I really don't feel like continuing this argument. Please let me know
>> when you can look at it objectively, and I will be happy to *discuss* it
>> with you.
>>
>> I will continue to recommend cURL for the reasons I have outlined. The
>> difference is I have no relationship with cURL, other than as a user of
>> the library.
>
> PMFBI, but I don't understand why you don't see the flexibility offered by
> Manuel's class.
>

I see the flexibility. But it's still not as flexible as cURL.

> I suspect that most of us are not hired at the consultant level, but as
> application/Web developers. As such, we likely have much less influence over
> our clients' decisions about Web server configurations. If a client is paying
> consultant fees for advice, I would think they are more willing to listen to
> such advice. When hiring/contracting a Web developer, they are more apt to
> set the requirements that he/she must comply with.
>

As a web developer, you are a consultant. And my customers don't set
the requirements - we discuss them and agree on the requirements. They
trust my advice, because I have sound reasons for giving them.

> So, if one can build a library of reusable code via the httpClient class that
> works with/without cURL, well, isn't that flexible?
>

Sure. But I can do more with cURL than you can with the httpClient class.

> Or are you suggesting that one should develope this library with code to
> handle either situation oneself? If you were to do that, would you create two
> separate libraries of code or would you create a class that can handle either
> situation? (Ponder this question as a developer, not as a higher-paid
> consultant.)
>

I didn't say any such thing. The only thing I said was that cURL is
more flexible. Period. Nothing more.

> Or would you simply turn down jobs that cannot use cURL?
>

As I said. I negotiate with my customers. I didn't say what I would or
would not use in any specific situation. I use the tool most suited for
the job.

>
> (Note: I've only used cURL myself, but then I only work on our own sites -
> unfortunately inheriting some frightening stuff.)
>
>


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Automated web browing

am 21.01.2008 00:04:27 von Jerry Stuckle

Manuel Lemos wrote:
> Hello,
>
> on 01/20/2008 01:56 AM Jerry Stuckle said the following:
>> You are obviously not able to step back and take an objective look at
>> your classes. I have tried to discuss much of this with you previously,
>> but you have consistently argued about unrelated things.
>>
>> I really don't feel like continuing this argument. Please let me know
>> when you can look at it objectively, and I will be happy to *discuss* it
>> with you.
>>
>> I will continue to recommend cURL for the reasons I have outlined. The
>> difference is I have no relationship with cURL, other than as a user of
>> the library.
>
> Jerry, never mind, this is not important for me.
>
> I am afraid you continue to avoid my points. You keep talking about
> arguing, discussing, A is better than B, competing. But I did not come
> here for that.
>

No, I made one statement - that cURL is more flexible. You keep brining
up completely irrelevant points in an attempt to argue your side.

> I am not interested in competition with you. For me, cooperating is
> better than competing.
>
> I am interested in spreading my class because the constructive feedback
> that I get from the users helps me to improve the class, so it will be
> better prepared for my present and future needs.
>

That's fine.

> You said you tried my class in the past. So I invited you to specify the
> limitations that you found on it. Unfortunately you avoided to be
> specific, and so, you failed to provide any constructive feedback.
>

I've tried. But you keep changing the subject. So I'm not going to
continue this any longer.

> You just made vague assertions about not being flexible without
> providing a real world usage that demonstrates where my class was not
> able to solve a problem that you had in accessing resources via HTTP.]
>
> When I demonstrated the limitations of relying on the Curl library, you
> basically just tried to minimize the problems or just ignored their
> relevance.
>

You didn't demonstrate any limitations of cURL. You did, however, show
how little you understand cURL.

> Maybe I am getting this wrong, but the impression that you are passing,
> at least to me, is that you really are not interested in helping, but
> rather want to minimize the work and participation of other people in
> this newsgroup, so your participation can prevail.
>

I'm not interested in helping you when you aren't interested in
discussing the topic at hand.

> The bottom line, is if you are not interested to be helpful and your
> interest is just competing with arguments, don't bother, I am not
> interested.
>
>


And quite frankly, with your attitude, I have completely lost interest
your HTTPClient class, and will neither use it nor recommend it to
anyone else.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Automated web browing

am 21.01.2008 00:44:49 von unknown

Post removed (X-No-Archive: yes)

Re: Automated web browing

am 21.01.2008 02:42:42 von lingoboyd

Jerry Stuckle posted in comp.lang.php:

> Mark A. Boyd wrote:
>> Jerry Stuckle posted in comp.lang.php:
>>> I will continue to recommend cURL for the reasons I have outlined.
>>> The difference is I have no relationship with cURL, other than as a
>>> user of the library.
>>
>> PMFBI, but I don't understand why you don't see the flexibility offered
>> by Manuel's class.
>
> I see the flexibility. But it's still not as flexible as cURL.

Thanks, that wasn't clear to me from previous posts. Well, you were very
clear about what you thought about cURL.

>> I suspect that most of us are not hired at the consultant level, but as
>> application/Web developers. As such, we likely have much less influence
>> over our clients' decisions about Web server configurations. If a
>> client is paying consultant fees for advice, I would think they are
>> more willing to listen to such advice. When hiring/contracting a Web
>> developer, they are more apt to set the requirements that he/she must
>> comply with.
>
> As a web developer, you are a consultant. And my customers don't set
> the requirements - we discuss them and agree on the requirements. They
> trust my advice, because I have sound reasons for giving them.

Makes sense. Although I envision more situations where it might be difficult
to convince a large client to reconfigure a server.

>> So, if one can build a library of reusable code via the httpClient
>> class that works with/without cURL, well, isn't that flexible?
>
> Sure. But I can do more with cURL than you can with the httpClient
> class.

Understood - when cURL is available.

>> Or are you suggesting that one should develope this library with code
>> to handle either situation oneself? If you were to do that, would you
>> create two separate libraries of code or would you create a class that
>> can handle either situation? (Ponder this question as a developer, not
>> as a higher-paid consultant.)
>
> I didn't say any such thing. The only thing I said was that cURL is
> more flexible. Period. Nothing more.

Thus the second question above. I wouldn't have asked if you had already
answered. AFAIK, only top-posters and Jeopardy players think with the mind-
set of answers first-questions later.

>> Or would you simply turn down jobs that cannot use cURL?
>
> As I said. I negotiate with my customers. I didn't say what I would or
> would not use in any specific situation.

Hmm, I think your answer to the question is "I negotiate with my customers."
Vague, but OK. I'm certainly not trying to pry into your negotiating
techniques.

> I use the tool most suited for the job.

Cliche as this answer may sound, I accept it for the omitted answers. After
all, cliches become cliches for good reason.



--
Mark A. Boyd
Keep-On-Learnin' :)

Re: Automated web browing

am 21.01.2008 03:28:47 von Jerry Stuckle

Mark A. Boyd wrote:
> Jerry Stuckle posted in comp.lang.php:
>
>> Mark A. Boyd wrote:
>>> Jerry Stuckle posted in comp.lang.php:
>>>> I will continue to recommend cURL for the reasons I have outlined.
>>>> The difference is I have no relationship with cURL, other than as a
>>>> user of the library.
>>> PMFBI, but I don't understand why you don't see the flexibility offered
>>> by Manuel's class.
>> I see the flexibility. But it's still not as flexible as cURL.
>
> Thanks, that wasn't clear to me from previous posts. Well, you were very
> clear about what you thought about cURL.
>
>>> I suspect that most of us are not hired at the consultant level, but as
>>> application/Web developers. As such, we likely have much less influence
>>> over our clients' decisions about Web server configurations. If a
>>> client is paying consultant fees for advice, I would think they are
>>> more willing to listen to such advice. When hiring/contracting a Web
>>> developer, they are more apt to set the requirements that he/she must
>>> comply with.
>> As a web developer, you are a consultant. And my customers don't set
>> the requirements - we discuss them and agree on the requirements. They
>> trust my advice, because I have sound reasons for giving them.
>
> Makes sense. Although I envision more situations where it might be difficult
> to convince a large client to reconfigure a server.
>
>>> So, if one can build a library of reusable code via the httpClient
>>> class that works with/without cURL, well, isn't that flexible?
>> Sure. But I can do more with cURL than you can with the httpClient
>> class.
>
> Understood - when cURL is available.
>
>>> Or are you suggesting that one should develope this library with code
>>> to handle either situation oneself? If you were to do that, would you
>>> create two separate libraries of code or would you create a class that
>>> can handle either situation? (Ponder this question as a developer, not
>>> as a higher-paid consultant.)
>> I didn't say any such thing. The only thing I said was that cURL is
>> more flexible. Period. Nothing more.
>
> Thus the second question above. I wouldn't have asked if you had already
> answered. AFAIK, only top-posters and Jeopardy players think with the mind-
> set of answers first-questions later.
>

First of all, there is no difference between a developer and a
consultant. A developer IS a consultant.

Now, some consultants may do more, and therefore command a higher fee.
But any developer worth his salt should be able to do this.

And I'm not suggesting anything. One thing I have found - fsockopen()
is more often blocked off by hosting companies than cURL is. That's
because fsockopen() can be used for a lot more malicious operations.

Also, by themselves, fsockopen()/fread/fwrite/etc. are much more
flexible than cURL. But also much harder to use.

>>> Or would you simply turn down jobs that cannot use cURL?
>> As I said. I negotiate with my customers. I didn't say what I would or
>> would not use in any specific situation.
>
> Hmm, I think your answer to the question is "I negotiate with my customers."
> Vague, but OK. I'm certainly not trying to pry into your negotiating
> techniques.
>

No, the customer tells me what he wants. I determine what is required
to satisfy his needs in the most efficient manner. I then look at their
hosting to see if they supply what is required.

If so, no problem. I tell the customer we're OK and can move on.

However, if not, I look at other options, especially those offered by
their current host. I then present the user with the options and
advantages and disadvantages of each option.

And if I recommend they change hosting companies, I have the reasons as
to why they should - and at least two or three options for hosting (none
of which I have any affiliation).

They then have the choice. But most of the time they accept my
recommendations because I have solid reasons. It's all a part of
servicing the customer.

>> I use the tool most suited for the job.
>
> Cliche as this answer may sound, I accept it for the omitted answers. After
> all, cliches become cliches for good reason.
>
>
>

Not a cliche - the absolute truth. For instance, right now I have a
site mostly built on VBScript and Access database (no, I didn't develop
the original site). I need to interface to two other websites (no XML
in either). So it's request pages, parse results, post data to pages,
parse results... All the while still interfacing with their their
current pages and database. This part I'm doing in PHP (using cURL, BTW)
because it's the best tool for the job. But it also meant building a
lot of PHP code to interface to their current database.

In this case VBScript was not the correct tool. My other options could
have been Perl or even C/C++. But PHP was the best tool for this job.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================