web crawling program

am 31.03.2008 15:38:07 von Raven

Hi,

I am trying to make 1000 searches from a site from a keyword file. I
want to automate these searches. I have copied the search form of
site and modified the post part of url so i can make necessary
modifications for the automation.

// structure of form

Program will be as follows

While not end of file
{
1st step: Read a keyword from the file and assign it $value
For each option $op on the form
//Fill form
2nd step: $op is selected
3rd step: enter $value to inputbox
4: submit

5: save the result to a file, $result
6: parse $result and save to database
}
There is no problem until step 5.When i fill and submit the
form a page is opened from the remote host and contains the table

//table

data1-- data2 -- data3

data1 and data2 is text and data 3 is a link

and i want to save data1 data2 and data3link to a database.

but when the control is gone to formprocessor on the remote host i
have no idea how to complete step 5. I mean how can i gain control and
save the result to a file and parse it.

Also this is the way i can to think of but necesserly the feasible
solution. If you can think of other options, i am all ears :)

Thank you very much for your kind response.

Re: web crawling program

am 31.03.2008 16:00:43 von George Maicovschi

Or you could just make a textfile containing all the URLs and pass it
to WGET.
Or use Jerry's approach with CURL.

CURL would be my first approach since it's built-in, but you could use
WGET as well.

Re: web crawling program

am 31.03.2008 16:14:54 von Raven

Thank you Jerry and George for your quick responses. The 100 options
in the forms corresponds the cities and the form in the site doesn't
allow you to make a query without selecting the city first. Since I
have no idea for city information(actually i am searching it) a
specific query takes 50 average submit to find and i have over 100
queries. Be it because of the mistake of the remote site designer or
me being evil:( it is unbearable to proceed one by one by hand.

Re: web crawling program

am 31.03.2008 16:48:42 von Jerry Stuckle

raven wrote:
> Hi,
>
> I am trying to make 1000 searches from a site from a keyword file. I
> want to automate these searches. I have copied the search form of
> site and modified the post part of url so i can make necessary
> modifications for the automation.
>
> // structure of form
>

>
> Program will be as follows
>
> While not end of file
> {
> 1st step: Read a keyword from the file and assign it $value
> For each option $op on the form
> //Fill form
> 2nd step: $op is selected
> 3rd step: enter $value to inputbox
> 4: submit
>
> 5: save the result to a file, $result
> 6: parse $result and save to database
> }
> There is no problem until step 5.When i fill and submit the
> form a page is opened from the remote host and contains the table
>
> //table
>
> data1-- data2 -- data3
>
> data1 and data2 is text and data 3 is a link
>
> and i want to save data1 data2 and data3link to a database.
>
> but when the control is gone to formprocessor on the remote host i
> have no idea how to complete step 5. I mean how can i gain control and
> save the result to a file and parse it.
>
> Also this is the way i can to think of but necesserly the feasible
> solution. If you can think of other options, i am all ears :)
>
> Thank you very much for your kind response.
>
>
>
>

You will need to use CURL or similar to submit the form so you can get
the information back.

And BTW - do you have permission to do this? If I saw someone doing
this on one of my sites, they'd be blocked immediately - if not sooner.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 31.03.2008 17:25:24 von unknown

Post removed (X-No-Archive: yes)

Re: web crawling program

am 31.03.2008 19:22:04 von Jerry Stuckle

raven wrote:
> Thank you Jerry and George for your quick responses. The 100 options
> in the forms corresponds the cities and the form in the site doesn't
> allow you to make a query without selecting the city first. Since I
> have no idea for city information(actually i am searching it) a
> specific query takes 50 average submit to find and i have over 100
> queries. Be it because of the mistake of the remote site designer or
> me being evil:( it is unbearable to proceed one by one by hand.
>

Then you have a problem. If their webmaster is paying any attention at
all, you'll be in deep trouble with him (and most probably the site
owner, if they aren't the same people).

I wouldn't recommend it.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 31.03.2008 20:05:56 von George Maicovschi

On Mar 31, 6:25 pm, Gary L. Burnore wrote:
> On Mon, 31 Mar 2008 07:14:54 -0700 (PDT), raven
> wrote:
>
> >Thank you Jerry and George for your quick responses. The 100 options
> >in the forms corresponds the cities and the form in the site doesn't
> >allow you to make a query without selecting the city first. Since I
> >have no idea for city information(actually i am searching it) a
> >specific query takes 50 average submit to find and i have over 100
> >queries. Be it because of the mistake of the remote site designer or
> >me being evil:( it is unbearable to proceed one by one by hand.
>
> My bet is the site will figure out that you're botting them after
> about the first 50 and shut you down. Ever thought of just asking
> them for the data?
> --

Well, in order to go about this you should do the following things:

1. Choose a user-agent to emulate (Microsoft Internet Explorer or
Firefox)
2. Choose a random request time so that you don't send requests all
the time.

Both this options are available in CURL and also WGET so you could use
any of them. And even might want to do it from more than one IP.
That's my opinion, if you need any more help on spidering the data
just drop me an email.

Regards,
George Maicovschi.

Re: web crawling program

am 01.04.2008 17:26:46 von Raven

thank you all.I will make a delay between queries like 10 seconds so
it will not consume remote site bandwith.but there is another problem
with curl.

The remote site needs creditentals to access.Normally I was opening
site with firefox, entering credidentals and after that I was
completing the form from the local copy and it was working.but when i
used curl for posting the site directs me to credidentals page.
I have made trivial post form and curl handle it very well so i think
my mistake isn't about curl.Things I have tried

1-I thought maybe the host knows request isn't coming from a html
form so i used user-agent as George suggested
$userAgent = 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows
NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6';
curl_setopt($Curl_Session, CURLOPT_USERAGENT, $userAgent);
also i used google& yahoo crawler id's as suggested elsewhere
didn't worked

2-I thought web page uses cookie based session management so somehow
directly using curl submit doesnt carry session variables but
webdeveloper addon for mozilla does not show any

3-I used a intermediate processor and tried to acces
curl_setopt ($Curl_Session, CURLOPT_POSTFIELDS, $_POST);
with no luck

I don't get how a local form submit works but a curl submit do not.I
used curl for other form completion tasks and everywhere it seems ok.
Is there a way the form processor knows it is coming from curl not
from a html form and rejects it? or alternatively is there a way i
can use local form and *somehow* gain control of generated page?

Re: web crawling program

am 01.04.2008 19:44:09 von Csaba

"Jerry Stuckle" wrote in message
news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@comcast.com...
> raven wrote:
>> Thank you Jerry and George for your quick responses. The 100
>> options
>> in the forms corresponds the cities and the form in the site
>> doesn't
>> allow you to make a query without selecting the city first. Since I
>> have no idea for city information(actually i am searching it) a
>> specific query takes 50 average submit to find and i have over 100
>> queries. Be it because of the mistake of the remote site designer
>> or
>> me being evil:( it is unbearable to proceed one by one by hand.
>>
>
> Then you have a problem. If their webmaster is paying any attention
> at all, you'll be in deep trouble with him (and most probably the
> site owner, if they aren't the same people).
>
> I wouldn't recommend it.
>

Jerry,
I always thought programmers were supposed to automate repetitive
tasks.
I see nothing wrong with this, as long as it "friendly fire" ;)

R.

Re: web crawling program

am 02.04.2008 03:47:24 von Csaba

"Jerry Stuckle" wrote in message
news:PZWdnfg7j5vlRW_anZ2dnUVZ_oqhnZ2d@comcast.com...
> Richard wrote:
>> "Jerry Stuckle" wrote in message
>> news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@comcast.com...
>>> raven wrote:
>>>> Thank you Jerry and George for your quick responses. The 100
>>>> options
>>>> in the forms corresponds the cities and the form in the site
>>>> doesn't
>>>> allow you to make a query without selecting the city first. Since
>>>> I
>>>> have no idea for city information(actually i am searching it) a
>>>> specific query takes 50 average submit to find and i have over
>>>> 100
>>>> queries. Be it because of the mistake of the remote site designer
>>>> or
>>>> me being evil:( it is unbearable to proceed one by one by hand.
>>>>
>>> Then you have a problem. If their webmaster is paying any
>>> attention at all, you'll be in deep trouble with him (and most
>>> probably the site owner, if they aren't the same people).
>>>
>>> I wouldn't recommend it.
>>>
>>
>> Jerry,
>> I always thought programmers were supposed to automate repetitive
>> tasks.
>> I see nothing wrong with this, as long as it "friendly fire" ;)
>>
>> R.
>>
>>
>>
>>
>
> What's wrong with it is he's using someone else's information and
> bandwidth.
>
> For instance, if the information is copyrighted, he could be in
> serious legal trouble. Even if it isn't copyrighted, the owner may
> not like the way he's using their website.
>
> Automating repetitive tasks is fine when it's your resources. But
> when you're using someone else's resources, you need to pay
> attention to what they allow.
>
> The whole thing could get him in serious legal trouble if he doesn't
> have permission to do what he wants. And if the owner of the site
> wanted to press it, it could cost the op a LOT of money.
>

Wow Jerry,
that's a lot of ifs. Now what if not?

Information on a website is public, and meant to be accessed.

Within limits. Of course.

R.

Re: web crawling program

am 02.04.2008 03:50:53 von Jerry Stuckle

Richard wrote:
> "Jerry Stuckle" wrote in message
> news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@comcast.com...
>> raven wrote:
>>> Thank you Jerry and George for your quick responses. The 100
>>> options
>>> in the forms corresponds the cities and the form in the site
>>> doesn't
>>> allow you to make a query without selecting the city first. Since I
>>> have no idea for city information(actually i am searching it) a
>>> specific query takes 50 average submit to find and i have over 100
>>> queries. Be it because of the mistake of the remote site designer
>>> or
>>> me being evil:( it is unbearable to proceed one by one by hand.
>>>
>> Then you have a problem. If their webmaster is paying any attention
>> at all, you'll be in deep trouble with him (and most probably the
>> site owner, if they aren't the same people).
>>
>> I wouldn't recommend it.
>>
>
> Jerry,
> I always thought programmers were supposed to automate repetitive
> tasks.
> I see nothing wrong with this, as long as it "friendly fire" ;)
>
> R.
>
>
>
>

What's wrong with it is he's using someone else's information and bandwidth.

For instance, if the information is copyrighted, he could be in serious
legal trouble. Even if it isn't copyrighted, the owner may not like the
way he's using their website.

Automating repetitive tasks is fine when it's your resources. But when
you're using someone else's resources, you need to pay attention to what
they allow.

The whole thing could get him in serious legal trouble if he doesn't
have permission to do what he wants. And if the owner of the site
wanted to press it, it could cost the op a LOT of money.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 02.04.2008 04:00:05 von unknown

Post removed (X-No-Archive: yes)

Re: web crawling program

am 02.04.2008 04:23:00 von Csaba

"Jerry Stuckle" wrote in message
news:tOednbL50thvem_anZ2dnUVZ_u7inZ2d@comcast.com...
> Richard wrote:
>> "Jerry Stuckle" wrote in message
>> news:PZWdnfg7j5vlRW_anZ2dnUVZ_oqhnZ2d@comcast.com...
>>> Richard wrote:
>>>> "Jerry Stuckle" wrote in message
>>>> news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@comcast.com...
>>>>> raven wrote:
>>>>>> Thank you Jerry and George for your quick responses. The 100
>>>>>> options
>>>>>> in the forms corresponds the cities and the form in the site
>>>>>> doesn't
>>>>>> allow you to make a query without selecting the city first.
>>>>>> Since I
>>>>>> have no idea for city information(actually i am searching it) a
>>>>>> specific query takes 50 average submit to find and i have over
>>>>>> 100
>>>>>> queries. Be it because of the mistake of the remote site
>>>>>> designer or
>>>>>> me being evil:( it is unbearable to proceed one by one by hand.
>>>>>>
>>>>> Then you have a problem. If their webmaster is paying any
>>>>> attention at all, you'll be in deep trouble with him (and most
>>>>> probably the site owner, if they aren't the same people).
>>>>>
>>>>> I wouldn't recommend it.
>>>>>
>>>> Jerry,
>>>> I always thought programmers were supposed to automate repetitive
>>>> tasks.
>>>> I see nothing wrong with this, as long as it "friendly fire" ;)
>>>>
>>>> R.
>>>>
>>>>
>>>>
>>>>
>>> What's wrong with it is he's using someone else's information and
>>> bandwidth.
>>>
>>> For instance, if the information is copyrighted, he could be in
>>> serious legal trouble. Even if it isn't copyrighted, the owner
>>> may not like the way he's using their website.
>>>
>>> Automating repetitive tasks is fine when it's your resources. But
>>> when you're using someone else's resources, you need to pay
>>> attention to what they allow.
>>>
>>> The whole thing could get him in serious legal trouble if he
>>> doesn't have permission to do what he wants. And if the owner of
>>> the site wanted to press it, it could cost the op a LOT of money.
>>>
>>
>> Wow Jerry,
>> that's a lot of ifs. Now what if not?
>>
>> Information on a website is public, and meant to be accessed.
>>
>> Within limits. Of course.
>>
>> R.
>>
>>
>>
>>
>>
>
> It is meant to be accessed within the limits the owner puts on it.
> And even though the site itself is public, the data on it can be
> copyrighted. And use for other than the intended purpose (i.e.
> personal use by the browser) is a violation of copyrights.
>
> You can do whatever you want when you own all of the resources. But
> when you start using someone else's resources, you can get into
> serious trouble. I recommend you talk to an attorney about it.
> He'll set you straight.
>

*sigh*

This is not going to be another one of "those" endless discussions.

Bye.

R.

Re: web crawling program

am 02.04.2008 04:56:56 von Jerry Stuckle

Richard wrote:
> "Jerry Stuckle" wrote in message
> news:PZWdnfg7j5vlRW_anZ2dnUVZ_oqhnZ2d@comcast.com...
>> Richard wrote:
>>> "Jerry Stuckle" wrote in message
>>> news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@comcast.com...
>>>> raven wrote:
>>>>> Thank you Jerry and George for your quick responses. The 100
>>>>> options
>>>>> in the forms corresponds the cities and the form in the site
>>>>> doesn't
>>>>> allow you to make a query without selecting the city first. Since
>>>>> I
>>>>> have no idea for city information(actually i am searching it) a
>>>>> specific query takes 50 average submit to find and i have over
>>>>> 100
>>>>> queries. Be it because of the mistake of the remote site designer
>>>>> or
>>>>> me being evil:( it is unbearable to proceed one by one by hand.
>>>>>
>>>> Then you have a problem. If their webmaster is paying any
>>>> attention at all, you'll be in deep trouble with him (and most
>>>> probably the site owner, if they aren't the same people).
>>>>
>>>> I wouldn't recommend it.
>>>>
>>> Jerry,
>>> I always thought programmers were supposed to automate repetitive
>>> tasks.
>>> I see nothing wrong with this, as long as it "friendly fire" ;)
>>>
>>> R.
>>>
>>>
>>>
>>>
>> What's wrong with it is he's using someone else's information and
>> bandwidth.
>>
>> For instance, if the information is copyrighted, he could be in
>> serious legal trouble. Even if it isn't copyrighted, the owner may
>> not like the way he's using their website.
>>
>> Automating repetitive tasks is fine when it's your resources. But
>> when you're using someone else's resources, you need to pay
>> attention to what they allow.
>>
>> The whole thing could get him in serious legal trouble if he doesn't
>> have permission to do what he wants. And if the owner of the site
>> wanted to press it, it could cost the op a LOT of money.
>>
>
> Wow Jerry,
> that's a lot of ifs. Now what if not?
>
> Information on a website is public, and meant to be accessed.
>
> Within limits. Of course.
>
> R.
>
>
>
>
>

It is meant to be accessed within the limits the owner puts on it. And
even though the site itself is public, the data on it can be
copyrighted. And use for other than the intended purpose (i.e. personal
use by the browser) is a violation of copyrights.

You can do whatever you want when you own all of the resources. But
when you start using someone else's resources, you can get into serious
trouble. I recommend you talk to an attorney about it. He'll set you
straight.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 02.04.2008 04:58:04 von Jerry Stuckle

Richard wrote:
> "Jerry Stuckle" wrote in message
> news:PZWdnfg7j5vlRW_anZ2dnUVZ_oqhnZ2d@comcast.com...
>> Richard wrote:
>>> "Jerry Stuckle" wrote in message
>>> news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@comcast.com...
>>>> raven wrote:
>>>>> Thank you Jerry and George for your quick responses. The 100
>>>>> options
>>>>> in the forms corresponds the cities and the form in the site
>>>>> doesn't
>>>>> allow you to make a query without selecting the city first. Since
>>>>> I
>>>>> have no idea for city information(actually i am searching it) a
>>>>> specific query takes 50 average submit to find and i have over
>>>>> 100
>>>>> queries. Be it because of the mistake of the remote site designer
>>>>> or
>>>>> me being evil:( it is unbearable to proceed one by one by hand.
>>>>>
>>>> Then you have a problem. If their webmaster is paying any
>>>> attention at all, you'll be in deep trouble with him (and most
>>>> probably the site owner, if they aren't the same people).
>>>>
>>>> I wouldn't recommend it.
>>>>
>>> Jerry,
>>> I always thought programmers were supposed to automate repetitive
>>> tasks.
>>> I see nothing wrong with this, as long as it "friendly fire" ;)
>>>
>>> R.
>>>
>>>
>>>
>>>
>> What's wrong with it is he's using someone else's information and
>> bandwidth.
>>
>> For instance, if the information is copyrighted, he could be in
>> serious legal trouble. Even if it isn't copyrighted, the owner may
>> not like the way he's using their website.
>>
>> Automating repetitive tasks is fine when it's your resources. But
>> when you're using someone else's resources, you need to pay
>> attention to what they allow.
>>
>> The whole thing could get him in serious legal trouble if he doesn't
>> have permission to do what he wants. And if the owner of the site
>> wanted to press it, it could cost the op a LOT of money.
>>
>
> Wow Jerry,
> that's a lot of ifs. Now what if not?
>
> Information on a website is public, and meant to be accessed.
>
> Within limits. Of course.
>
> R.
>
>
>
>
>

Oh, and BTW - that's not many "ifs". The ONLY "if" which counts is "if"
the website owner allows such access. And it would have to be
*explicit* - not implicit, at least in the U.S.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 02.04.2008 05:17:48 von unknown

Post removed (X-No-Archive: yes)

Re: web crawling program

am 02.04.2008 06:03:38 von Jerry Stuckle

Richard wrote:
> "Jerry Stuckle" wrote in message
> news:tOednbL50thvem_anZ2dnUVZ_u7inZ2d@comcast.com...
>> Richard wrote:
>>> "Jerry Stuckle" wrote in message
>>> news:PZWdnfg7j5vlRW_anZ2dnUVZ_oqhnZ2d@comcast.com...
>>>> Richard wrote:
>>>>> "Jerry Stuckle" wrote in message
>>>>> news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@comcast.com...
>>>>>> raven wrote:
>>>>>>> Thank you Jerry and George for your quick responses. The 100
>>>>>>> options
>>>>>>> in the forms corresponds the cities and the form in the site
>>>>>>> doesn't
>>>>>>> allow you to make a query without selecting the city first.
>>>>>>> Since I
>>>>>>> have no idea for city information(actually i am searching it) a
>>>>>>> specific query takes 50 average submit to find and i have over
>>>>>>> 100
>>>>>>> queries. Be it because of the mistake of the remote site
>>>>>>> designer or
>>>>>>> me being evil:( it is unbearable to proceed one by one by hand.
>>>>>>>
>>>>>> Then you have a problem. If their webmaster is paying any
>>>>>> attention at all, you'll be in deep trouble with him (and most
>>>>>> probably the site owner, if they aren't the same people).
>>>>>>
>>>>>> I wouldn't recommend it.
>>>>>>
>>>>> Jerry,
>>>>> I always thought programmers were supposed to automate repetitive
>>>>> tasks.
>>>>> I see nothing wrong with this, as long as it "friendly fire" ;)
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> What's wrong with it is he's using someone else's information and
>>>> bandwidth.
>>>>
>>>> For instance, if the information is copyrighted, he could be in
>>>> serious legal trouble. Even if it isn't copyrighted, the owner
>>>> may not like the way he's using their website.
>>>>
>>>> Automating repetitive tasks is fine when it's your resources. But
>>>> when you're using someone else's resources, you need to pay
>>>> attention to what they allow.
>>>>
>>>> The whole thing could get him in serious legal trouble if he
>>>> doesn't have permission to do what he wants. And if the owner of
>>>> the site wanted to press it, it could cost the op a LOT of money.
>>>>
>>> Wow Jerry,
>>> that's a lot of ifs. Now what if not?
>>>
>>> Information on a website is public, and meant to be accessed.
>>>
>>> Within limits. Of course.
>>>
>>> R.
>>>
>>>
>>>
>>>
>>>
>> It is meant to be accessed within the limits the owner puts on it.
>> And even though the site itself is public, the data on it can be
>> copyrighted. And use for other than the intended purpose (i.e.
>> personal use by the browser) is a violation of copyrights.
>>
>> You can do whatever you want when you own all of the resources. But
>> when you start using someone else's resources, you can get into
>> serious trouble. I recommend you talk to an attorney about it.
>> He'll set you straight.
>>
>
> *sigh*
>
> This is not going to be another one of "those" endless discussions.
>
> Bye.
>
> R.
>
>
>

As I said - see your attorney. He'll tell you you can't do anything you
want with someone else's website.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 02.04.2008 12:42:31 von Csaba

*grin*

Re: web crawling program

am 02.04.2008 12:47:10 von Raven

I do not want to discuss the ethics in this discussion. The site is
public with free membership. Since I respect the host owner i will
include a time delay between queries.It will not burden host more than
a user.I only see no reason not to automate the process.

Re: web crawling program

am 02.04.2008 13:48:28 von Jerry Stuckle

raven wrote:
> I do not want to discuss the ethics in this discussion. The site is
> public with free membership. Since I respect the host owner i will
> include a time delay between queries.It will not burden host more than
> a user.I only see no reason not to automate the process.
>

As I said - public does not mean you can do whatever you want. I highly
suggest you talk to an attorney. If you do this without permission, you
could land yourself in a lot of trouble.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 02.04.2008 13:49:05 von Jerry Stuckle

raven wrote:
> I do not want to discuss the ethics in this discussion. The site is
> public with free membership. Since I respect the host owner i will
> include a time delay between queries.It will not burden host more than
> a user.I only see no reason not to automate the process.
>

And BTW - that is not an ethical situation - it is a legal one.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: web crawling program

am 03.04.2008 01:39:21 von Baho Utot

On Tue, 01 Apr 2008 21:56:56 -0500, Jerry Stuckle wrote:

[putolin]

>>
>>
>>
>>
> It is meant to be accessed within the limits the owner puts on it. And
> even though the site itself is public, the data on it can be
> copyrighted. And use for other than the intended purpose (i.e. personal
> use by the browser) is a violation of copyrights.
>
> You can do whatever you want when you own all of the resources. But
> when you start using someone else's resources, you can get into serious
> trouble. I recommend you talk to an attorney about it. He'll set you
> straight.

I think it depends upon the country you live in. In my country no one
would care.

I refer you to the case of the I Love virus, USA wanted to arrest and
jail the teens, Philippines said Pff.

http://query.nytimes.com/gst/fullpage.html?
res=9C0CE6DD1E3EF931A1575BC0A9669C8B63

And resulted in this:

(from the NYT articule Published: August 22, 2000)

Last month, the Philippine president, Joseph Estrada, traveled to the
United States to meet with President Clinton and to seek high-technology
investment. Last week, Callahan Broadband Wireless said it would invest
3.15 billion pesos, or $70 million, in a Philippine company providing
high-speed Internet services to businesses in the country.

--
Tayo'y Mga Pinoy

Re: web crawling program

am 03.04.2008 06:45:16 von Jerry Stuckle

Baho Utot wrote:
> On Tue, 01 Apr 2008 21:56:56 -0500, Jerry Stuckle wrote:
>
> [putolin]
>
>>>
>>>
>>>
>> It is meant to be accessed within the limits the owner puts on it. And
>> even though the site itself is public, the data on it can be
>> copyrighted. And use for other than the intended purpose (i.e. personal
>> use by the browser) is a violation of copyrights.
>>
>> You can do whatever you want when you own all of the resources. But
>> when you start using someone else's resources, you can get into serious
>> trouble. I recommend you talk to an attorney about it. He'll set you
>> straight.
>
> I think it depends upon the country you live in. In my country no one
> would care.
>
> I refer you to the case of the I Love virus, USA wanted to arrest and
> jail the teens, Philippines said Pff.
>
> http://query.nytimes.com/gst/fullpage.html?
> res=9C0CE6DD1E3EF931A1575BC0A9669C8B63
>
>
> And resulted in this:
>
> (from the NYT articule Published: August 22, 2000)
>
> Last month, the Philippine president, Joseph Estrada, traveled to the
> United States to meet with President Clinton and to seek high-technology
> investment. Last week, Callahan Broadband Wireless said it would invest
> 3.15 billion pesos, or $70 million, in a Philippine company providing
> high-speed Internet services to businesses in the country.
>

A reference to a story almost 8 years old? ROFLMAO! Just what I expect
from you, Bhao.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================