Data mining?
am 25.04.2007 10:31:53 von tom
Hi all,
I wonder if anyone can give me some help here.
I have permission from a colleage to use some data from his website,
now I need to take the data and intermingle some of the information
with data from my database. I only have access to the HTML of the
external site, there's no XML feed or anything simple I can get it
from either. So, I was wondering if there was an easyish way of
parsing out the info I need from the HTML and putting it into and
array or something? This is how the HTML is formed:
Tel: .
xxNeed this Titlexx
Address line 1
address line 2
Postcode
Tel: .
xxNeed this Titlexx
Address line 1
address line 2
Postcode
Tel: .
xxNeed this Titlexx
Address line 1
address line 2
Postcode
And so on....
Does anyone have any bright ideas? I've got as far as putting the
whole page into a string and ripping out the and other
stuff. It's looping through those DIVs and turning them into something
I can manipulate where I'm struggling.
Thanks in advance,
Tom
Re: Data mining?
am 26.04.2007 18:02:40 von Anthony Jones
"Tom" wrote in message
news:1177489913.166536.19160@r30g2000prh.googlegroups.com...
> Hi all,
>
> I wonder if anyone can give me some help here.
>
> I have permission from a colleage to use some data from his website,
> now I need to take the data and intermingle some of the information
> with data from my database. I only have access to the HTML of the
> external site, there's no XML feed or anything simple I can get it
> from either. So, I was wondering if there was an easyish way of
> parsing out the info I need from the HTML and putting it into and
> array or something? This is how the HTML is formed:
>
>
>
Tel: .
>
>
xxNeed this Titlexx
>
>
>
Address line 1
> address line 2
> Postcode
>
>
>
>
>
>
>
>
>
>
>
Tel: .
>
>
xxNeed this Titlexx
>
>
>
Address line 1
> address line 2
> Postcode
>
>
>
>
>
>
>
>
>
Tel: .
>
>
xxNeed this Titlexx
>
>
>
Address line 1
> address line 2
> Postcode
>
>
>
>
>
>
>
> And so on....
>
> Does anyone have any bright ideas? I've got as far as putting the
> whole page into a string and ripping out the and other
> stuff. It's looping through those DIVs and turning them into something
> I can manipulate where I'm struggling.
>
> Thanks in advance,
I looks like the HTML is XML compliant (e.g., it uses
rather than
simply
) you might be able to get away with loading it into an XML DOM.
>
> Tom
>