Re: GetElementByClass?

Re: GetElementByClass?

am 03.04.2010 16:11:02 von Peter Pei

On Sat, 03 Apr 2010 08:58:44 -0600, Ashley Sheridan
wrote:

> On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:
>
>> Hi gang:
>>
>> Here's the problem.
>>
>> I have 184 HTML pages in a directory and each page contain a
>> question. The question is noted in the HTML DOM like so:
>>
>>


>> Who is Roger Rabbit?
>>


>>
>> My question is -- how can I extract the string "Who is Roger Rabbit?"
>> from each page using php? You see, I want to store the questions in a
>> database without having to re-type, or cut/paste, each one.
>>
>> Now, I can extract each question by using javascript --
>>
>> document.getElementById("question").innerHTML;
>>
>> -- and stepping through each page, but I don't want to use javascript
>> for this.
>>
>> I have not found/created a working example of this using PHP. I tried
>> using PHP's getElementByID(), but that requires the target file to be
>> valid xml and the string to be contained within an ID and not a
>> class. These pages do not support either requirement.
>>
>> Additionally, I realize that I can load the files and parse out what
>> is between the

tags, but I was hoping for a "GetElementByClass"
>> way to do this.
>>
>> So, is there one?
>>
>> Thanks,
>>
>> tedd
>> --
>> -------
>> http://sperling.com http://ancientstones.com http://earthstones.com
>>
>
>
> I don't think there is a getElementsByClass function. HTML5 is proposing
> one, but that will most likely be implemented in Javascript before PHP
> Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
> sure what it is. If you know roughly where in the document the HTML
> snippet is you can use XPath to grab it.
>
> Failing that, what about a regex? It shouldn't be too hard to write a
> regex to match your example above.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

Somejavascript engine already support GetElementByClass, for example Opera
does.

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 16:14:50 von Peter Pei

No javascript's getElementByID() won't work here. As "question" is a
class, not an ID. But like what was mentioned here, you can use
getElementByClass() with Opera, and that will work.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 16:22:02 von Peter Pei

>>
>
>
> Yes, because Opera is pretty much leading the way with its HTML5
> support. Not even Firefox supports as much as Opera does.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

Opera 10.10 is a very nice version, but 10.50 could be quite slow with
some web pages.

I still remember that once upon a time, Opera was so broken, and it also
showed you that little window for ads ;-)

I love to see opera get a chance, but its market share is not moving even
with the release Opera 10.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 16:23:51 von Peter Pei

>
> Why don't you just use REGEX? I don't know any possibility to easily
> process contents which are not valid XML/XHTML just because there's no
> library to load such stuff (but put me in right there).
>
> I'm not an expert of REGEX, but I think the following would do it:
> /\<p\s*class\=\"question\"\s*\>(.*)\<\/p\>
>
>
> (my first contribute here, I beg your pardon if something went wrong)
>
> Regards,
>
>

regexp is the best fit here and not much effort to do. Especially consider
this is only for one time use.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 16:26:57 von Peter Pei

>>
>> Hi
>>
>> You could replace the "class" with "id" and then go on with JavaScript.
>>
>> A possible better way are regular expressions...
>>
>>
>> Greetz
>> Piero
>>
>>
>> --
>> PHP General Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>

Yes, and jquery is hosted on Microsoft CDN, don't even need to download
just plug the link in.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

GetElementByClass?

am 03.04.2010 16:29:01 von TedD

Hi gang:

Here's the problem.

I have 184 HTML pages in a directory and each page contain a
question. The question is noted in the HTML DOM like so:


Who is Roger Rabbit?



My question is -- how can I extract the string "Who is Roger Rabbit?"
from each page using php? You see, I want to store the questions in a
database without having to re-type, or cut/paste, each one.

Now, I can extract each question by using javascript --

document.getElementById("question").innerHTML;

-- and stepping through each page, but I don't want to use javascript for this.

I have not found/created a working example of this using PHP. I tried
using PHP's getElementByID(), but that requires the target file to be
valid xml and the string to be contained within an ID and not a
class. These pages do not support either requirement.

Additionally, I realize that I can load the files and parse out what
is between the

tags, but I was hoping for a "GetElementByClass"
way to do this.

So, is there one?

Thanks,

tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 16:48:32 von Peter Pei

>
> I think Tedds main reason not to use Javascript is that he needs it to
> be done on the server rather than the client machine.
>
>
> ps. please use bottom posting on the list.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

But he also mentioned that he wanted to avoid copy and paste... it does
give me the feeling that this is a one time thing, and he just wanted to
extract the questions.

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 16:51:26 von Peter Pei

On Sat, 03 Apr 2010 09:21:17 -0600, Ashley Sheridan
wrote:

> s, first browser to have tabs, first to have that
> odd homepage with thumbnails of y

Talking about Opera's 'speed dial"... I downloaded safari yesterday (which
I didn't like last time I used it), it now has the same kind of page but
with sort of 3D looking.

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 16:58:44 von Ashley Sheridan

--=-2Gywd1M7t30ykV3syYkk
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:

> Hi gang:
>
> Here's the problem.
>
> I have 184 HTML pages in a directory and each page contain a
> question. The question is noted in the HTML DOM like so:
>
>


> Who is Roger Rabbit?
>


>
> My question is -- how can I extract the string "Who is Roger Rabbit?"
> from each page using php? You see, I want to store the questions in a
> database without having to re-type, or cut/paste, each one.
>
> Now, I can extract each question by using javascript --
>
> document.getElementById("question").innerHTML;
>
> -- and stepping through each page, but I don't want to use javascript for this.
>
> I have not found/created a working example of this using PHP. I tried
> using PHP's getElementByID(), but that requires the target file to be
> valid xml and the string to be contained within an ID and not a
> class. These pages do not support either requirement.
>
> Additionally, I realize that I can load the files and parse out what
> is between the

tags, but I was hoping for a "GetElementByClass"
> way to do this.
>
> So, is there one?
>
> Thanks,
>
> tedd
> --
> -------
> http://sperling.com http://ancientstones.com http://earthstones.com
>


I don't think there is a getElementsByClass function. HTML5 is proposing
one, but that will most likely be implemented in Javascript before PHP
Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
sure what it is. If you know roughly where in the document the HTML
snippet is you can use XPath to grab it.

Failing that, what about a regex? It shouldn't be too hard to write a
regex to match your example above.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-2Gywd1M7t30ykV3syYkk--

Re: GetElementByClass?

am 03.04.2010 17:03:51 von dispy

Am 03.04.2010 16:29, schrieb tedd:
> Hi gang:
>
> Here's the problem.
>
> I have 184 HTML pages in a directory and each page contain a question.
> The question is noted in the HTML DOM like so:
>
> <p class="question">
> Who is Roger Rabbit?
> </p>
>
> My question is -- how can I extract the string "Who is Roger Rabbit?"
> from each page using php? You see, I want to store the questions in a
> database without having to re-type, or cut/paste, each one.
>
> Now, I can extract each question by using javascript --
>
> document.getElementById("question").innerHTML;
>
> -- and stepping through each page, but I don't want to use javascript
> for this.
>
> I have not found/created a working example of this using PHP. I tried
> using PHP's getElementByID(), but that requires the target file to be
> valid xml and the string to be contained within an ID and not a class.
> These pages do not support either requirement.
>
> Additionally, I realize that I can load the files and parse out what is
> between the <p> tags, but I was hoping for a "GetElementByClass" way to
> do this.
>
> So, is there one?
>
> Thanks,
>
> tedd

Why don't you just use REGEX? I don't know any possibility to easily
process contents which are not valid XML/XHTML just because there's no
library to load such stuff (but put me in right there).

I'm not an expert of REGEX, but I think the following would do it:
/\<p\s*class\=\"question\"\s*\>(.*)\<\/p\>


(my first contribute here, I beg your pardon if something went wrong)

Regards,

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 17:07:08 von vikash.iitb

--000e0cd5cb0019b8a9048356737e
Content-Type: text/plain; charset=UTF-8

I use this: http://simplehtmldom.sourceforge.net/
Check it out.


Thanks,
Vikash Kumar
--
http://vika.sh


On Sat, Apr 3, 2010 at 8:28 PM, Ashley Sheridan wrote:

> On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:
>
> > Hi gang:
> >
> > Here's the problem.
> >
> > I have 184 HTML pages in a directory and each page contain a
> > question. The question is noted in the HTML DOM like so:
> >
> >


> > Who is Roger Rabbit?
> >


> >
> > My question is -- how can I extract the string "Who is Roger Rabbit?"
> > from each page using php? You see, I want to store the questions in a
> > database without having to re-type, or cut/paste, each one.
> >
> > Now, I can extract each question by using javascript --
> >
> > document.getElementById("question").innerHTML;
> >
> > -- and stepping through each page, but I don't want to use javascript for
> this.
> >
> > I have not found/created a working example of this using PHP. I tried
> > using PHP's getElementByID(), but that requires the target file to be
> > valid xml and the string to be contained within an ID and not a
> > class. These pages do not support either requirement.
> >
> > Additionally, I realize that I can load the files and parse out what
> > is between the

tags, but I was hoping for a "GetElementByClass"
> > way to do this.
> >
> > So, is there one?
> >
> > Thanks,
> >
> > tedd
> > --
> > -------
> > http://sperling.com http://ancientstones.com http://earthstones.com
> >
>
>
> I don't think there is a getElementsByClass function. HTML5 is proposing
> one, but that will most likely be implemented in Javascript before PHP
> Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
> sure what it is. If you know roughly where in the document the HTML
> snippet is you can use XPath to grab it.
>
> Failing that, what about a regex? It shouldn't be too hard to write a
> regex to match your example above.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>

--000e0cd5cb0019b8a9048356737e--

Re: GetElementByClass?

am 03.04.2010 17:07:38 von Ashley Sheridan

--=-/H1r+170yMsSmlOQYuSE
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sat, 2010-04-03 at 08:11 -0600, Peter Pei wrote:

> On Sat, 03 Apr 2010 08:58:44 -0600, Ashley Sheridan
> wrote:
>
> > On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:
> >
> >> Hi gang:
> >>
> >> Here's the problem.
> >>
> >> I have 184 HTML pages in a directory and each page contain a
> >> question. The question is noted in the HTML DOM like so:
> >>
> >>


> >> Who is Roger Rabbit?
> >>


> >>
> >> My question is -- how can I extract the string "Who is Roger Rabbit?"
> >> from each page using php? You see, I want to store the questions in a
> >> database without having to re-type, or cut/paste, each one.
> >>
> >> Now, I can extract each question by using javascript --
> >>
> >> document.getElementById("question").innerHTML;
> >>
> >> -- and stepping through each page, but I don't want to use javascript
> >> for this.
> >>
> >> I have not found/created a working example of this using PHP. I tried
> >> using PHP's getElementByID(), but that requires the target file to be
> >> valid xml and the string to be contained within an ID and not a
> >> class. These pages do not support either requirement.
> >>
> >> Additionally, I realize that I can load the files and parse out what
> >> is between the

tags, but I was hoping for a "GetElementByClass"
> >> way to do this.
> >>
> >> So, is there one?
> >>
> >> Thanks,
> >>
> >> tedd
> >> --
> >> -------
> >> http://sperling.com http://ancientstones.com http://earthstones.com
> >>
> >
> >
> > I don't think there is a getElementsByClass function. HTML5 is proposing
> > one, but that will most likely be implemented in Javascript before PHP
> > Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
> > sure what it is. If you know roughly where in the document the HTML
> > snippet is you can use XPath to grab it.
> >
> > Failing that, what about a regex? It shouldn't be too hard to write a
> > regex to match your example above.
> >
> > Thanks,
> > Ash
> > http://www.ashleysheridan.co.uk
> >
> >
>
> Somejavascript engine already support GetElementByClass, for example Opera
> does.
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>


Yes, because Opera is pretty much leading the way with its HTML5
support. Not even Firefox supports as much as Opera does.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-/H1r+170yMsSmlOQYuSE--

Re: GetElementByClass?

am 03.04.2010 17:09:36 von Adam Richardson

--00163613751feb9eeb0483567b10
Content-Type: text/plain; charset=ISO-8859-1

>
>


> Who is Roger Rabbit?
>


>
> My question is -- how can I extract the string "Who is Roger Rabbit?" from
> each page using php? You see, I want to store the questions in a database
> without having to re-type, or cut/paste, each one.
>


I have not found/created a working example of this using PHP. I tried using
> PHP's getElementByID(), but that requires the target file to be valid xml
> and the string to be contained within an ID and not a class. These pages do
> not support either requirement.
>
> Additionally, I realize that I can load the files and parse out what is
> between the

tags, but I was hoping for a "GetElementByClass" way to do
> this.
>
> So, is there one?
>


Perhaps I'd try this:
http://simplehtmldom.sourceforge.net/manual.htm

Adam

--
Nephtali: PHP web framework that functions beautifully
http://nephtaliproject.com

--00163613751feb9eeb0483567b10--

Re: GetElementByClass?

am 03.04.2010 17:14:58 von Peter Pei

>> Somejavascript engine already support GetElementByClass, for example
>> Opera does.
>
> My example shows how, namely:
>
> document.getElementById("question").innerHTML;
>
> will return the value within the class.
>
> Cheers,
>
> tedd
>

In your original post, you said the data you had was:


Who is Roger Rabbit?



Does that still stand? or there was a typo, and class should really be ID?
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 17:16:13 von Piero Steinger

On 03.04.2010 16:29, tedd wrote:
> Hi gang:
>
> Here's the problem.
>
> I have 184 HTML pages in a directory and each page contain a question.
> The question is noted in the HTML DOM like so:
>
>


> Who is Roger Rabbit?
>


>
> My question is -- how can I extract the string "Who is Roger Rabbit?"
> from each page using php? You see, I want to store the questions in a
> database without having to re-type, or cut/paste, each one.
>
> Now, I can extract each question by using javascript --
>
> document.getElementById("question").innerHTML;
>
> -- and stepping through each page, but I don't want to use javascript
> for this.
>
> I have not found/created a working example of this using PHP. I tried
> using PHP's getElementByID(), but that requires the target file to be
> valid xml and the string to be contained within an ID and not a class.
> These pages do not support either requirement.
>
> Additionally, I realize that I can load the files and parse out what
> is between the

tags, but I was hoping for a "GetElementByClass"
> way to do this.
>
> So, is there one?
>
> Thanks,
>
> tedd

Hi

You could replace the "class" with "id" and then go on with JavaScript.

A possible better way are regular expressions...


Greetz
Piero


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 17:17:26 von Ashley Sheridan

--=-FYQAGQC0f500Kqn9zpA1
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sat, 2010-04-03 at 17:03 +0200, dispy wrote:

> Am 03.04.2010 16:29, schrieb tedd:
> > Hi gang:
> >
> > Here's the problem.
> >
> > I have 184 HTML pages in a directory and each page contain a question.
> > The question is noted in the HTML DOM like so:
> >
> > <p class="question">
> > Who is Roger Rabbit?
> > </p>
> >
> > My question is -- how can I extract the string "Who is Roger Rabbit?"
> > from each page using php? You see, I want to store the questions in a
> > database without having to re-type, or cut/paste, each one.
> >
> > Now, I can extract each question by using javascript --
> >
> > document.getElementById("question").innerHTML;
> >
> > -- and stepping through each page, but I don't want to use javascript
> > for this.
> >
> > I have not found/created a working example of this using PHP. I tried
> > using PHP's getElementByID(), but that requires the target file to be
> > valid xml and the string to be contained within an ID and not a class.
> > These pages do not support either requirement.
> >
> > Additionally, I realize that I can load the files and parse out what is
> > between the <p> tags, but I was hoping for a "GetElementByClass" way to
> > do this.
> >
> > So, is there one?
> >
> > Thanks,
> >
> > tedd
>
> Why don't you just use REGEX? I don't know any possibility to easily
> process contents which are not valid XML/XHTML just because there's no
> library to load such stuff (but put me in right there).
>
> I'm not an expert of REGEX, but I think the following would do it:
> /\<p\s*class\=\"question\"\s*\>(.*)\<\/p\>
>
>
> (my first contribute here, I beg your pardon if something went wrong)
>
> Regards,
>
>


The . won't match new line characters, so you'll have to add those in
too.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-FYQAGQC0f500Kqn9zpA1--

Re: GetElementByClass?

am 03.04.2010 17:21:17 von Ashley Sheridan

--=-MxOfUwdWIVzqf4jifNhL
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sat, 2010-04-03 at 08:22 -0600, Peter Pei wrote:

> >>
> >
> >
> > Yes, because Opera is pretty much leading the way with its HTML5
> > support. Not even Firefox supports as much as Opera does.
> >
> > Thanks,
> > Ash
> > http://www.ashleysheridan.co.uk
> >
> >
>
> Opera 10.10 is a very nice version, but 10.50 could be quite slow with
> some web pages.
>
> I still remember that once upon a time, Opera was so broken, and it also
> showed you that little window for ads ;-)
>
> I love to see opera get a chance, but its market share is not moving even
> with the release Opera 10.


Opera has led the way in most things over the years. It was the first
browser to have addons, first browser to have tabs, first to have that
odd homepage with thumbnails of your 9 most popular sites.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-MxOfUwdWIVzqf4jifNhL--

Re: GetElementByClass?

am 03.04.2010 17:21:44 von vikash.iitb

--000e0cd6a82053ab1b048356a7de
Content-Type: text/plain; charset=UTF-8

If you are open to use javascript then a js library like jQuery may help in
selecting all elements from a particular class.

$(".clasName")


Thanks,
Vikash Kumar
--
http://vika.sh


On Sat, Apr 3, 2010 at 8:46 PM, Piero Steinger wrote:

> On 03.04.2010 16:29, tedd wrote:
> > Hi gang:
> >
> > Here's the problem.
> >
> > I have 184 HTML pages in a directory and each page contain a question.
> > The question is noted in the HTML DOM like so:
> >
> >


> > Who is Roger Rabbit?
> >


> >
> > My question is -- how can I extract the string "Who is Roger Rabbit?"
> > from each page using php? You see, I want to store the questions in a
> > database without having to re-type, or cut/paste, each one.
> >
> > Now, I can extract each question by using javascript --
> >
> > document.getElementById("question").innerHTML;
> >
> > -- and stepping through each page, but I don't want to use javascript
> > for this.
> >
> > I have not found/created a working example of this using PHP. I tried
> > using PHP's getElementByID(), but that requires the target file to be
> > valid xml and the string to be contained within an ID and not a class.
> > These pages do not support either requirement.
> >
> > Additionally, I realize that I can load the files and parse out what
> > is between the

tags, but I was hoping for a "GetElementByClass"
> > way to do this.
> >
> > So, is there one?
> >
> > Thanks,
> >
> > tedd
>
> Hi
>
> You could replace the "class" with "id" and then go on with JavaScript.
>
> A possible better way are regular expressions...
>
>
> Greetz
> Piero
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--000e0cd6a82053ab1b048356a7de--

Re: GetElementByClass?

am 03.04.2010 17:22:34 von Ashley Sheridan

--=-CBE8vo4KjjVOQFgCCfvW
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sat, 2010-04-03 at 20:51 +0530, vikash.iitb@gmail.com wrote:

> If you are open to use javascript then a js library like jQuery may help in
> selecting all elements from a particular class.
>
> $(".clasName")
>
>
> Thanks,
> Vikash Kumar
> --
> http://vika.sh
>
>
> On Sat, Apr 3, 2010 at 8:46 PM, Piero Steinger wrote:
>
> > On 03.04.2010 16:29, tedd wrote:
> > > Hi gang:
> > >
> > > Here's the problem.
> > >
> > > I have 184 HTML pages in a directory and each page contain a question.
> > > The question is noted in the HTML DOM like so:
> > >
> > >


> > > Who is Roger Rabbit?
> > >


> > >
> > > My question is -- how can I extract the string "Who is Roger Rabbit?"
> > > from each page using php? You see, I want to store the questions in a
> > > database without having to re-type, or cut/paste, each one.
> > >
> > > Now, I can extract each question by using javascript --
> > >
> > > document.getElementById("question").innerHTML;
> > >
> > > -- and stepping through each page, but I don't want to use javascript
> > > for this.
> > >
> > > I have not found/created a working example of this using PHP. I tried
> > > using PHP's getElementByID(), but that requires the target file to be
> > > valid xml and the string to be contained within an ID and not a class.
> > > These pages do not support either requirement.
> > >
> > > Additionally, I realize that I can load the files and parse out what
> > > is between the

tags, but I was hoping for a "GetElementByClass"
> > > way to do this.
> > >
> > > So, is there one?
> > >
> > > Thanks,
> > >
> > > tedd
> >
> > Hi
> >
> > You could replace the "class" with "id" and then go on with JavaScript.
> >
> > A possible better way are regular expressions...
> >
> >
> > Greetz
> > Piero
> >
> >
> > --
> > PHP General Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >
> >

I think Tedds main reason not to use Javascript is that he needs it to
be done on the server rather than the client machine.


ps. please use bottom posting on the list.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-CBE8vo4KjjVOQFgCCfvW--

Re: GetElementByClass?

am 03.04.2010 17:33:34 von Peter Pei

> Sort of.
>
> Like I said, the folling will work:
>
> document.getElementById("question").innerHTML;
>
> While you are using a getElementById, which returns an ID, but adding
> .innerHTML will return the class value.
>
> Try it.
>
> Cheers,
>
> tedd

No, this will not work, if it appeared working, please re-check your data,
and make sure you didn't miss anything...

Run the following in any browser, and see whether you get a pop up, then
change class= to ID=, and run it again.


Who is Roger Rabbit?




innerHTML works off an object, and it does nothing if youu cannot even
locate the object in the first place.

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 17:35:09 von Piero Steinger

On 03.04.2010 17:17, Ashley Sheridan wrote:
> On Sat, 2010-04-03 at 17:03 +0200, dispy wrote:
>
>
>> Am 03.04.2010 16:29, schrieb tedd:
>>
>>> Hi gang:
>>>
>>> Here's the problem.
>>>
>>> I have 184 HTML pages in a directory and each page contain a question.
>>> The question is noted in the HTML DOM like so:
>>>
>>> <p class="question">
>>> Who is Roger Rabbit?
>>> </p>
>>>
>>> My question is -- how can I extract the string "Who is Roger Rabbit?"
>>> from each page using php? You see, I want to store the questions in a
>>> database without having to re-type, or cut/paste, each one.
>>>
>>> Now, I can extract each question by using javascript --
>>>
>>> document.getElementById("question").innerHTML;
>>>
>>> -- and stepping through each page, but I don't want to use javascript
>>> for this.
>>>
>>> I have not found/created a working example of this using PHP. I tried
>>> using PHP's getElementByID(), but that requires the target file to be
>>> valid xml and the string to be contained within an ID and not a class.
>>> These pages do not support either requirement.
>>>
>>> Additionally, I realize that I can load the files and parse out what is
>>> between the <p> tags, but I was hoping for a "GetElementByClass" way to
>>> do this.
>>>
>>> So, is there one?
>>>
>>> Thanks,
>>>
>>> tedd
>>>
>> Why don't you just use REGEX? I don't know any possibility to easily
>> process contents which are not valid XML/XHTML just because there's no
>> library to load such stuff (but put me in right there).
>>
>> I'm not an expert of REGEX, but I think the following would do it:
>> /\<p\s*class\=\"question\"\s*\>(.*)\<\/p\>
>>
>>
>> (my first contribute here, I beg your pardon if something went wrong)
>>
>> Regards,
>>
>>
>>
>
> The . won't match new line characters, so you'll have to add those in
> too.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>

It matches new lines with the modifier s.
http://ch2.php.net/manual/en/reference.pcre.pattern.modifier s.php

Greetz
Piero


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 17:45:54 von Peter Pei

>
> It might have worked in Internet Explorer, as for a while that browser
> got confused over the class and id if two different elements on a page
> had the same class and id values.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

IE and Opera were the two I tested with.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 17:47:38 von Peter Pei

> > // here is where you load a single file or change to iterate over a
> // directory of files
> $oDomDoc = DOMDocument::loadHTMLFile('./tedd.html');
>
> // here is where you search for the question sections of each file
> $oDomXpath = new DOMXPath($oDomDoc);
> $oNodeList = $oDomXpath->query("//p[@class='question']");
>
> // here is where you extract the question sections of each file
> foreach($oNodeList as $oDomNode)
> var_dump($oDomNode->nodeValue);
>
>
> should be trivial to expand that to work w/ multiple files.
>
>
>
>> Now, I can extract each question by using javascript --
>>
>> document.getElementById("question").innerHTML;
>>
>
> tedd, are you slipping? i thought you were searching by the class
> attribute, lol.
>
> -nathan

Beautiful!

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:01:57 von TedD

At 8:11 AM -0600 4/3/10, Peter Pei wrote:
>On Sat, 03 Apr 2010 08:58:44 -0600, Ashley Sheridan
> wrote:
>
>>On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:
>>>-snip-
>>>
>>>Now, I can extract each question by using javascript --
>>>
>>>document.getElementById("question").innerHTML;
>>>
>>
>
>Somejavascript engine already support GetElementByClass, for example
>Opera does.

My example shows how, namely:

document.getElementById("question").innerHTML;

will return the value within the class.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:03:56 von Nathan Nobbe

--000e0cd5f8163e3df80483573e9c
Content-Type: text/plain; charset=UTF-8

On Sat, Apr 3, 2010 at 8:29 AM, tedd wrote:

> Hi gang:
>
> Here's the problem.
>
> I have 184 HTML pages in a directory and each page contain a question. The
> question is noted in the HTML DOM like so:
>
>


> Who is Roger Rabbit?
>


>
> My question is -- how can I extract the string "Who is Roger Rabbit?" from
> each page using php? You see, I want to store the questions in a database
> without having to re-type, or cut/paste, each one.
>

if the files are html on the server then it should be easy to loop over each
one, loading the markup into memory and searching for what you want. id go
for xpath myself; i tend to always start there and fall back to regex since
xpath & xsl are so much cleaner for dealing w/ markup.

anyway heres the demo
--------------
tedd.html
--------------


sadfasdf

hello



Who is Roger Rabbit?


more stuff



Who is Roger Rabbit?




---------------------
transform.php
---------------------
// here is where you load a single file or change to iterate over a
// directory of files
$oDomDoc = DOMDocument::loadHTMLFile('./tedd.html');

// here is where you search for the question sections of each file
$oDomXpath = new DOMXPath($oDomDoc);
$oNodeList = $oDomXpath->query("//p[@class='question']");

// here is where you extract the question sections of each file
foreach($oNodeList as $oDomNode)
var_dump($oDomNode->nodeValue);


should be trivial to expand that to work w/ multiple files.



> Now, I can extract each question by using javascript --
>
> document.getElementById("question").innerHTML;
>

tedd, are you slipping? i thought you were searching by the class
attribute, lol.

-nathan

--000e0cd5f8163e3df80483573e9c--

Re: GetElementByClass?

am 03.04.2010 18:11:35 von TedD

At 3:58 PM +0100 4/3/10, Ashley Sheridan wrote:
>I don't think there is a getElementsByClass function. HTML5 is
>proposing one, but that will most likely be implemented in
>Javascript before PHP Dom. There is a way to tidy up the HTML to
>make it XHTML, but I'm not sure what it is. If you know roughly
>where in the document the HTML snippet is you can use XPath to grab
>it.
>
>Failing that, what about a regex? It shouldn't be too hard to write
>a regex to match your example above.
>
>Thanks,
>Ash

Ash:

I don't have a problem solving the problem the long way, which is to:

1. Load the file;
2. Parse between the markers;
3. Strip tags and replace extra white space.
4. Save to the db.

In fact, here's the code I used to solve the problem:

//--------

$filesize = filesize($filename);
$file = fopen( $filename, "r" );
$text = fread( $file, $filesize );
fclose( $file );

$marker1 = "

";
$marker2 = "

";

$first = strpos($text, $marker1)+20;
$last = strpos($text, $marker2);
$len = $last - $first;

$text = substr($text, $first , $len);
$text = strip_tags($text);

$space = array(' ', "\t", "\n", "\r", "\x0B", "\x0C");

$words = array();
$all_words = explode(' ', $text);
{
$line = str_replace($space, '', $line);
if (strlen($line) > 0)
{
$words[] = $line;
}
}

$text = implode(' ',$words);
$text = htmlspecialchars($text);

//---------

I was just exploring PHP's getElement thing and wasn't having much
luck with it.

Cheers,

tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:18:44 von TedD

At 8:14 AM -0600 4/3/10, Peter Pei wrote:
>No javascript's getElementByID() won't work here. As "question" is a
>class, not an ID. But like what was mentioned here, you can use
>getElementByClass() with Opera, and that will work.

Sort of.

Like I said, the folling will work:

document.getElementById("question").innerHTML;

While you are using a getElementById, which returns an ID, but adding
..innerHTML will return the class value.

Try it.

Cheers,

tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:20:11 von TedD

At 5:16 PM +0200 4/3/10, Piero Steinger wrote:
>
>Hi
>
>You could replace the "class" with "id" and then go on with JavaScript.
>
>A possible better way are regular expressions...
>
>
>Greetz
>Piero

I can go with javascript "as-is" (what I showed) and don't have to
change any html.

Cheers,

tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:23:47 von TedD

At 4:22 PM +0100 4/3/10, Ashley Sheridan wrote:
>>
>I think Tedds main reason not to use Javascript is that he needs it
>to be done on the server rather than the client machine.
>
>
>ps. please use bottom posting on the list.
>
>Thanks,
>Ash

Yeah, one reason was to get this done in one operation and not step
through the questions like I would have to do using javascript to
approve (trigger) each step.

But my main reason for posting was to see if PHP had DOM operators
like javascript.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:29:34 von Nathan Nobbe

--001636e907d0e85a6b0483579910
Content-Type: text/plain; charset=UTF-8

On Sat, Apr 3, 2010 at 10:18 AM, tedd wrote:

> At 8:14 AM -0600 4/3/10, Peter Pei wrote:
>
>> No javascript's getElementByID() won't work here. As "question" is a
>> class, not an ID. But like what was mentioned here, you can use
>> getElementByClass() with Opera, and that will work.
>>
>
> Sort of.
>
> Like I said, the folling will work:
>
> document.getElementById("question").innerHTML;
>
> While you are using a getElementById, which returns an ID, but adding
> .innerHTML will return the class value.
>
> Try it.
>

i did, and just like i thought, it doesnt work. why, .innerHTML is executed
*after* document.getElementById("question"), meaning if that returns
nothing; which in this case it does, then there is nothing for innerHTML to
operate on.

i in fact had to wrap the call in an exception just to get it from blowing
up in firefox.






Who is Roger Rabbit?






heres what firefox says on my box:

TypeError: document.getElementById("question") is null

and btw, i think after seeing your solution we can see why i like the xpath
approach so much more :P

-nathan

--001636e907d0e85a6b0483579910--

Re: GetElementByClass?

am 03.04.2010 18:29:49 von TedD

At 9:14 AM -0600 4/3/10, Peter Pei wrote:
>>>Somejavascript engine already support GetElementByClass, for
>>>example Opera does.
>>
>>My example shows how, namely:
>>
>>document.getElementById("question").innerHTML;
>>
>>will return the value within the class.
>>
>>Cheers,
>>
>>tedd
>>
>
>In your original post, you said the data you had was:
>
>


> Who is Roger Rabbit?
>


>
>Does that still stand? or there was a typo, and class should really be ID?

You're right -- I re-looked at the code to solve a similar problem
and it is indeed ID and not class.

I was using the wrong example.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:30:52 von TedD

At 12:18 PM -0400 4/3/10, tedd wrote:
>At 8:14 AM -0600 4/3/10, Peter Pei wrote:
>>No javascript's getElementByID() won't work here. As "question" is
>>a class, not an ID. But like what was mentioned here, you can use
>>getElementByClass() with Opera, and that will work.
>
>Sort of.
>
>Like I said, the folling will work:
>
>document.getElementById("question").innerHTML;
>
>While you are using a getElementById, which returns an ID, but
>adding .innerHTML will return the class value.
>
>Try it.
>
>Cheers,
>
>tedd
>--


Nope, I was wrong here.

I was looking at the wrong example.

Cheers,

tedd



--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GetElementByClass?

am 03.04.2010 18:32:47 von Ashley Sheridan

--=-B7t35yAmjxqcOBDmowal
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sat, 2010-04-03 at 09:33 -0600, Peter Pei wrote:

> > Sort of.
> >
> > Like I said, the folling will work:
> >
> > document.getElementById("question").innerHTML;
> >
> > While you are using a getElementById, which returns an ID, but adding
> > .innerHTML will return the class value.
> >
> > Try it.
> >
> > Cheers,
> >
> > tedd
>
> No, this will not work, if it appeared working, please re-check your data,
> and make sure you didn't miss anything...
>
> Run the following in any browser, and see whether you get a pop up, then
> change class= to ID=, and run it again.
>
>


> Who is Roger Rabbit?
>


>
>
> innerHTML works off an object, and it does nothing if youu cannot even
> locate the object in the first place.
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>


It might have worked in Internet Explorer, as for a while that browser
got confused over the class and id if two different elements on a page
had the same class and id values.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-B7t35yAmjxqcOBDmowal--

Re: GetElementByClass?

am 03.04.2010 18:37:51 von TedD

At 10:03 AM -0600 4/3/10, Nathan Nobbe wrote:
>-snip- code

Your code worked like a charm.

Thanks.

>Now, I can extract each question by using javascript --
>
>document.getElementById("question").innerHTML;
>
>
>tedd, are you slipping? i thought you were searching by the class
>attribute, lol.

Yeah, I was looking at the wrong example I created.

That example used ID's and I thought it was operating on class.

Sorry,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php