DOMElement: td vs. th

am 11.03.2010 21:34:31 von Andy Theuninck

I'm trying to parse a string containing an HTML table using the
builtin DOM classes and running into an odd problem.

Here's what I'm doing:
$dom = new DOMDocument();
$dom->loadHTML($str);
$tables = $dom->getElementsByTagName("table");
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach($rows as $row){
foreach($row->childNodes as $node)
// stuff
}

This gives me the row elements in order and access to their contents.
The weird part is $node always appears to be a td tag - even when it's
a th tag in the original string (DOMElement::tagName is always "td"
(as well as DOMNode::nodeName and DOMNode::localName)). The th tags
definitely aren't being omitted; I still get nodes with their
contents, just with the wrong tag name.

Is there any way to override this behavior so that I can distinguish
between td tags and th tags?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: DOMElement: td vs. th

am 11.03.2010 21:45:56 von Rene Veerman

hmm lame bug... but you can add a classname to the s and check for that=
?..

On Thu, Mar 11, 2010 at 9:34 PM, Andy Theuninck wrote:
> I'm trying to parse a string containing an HTML table using the
> builtin DOM classes and running into an odd problem.
>
> Here's what I'm doing:
> $dom =3D new DOMDocument();
> $dom->loadHTML($str);
> $tables =3D $dom->getElementsByTagName("table");
> $rows =3D $tables->item(0)->getElementsByTagName('tr');
> foreach($rows as $row){
> =A0 =A0foreach($row->childNodes as $node)
> =A0 =A0 =A0 =A0 // stuff
> }
>
> This gives me the row elements in order and access to their contents.
> The weird part is $node always appears to be a td tag - even when it's
> a th tag in the original string (DOMElement::tagName is always "td"
> (as well as DOMNode::nodeName and DOMNode::localName)). The th tags
> definitely aren't being omitted; I still get nodes with their
> contents, just with the wrong tag name.
>
> Is there any way to override this behavior so that I can distinguish
> between td tags and th tags?
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: DOMElement: td vs. th

am 11.03.2010 21:52:37 von Andy Theuninck

I could could, but that would kind of defeat the point of the project
(I'm trying to capture a bunch of existing HTML reports via output
buffering and transform the tables into proper XLS. Tweaking every
single report is exactly what I'm trying to avoid).

On Thu, Mar 11, 2010 at 2:45 PM, Rene Veerman wrote:
> hmm lame bug... but you can add a classname to the s and check for th=
at?..
>
> On Thu, Mar 11, 2010 at 9:34 PM, Andy Theuninck wrot=
e:
>> I'm trying to parse a string containing an HTML table using the
>> builtin DOM classes and running into an odd problem.
>>
>> Here's what I'm doing:
>> $dom =3D new DOMDocument();
>> $dom->loadHTML($str);
>> $tables =3D $dom->getElementsByTagName("table");
>> $rows =3D $tables->item(0)->getElementsByTagName('tr');
>> foreach($rows as $row){
>> =A0 =A0foreach($row->childNodes as $node)
>> =A0 =A0 =A0 =A0 // stuff
>> }
>>
>> This gives me the row elements in order and access to their contents.
>> The weird part is $node always appears to be a td tag - even when it's
>> a th tag in the original string (DOMElement::tagName is always "td"
>> (as well as DOMNode::nodeName and DOMNode::localName)). The th tags
>> definitely aren't being omitted; I still get nodes with their
>> contents, just with the wrong tag name.
>>
>> Is there any way to override this behavior so that I can distinguish
>> between td tags and th tags?
>>
>> --
>> PHP General Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: DOMElement: td vs. th

am 11.03.2010 21:59:49 von Rene Veerman

Re: DOMElement: td vs. th

am 11.03.2010 22:01:37 von Rene Veerman

So in other words; it's the library that you fix with wrapper
functions, not the reports (outside the scope of using the library).

On Thu, Mar 11, 2010 at 9:59 PM, Rene Veerman wrote:
> function readyForDOM_report($originalReportAsText) {
> =A0return str_replace (' tAsText);
> }
>
> $dom =3D new DOMDocument();
> $dom->loadHTML(readyForDOM_report($str));
> $tables =3D $dom->getElementsByTagName("table");
> $rows =3D $tables->item(0)->getElementsByTagName('tr');
> foreach($rows as $row){
> =A0 foreach($row->childNodes as $node)
> =A0 =A0 =A0 =A0// check $node for having a classname 'transportTH'.
> }
>
> the only problem i foresee is s in your reports already having a
> class=3D"something" set, which could mess it up. you'd need to check
> that. but in that case you can always pump the original $str to the
> DOM, and use multiple $k's from foreach ($arr as $k=3D>$v) to get to the
> corresponding node, and have the original class name.
>
>
>
>
>
> On Thu, Mar 11, 2010 at 9:52 PM, Andy Theuninck wrot=
e:
>> I could could, but that would kind of defeat the point of the project
>> (I'm trying to capture a bunch of existing HTML reports via output
>> buffering and transform the tables into proper XLS. Tweaking every
>> single report is exactly what I'm trying to avoid).
>>
>> On Thu, Mar 11, 2010 at 2:45 PM, Rene Veerman wrote=
:
>>> hmm lame bug... but you can add a classname to the s and check for =
that?..
>>>
>>> On Thu, Mar 11, 2010 at 9:34 PM, Andy Theuninck wr=
ote:
>>>> I'm trying to parse a string containing an HTML table using the
>>>> builtin DOM classes and running into an odd problem.
>>>>
>>>> Here's what I'm doing:
>>>> $dom =3D new DOMDocument();
>>>> $dom->loadHTML($str);
>>>> $tables =3D $dom->getElementsByTagName("table");
>>>> $rows =3D $tables->item(0)->getElementsByTagName('tr');
>>>> foreach($rows as $row){
>>>> =A0 =A0foreach($row->childNodes as $node)
>>>> =A0 =A0 =A0 =A0 // stuff
>>>> }
>>>>
>>>> This gives me the row elements in order and access to their contents.
>>>> The weird part is $node always appears to be a td tag - even when it's
>>>> a th tag in the original string (DOMElement::tagName is always "td"
>>>> (as well as DOMNode::nodeName and DOMNode::localName)). The th tags
>>>> definitely aren't being omitted; I still get nodes with their
>>>> contents, just with the wrong tag name.
>>>>
>>>> Is there any way to override this behavior so that I can distinguish
>>>> between td tags and th tags?
>>>>
>>>> --
>>>> PHP General Mailing List (http://www.php.net/)
>>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>>>
>>>>
>>>
>>
>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: DOMElement: td vs. th

am 11.03.2010 22:21:13 von Andy Theuninck

Gotcha, wasn't thinking straight. Turns out it doesn't really have to
be a legal-HTML attribute anyway, so I can just do:
str_replace('
On Thu, Mar 11, 2010 at 3:01 PM, Rene Veerman wrote:
> So in other words; it's the library that you fix with wrapper
> functions, not the reports (outside the scope of using the library).
>
> On Thu, Mar 11, 2010 at 9:59 PM, Rene Veerman wrote:
>> function readyForDOM_report($originalReportAsText) {
>> =A0return str_replace (' rtAsText);
>> }
>>
>> $dom =3D new DOMDocument();
>> $dom->loadHTML(readyForDOM_report($str));
>> $tables =3D $dom->getElementsByTagName("table");
>> $rows =3D $tables->item(0)->getElementsByTagName('tr');
>> foreach($rows as $row){
>> =A0 foreach($row->childNodes as $node)
>> =A0 =A0 =A0 =A0// check $node for having a classname 'transportTH'.
>> }
>>
>> the only problem i foresee is s in your reports already having a
>> class=3D"something" set, which could mess it up. you'd need to check
>> that. but in that case you can always pump the original $str to the
>> DOM, and use multiple $k's from foreach ($arr as $k=3D>$v) to get to the
>> corresponding node, and have the original class name.
>>
>>
>>
>>
>>
>> On Thu, Mar 11, 2010 at 9:52 PM, Andy Theuninck wro=
te:
>>> I could could, but that would kind of defeat the point of the project
>>> (I'm trying to capture a bunch of existing HTML reports via output
>>> buffering and transform the tables into proper XLS. Tweaking every
>>> single report is exactly what I'm trying to avoid).
>>>
>>> On Thu, Mar 11, 2010 at 2:45 PM, Rene Veerman wrot=
e:
>>>> hmm lame bug... but you can add a classname to the s and check for=
that?..
>>>>
>>>> On Thu, Mar 11, 2010 at 9:34 PM, Andy Theuninck w=
rote:
>>>>> I'm trying to parse a string containing an HTML table using the
>>>>> builtin DOM classes and running into an odd problem.
>>>>>
>>>>> Here's what I'm doing:
>>>>> $dom =3D new DOMDocument();
>>>>> $dom->loadHTML($str);
>>>>> $tables =3D $dom->getElementsByTagName("table");
>>>>> $rows =3D $tables->item(0)->getElementsByTagName('tr');
>>>>> foreach($rows as $row){
>>>>> =A0 =A0foreach($row->childNodes as $node)
>>>>> =A0 =A0 =A0 =A0 // stuff
>>>>> }
>>>>>
>>>>> This gives me the row elements in order and access to their contents.
>>>>> The weird part is $node always appears to be a td tag - even when it'=
s
>>>>> a th tag in the original string (DOMElement::tagName is always "td"
>>>>> (as well as DOMNode::nodeName and DOMNode::localName)). The th tags
>>>>> definitely aren't being omitted; I still get nodes with their
>>>>> contents, just with the wrong tag name.
>>>>>
>>>>> Is there any way to override this behavior so that I can distinguish
>>>>> between td tags and th tags?
>>>>>
>>>>> --
>>>>> PHP General Mailing List (http://www.php.net/)
>>>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>>>>
>>>>>
>>>>
>>>
>>
>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php