Re: Parse a html file as a XML file
am 19.01.2008 17:56:57 von NoSpamMgbworld
Try , which is a standard practice. I imagine some parsers will
still puke on this methodology, but it should solve the major issue.
Can you solve this without doing anything? Probably not. It is the nature of
freeform sections, which XML does not understand the same way HTML parsers
do, as the rules are more strict.
--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA
*************************************************
| Think outside the box!
|
*************************************************
"Stan SR" wrote in message
news:eQnv7YnWIHA.2000@TK2MSFTNGP05.phx.gbl...
> Hi,
>
> I need to read a html file and parse it as a XML File.
>
> All my html file have this structure.
>
>
>
>
>
>
>
>
>
>
> My code has to read some sections (title, script, body).
> Everything works when the script language (javascript code) section has
> not code or not a lot, but sometimes it fails when there are characters
> like ; (especially in "for" statement).
> So for that works, I had to add "decorate" the script section with
> and it looks like
>
>
>
> Is there a way to parse the file without using the tag ?
>
> Stan
>
>
RE: Parse a html file as a XML file
am 19.01.2008 18:56:00 von pbromberg
You could try using Simon Mourier's "HtmlAgilityPack", which can be found on
codeplex.com.
It uses the concept of HtmlDocument class which parses the HTML of the page
into an XPATH conformant document object that works "just like" XmlDocument.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
MetaFinder: http://www.blogmetafinder.com
"Stan SR" wrote:
> Hi,
>
> I need to read a html file and parse it as a XML File.
>
> All my html file have this structure.
>
>
>
>
>
>
>
>
>
>
> My code has to read some sections (title, script, body).
> Everything works when the script language (javascript code) section has not
> code or not a lot, but sometimes it fails when there are characters like ;
> (especially in "for" statement).
> So for that works, I had to add "decorate" the script section with
> and it looks like
>
>
>
> Is there a way to parse the file without using the tag ?
>
> Stan
>
>
>