Re: Regex to get the <html></html>
am 02.08.2007 19:39:12 von FFMG
Rik;84869 Wrote:
> On Thu, 02 Aug 2007 17:48:24 +0200, FFMG
>
> wrote:
> > I want to get the code and a 'simple?' solution seems to be
> > be...
> >
> > preg_match_all("/<[html]+[^>]*>\s*(.*\s*)<\/html>\s*/i", $html,
> > $matches, PREG_SET_ORDER);
>
> Euhm, nope. you start on an undefined tag (lose the blockquotes around
>
> '[html]'), and you;re matching the html tag, not the head tag.
>
Of course, thanks. Must have been a typo.
Rik;84869 Wrote:
>
>
> DOM functions?
>
> > How can I change my regex to ignore head tags inside double or
> single
> > quotes?
>
> Could be done by setting a greedy match starting on a quote untill the
>
> endquote. Then again, if you're concerned with invalid attributes,
> you'd
> have to allow for the possibility the quotes are erronous too, i.e.
> someone forgot to open or close them.
>
> I've taken a stab at it with regexes in the past, which works quite
> well
> as long as you can be sure it's stricly valid HTML. If it isn't, or
> you're
> using outside sources where this isn't known, don't use regular
> expressions for something a parser ought to be doing.
> --
> Rik Wasmus
Thanks, are you suggesting that I walk the text, first look for the
open tag, then look for the close tag that is not within a quote?
I guess a simple function could do that.
Would you know of such function or would I need to write one :)?
FFMG
--
'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------ ------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=19012
Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).
Re: Regex to get the <html></html>
am 02.08.2007 19:42:56 von gosha bine
Rik wrote:
> DOM functions?
Yes, or a SAX parser like pear HTML_SAX. DOM is still too picky about
invalid html.
> I've taken a stab at it with regexes in the past, which works quite well
> as long as you can be sure it's stricly valid HTML. If it isn't, or
> you're using outside sources where this isn't known, don't use regular
> expressions for something a parser ought to be doing.
Yes, regexps suck as a parser, however in most cases you need just a
lexer and that's the job regexps do quite well.
--
gosha bine
extended php parser ~ http://code.google.com/p/pihipi
blok ~ http://www.tagarga.com/blok