Parsing HTML with Regular Expressions

Parsing HTML with Regular Expressions

am 15.06.2005 17:39:45 von Captain Dondo

I am trying to pull out an href from a bit of javascript. I am running
php, but the RE should be the same....

What I have is this:

onClick="JavaScript:window.open(\'http://www.seiner.com/blog /Travels/images/wp-snapshot.php?image=http://www.seiner.com/ blog/Travels/images/2.jpg&width=730&height=755\',
\'FamilyPic\', \'scrollbars=yes,height=755,width=730,location=no\');
return false"> src="http://www.seiner.com/blog/Travels/images/thumb-2.jpg"/ >

What I want to do is pull out the URL in the window.open call but only
if it doesn't contain either a next=[whatever] or a prev=[whatever] tag.

In other words, the above href doesn't contain either one, so my RE
returns 'http://www.seiner.com/blog/Travels/images/1.jpg'.

But if the above URL were to be as follows (see the next and prev at the
end of the URL):

onClick="JavaScript:window.open(\'http://www.seiner.com/blog /Travels/images/wp-snapshot.php?image=http://www.seiner.com/ blog/Travels/images/2.jpg&width=730&height=755&prev=4.jpg&ne xt=2.jpg\',
\'FamilyPic\', \'scrollbars=yes,height=755,width=730,location=no\');
return false"> src="http://www.seiner.com/blog/Travels/images/thumb-2.jpg"/ >

I want the RE to not match....

The RE I am using is

$re = '<[aA] .*image=([a-zA-Z0-9.:/-]*).*/>';

and the actual match is done via:

preg_match_all ( $re, $text , $matches, PREG_OFFSET_CAPTURE);

TIA...