Regular Expression help

Regular Expression help

am 26.04.2007 21:26:19 von Rob

Hi,
I need to convert our word documents to html for our website. I've used
MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
2.0" to clean up the code but I am stuck with a lot of additional code
and I want to write a script that will do a custom cleanup.

The Word document has a "Table of Contents" and when I convert, I get
links at the top of my page that link to the appropriate section but I
get code like this:

name="_Toc58980987"> name="_Toc58981749"> name="_Toc93973545"> name="_Toc126114863">


I get a whole bunch of empty anchor tags each with a different name and
only the last anchor tag is correct. I would like to use regular
expressions to remove all empty "a" tags.

I know how to use regular expressions with ASP 3.0 but I don't know the
pattern.

Does anyone know the regex.pattern to replace all empty tags with an
empty string?

Thanks
Rob



*** Sent via Developersdex http://www.developersdex.com ***

Re: Regular Expression help

am 27.04.2007 07:54:41 von lexa

"Rob" wrote in message
news:uMscFjDiHHA.4904@TK2MSFTNGP05.phx.gbl...
> Hi,
> I need to convert our word documents to html for our website. I've used
> MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
> 2.0" to clean up the code but I am stuck with a lot of additional code
> and I want to write a script that will do a custom cleanup.
>
> The Word document has a "Table of Contents" and when I convert, I get
> links at the top of my page that link to the appropriate section but I
> get code like this:
>
> > name="_Toc58980987">
> name="_Toc58981749"> > name="_Toc93973545"> > name="_Toc126114863">
>
>
> I get a whole bunch of empty anchor tags each with a different name and
> only the last anchor tag is correct. I would like to use regular
> expressions to remove all empty "a" tags.
>

Rob, I think something similar to

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
..Pattern = "\\<\/a\>"
..IgnoreCase = True
..Global = True
End With

ReplacedText = RegularExpressionObject.Replace(InitialText, "")

Re: Regular Expression help

am 27.04.2007 09:29:51 von exjxw.hannivoort

Alexey Smirnov wrote on 27 apr 2007 in
microsoft.public.inetserver.asp.general:

>
> "Rob" wrote in message
> news:uMscFjDiHHA.4904@TK2MSFTNGP05.phx.gbl...
[..]
>>
>> I get a whole bunch of empty anchor tags each with a different name
>> and only the last anchor tag is correct. I would like to use regular
>> expressions to remove all empty "a" tags.
>>
>
> Rob, I think something similar to
>
> Set RegularExpressionObject = New RegExp
>
> With RegularExpressionObject
> .Pattern = "\\<\/a\>"
> .IgnoreCase = True
> .Global = True
> End With
>
> ReplacedText = RegularExpressionObject.Replace(InitialText, "")

..Pattern = "]*>\s*<\/a>"

will do.

=================

However, why [yes, I know it is personal preference] not use a bit of
jscript even if you use vbs in ASP:


<% ' vbs
dim t,result
t="x"
result = deleteEmptyAnchors(t)
%>





--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Re: Regular Expression help

am 27.04.2007 14:17:28 von Rob

Thanks Evertjan

I tried the other example "\\<\/a\>" but my page was taking
too long to process it. Then I tried your example "]*>\s*<\/a>" and
it works great.

Thanks again.

Rob



*** Sent via Developersdex http://www.developersdex.com ***