Re: regex help
am 23.09.2004 17:51:59 von Sunny
I found it:
"?table.*?>"
The second ? makes lazy *, i.e. as few repeats as possible.
Thanks for reading, and hope this can help someone.
SUnny
In article ,
sunny@newsgroups.nospam says...
> Hi,
> I know that this is not the very right group, but I can not find better,
> and I have seen a lot of regex experts to answer, so I'll give a shot:
>
> Lets have this string:
> "
\nMY_TEXT\n<\table>"
>
> please note the new line characters.
>
> So, using a regex like:
> "?table.*>"
>
> will find these:
>
> "
" and
> "<\table>"
>
> How should like a regex to get only the table tags? I want to leave
> and
unmatched? How to stop the match to the first occurrence of ">",
> not the last one?
>
> I.e. if I replace the match with an empty string, I want to receive:
> "
\nMY_TEXT\n
",
>
> but not what I have now:
> "\nMY_TEXT\n"
>
>
> Cheers
> Sunny
>
Re: Regex Help
am 31.03.2008 21:02:27 von Jesse Houwing
Hello Christopher,
What is it that isn't working right now? It looks like you're nearly there.
Your pattern isn't what I'd make of it, try the following if that's what's
currently bothering you:
?(?:font|span|div|more tags here)[^>]*>
And there seems to be a little error in your code: Regex.Replace doesn't
alter the original string (strings are immutable in .NET), but it returns
a new string instead, so the following code needs to be changed:
strHtmlString = node.InnerXml()
strHtmlString = Regex.Replace(strHtmlString, pattern, String.Empty,RegexOptions.IgnoreCase).Trim()
If that doesn't work, then please explain what it is that isn't working :).
Jesse
> I'm inserting a SharePoint List into a SQL Database, but some of the
> text has oddly formed HTML tags. I want to remove these tags with a
> regular expression, but I'm having some difficulty. My code is below.
>
> Imports System
> Imports System.Net
> Imports System.Data
> Imports System.Math
> Imports Microsoft.SqlServer.Dts.Runtime
> Imports System.Xml
> Imports SharePointServices
> Imports SharePointServices.NorthwindSync
> Imports System.Text.RegularExpressions
> Imports System.IO
> Public Class ScriptMain
>
> Public Sub Main()
>
> Dim DocLoc As String
> Dim TextDoc As TextWriter
> Dim listService As New Lists()
> Dim node As XmlNode
> Dim strHtmlString As String
> Dim pattern As String =
> "<[/]?(font|span|div|del|ins|color:\w+)[^>]*?"
> DocLoc = "\\MYSERVER\MyFolder\MyFile.xml"
>
> listService.PreAuthenticate = True
> listService.Credentials =
> CredentialCache.DefaultNetworkCredentials
> Try
>
> node = ListHelper.GetAllListItems(listService, "My List
> Name")
> strHtmlString = node.InnerXml()
> Regex.Replace(strHtmlString, pattern, String.Empty,
> RegexOptions.IgnoreCase).Trim()
> TextDoc = File.CreateText(DocLoc)
> TextDoc.WriteLine(strHtmlString)
> TextDoc.Flush()
> TextDoc.Close()
> Catch ex As Exception
>
> 'Raise the error again and the result to failure.
> Dts.Events.FireError(1, ex.TargetSite.ToString(),
> ex.Message,
> "", 0)
> Dts.TaskResult = Dts.Results.Failure
> End Try
>
> Dts.TaskResult = Dts.Results.Success
>
> End Sub
>
> End Class
>
> And here are a few samples of what I'm tryig to remove with the Regex.
>
> "
"
> ""
> "
"
> Any help would be greatly appreciated.
>
> Thanks,
> Chris
--
Jesse Houwing
jesse.houwing at sogeti.nl