Regular Expressions

Regular Expressions

am 12.09.2004 18:38:30 von Scott Schluer

Hi all,

I got a JavaScript function from a website that uses regular expressions to
count the number of words in a textbox. I'm trying to replicate it with
ASP.NET so I can run a second check on the server side to make sure the word
count doesn't go over a specified number of words. I don't know anything
about regular expressions so I'm hoping there's a glaring error that someone
can just point out:

The JavaScript function works great, but the ASP.NET function I tried to
replicate isn't working (it's not doing the .Replace function correctly).
Here's the Javascript function:

function CountWords (this_field, that_field) {
var fullStr = this_field.value + " ";
var initial_whitespace_rExp = /^[^A-Za-z0-9]+/gi;
var left_trimmedStr = fullStr.replace(initial_whitespace_rExp, "");
var non_alphanumerics_rExp = rExp = /[^A-Za-z0-9]+/gi;
var cleanedStr = left_trimmedStr.replace(non_alphanumerics_rExp, " ");
var splitString = cleanedStr.split(" ");
var word_count = splitString.length -1;

if (fullStr.length < 2) word_count = 0;
that_field.value = word_count;
}

Here's the ASP.NET function:

Public Shared Function CountWords(ByVal strString As String) As Integer
strString = strString & " "
Dim initial_whitespace_rExp As New
System.Text.RegularExpressions.Regex("/^[^A-Za-z0-9]+/gi")
Dim left_trimmedStr As String =
initial_whitespace_rExp.Replace(strString, "")
Dim non_alphanumerics_rExp As New
System.Text.RegularExpressions.Regex("/[^A-Za-z0-9]+/gi")
Dim cleanedStr As String =
non_alphanumerics_rExp.Replace(left_trimmedStr, "")
Dim splitString() As String = cleanedStr.Split(" ")
Dim WordCount = splitString.Length - 1

Return WordCount
End Function

If I type the sentence: "This is a test sentence", the JS function
returns a total of 5 words. The ASP.NET function returns 5 + the number of
EXTRA spaces (I don't know how many extra spaces I typed here, but lets say
it was 10 -- the ASP.NET function would return 15 words). The two Replace
functions should be stripping those.

Any ideas?

Re: Regular Expressions

am 13.09.2004 21:19:45 von Jeff B

Try this

Public Shared Function CountWords(ByVal strString As String) As Integer
strString = strString & " "
Dim initial_whitespace_rExp As New
System.Text.RegularExpressions.Regex("\A[^\w]*")
Dim left_trimmedStr As String =
initial_whitespace_rExp.Replace(strString, "")
Dim non_alphanumerics_rExp As New
System.Text.RegularExpressions.Regex("[^'\w]\b*")
' REPLACE ALL NON-ALPHNUMERIC EXCEPT APOSTROPHE WITH WHITESPACE
Dim cleanedStr As String =
non_alphanumerics_rExp.Replace(left_trimmedStr, " ")
'REPLACE ALL CONSECUTIVE WHITESPACE WITH SINGLE WHITESPACE
Dim consecutive_whitespace_rExp As New
System.Text.RegularExpressions.Regex("[\s]*[^'\w]")

Dim fixedCleanedStr As String =
consecutive_whitespace_rExp.Replace(cleanedStr, " ")
Dim splitString() As String = fixedCleanedStr.Split(" ")
Dim WordCount = splitString.Length - 1

Return WordCount
End Function



"Scott Schluer" wrote in message
news:O6noXbOmEHA.952@TK2MSFTNGP14.phx.gbl...
> Hi all,
>
> I got a JavaScript function from a website that uses regular expressions
to
> count the number of words in a textbox. I'm trying to replicate it with
> ASP.NET so I can run a second check on the server side to make sure the
word
> count doesn't go over a specified number of words. I don't know anything
> about regular expressions so I'm hoping there's a glaring error that
someone
> can just point out:
>
> The JavaScript function works great, but the ASP.NET function I tried to
> replicate isn't working (it's not doing the .Replace function correctly).
> Here's the Javascript function:
>
> function CountWords (this_field, that_field) {
> var fullStr = this_field.value + " ";
> var initial_whitespace_rExp = /^[^A-Za-z0-9]+/gi;
> var left_trimmedStr = fullStr.replace(initial_whitespace_rExp, "");
> var non_alphanumerics_rExp = rExp = /[^A-Za-z0-9]+/gi;
> var cleanedStr = left_trimmedStr.replace(non_alphanumerics_rExp, " ");
> var splitString = cleanedStr.split(" ");
> var word_count = splitString.length -1;
>
> if (fullStr.length < 2) word_count = 0;
> that_field.value = word_count;
> }
>
> Here's the ASP.NET function:
>
> Public Shared Function CountWords(ByVal strString As String) As Integer
> strString = strString & " "
> Dim initial_whitespace_rExp As New
> System.Text.RegularExpressions.Regex("/^[^A-Za-z0-9]+/gi")
> Dim left_trimmedStr As String =
> initial_whitespace_rExp.Replace(strString, "")
> Dim non_alphanumerics_rExp As New
> System.Text.RegularExpressions.Regex("/[^A-Za-z0-9]+/gi")
> Dim cleanedStr As String =
> non_alphanumerics_rExp.Replace(left_trimmedStr, "")
> Dim splitString() As String = cleanedStr.Split(" ")
> Dim WordCount = splitString.Length - 1
>
> Return WordCount
> End Function
>
> If I type the sentence: "This is a test sentence", the JS function
> returns a total of 5 words. The ASP.NET function returns 5 + the number of
> EXTRA spaces (I don't know how many extra spaces I typed here, but lets
say
> it was 10 -- the ASP.NET function would return 15 words). The two Replace
> functions should be stripping those.
>
> Any ideas?
>
>

Re: Regular Expressions

am 31.03.2008 20:48:37 von Jesse Houwing

Hello RN1,

> $ in RegExp means the end of a string. So if the ValidationExpression
> for a RegularExpressionValidator which validates a TextBox is "c
> $" (without the double quotes), shouldn't input strings 'abc', '23pc',
> 'c9mccc' (all without the single quotes) evaluate to True BUT it
> doesn't. In fact, these strings evaluate to False. Only the string
> 'c' (again without the single quotes) evaluates to True.
> If I am not mistaken, "c$" means that any string (irrespective of its
> length) will evaluate to True provided the last character in the
> string is 'c' (without the single quotes). Or have I got it wrong? If
> so, please correct me.

Your regex is completely correct, just so you know, but there is a little
undocumented feature in the RagularExpressionValidator that puts a ^ and
a $ around the expression regardless of your wishes. os it actually tries
to evaluate this:

^c$$, which only accepts c as input.

The correct solution would be to make it

..*c

or

^.*c$

which will evaluate correctly.

--
Jesse Houwing
jesse.houwing at sogeti.nl