Removing obscure chars

Removing obscure chars

am 03.04.2007 19:17:59 von Yobbo

Hi All

I have an ASP function in place to strip invalid chars out of a data store
before I create an XML file of this data, but my function doesn't work on a
certain set of chars.

As far as I can see these are the following:

a) trademark char
b) long hyphen/dash char
c) smart/curly quotes (both left and right)

Even though my function is set up as follows:

Function ReFormatStringForXML(s)
IF LEN(s) > 0 AND NOT IsNull(s) THEN
s = Replace(s,"™","™")
s = Replace(s,"—","-")
s = Replace(s,"’",""")
s = Replace(s,"'",""")
s = Replace(s,"""",""")
s = Replace(s,"&","&")
s = Replace(s,"<","<")
s = Replace(s,">",">")
END IF
ReFormatStringForXML = s
End Function

These chars still pass by and foul up my XML file.

I have a feeling that its down to the fact that my function is looking for
the html equiv rather than the actual char, but I can't possibly get away
with simply copy and pasting these friggin(!!) chars into my function.
Surely this is bad practise?

Does anybody know how I can trap and replace/remove these chars if need be?

Thanks

Re: Removing obscure chars

am 04.04.2007 09:09:36 von Adrienne Boswell

Gazing into my crystal ball I observed "Yobbo"
writing in news:ugCfLVhdHHA.2148@TK2MSFTNGP05.phx.gbl:

> Hi All
>
> I have an ASP function in place to strip invalid chars out of a data
> store before I create an XML file of this data, but my function
> doesn't work on a certain set of chars.
>
> As far as I can see these are the following:
>
> a) trademark char
> b) long hyphen/dash char
> c) smart/curly quotes (both left and right)

I detest these "smart" quotes. Are regular quotes dumb by comparison?

>
> Even though my function is set up as follows:
>
> Function ReFormatStringForXML(s)
> IF LEN(s) > 0 AND NOT IsNull(s) THEN
> s = Replace(s,"™","™")
> s = Replace(s,"—","-")
> s = Replace(s,"’",""")
> s = Replace(s,"'",""")
> s = Replace(s,"""",""")
> s = Replace(s,"&","&")
> s = Replace(s,"<","<")
> s = Replace(s,">",">")
> END IF
> ReFormatStringForXML = s
> End Function
>
> These chars still pass by and foul up my XML file.
>
> I have a feeling that its down to the fact that my function is looking
> for the html equiv rather than the actual char, but I can't possibly
> get away with simply copy and pasting these friggin(!!) chars into my
> function. Surely this is bad practise?

You are putting in the HTML entity, you may need to put the ascii
character instead, for example:
s = replace(s,chr(60),">")

>
> Does anybody know how I can trap and replace/remove these chars if
> need be?
>
> Thanks
>
>
>
>

HTH

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Re: Removing obscure chars

am 04.04.2007 09:09:36 von Adrienne Boswell

Gazing into my crystal ball I observed "Yobbo"
writing in news:ugCfLVhdHHA.2148@TK2MSFTNGP05.phx.gbl:

> Hi All
>
> I have an ASP function in place to strip invalid chars out of a data
> store before I create an XML file of this data, but my function
> doesn't work on a certain set of chars.
>
> As far as I can see these are the following:
>
> a) trademark char
> b) long hyphen/dash char
> c) smart/curly quotes (both left and right)

I detest these "smart" quotes. Are regular quotes dumb by comparison?

>
> Even though my function is set up as follows:
>
> Function ReFormatStringForXML(s)
> IF LEN(s) > 0 AND NOT IsNull(s) THEN
> s = Replace(s,"™","™")
> s = Replace(s,"—","-")
> s = Replace(s,"’",""")
> s = Replace(s,"'",""")
> s = Replace(s,"""",""")
> s = Replace(s,"&","&")
> s = Replace(s,"<","<")
> s = Replace(s,">",">")
> END IF
> ReFormatStringForXML = s
> End Function
>
> These chars still pass by and foul up my XML file.
>
> I have a feeling that its down to the fact that my function is looking
> for the html equiv rather than the actual char, but I can't possibly
> get away with simply copy and pasting these friggin(!!) chars into my
> function. Surely this is bad practise?

You are putting in the HTML entity, you may need to put the ascii
character instead, for example:
s = replace(s,chr(60),">")

>
> Does anybody know how I can trap and replace/remove these chars if
> need be?
>
> Thanks
>
>
>
>

HTH

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Re: Removing obscure chars

am 04.04.2007 18:02:00 von Daniel Crichton

Yobbo wrote on Tue, 3 Apr 2007 18:17:59 +0100:

> Hi All
>
> I have an ASP function in place to strip invalid chars out of a data store
> before I create an XML file of this data, but my function doesn't work on
> a certain set of chars.
>
> As far as I can see these are the following:
>
> a) trademark char
> b) long hyphen/dash char
> c) smart/curly quotes (both left and right)
>
> Even though my function is set up as follows:
>
> Function ReFormatStringForXML(s)
> IF LEN(s) > 0 AND NOT IsNull(s) THEN
> s = Replace(s,"™","™")
> s = Replace(s,"—","-")
> s = Replace(s,"’",""")
> s = Replace(s,"'",""")
> s = Replace(s,"""",""")
> s = Replace(s,"&","&")
> s = Replace(s,"<","<")
> s = Replace(s,">",">")
> END IF
> ReFormatStringForXML = s
> End Function
>
> These chars still pass by and foul up my XML file.
>
> I have a feeling that its down to the fact that my function is looking for
> the html equiv rather than the actual char, but I can't possibly get away
> with simply copy and pasting these friggin(!!) chars into my function.
> Surely this is bad practise?
>
> Does anybody know how I can trap and replace/remove these chars if need
> be?

Your function is quite limited. What happens when a character not in your
list appears? The XML supported entity list is pretty small.

Here's the function I use in my own XML generation code, it's crude but it works:

function XMLEncode(strText)

'loop through code and replace all non-alphanumeric characters with their
ascii value
strNewText = ""

For i = 1 to Len(strText)

j = Asc(Mid(strText,i,1))

If j = 10 Then
'replace tab with a line break
strNewText= strNewText & "<br>"
ElseIf j = 13 or j = 9 then 'cr, lf, tab
'strip them
ElseIf j = 34 then
strNewText = strNewText & """
ElseIf j = 39 then
strNewText = strNewText & "'"
ElseIf j = 32 or j = 45 or (j >=49 and j <= 57) or (j >=65 and j <= 90) or
(j >= 97 and j <= 122) then
'ok
strNewText = strNewText & Mid(strText,i,1)
ElseIf j = 38 Then '&
strNewText = strNewText & "&"
ElseIf j = 60 then '<
strNewText = strNewText & "<"
ElseIf j = 62 then '>
strNewText = strNewText & ">"
Else
strNewText = strNewText & "&#" & j & ";"
End If

Next

XMLEncode = strNewText
End Function


This checks each character in the string in turn, and replaces some with
entities, and the rest of the non-printable characters with their numeric
value. You could easily add a few more entity replacements as required. Just
watch out for the first couple of replacements where I replace tabs with a

, and strip out carriage returns and line feeds, as that might not fit
what you want do with the XML yourself.

Dan

Re: Removing obscure chars

am 04.04.2007 18:02:00 von Daniel Crichton

Yobbo wrote on Tue, 3 Apr 2007 18:17:59 +0100:

> Hi All
>
> I have an ASP function in place to strip invalid chars out of a data store
> before I create an XML file of this data, but my function doesn't work on
> a certain set of chars.
>
> As far as I can see these are the following:
>
> a) trademark char
> b) long hyphen/dash char
> c) smart/curly quotes (both left and right)
>
> Even though my function is set up as follows:
>
> Function ReFormatStringForXML(s)
> IF LEN(s) > 0 AND NOT IsNull(s) THEN
> s = Replace(s,"™","™")
> s = Replace(s,"—","-")
> s = Replace(s,"’",""")
> s = Replace(s,"'",""")
> s = Replace(s,"""",""")
> s = Replace(s,"&","&")
> s = Replace(s,"<","<")
> s = Replace(s,">",">")
> END IF
> ReFormatStringForXML = s
> End Function
>
> These chars still pass by and foul up my XML file.
>
> I have a feeling that its down to the fact that my function is looking for
> the html equiv rather than the actual char, but I can't possibly get away
> with simply copy and pasting these friggin(!!) chars into my function.
> Surely this is bad practise?
>
> Does anybody know how I can trap and replace/remove these chars if need
> be?

Your function is quite limited. What happens when a character not in your
list appears? The XML supported entity list is pretty small.

Here's the function I use in my own XML generation code, it's crude but it works:

function XMLEncode(strText)

'loop through code and replace all non-alphanumeric characters with their
ascii value
strNewText = ""

For i = 1 to Len(strText)

j = Asc(Mid(strText,i,1))

If j = 10 Then
'replace tab with a line break
strNewText= strNewText & "<br>"
ElseIf j = 13 or j = 9 then 'cr, lf, tab
'strip them
ElseIf j = 34 then
strNewText = strNewText & """
ElseIf j = 39 then
strNewText = strNewText & "'"
ElseIf j = 32 or j = 45 or (j >=49 and j <= 57) or (j >=65 and j <= 90) or
(j >= 97 and j <= 122) then
'ok
strNewText = strNewText & Mid(strText,i,1)
ElseIf j = 38 Then '&
strNewText = strNewText & "&"
ElseIf j = 60 then '<
strNewText = strNewText & "<"
ElseIf j = 62 then '>
strNewText = strNewText & ">"
Else
strNewText = strNewText & "&#" & j & ";"
End If

Next

XMLEncode = strNewText
End Function


This checks each character in the string in turn, and replaces some with
entities, and the rest of the non-printable characters with their numeric
value. You could easily add a few more entity replacements as required. Just
watch out for the first couple of replacements where I replace tabs with a

, and strip out carriage returns and line feeds, as that might not fit
what you want do with the XML yourself.

Dan

Re: Removing obscure chars

am 05.04.2007 09:35:05 von Anthony Jones

"Yobbo" wrote in message
news:ugCfLVhdHHA.2148@TK2MSFTNGP05.phx.gbl...
> Hi All
>
> I have an ASP function in place to strip invalid chars out of a data store
> before I create an XML file of this data, but my function doesn't work on
a
> certain set of chars.
>
> As far as I can see these are the following:
>
> a) trademark char
> b) long hyphen/dash char
> c) smart/curly quotes (both left and right)
>
> Even though my function is set up as follows:
>
> Function ReFormatStringForXML(s)
> IF LEN(s) > 0 AND NOT IsNull(s) THEN
> s = Replace(s,"™","™")
> s = Replace(s,"—","-")
> s = Replace(s,"’",""")
> s = Replace(s,"'",""")
> s = Replace(s,"""",""")
> s = Replace(s,"&","&")
> s = Replace(s,"<","<")
> s = Replace(s,">",">")
> END IF
> ReFormatStringForXML = s
> End Function
>
> These chars still pass by and foul up my XML file.
>
> I have a feeling that its down to the fact that my function is looking for
> the html equiv rather than the actual char, but I can't possibly get away
> with simply copy and pasting these friggin(!!) chars into my function.
> Surely this is bad practise?
>
> Does anybody know how I can trap and replace/remove these chars if need
be?
>
> Thanks

If you are creating an XML file can you use a DOMDocument to build it and
save it?
That'll ensure correct XML is created.

Re: Removing obscure chars

am 05.04.2007 09:35:05 von Anthony Jones

"Yobbo" wrote in message
news:ugCfLVhdHHA.2148@TK2MSFTNGP05.phx.gbl...
> Hi All
>
> I have an ASP function in place to strip invalid chars out of a data store
> before I create an XML file of this data, but my function doesn't work on
a
> certain set of chars.
>
> As far as I can see these are the following:
>
> a) trademark char
> b) long hyphen/dash char
> c) smart/curly quotes (both left and right)
>
> Even though my function is set up as follows:
>
> Function ReFormatStringForXML(s)
> IF LEN(s) > 0 AND NOT IsNull(s) THEN
> s = Replace(s,"™","™")
> s = Replace(s,"—","-")
> s = Replace(s,"’",""")
> s = Replace(s,"'",""")
> s = Replace(s,"""",""")
> s = Replace(s,"&","&")
> s = Replace(s,"<","<")
> s = Replace(s,">",">")
> END IF
> ReFormatStringForXML = s
> End Function
>
> These chars still pass by and foul up my XML file.
>
> I have a feeling that its down to the fact that my function is looking for
> the html equiv rather than the actual char, but I can't possibly get away
> with simply copy and pasting these friggin(!!) chars into my function.
> Surely this is bad practise?
>
> Does anybody know how I can trap and replace/remove these chars if need
be?
>
> Thanks

If you are creating an XML file can you use a DOMDocument to build it and
save it?
That'll ensure correct XML is created.