Stripping MS Word code from my forms once and for all.
Stripping MS Word code from my forms once and for all.
am 15.09.2007 18:43:04 von FFMG
Hi,
I have a form that allows users to comment, add entries and so on.
But what a lot of them do is copy and paste directly from MS Word to my
forms.
almost all browsers will accept the post and give the impression that
everything is saved properly.
But, that is not the case when it comes time to displaying the message
in my page.
So how can I strip/replace all the MS Word invalid code from my
$_POSTs?
Thanks
FFMG
--
'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------ ------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318
Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).
Re: Stripping MS Word code from my forms once and for all.
am 16.09.2007 02:48:03 von Macca
I found this on php.net at http://uk2.php.net/strtr which may be of
some help:
After battling with strtr trying to strip out MS word formatting from
things pasted into forms I ended up coming up with this..
it strips ALL non-standard ascii characters, preserving html codes and
such, but gets rid of all the characters that refuse to show in
firefox.
If you look at this page in firefox you will see a ton of "question
mark" characters and so it is not possible to copy and paste those to
remove them from strings.. (this fixes that issue nicely, though I
admit it could be done a bit better)
function fixoutput($str){
$good[] = 9; #tab
$good[] = 10; #nl
$good[] = 13; #cr
for($a=32;$a<127;$a++){
$good[] = $a;
}
$len = strlen($str);
for($b=0;$b < $len+1; $b++){
if(in_array(ord($str[$b]), $good)){
$newstr .= $str[$b];
}//fi
}//rof
return $newstr;
}
?>
Re: Stripping MS Word code from my forms once and for all.
am 16.09.2007 04:48:01 von Bucky Kaufman
FFMG wrote:
> So how can I strip/replace all the MS Word invalid code from my
> $_POSTs?
I presume you're referring to all the MS Office XML markup.
That's actually good stuff, sometimes.
What you need to do is read the document as an XML file, then all the MS
crap will make sense... and more importantly, be easily stripped away.
Before you strip it away though, you might want to go through it because
you might find that some of the document properties are useful to your
application.
Re: Stripping MS Word code from my forms once and for all.
am 16.09.2007 16:19:52 von FFMG
Sanders Kaufman;92056 Wrote:
> FFMG wrote:
>
> > So how can I strip/replace all the MS Word invalid code from my
> > $_POSTs?
>
> I presume you're referring to all the MS Office XML markup.
> That's actually good stuff, sometimes.
>
No, sorry I was actually talking about some non standard characters
that MS Words inserts.
Some bowser will, (maybe wrongly), not display any invalid characters
in the textarea itself giving the user the impression that everything
is fine.
But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.
FFMG
--
'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------ ------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318
Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).
Re: Stripping MS Word code from my forms once and for all.
am 17.09.2007 14:10:27 von Bucky Kaufman
FFMG wrote:
> Sanders Kaufman;92056 Wrote:
>> FFMG wrote:
>>
>>> So how can I strip/replace all the MS Word invalid code from my
>>> $_POSTs?
>> I presume you're referring to all the MS Office XML markup.
>> That's actually good stuff, sometimes.
>>
>
> No, sorry I was actually talking about some non standard characters
> that MS Words inserts.
>
> Some bowser will, (maybe wrongly), not display any invalid characters
> in the textarea itself giving the user the impression that everything
> is fine.
>
> But when I then try to display the comment/entry I get a bunch of
> questions marks for the characters that were invalid.
Ah, so. You're having a character set problem.
Rather than have a big old off-topic thread about it here, you should
probably take the question to an Office or HTML group.
PHP won't help you much.
Re: Stripping MS Word code from my forms once and for all.
am 17.09.2007 15:09:02 von FFMG
Sanders Kaufman;92237 Wrote:
>
> > No, sorry I was actually talking about some non standard characters
> > that MS Words inserts.
> >
> > Some bowser will, (maybe wrongly), not display any invalid
> characters
> > in the textarea itself giving the user the impression that
> everything
> > is fine.
> >
> > But when I then try to display the comment/entry I get a bunch of
> > questions marks for the characters that were invalid.[/color]
>
> Ah, so. You're having a character set problem.
> Rather than have a big old off-topic thread about it here, you should
> probably take the question to an Office or HTML group.
> PHP won't help you much.
No I am not, read the question again, carefully this time.
Textareas of most browsers will, (wrongly), accept MS Word pasted
code.
By the time it gets to my server I have to clean it up.
My PHP code must handle it.
Is that on topic enough for you?
FFMG
--
'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------ ------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318
Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).
Re: Stripping MS Word code from my forms once and for all.
am 18.09.2007 02:51:34 von Jerry Stuckle
FFMG wrote:
> Sanders Kaufman;92237 Wrote:
>>> No, sorry I was actually talking about some non standard characters
>>> that MS Words inserts.
>>>
>>> Some bowser will, (maybe wrongly), not display any invalid
>> characters
>>> in the textarea itself giving the user the impression that
>> everything
>>> is fine.
>>>
>>> But when I then try to display the comment/entry I get a bunch of
>>> questions marks for the characters that were invalid.[/color]
>> Ah, so. You're having a character set problem.
>> Rather than have a big old off-topic thread about it here, you should
>> probably take the question to an Office or HTML group.
>> PHP won't help you much.
>
> No I am not, read the question again, carefully this time.
> Textareas of most browsers will, (wrongly), accept MS Word pasted
> code.
>
> By the time it gets to my server I have to clean it up.
> My PHP code must handle it.
>
> Is that on topic enough for you?
>
> FFMG
>
>
Yes, this has been asked before - but I don't remember what the answer was.
The easiest way would be to check for non-alphanumeric chars using a
regex. If you find any, tell the user to use plain text editor.
You could use a regex to strip non-alphanumeric characters, but this
might have some problems. For instance, what happens if you have a
control sequence which happens to contain a character - i.e. 0x010231?
The 0x42 would be taken as the character '1', even though it's part of a
control sequence. But you could clean it up fairly well this way.
Try googling this newsgroup for something like "MS WORD". It's been a
few months.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Re: Stripping MS Word code from my forms once and for all.
am 18.09.2007 04:46:16 von Bucky Kaufman
FFMG wrote:
> Sanders Kaufman;92237 Wrote:
>> Ah, so. You're having a character set problem.
>> Rather than have a big old off-topic thread about it here, you should
>> probably take the question to an Office or HTML group.
>> PHP won't help you much.
>
> No I am not, read the question again, carefully this time.
> Textareas of most browsers will, (wrongly), accept MS Word pasted
> code.
There is nothing in the HTML specification requiring HTML to reject MS
Word, Open Office, or any other format. That would be a bug, not a feature.
> By the time it gets to my server I have to clean it up.
> My PHP code must handle it.
>
> Is that on topic enough for you?
No, and it won't likely be topic(al) enough for most of the other folks
here in the PHP group, either.
While you are indeed trying to process the data through PHP, you appear
to be perfectly capable of programming in PHP, and thus need very little
help with PHP.
Instead, you need to identify the correct character set to use in
interpreting the Office document, and to apply that character set to the
data retrieved through the HTML FORM element.
That means that the help you need is with Office and HTML, not PHP.
Re: Stripping MS Word code from my forms once and for all.
am 18.09.2007 13:04:50 von FFMG
Sanders Kaufman;92371 Wrote:
> FFMG wrote:
> > Sanders Kaufman;92237 Wrote:
>
> >> Ah, so. You're having a character set problem.
> >> Rather than have a big old off-topic thread about it here, you
> should
> >> probably take the question to an Office or HTML group.
> >> PHP won't help you much.
> >
> > No I am not, read the question again, carefully this time.
> > Textareas of most browsers will, (wrongly), accept MS Word pasted
> > code.
>
> There is nothing in the HTML specification requiring HTML to reject MS
> Word, Open Office, or any other format. That would be a bug, not a
> feature.
>
Great, one more reason to strip MS Word characters.
Sanders Kaufman;92371 Wrote:
>
>
> > By the time it gets to my server I have to clean it up.
> > My PHP code must handle it.
> >
> > Is that on topic enough for you?
>
> No, and it won't likely be topic(al) enough for most of the other
> folks
> here in the PHP group, either.
>
> While you are indeed trying to process the data through PHP, you
> appear
> to be perfectly capable of programming in PHP, and thus need very
> little
> help with PHP.
>
> Instead, you need to identify the correct character set to use in
> interpreting the Office document, and to apply that character set to
> the
> data retrieved through the HTML FORM element.
>
> That means that the help you need is with Office and HTML, not PHP.
Well, I tend to disagree.
Because I am trying to process data in PHP I think that asking fellow
programmers on the PHP group for input is not as off-topic as you
think.
Is your suggestion to convert to an MS Office charset, (even if the
user did not use MS Word), and then convert it back as needed?
Would stripping the MS chars not be faster/better?
FFMG
--
'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------ ------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318
Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).
Re: Stripping MS Word code from my forms once and for all.
am 18.09.2007 15:00:57 von Bucky Kaufman
FFMG wrote:
> Sanders Kaufman;92371 Wrote:
>> That means that the help you need is with Office and HTML, not PHP.
>
> Well, I tend to disagree.
> Because I am trying to process data in PHP I think that asking fellow
> programmers on the PHP group for input is not as off-topic as you
> think.
How's that workin' out for ya, champ?
Have you noticed the roar of silence in response to your original request?
Seriously - you'll get a better response in an HTML or MS Office group.
> Is your suggestion to convert to an MS Office charset, (even if the
> user did not use MS Word), and then convert it back as needed?
> Would stripping the MS chars not be faster/better?
There are no such things as "MS characters" or an MS Office Character Set.
Re: Stripping MS Word code from my forms once and for all.
am 22.09.2007 16:00:44 von FFMG
Sanders Kaufman;92428 Wrote:
> FFMG wrote:
> > Sanders Kaufman;92371 Wrote:
>
> >> That means that the help you need is with Office and HTML, not PHP.
> >
> > Well, I tend to disagree.
> > Because I am trying to process data in PHP I think that asking
> fellow
> > programmers on the PHP group for input is not as off-topic as you
> > think.
>
> How's that workin' out for ya, champ?
> ...
>
Read the thread, the answer was given.
I see you could not answer the question so you have to start using
abusive language.
Shame.
FFMG
--
'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------ ------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318
Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).