Converting funky characters

Converting funky characters

am 29.03.2010 02:05:50 von Skip Evans

Hey all,

What's the best way to filter/convert characters that don't
translate properly from say news stories to HTML?

For example, I have a form that people cut and paste the lead
in paragraph from news stories they want to link to from their
sites to the original. And of course things like long dashes,
double quotes, single quotes, etc, always translate is wacky
unprintables when they are rendered, and the user needs to
edit them to replace them with standard characters.

Is there way to filter this text through a function that will
convert them to web friendly chars?

Thanks,
Skip

--
====================================
Skip Evans
PenguinSites.com, LLC
503 S Baldwin St, #1
Madison WI 53703
608.250.2720
http://penguinsites.com
------------------------------------
Those of you who believe in
telekinesis, raise my hand.
-- Kurt Vonnegut

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Converting funky characters

am 29.03.2010 03:21:33 von Nilesh Govindrajan

On 03/29/2010 05:35 AM, Skip Evans wrote:
> Hey all,
>
> What's the best way to filter/convert characters that don't
> translate properly from say news stories to HTML?
>
> For example, I have a form that people cut and paste the lead
> in paragraph from news stories they want to link to from their
> sites to the original. And of course things like long dashes,
> double quotes, single quotes, etc, always translate is wacky
> unprintables when they are rendered, and the user needs to
> edit them to replace them with standard characters.
>
> Is there way to filter this text through a function that will
> convert them to web friendly chars?
>
> Thanks,
> Skip
>

PCRE is your best friend for such problems.

--
Nilesh Govindarajan
Site & Server Administrator
www.itech7.com
मेरा भारत महान !
मम भारत: महत्तम भवतु !

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Converting funky characters

am 29.03.2010 03:52:02 von solo hsi

i think you just need function urldecode()....

On Mon, Mar 29, 2010 at 8:05 AM, Skip Evans wrote:
> Hey all,
>
> What's the best way to filter/convert characters that don't
> translate properly from say news stories to HTML?
>
> For example, I have a form that people cut and paste the lead
> in paragraph from news stories they want to link to from their
> sites to the original. And of course things like long dashes,
> double quotes, single quotes, etc, always translate is wacky
> unprintables when they are rendered, and the user needs to
> edit them to replace them with standard characters.
>
> Is there way to filter this text through a function that will
> convert them to web friendly chars?
>
> Thanks,
> Skip
>
> --
> ==================== =====
============
> Skip Evans
> PenguinSites.com, LLC
> 503 S Baldwin St, #1
> Madison WI 53703
> 608.250.2720
> http://penguinsites.com
> ------------------------------------
> Those of you who believe in
> telekinesis, raise my hand.
> =A0-- Kurt Vonnegut
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>



--=20
solo(xzyyyy@gmail.com)

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Converting funky characters

am 29.03.2010 03:54:23 von Nilesh Govindrajan

On 03/29/2010 07:22 AM, solo hsi wrote:
> i think you just need function urldecode()....
>
> On Mon, Mar 29, 2010 at 8:05 AM, Skip Evans wrote:
>> Hey all,
>>
>> What's the best way to filter/convert characters that don't
>> translate properly from say news stories to HTML?
>>
>> For example, I have a form that people cut and paste the lead
>> in paragraph from news stories they want to link to from their
>> sites to the original. And of course things like long dashes,
>> double quotes, single quotes, etc, always translate is wacky
>> unprintables when they are rendered, and the user needs to
>> edit them to replace them with standard characters.
>>
>> Is there way to filter this text through a function that will
>> convert them to web friendly chars?
>>
>> Thanks,
>> Skip
>>
>> --
>> ====================================
>> Skip Evans
>> PenguinSites.com, LLC
>> 503 S Baldwin St, #1
>> Madison WI 53703
>> 608.250.2720
>> http://penguinsites.com
>> ------------------------------------
>> Those of you who believe in
>> telekinesis, raise my hand.
>> -- Kurt Vonnegut
>>
>> --
>> PHP General Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>
>
>

No, only urlencode() won't do the job. He's saying more about the long
spaces and quotes, etc.

--
Nilesh Govindarajan
Site & Server Administrator
www.itech7.com
मेरा भारत महान !
मम भारत: महत्तम भवतु !

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Converting funky characters

am 29.03.2010 09:52:44 von Ashley Sheridan

--=-WPh0ABDWTmj2/u6CgEZc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, 2010-03-29 at 07:24 +0530, Nilesh Govindarajan wrote:

> On 03/29/2010 07:22 AM, solo hsi wrote:
> > i think you just need function urldecode()....
> >
> > On Mon, Mar 29, 2010 at 8:05 AM, Skip Evans wr=
ote:
> >> Hey all,
> >>
> >> What's the best way to filter/convert characters that don't
> >> translate properly from say news stories to HTML?
> >>
> >> For example, I have a form that people cut and paste the lead
> >> in paragraph from news stories they want to link to from their
> >> sites to the original. And of course things like long dashes,
> >> double quotes, single quotes, etc, always translate is wacky
> >> unprintables when they are rendered, and the user needs to
> >> edit them to replace them with standard characters.
> >>
> >> Is there way to filter this text through a function that will
> >> convert them to web friendly chars?
> >>
> >> Thanks,
> >> Skip
> >>
> >> --
> >> ==================== ===3D=
=============3D
> >> Skip Evans
> >> PenguinSites.com, LLC
> >> 503 S Baldwin St, #1
> >> Madison WI 53703
> >> 608.250.2720
> >> http://penguinsites.com
> >> ------------------------------------
> >> Those of you who believe in
> >> telekinesis, raise my hand.
> >> -- Kurt Vonnegut
> >>
> >> --
> >> PHP General Mailing List (http://www.php.net/)
> >> To unsubscribe, visit: http://www.php.net/unsub.php
> >>
> >>
> >
> >
> >
>=20
> No, only urlencode() won't do the job. He's saying more about the long=20
> spaces and quotes, etc.
>=20
> --=20
> Nilesh Govindarajan
> Site & Server Administrator
> www.itech7.com
> मेरा भारत=
महान !
> मम भारत: मà=A4=
¹à¤¤à¥à¤¤à¤=AE भवतà=A5=
=81 !
>=20


I wrote something that converts these characters, which you'll most
often find when copying text from MS Office:

http://www.ashleysheridan.co.uk/coding_php_remove_ms_crap.ph p

The second argument to the function just tells it to remove all the
hidden meta tags that MS Office chucks into text that you copy into a
rich text box in a web page, as this can really mess up how the content
is displayed in a web browser.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-WPh0ABDWTmj2/u6CgEZc--

Re: Converting funky characters

am 29.03.2010 10:51:59 von Ashley Sheridan

--=-tXB3+jiomFgRmwgwsrFj
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, 2010-03-29 at 14:25 +0530, Nilesh Govindarajan wrote:

> On 03/29/2010 01:22 PM, Ashley Sheridan wrote:
> > On Mon, 2010-03-29 at 07:24 +0530, Nilesh Govindarajan wrote:
> >> On 03/29/2010 07:22 AM, solo hsi wrote:
> >> > i think you just need function urldecode()....
> >> >
> >> > On Mon, Mar 29, 2010 at 8:05 AM, Skip Evans > wrote:
> >> >> Hey all,
> >> >>
> >> >> What's the best way to filter/convert characters that don't
> >> >> translate properly from say news stories to HTML?
> >> >>
> >> >> For example, I have a form that people cut and paste the lead
> >> >> in paragraph from news stories they want to link to from their
> >> >> sites to the original. And of course things like long dashes,
> >> >> double quotes, single quotes, etc, always translate is wacky
> >> >> unprintables when they are rendered, and the user needs to
> >> >> edit them to replace them with standard characters.
> >> >>
> >> >> Is there way to filter this text through a function that will
> >> >> convert them to web friendly chars?
> >> >>
> >> >> Thanks,
> >> >> Skip
> >> >>
> >> >> --
> >> >> ==================== ===
==============
> >> >> Skip Evans
> >> >> PenguinSites.com, LLC
> >> >> 503 S Baldwin St, #1
> >> >> Madison WI 53703
> >> >> 608.250.2720
> >> >> http://penguinsites.com
> >> >> ------------------------------------
> >> >> Those of you who believe in
> >> >> telekinesis, raise my hand.
> >> >> -- Kurt Vonnegut
> >> >>
> >> >> --
> >> >> PHP General Mailing List (http://www.php.net/)
> >> >> To unsubscribe, visit:http://www.php.net/unsub.php
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >>
> >> No, only urlencode() won't do the job. He's saying more about the long
> >> spaces and quotes, etc.
> >>
> >> --
> >> Nilesh Govindarajan
> >> Site& Server Administrator
> >> www.itech7.com
> >> मेरा भारà=
¤¤ महान !
> >> मम भारत:मà=
¤¹à¤¤à¥à¤¤à¤® भवतà=
¥ !
> >>
> >
> > I wrote something that converts these characters, which you'll most
> > often find when copying text from MS Office:
> >
> > http://www.ashleysheridan.co.uk/coding_php_remove_ms_crap.ph p
> >
> > The second argument to the function just tells it to remove all the
> > hidden meta tags that MS Office chucks into text that you copy into a
> > rich text box in a web page, as this can really mess up how the content
> > is displayed in a web browser.
> >
> > Thanks,
> > Ash
> > http://www.ashleysheridan.co.uk
> >
> >
>=20
> Nice one. But not of use to me. I use geshi on my site, so cannot use=20
> WYSIWYG editors. Instead I've BBCode and WikiCreole (PEAR Wiki) input=20
> formats.
>=20
> --=20
> Nilesh Govindarajan
> Site & Server Administrator
> www.itech7.com
> मेरा भारत=
महान !
> मम भारत: मà=A4=
¹à¤¤à¥à¤¤à¤=AE भवतà=A5=
=81 !
>=20


It's not just for wysiwyg editors, that's just the second part of the
function. The first does just what you asked for.

Thanks,
Ash
http://www.ashleysheridan.co.uk



--=-tXB3+jiomFgRmwgwsrFj--

Re: Converting funky characters

am 29.03.2010 10:55:20 von Nilesh Govindrajan

On 03/29/2010 01:22 PM, Ashley Sheridan wrote:
> On Mon, 2010-03-29 at 07:24 +0530, Nilesh Govindarajan wrote:
>> On 03/29/2010 07:22 AM, solo hsi wrote:
>> > i think you just need function urldecode()....
>> >
>> > On Mon, Mar 29, 2010 at 8:05 AM, Skip Evans> wrote:
>> >> Hey all,
>> >>
>> >> What's the best way to filter/convert characters that don't
>> >> translate properly from say news stories to HTML?
>> >>
>> >> For example, I have a form that people cut and paste the lead
>> >> in paragraph from news stories they want to link to from their
>> >> sites to the original. And of course things like long dashes,
>> >> double quotes, single quotes, etc, always translate is wacky
>> >> unprintables when they are rendered, and the user needs to
>> >> edit them to replace them with standard characters.
>> >>
>> >> Is there way to filter this text through a function that will
>> >> convert them to web friendly chars?
>> >>
>> >> Thanks,
>> >> Skip
>> >>
>> >> --
>> >> ====================================
>> >> Skip Evans
>> >> PenguinSites.com, LLC
>> >> 503 S Baldwin St, #1
>> >> Madison WI 53703
>> >> 608.250.2720
>> >> http://penguinsites.com
>> >> ------------------------------------
>> >> Those of you who believe in
>> >> telekinesis, raise my hand.
>> >> -- Kurt Vonnegut
>> >>
>> >> --
>> >> PHP General Mailing List (http://www.php.net/)
>> >> To unsubscribe, visit:http://www.php.net/unsub.php
>> >>
>> >>
>> >
>> >
>> >
>>
>> No, only urlencode() won't do the job. He's saying more about the long
>> spaces and quotes, etc.
>>
>> --
>> Nilesh Govindarajan
>> Site& Server Administrator
>> www.itech7.com
>> मेरा भारत महान !
>> मम भारत:महत्तम भवतु !
>>
>
> I wrote something that converts these characters, which you'll most
> often find when copying text from MS Office:
>
> http://www.ashleysheridan.co.uk/coding_php_remove_ms_crap.ph p
>
> The second argument to the function just tells it to remove all the
> hidden meta tags that MS Office chucks into text that you copy into a
> rich text box in a web page, as this can really mess up how the content
> is displayed in a web browser.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

Nice one. But not of use to me. I use geshi on my site, so cannot use
WYSIWYG editors. Instead I've BBCode and WikiCreole (PEAR Wiki) input
formats.

--
Nilesh Govindarajan
Site & Server Administrator
www.itech7.com
मेरा भारत महान !
मम भारत: महत्तम भवतु !

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Converting funky characters

am 29.03.2010 10:59:21 von Nilesh Govindrajan

On 03/29/2010 02:21 PM, Ashley Sheridan wrote:
> On Mon, 2010-03-29 at 14:25 +0530, Nilesh Govindarajan wrote:
>> On 03/29/2010 01:22 PM, Ashley Sheridan wrote:
>> > On Mon, 2010-03-29 at 07:24 +0530, Nilesh Govindarajan wrote:
>> >> On 03/29/2010 07:22 AM, solo hsi wrote:
>> >> > i think you just need function urldecode()....
>> >> >
>> >> > On Mon, Mar 29, 2010 at 8:05 AM, Skip Evans > wrote:
>> >> >> Hey all,
>> >> >>
>> >> >> What's the best way to filter/convert characters that don't
>> >> >> translate properly from say news stories to HTML?
>> >> >>
>> >> >> For example, I have a form that people cut and paste the lead
>> >> >> in paragraph from news stories they want to link to from their
>> >> >> sites to the original. And of course things like long dashes,
>> >> >> double quotes, single quotes, etc, always translate is wacky
>> >> >> unprintables when they are rendered, and the user needs to
>> >> >> edit them to replace them with standard characters.
>> >> >>
>> >> >> Is there way to filter this text through a function that will
>> >> >> convert them to web friendly chars?
>> >> >>
>> >> >> Thanks,
>> >> >> Skip
>> >> >>
>> >> >> --
>> >> >> ====================================
>> >> >> Skip Evans
>> >> >> PenguinSites.com, LLC
>> >> >> 503 S Baldwin St, #1
>> >> >> Madison WI 53703
>> >> >> 608.250.2720
>> >> >> http://penguinsites.com
>> >> >> ------------------------------------
>> >> >> Those of you who believe in
>> >> >> telekinesis, raise my hand.
>> >> >> -- Kurt Vonnegut
>> >> >>
>> >> >> --
>> >> >> PHP General Mailing List (http://www.php.net/)
>> >> >> To unsubscribe, visit:http://www.php.net/unsub.php
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >>
>> >> No, only urlencode() won't do the job. He's saying more about the long
>> >> spaces and quotes, etc.
>> >>
>> >> --
>> >> Nilesh Govindarajan
>> >> Site& Server Administrator
>> >> www.itech7.com
>> >> मेरा भारत महान !
>> >> मम भारत:महत्तम भवतु !
>> >>
>> >
>> > I wrote something that converts these characters, which you'll most
>> > often find when copying text from MS Office:
>> >
>> > http://www.ashleysheridan.co.uk/coding_php_remove_ms_crap.ph p
>> >
>> > The second argument to the function just tells it to remove all the
>> > hidden meta tags that MS Office chucks into text that you copy into a
>> > rich text box in a web page, as this can really mess up how the content
>> > is displayed in a web browser.
>> >
>> > Thanks,
>> > Ash
>> > http://www.ashleysheridan.co.uk
>> >
>> >
>>
>> Nice one. But not of use to me. I use geshi on my site, so cannot use
>> WYSIWYG editors. Instead I've BBCode and WikiCreole (PEAR Wiki) input
>> formats.
>>
>> --
>> Nilesh Govindarajan
>> Site& Server Administrator
>> www.itech7.com
>> मेरा भारत महान !
>> मम भारत:महत्तम भवतु !
>>
>
> It's not just for wysiwyg editors, that's just the second part of the
> function. The first does just what you asked for.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

I didn't ask anything in this topic. It was Skip Evans the thread starter.

--
Nilesh Govindarajan
Site & Server Administrator
www.itech7.com
मेरा भारत महान !
मम भारत: महत्तम भवतु !

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Converting funky characters

am 29.03.2010 14:33:01 von Al

On 3/28/2010 8:05 PM, Skip Evans wrote:
> Hey all,
>
> What's the best way to filter/convert characters that don't
> translate properly from say news stories to HTML?
>
> For example, I have a form that people cut and paste the lead
> in paragraph from news stories they want to link to from their
> sites to the original. And of course things like long dashes,
> double quotes, single quotes, etc, always translate is wacky
> unprintables when they are rendered, and the user needs to
> edit them to replace them with standard characters.
>
> Is there way to filter this text through a function that will
> convert them to web friendly chars?
>
> Thanks,
> Skip
>

Here's how I handle the problem:

//region***** Translate table for dumb Windows chars when user pastes from Word;
function strips all >160

$win1252ToPlainTextArray = array(
chr(130) => ',',
chr(131) => '',
chr(132) => ',,',
chr(133) => '...',
chr(134) => '+',
chr(135) => '',
chr(139) => '<',
chr(145) => '\'',
chr(146) => '\'',
chr(147) => '"',
chr(148) => '"',
chr(149) => '*',
chr(150) => '-',
chr(151) => '-',
chr(155) => '>',
chr(160) => ' ',
);
//endregion

function cleanWin1252Text($str, $win1252ToPlainTextArray)
{
$str = strtr($str, $win1252ToPlainTextArray);
$str = trim($str);
$patterns = array('%[\x7F-\x81]%', '%[\x83]%', '%[\x87-\x8A]%',
'%[\x8C-\x90]%', '%[\x98-\xff]%');

return preg_replace($patterns, '', $str); //Strip
}




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php