Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

bind-address mysql multiple, sanibleone xxxx, ftp://192.168.100.100/, www.xxxcon, which comes first ob_start or session, wwwxxx/58/2010, xxxxdup, xxxxdup, mailx informatii, should producers of software-based services, such as atms, be held liable for economic injuries suffered when their systems fail?

Links

XODOX
Impressum

#1: Windows 1252 to iso-8859-1 without iconv or recode?

Posted on 2008-04-23 23:29:16 by dutone

I have some text files that were saved in Windows as ASCII which,
unfortunately, causes the text file to contain non-control chars in
the range that iso-8859-1 defines control chars.

iconv and recode do not convert or drop these 1252 codes (145,146, and
147) to the appropriate iso-8859-1 equivalents and instead give me
garbage.

Is there a utility that I can use to convert the chars appropriately?

Report this message

#2: Re: Windows 1252 to iso-8859-1 without iconv or recode?

Posted on 2008-04-23 23:43:05 by dutone

On Apr 23, 2:29 pm, dutone <dut...@hotmail.com> wrote:
> I have some text files that were saved in Windows as ASCII which,
> unfortunately, causes the text file to contain non-control chars in
> the range that iso-8859-1 defines control chars.
>
> iconv and recode do not convert or drop these 1252 codes (145,146, and
> 147) to the appropriate iso-8859-1 equivalents and instead give me
> garbage.
>
> Is there a utility that I can use to convert the chars appropriately?

Note that I can do this via Perl or Sed via perl -pe"s/\x92/'/g"

But was wondering if there was an existing util and/or why iconv and
recode don't convert when possible.

Report this message

#3: Re: Windows 1252 to iso-8859-1 without iconv or recode?

Posted on 2008-04-23 23:47:25 by Lew Pitcher

In comp.unix.shell, dutone wrote:

> I have some text files that were saved in Windows as ASCII which,
> unfortunately, causes the text file to contain non-control chars in
> the range that iso-8859-1 defines control chars.

That would be impossible to do with /ASCII/. I'm sure that you mean that you
saved the text files in the CP1252 characterset (/not/ the ASCII
characterset), and are having problems converting from CP1252 to ISO-8859-1

> iconv and recode do not convert or drop these 1252 codes (145,146, and
> 147)

Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
the character value exceeds 127, then you /don't/ have ASCII

> to the appropriate iso-8859-1 equivalents and instead give me
> garbage.
>
> Is there a utility that I can use to convert the chars appropriately?

In CP1252,
character 145 is LEFT SINGLE QUOTATION MARK,
character 146 is RIGHT SINGLE QUOTATION MARK, and
character 147 is LEFT DOUBLE QUOTATION MARK
(courtesy of the ISO Internationalization working group's characterset map
at http://anubis.dkuug.dk/i18n/charmaps/CP1252 )

In ISO-8895-1 (http://anubis.dkuug.dk/i18n/charmaps/ISO_8859-1) there
doesn't seem to be a corresponding character (codepoint) for any of those
three characters. By rights, they all should map to the 0x1a (SUB)
character.

I know of no utility save iconv that would convert these for you. Perhaps
you can convert in two stages: CP1252 to Unicode, and Unicode to
ISO-8895-1.

Luck be with you
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Report this message

#4: Re: Windows 1252 to iso-8859-1 without iconv or recode?

Posted on 2008-04-24 01:25:39 by dutone

On Apr 23, 2:47 pm, Lew Pitcher <lpitc...@teksavvy.com> wrote:
> In comp.unix.shell, dutone wrote:
> > I have some text files that were saved in Windows as ASCII which,
> > unfortunately, causes the text file to contain non-control chars in
> > the range that iso-8859-1 defines control chars.
>
> That would be impossible to do with /ASCII/. I'm sure that you mean that you
> saved the text files in the CP1252 characterset (/not/ the ASCII
> characterset), and are having problems converting from CP1252 to ISO-8859-1

I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
it as iso-8859-1, rather 1252.

> > iconv and recode do not convert or drop these 1252 codes (145,146, and
> > 147)
>
> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
> the character value exceeds 127, then you /don't/ have ASCII

I would expect a Windows-1252 to iso-8859-1 conversion to replace
145,146 with 39 and ,147 with 34.

Guess I'm sticking with Perl for the conversion.

Thanks.

Report this message

#5: Re: Windows 1252 to iso-8859-1 without iconv or recode?

Posted on 2008-04-24 01:55:50 by Gary Johnson

dutone <dutone@hotmail.com> wrote:
> On Apr 23, 2:47 pm, Lew Pitcher <lpitc...@teksavvy.com> wrote:
>> In comp.unix.shell, dutone wrote:
>> > I have some text files that were saved in Windows as ASCII which,
>> > unfortunately, causes the text file to contain non-control chars in
>> > the range that iso-8859-1 defines control chars.
>>
>> That would be impossible to do with /ASCII/. I'm sure that you mean that you
>> saved the text files in the CP1252 characterset (/not/ the ASCII
>> characterset), and are having problems converting from CP1252 to ISO-8859-1
>
> I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
> it as iso-8859-1, rather 1252.
>
>> > iconv and recode do not convert or drop these 1252 codes (145,146, and
>> > 147)
>>
>> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
>> the character value exceeds 127, then you /don't/ have ASCII
>
> I would expect a Windows-1252 to iso-8859-1 conversion to replace
> 145,146 with 39 and ,147 with 34.
>
> Guess I'm sticking with Perl for the conversion.

You can use iconv for this, but you have to add the //TRANSLIT suffix,
like this:

iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT

That tells iconv to choose a symbol from the output character set that
is close to the desired symbol.

--
Gary Johnson

Report this message

#6: Re: Windows 1252 to iso-8859-1 without iconv or recode?

Posted on 2008-04-24 04:07:29 by dutone

On Apr 23, 4:55 pm, Gary Johnson <garyj...@eskimo.com> wrote:
> dutone <dut...@hotmail.com> wrote:
> > On Apr 23, 2:47 pm, Lew Pitcher <lpitc...@teksavvy.com> wrote:
> >> In comp.unix.shell, dutone wrote:
> >> > I have some text files that were saved in Windows as ASCII which,
> >> > unfortunately, causes the text file to contain non-control chars in
> >> > the range that iso-8859-1 defines control chars.
>
> >> That would be impossible to do with /ASCII/. I'm sure that you mean that you
> >> saved the text files in the CP1252 characterset (/not/ the ASCII
> >> characterset), and are having problems converting from CP1252 to ISO-8859-1
>
> > I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
> > it as iso-8859-1, rather 1252.
>
> >> > iconv and recode do not convert or drop these 1252 codes (145,146, and
> >> > 147)
>
> >> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
> >> the character value exceeds 127, then you /don't/ have ASCII
>
> > I would expect a Windows-1252 to iso-8859-1 conversion to replace
> > 145,146 with 39 and ,147 with 34.
>
> > Guess I'm sticking with Perl for the conversion.
>
> You can use iconv for this, but you have to add the //TRANSLIT suffix,
> like this:
>
> iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT

Oh, cool. They should mention that suffix in GNU's iconv man page.

Thanks

Report this message