Windows 1252 to iso-8859-1 without iconv or recode?

Windows 1252 to iso-8859-1 without iconv or recode?

am 23.04.2008 23:29:16 von dutone

I have some text files that were saved in Windows as ASCII which,
unfortunately, causes the text file to contain non-control chars in
the range that iso-8859-1 defines control chars.

iconv and recode do not convert or drop these 1252 codes (145,146, and
147) to the appropriate iso-8859-1 equivalents and instead give me
garbage.

Is there a utility that I can use to convert the chars appropriately?

Re: Windows 1252 to iso-8859-1 without iconv or recode?

am 23.04.2008 23:43:05 von dutone

On Apr 23, 2:29 pm, dutone wrote:
> I have some text files that were saved in Windows as ASCII which,
> unfortunately, causes the text file to contain non-control chars in
> the range that iso-8859-1 defines control chars.
>
> iconv and recode do not convert or drop these 1252 codes (145,146, and
> 147) to the appropriate iso-8859-1 equivalents and instead give me
> garbage.
>
> Is there a utility that I can use to convert the chars appropriately?

Note that I can do this via Perl or Sed via perl -pe"s/\x92/'/g"

But was wondering if there was an existing util and/or why iconv and
recode don't convert when possible.

Re: Windows 1252 to iso-8859-1 without iconv or recode?

am 23.04.2008 23:47:25 von Lew Pitcher

In comp.unix.shell, dutone wrote:

> I have some text files that were saved in Windows as ASCII which,
> unfortunately, causes the text file to contain non-control chars in
> the range that iso-8859-1 defines control chars.

That would be impossible to do with /ASCII/. I'm sure that you mean that you
saved the text files in the CP1252 characterset (/not/ the ASCII
characterset), and are having problems converting from CP1252 to ISO-8859-1

> iconv and recode do not convert or drop these 1252 codes (145,146, and
> 147)

Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
the character value exceeds 127, then you /don't/ have ASCII

> to the appropriate iso-8859-1 equivalents and instead give me
> garbage.
>
> Is there a utility that I can use to convert the chars appropriately?

In CP1252,
character 145 is LEFT SINGLE QUOTATION MARK,
character 146 is RIGHT SINGLE QUOTATION MARK, and
character 147 is LEFT DOUBLE QUOTATION MARK
(courtesy of the ISO Internationalization working group's characterset map
at http://anubis.dkuug.dk/i18n/charmaps/CP1252 )

In ISO-8895-1 (http://anubis.dkuug.dk/i18n/charmaps/ISO_8859-1) there
doesn't seem to be a corresponding character (codepoint) for any of those
three characters. By rights, they all should map to the 0x1a (SUB)
character.

I know of no utility save iconv that would convert these for you. Perhaps
you can convert in two stages: CP1252 to Unicode, and Unicode to
ISO-8895-1.

Luck be with you
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Re: Windows 1252 to iso-8859-1 without iconv or recode?

am 24.04.2008 01:25:39 von dutone

On Apr 23, 2:47 pm, Lew Pitcher wrote:
> In comp.unix.shell, dutone wrote:
> > I have some text files that were saved in Windows as ASCII which,
> > unfortunately, causes the text file to contain non-control chars in
> > the range that iso-8859-1 defines control chars.
>
> That would be impossible to do with /ASCII/. I'm sure that you mean that you
> saved the text files in the CP1252 characterset (/not/ the ASCII
> characterset), and are having problems converting from CP1252 to ISO-8859-1

I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
it as iso-8859-1, rather 1252.

> > iconv and recode do not convert or drop these 1252 codes (145,146, and
> > 147)
>
> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
> the character value exceeds 127, then you /don't/ have ASCII

I would expect a Windows-1252 to iso-8859-1 conversion to replace
145,146 with 39 and ,147 with 34.

Guess I'm sticking with Perl for the conversion.

Thanks.

Re: Windows 1252 to iso-8859-1 without iconv or recode?

am 24.04.2008 01:55:50 von Gary Johnson

dutone wrote:
> On Apr 23, 2:47 pm, Lew Pitcher wrote:
>> In comp.unix.shell, dutone wrote:
>> > I have some text files that were saved in Windows as ASCII which,
>> > unfortunately, causes the text file to contain non-control chars in
>> > the range that iso-8859-1 defines control chars.
>>
>> That would be impossible to do with /ASCII/. I'm sure that you mean that you
>> saved the text files in the CP1252 characterset (/not/ the ASCII
>> characterset), and are having problems converting from CP1252 to ISO-8859-1
>
> I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
> it as iso-8859-1, rather 1252.
>
>> > iconv and recode do not convert or drop these 1252 codes (145,146, and
>> > 147)
>>
>> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
>> the character value exceeds 127, then you /don't/ have ASCII
>
> I would expect a Windows-1252 to iso-8859-1 conversion to replace
> 145,146 with 39 and ,147 with 34.
>
> Guess I'm sticking with Perl for the conversion.

You can use iconv for this, but you have to add the //TRANSLIT suffix,
like this:

iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT

That tells iconv to choose a symbol from the output character set that
is close to the desired symbol.

--
Gary Johnson

Re: Windows 1252 to iso-8859-1 without iconv or recode?

am 24.04.2008 04:07:29 von dutone

On Apr 23, 4:55 pm, Gary Johnson wrote:
> dutone wrote:
> > On Apr 23, 2:47 pm, Lew Pitcher wrote:
> >> In comp.unix.shell, dutone wrote:
> >> > I have some text files that were saved in Windows as ASCII which,
> >> > unfortunately, causes the text file to contain non-control chars in
> >> > the range that iso-8859-1 defines control chars.
>
> >> That would be impossible to do with /ASCII/. I'm sure that you mean that you
> >> saved the text files in the CP1252 characterset (/not/ the ASCII
> >> characterset), and are having problems converting from CP1252 to ISO-8859-1
>
> > I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
> > it as iso-8859-1, rather 1252.
>
> >> > iconv and recode do not convert or drop these 1252 codes (145,146, and
> >> > 147)
>
> >> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
> >> the character value exceeds 127, then you /don't/ have ASCII
>
> > I would expect a Windows-1252 to iso-8859-1 conversion to replace
> > 145,146 with 39 and ,147 with 34.
>
> > Guess I'm sticking with Perl for the conversion.
>
> You can use iconv for this, but you have to add the //TRANSLIT suffix,
> like this:
>
> iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT

Oh, cool. They should mention that suffix in GNU's iconv man page.

Thanks