strange Dos metacharacters

strange Dos metacharacters

am 03.04.2008 04:28:02 von simonp

I have a text file of .srt subtitles downloaded in dos fornat,
and want to convert to unix text style. I have no problems
removing the ^M character with sed (using ctrl-V ctrl-M).

But every other character in the file is this "^@" :

^@m^@a^@ ^@p^@i^@c^@c^@o^@l^@o^@ ^@t^@a^@g^@l^@i^@o^@.^@^@

I can't seem to produce this control sequence on the keyboard.
Does anyone know what it is?

Cheers,
Simon

--
Spectral Horse Poems
ww.spectralhorse.com
Coins in the Void

Re: strange Dos metacharacters

am 03.04.2008 05:04:27 von James Michael Fultz

* simonp@nospam.com :
> I have a text file of .srt subtitles downloaded in dos fornat,
> and want to convert to unix text style. I have no problems
> removing the ^M character with sed (using ctrl-V ctrl-M).
>
> But every other character in the file is this "^@" :
>
> ^@m^@a^@ ^@p^@i^@c^@c^@o^@l^@o^@ ^@t^@a^@g^@l^@i^@o^@.^@^@
>
> I can't seem to produce this control sequence on the keyboard.
> Does anyone know what it is?
>
> Cheers,
> Simon

Looks as if you have a UTF-16 encoded file as opposed to ASCII, Latin1,
or UTF-8. You'll need something like iconv to convert it.

--
James Michael Fultz
Remove this part when replying ^^^^^^^^

Re: strange Dos metacharacters

am 03.04.2008 05:26:36 von mop2

On my slackware xterm bash:

$ for i in $(seq 0 255)
> do printf "$i \x`printf %x $i`\n"
> done|cat -v|grep @
0 ^@
64 @
128 M-^@
192 M-@
$
$ echo -e "\x00"|cat -v
^@
$
$ echo -e "\000"|cat -v
^@
$
$ echo -e "m\000a\rc"|cat -v
m^@a^Mc
$ echo -e "m\000a\rc"|tr -d '\000\r'|cat -v
mac
$

Re: strange Dos metacharacters

am 03.04.2008 06:20:25 von simonp

James Michael Fultz wrote:
> * simonp@nospam.com :
>> I have a text file of .srt subtitles downloaded in dos fornat,
>> and want to convert to unix text style. I have no problems
>> removing the ^M character with sed (using ctrl-V ctrl-M).
>>
>> But every other character in the file is this "^@" :
>>
>> ^@m^@a^@ ^@p^@i^@c^@c^@o^@l^@o^@ ^@t^@a^@g^@l^@i^@o^@.^@^@
>>
>> I can't seem to produce this control sequence on the keyboard.
>> Does anyone know what it is?
>>
>> Cheers,
>> Simon
>
> Looks as if you have a UTF-16 encoded file as opposed to ASCII, Latin1,
> or UTF-8. You'll need something like iconv to convert it.
>

Thanks for the tip, that was exactly the problem.

The _file_ utility (which I just discovered) identified it as
UTF-16, and iconv converted easily to ASCII.

(Turns out the subtitles are in Italian though.)

Cheers,
Simon

--
Spectral Horse Poems
ww.spectralhorse.com
Coins in the Void