odd file with ^@ characters
odd file with ^@ characters
am 03.01.2008 19:08:45 von cartercc
I was given an odd CVS file this morning, with datafields delimited by
",". I do not know the provenance, and neither does the person who
gave it to me. It's a data file with 1279 rows and size is 11M that
contains data for an urgent report, and the target system is Windows.
Neither Excel nor Access will take the file, but it opens up fine in
with cat, head, etc.
I moved the file to my Unix system and looked at it in vi. This is
what it looks like, for example, 'perl is good' looks like this:
^@p^@e^@r^@l^@ ^@i^@s^@ ^@g^@o^@o^@d
When I open up the output file in windows, I either get the ? or the
square unrecognizible character symbols.
I ~think~ I know what the problem is, but I haven't found the answer.
Here are some of the things I've tried. The best result is a file with
ASCII characters but with large strings of ?????????????????? at the
end of each line. Is this an encoding problem? and if so, how do I
convert the characters into plain ASCII that Excel or Access will
accept?
Thanks, CC
(code follows)
#!/usr/bin/perl -w
use strict;
use Encode; #tried various decode functions, also binmode
#open INFILE, "
open INFILE, "<:utf8", 'BB.TXT';
open OUTFILE, ">:utf8", 'cleanBB.txt';
while ()
{
# print $_;
my $line = $_;
# $line = decode_utf8($line);
# $line =~ s/\x{0000}//g;
# $line =~ s/<.*>//g;
$line =~ s/[^[:ascii:]]+//g;
$line =~ s/<(.*)>//g; #removed file data between <>, not HTML.
print OUTFILE $line;
}
close INFILE;
close OUTFILE;
Re: odd file with ^@ characters
am 03.01.2008 19:30:01 von Peter Makholm
cartercc@gmail.com writes:
> I moved the file to my Unix system and looked at it in vi. This is
> what it looks like, for example, 'perl is good' looks like this:
> ^@p^@e^@r^@l^@ ^@i^@s^@ ^@g^@o^@o^@d
Could be UTF-16BE or UCS-2BE.
//Makholm
Re: odd file with ^@ characters
am 03.01.2008 19:38:08 von jurgenex
cartercc@gmail.com wrote:
>I was given an odd CVS file this morning, with datafields delimited by
>",". I do not know the provenance, and neither does the person who
>gave it to me. It's a data file with 1279 rows and size is 11M that
>contains data for an urgent report, and the target system is Windows.
>Neither Excel nor Access will take the file, but it opens up fine in
>with cat, head, etc.
>
>I moved the file to my Unix system and looked at it in vi. This is
>what it looks like, for example, 'perl is good' looks like this:
>^@p^@e^@r^@l^@ ^@i^@s^@ ^@g^@o^@o^@d
Nothing do to with Perl but it appears as if the file is encoded in a 16-bit
encoding, most likely UTF-16.
>When I open up the output file in windows, I either get the ? or the
>square unrecognizible character symbols.
This may or may not work: Try opening the file in Firefox (yes, in a web
browser), and change "View -> Encoding -> More" to Unicode(UTF16).
This should give you a readable display which then you can either
copy-and-paste or even Save-As in a different encoding that is compatible
with your other tools.
Another option: Windows tools have the habit of adding a byte order mark
(BOM) to any Unicode file, no matter if its needed or not. Maybe it's just
that whatever program created that file did not write the BOM and therefore
the Windows programs don't recognize the encoding.
If that is the case you could use your favourite editor to just inject the
BOM at the beginning of the file.
jue
>
>I ~think~ I know what the problem is, but I haven't found the answer.
>Here are some of the things I've tried. The best result is a file with
>ASCII characters but with large strings of ?????????????????? at the
>end of each line. Is this an encoding problem? and if so, how do I
>convert the characters into plain ASCII that Excel or Access will
>accept?
>
>Thanks, CC
>(code follows)
>
>#!/usr/bin/perl -w
>use strict;
>use Encode; #tried various decode functions, also binmode
>
>#open INFILE, "
>open INFILE, "<:utf8", 'BB.TXT';
>open OUTFILE, ">:utf8", 'cleanBB.txt';
>
>while ()
>{
># print $_;
> my $line = $_;
># $line = decode_utf8($line);
># $line =~ s/\x{0000}//g;
># $line =~ s/<.*>//g;
> $line =~ s/[^[:ascii:]]+//g;
> $line =~ s/<(.*)>//g; #removed file data between <>, not HTML.
> print OUTFILE $line;
>}
>close INFILE;
>close OUTFILE;
Re: odd file with ^@ characters
am 03.01.2008 20:55:43 von cartercc
On Jan 3, 1:38 pm, Jürgen Exner
> Nothing do to with Perl but it appears as if the file is encoded in a 16-b=
it
> encoding, most likely UTF-16.
Yes, thanks, UTF16 it is. Since the guys who will work with this will
use MS apps, I'll bow out ... but I may be back if they ask me to do
something funky with the file (like create a script to spit out a
report, which they may do since I think this will be a continuing
task.)
CC