REGEX help needed on binary file
REGEX help needed on binary file
am 09.09.2005 23:54:37 von Chris
Not a PERL newbie, but closer to that than expert.
We receive a binary .afp (Advanced Function Printing) file with EBCDIC
hex coding. I'm working from the Windows platform.
Ultimately I need to be able to search the file for proprietary account
numbers in NNNN-NNNN-NNNN-NNNN format. The eventual goal is to mask out
the sensitive data using something like an 'x' (\xA7) in place of each
digit. But I'm having problems with the regex for (any) hex digit.
After modification, the file has to be processed by a piece of
third-party software that is expecting the original format, so I'm very
hesitant to do any sort of conversion/translation if I can help it.
Here is the test script so far:
Note: The first line of the file includes the EBCDIC string
C3D7E6F6F0F4F6, which translates to CPW6046 in ASCII (according to my
Hex editor).
$infile = "P6046csf.afp";
$path = "c://";
open (INFILE, $path.$infile) or die "can't open file $path $infile $!"
;
binmode (INFILE);
$line=;
if ($line=~m/\xC3\xD7\xE6\xF6\xF0\xF4\xF6/){ print "first test - item
found on line\n";}
#the above test succeeds using 'literal' hex values
#now try to match a non-literal hex digit in place of the \xC3 literal
if ($line=~m/\x[0-9a-fA-F][0-9a-fA-F]\xD7\xE6\xF6\xF0\xF4\xF6/
{print "second test - item found on line\n"}
else
{print "second test - item not matched with [0-9A-Fa-f]\n"}
#The second test fails to match
close INFILE;
Everything I can find recommends the \x[0-9a-fA-F][0-9a-fA-F] approach,
or some variant of it, in a regex. But since none of those variants are
working for me, I need some alternate suggestions.
Thanks in advance,
Chris Mack
Westminster, Co
Re: REGEX help needed on binary file
am 10.09.2005 00:28:40 von Chris
Sorry, missed the closing parenthesis onthe second IF when I
copied/pasted in to the message. It should read:
if ($line=3D~m/\x[0-9a-fA-F][0-9a-f=ADA-F]\xD7\xE6\xF6\xF0\xF4\ xF6/ )
{print "second test - item found on line\n"}
else
{print "second test - item not matched with [0-9A-Fa-f]\n"}
#The second test fails to match=20
Chris
Re: REGEX help needed on binary file
am 10.09.2005 05:54:59 von someone
chris wrote:
> Not a PERL newbie, but closer to that than expert.
>
> We receive a binary .afp (Advanced Function Printing) file with EBCDIC
> hex coding. I'm working from the Windows platform.
>
> Ultimately I need to be able to search the file for proprietary account
> numbers in NNNN-NNNN-NNNN-NNNN format. The eventual goal is to mask out
> the sensitive data using something like an 'x' (\xA7) in place of each
> digit. But I'm having problems with the regex for (any) hex digit.
>
> After modification, the file has to be processed by a piece of
> third-party software that is expecting the original format, so I'm very
> hesitant to do any sort of conversion/translation if I can help it.
>
> Here is the test script so far:
>
> Note: The first line of the file includes the EBCDIC string
> C3D7E6F6F0F4F6, which translates to CPW6046 in ASCII (according to my
> Hex editor).
>
> $infile = "P6046csf.afp";
> $path = "c://";
> open (INFILE, $path.$infile) or die "can't open file $path $infile $!"
> ;
> binmode (INFILE);
> $line=;
>
> if ($line=~m/\xC3\xD7\xE6\xF6\xF0\xF4\xF6/){ print "first test - item
> found on line\n";}
>
> #the above test succeeds using 'literal' hex values
> #now try to match a non-literal hex digit in place of the \xC3 literal
/\xC3/ represents *a single character*.
> if ($line=~m/\x[0-9a-fA-F][0-9a-fA-F]\xD7\xE6\xF6\xF0\xF4\xF6/
> {print "second test - item found on line\n"}
> else
> {print "second test - item not matched with [0-9A-Fa-f]\n"}
> #The second test fails to match
/\x[0-9a-fA-F][0-9a-fA-F]/ represents *three characters*.
You probably want:
/.\xD7\xE6\xF6\xF0\xF4\xF6/s
Where the . with the /s option will match *any* single character.
Have you read the perlebcdic man page?
perldoc perlebcdic
John
--
use Perl;
program
fulfillment
Re: REGEX help needed on binary file
am 10.09.2005 23:21:38 von Joe Smith
chris wrote:
> We receive a binary .afp (Advanced Function Printing) file with EBCDIC
> hex coding.
From the rest of your program, it is clear that you are receiving
a file with EBCDIC encoding, not "hex coding".
> Ultimately I need to be able to search the file for proprietary account
> numbers in NNNN-NNNN-NNNN-NNNN format. The eventual goal is to mask out
> the sensitive data using something like an 'x' (\xA7) in place of each
> digit. But I'm having problems with the regex for (any) hex digit.
The file is EBCDIC characters, not "hex digits".
You want to look for characters that match EBCDIC('0') through
EBCDIC('9').
> Note: The first line of the file includes the EBCDIC string
> C3D7E6F6F0F4F6, which translates to CPW6046 in ASCII (according to my
> Hex editor).
The first line of the file includes a seven binary bytes.
Those bytes, when displayed as hex, show up as "C3D7E6F6F0F4F6".
Those bytes, when displayed as ASCII, show up as "CPW6046".
> if ($line=~m/\xC3\xD7\xE6\xF6\xF0\xF4\xF6/){ print "first test - item
> found on line\n";}
>
> #the above test succeeds using 'literal' hex values
> #now try to match a non-literal hex digit in place of the \xC3 literal
You've got a conceptual error there. You are not working with hex
digits. You are working with bytes. I expect that you want accept
as a match a single byte, either m/./ or something that matches the
EBCDIC code for a single alphanumeric character.
This will match a *single character* in the range A-IJ-RS-Z0-9a-ij-rs-z:
m/[\xC1-\xC9\xD1-\xD9\xE2-\xE9\xF0-xF9\x81-\x89\x91-\x99\xA2 -\xA9]/;
> Everything I can find recommends the \x[0-9a-fA-F][0-9a-fA-F] approach,
> or some variant of it, in a regex.
As shown in my regex above, the characters in \x__ form need to be
inside the square brackets, with hyphens between two such \x__
characters.
-Joe
Re: REGEX help needed on binary file
am 12.09.2005 18:30:56 von Chris
I appreciate the replies... and the conceptual shift. I knew it had to
be something simple, just outside my grasp.
[\xF0-\xF9]{4} does exactly what I need to match the number groupings.
I'll be studying perldoc perlebcdic soon.
Thanks,
Chris Mack