Another regex

am 02.03.2007 22:54:26 von amit hetawal

hello,
Thanks for your previous reponses.
Now this time i am using the right syntax for matching, for the string like:

$temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"

I need to write a regex for filterin out the string between.

AAA
BBB
CCC

so in the above case i should have the output as:

AAAZZZZZBBB
BBBSSSSSSCCC
CCCGGGGBBB
BBBVVVVVBBB
meaning all combinations of start and end for AAA BBB CCC.

I have the regex for one of them but how do i do it simultaneously for
all 3 of them.

$temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';

@t = ($temp =~/(AAA)(.*?)(BBB)/g);
foreach (@t)
{

print $_;

}

Am not able to figure out how will go about when just after the match
i need to match for
BBBSSSSCCC.

Any suggestions

Thanks
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Another regex

am 03.03.2007 00:27:13 von amit hetawal

yes it is a DNA sequence i need to find.

But still not getting how.. should i go about.

Can you advise something

Thanks

On 3/2/07, Deane.Rothenmaier@walgreens.com
wrote:
>
> If those letters were different, I'd think you were working on a chunk of
> DNA... P-))
>
> Deane Rothenmaier
> Programmer/Analyst
> Walgreens Corp.
> 847-914-5150
>
> "On two occasions I have been asked [by members of Parliament], 'Pray, Mr.
> Babbage, if you put into the machine wrong figures, will the right answers
> come out?' I am not able rightly to apprehend the kind of confusion of ideas
> that could provoke such a question." -- Charles Babbage
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Another regex

am 03.03.2007 01:09:17 von Andy_Bach

Well, some of it depends upon how consistent your markers are:

$temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"

> I need to write a regex for filterin out the string between.
AAA
BBB
CCC

> so in the above case i should have the output as:
AAAZZZZZBBB
BBBSSSSSSCCC
CCCGGGGBBB
BBBVVVVVBBB
> meaning all combinations of start and end for AAA BBB CCC.

So you want the markers and what's between them - will there always be a
begin/end set of markers, but just of different content?

> I have the regex for one of them but how do i do it simultaneously for
> all 3 of them.

$temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';

@t = ($temp =~/(AAA)(.*?)(BBB)/g);
foreach (@t)
{

print $_;

}

So, use the alternative to create marker sets (note, you need to add "\n"
to the end of your print stmts or it'll all run together which makes its
seem like its working ... sort of):

my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my @t = ($temp =~/(AAA|BBB|CCC)(.*?)(AAA|BBB|CCC)/g);
foreach (@t) {
print "Got: ", $_, "\n";
}

Sort of work - it gets:
Got: AAA
Got: ZZZZ
Got: BBB
Got: CCC
Got: GGGG
Got: BBB

you want to capture the whole shebang - so we use both the capture parens
and, because we're using the alternative pipe "|" , the non-capturing
parens (which are "(?:....)" ) to group our alternatives:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my @t = ($temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g);
foreach (@t) {
print "Got: ", $_, "\n";
}

Got: AAAZZZZBBB
Got: CCCGGGGBBB

But this isn't quite right as its not 'reusing' the last marker set to be
the beginning of the first. This gets trickier, you want to restart the
match at the marker of the previous match not just after it. First, lets
go to the cool
while ( /.../g ) {

loop - note the change to '$1' in the print:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
print "Got: ", $1, "\n";
}

Got: AAAZZZZBBB
Got: CCCGGGGBBB

Er, I have to go here but I think the proper bump along/reset code might
be in this articles:

http://www.samag.com/documents/s=10118/sam0703i/0703i.htm

nope. Dang. I'll have to find it. The \G marks the point of the last
match, when you're doing a global "/g" matching process. The "pos()"
function is the location of the current \G and you can reset that.
Something like:
my $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
while( $temp =~/((?:AAA|BBB|CCC).*?(?:AAA|BBB|CCC))/g) {
$pos = pos $temp;
print "Got ($pos):", $1, "\n";
pos $temp -= 3;
}

Got (14):AAAZZZZBBB
Got (21):BBBSSSSCCC
Got (28):CCCGGGGBBB
Got (36):BBBVVVVVBBB

a

Andy Bach
Systems Mangler
Internet: andy_bach@wiwb.uscourts.gov
VOICE: (608) 261-5738 FAX 264-5932

"Procrastination is like putting lots and lots of commas in the sentence
of your life."
Ze Frank
http://lifehacker.com/software/procrastination/ze-frank-on-p rocrastination-235859.php
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Another regex

am 03.03.2007 01:10:03 von Bill Luebkert

amit hetawal wrote:
> hello,
> Thanks for your previous reponses.
> Now this time i am using the right syntax for matching, for the string like:
>
>
> $temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
>
> I need to write a regex for filterin out the string between.
>
> AAA
> BBB
> CCC
>
> so in the above case i should have the output as:
>
>
> AAAZZZZZBBB
> BBBSSSSSSCCC
> CCCGGGGBBB
> BBBVVVVVBBB
> meaning all combinations of start and end for AAA BBB CCC.
>
> I have the regex for one of them but how do i do it simultaneously for
> all 3 of them.

I'd do something like:

use strict;
use warnings;

my $temp = 'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';

while ($temp =~ s/((?:AAA|BBB|CCC).*?(AAA|BBB|CCC))/$2/) {
# $2 - leave the suffix there for next piece
print "piece=$1\n";
}

__END__

piece=AAAZZZZBBB
piece=BBBSSSSCCC
piece=CCCGGGGBBB
piece=BBBVVVVVBBB

>
> $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
>
> @t = ($temp =~/(AAA)(.*?)(BBB)/g);
> foreach (@t)
> {
>
> print $_;
>
> }
>
> Am not able to figure out how will go about when just after the match
> i need to match for
> BBBSSSSCCC.

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Another regex

am 03.03.2007 01:26:25 von Williamawalters

--===============1947490370==
Content-Type: multipart/alternative;
boundary="-----------------------------1172881585"

-------------------------------1172881585
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

hi amit --

In a message dated 3/2/2007 4:54:57 P.M. Eastern Standard Time,
amit.hetawal@gmail.com writes:

> hello,
> Thanks for your previous reponses.
> Now this time i am using the right syntax for matching, for the string
like:
>
>
> $temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
>
> I need to write a regex for filterin out the string between.
>
> AAA
> BBB
> CCC
>
> so in the above case i should have the output as:
>
>
> AAAZZZZZBBB
> BBBSSSSSSCCC
> CCCGGGGBBB
> BBBVVVVVBBB
> meaning all combinations of start and end for AAA BBB CCC.
>
> I have the regex for one of them but how do i do it simultaneously for
> all 3 of them.
>
>
> $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
>
> @t = ($temp =~/(AAA)(.*?)(BBB)/g);
> foreach (@t)
> {
>
> print $_;
>
> }
>
> Am not able to figure out how will go about when just after the match
> i need to match for
> BBBSSSSCCC.
>
> Any suggestions
>
>
> Thanks

try this:

C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
my $starter = qr(AAA | BBB | CCC)x; my $stopper = qr(AAA | BBB | CCC)x; my
@seq;
while ($dna =~ / ($starter) (.*?) ($stopper) /gx) { push @seq, qq($1$2$3);
pos($dna) = $-[3]; };
print qq({$_} \n) for @seq"
{AAAZZZZBBB}
{BBBSSSSCCC}
{CCCGGGGBBB}
{BBBVVVVVBBB}
{BBBAAA}

the trick is resetting the search position in the body of the while loop.
as far as i know, there is
no way to do this purely from within a regex.
i defined two separate patterns for starting and stopping a subsequence even
though the actual
groups are identical; it may serve if the groups ever differ.
note that the above also captures the ``empty'' test case i added at the
end. if you do not want this,
try this instead (the (.*?) becomes (.+?)):

C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
my $starter = qr(AAA | BBB | CCC)x; my $stopper = qr(AAA | BBB | CCC)x; my
@seq;
while ($dna =~ / ($starter) (.+?) ($stopper) /gx) { push @seq, qq($1$2$3);
pos($dna) = $-[3]; };
print qq({$_} \n) for @seq"
{AAAZZZZBBB}
{BBBSSSSCCC}
{CCCGGGGBBB}
{BBBVVVVVBBB}

hth -- bill walters

**************************************
AOL now offers free
email to everyone. Find out more about what's free from AOL at
http://www.aol.com.

-------------------------------1172881585
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

Arial"=20
bottomMargin=3D7 leftMargin=3D7 topMargin=3D7 rightMargin=3D7> e_document=20
face=3DArial color=3D#000000 size=3D2>

hi amit --

In a message dated 3/2/2007 4:54:57 P.M. Eastern Standard Time,=20
amit.hetawal@gmail.com writes:

> hello,
> Thanks for your previous reponses.
> Now this=
=20
time i am using the right syntax for matching, for the string like:
>=20

>
> $temp=3D "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
> R>>=20
I need to write a regex for filterin out the string between.
>
>=
;=20
AAA
> BBB
> CCC
>
> so in the above case i should h=
ave=20
the output as:
>
>
> AAAZZZZZBBB
>=20
BBBSSSSSSCCC
> CCCGGGGBBB
> BBBVVVVVBBB
> meaning all=20
combinations of start and end for AAA BBB CCC.
>
> I have the r=
egex=20
for one of them but how do i do it simultaneously for
> all 3 of=20
them.
>
>
>=20
$temp=3D'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
>
> @t =3D ($te=
mp=20
=3D~/(AAA)(.*?)(BBB)/g);
> foreach (@t)
> {
>
> pri=
nt=20
$_;
>
> }
>
> Am not able to figure out how will g=
o=20
about when just after the match
> i need to match for
>=20
BBBSSSSCCC.
>
> Any suggestions
>
>
>=20
Thanks

try this:

C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
my $starter =3D qr(AAA | BBB |=
=20
CCC)x; my $stopper =3D qr(AAA | BBB | CCC)x; my @seq;
while (=
$dna=20
=3D~ / ($starter) (.*?) ($stopper) /gx) { push @seq, qq($1$2$3); pos($=
dna) =
$-[3]; };
print qq({$_} \n) for=20
@seq"
{AAAZZZZBBB}
{BBBSSSSCCC}
{CCCGGGGBBB}
{BBBVVVVVBBB}
{B=
BBAAA}

the trick is resetting the search position in the body of the while=20
loop. as far as i know, there is

no way to do this purely from within a regex.

i defined two separate patterns for starting and stopping a subsequence=
=20
even though the actual

groups are identical; it may serve if the groups ever differ. &nbs=
p;=20

note that the above also captures the ``empty'' test case i added=20=
at=20
the end. if you do not want this,

try this instead (the (.*?) becomes (.+?)): &nbs=
p;=20

C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
my $starter =3D qr(AAA | BBB |=
=20
CCC)x; my $stopper =3D qr(AAA | BBB | CCC)x; my @seq;
while (=
$dna=20
=3D~ / ($starter) (.+?) ($stopper) /gx) { push @seq, qq($1$2$3); pos($=
dna) =
$-[3]; };
print qq({$_} \n) for=20
@seq"
{AAAZZZZBBB}
{BBBSSSSCCC}
{CCCGGGGBBB}
{BBBVVVVVBBB}
DIV>

hth -- bill walters

font: normal 10pt ARIAL, SAN-SERIF;">

AOL now=20=
offers free email to everyone. Find out more about what's free from AOL at=20=
ol?redir=3Dhttp://www.aol.com" href=3D"http://pr.atwola.com/promoclk/1615326=
657x4311227241x4298082137/aol?redir=3Dhttp%3A%2F%2Fwww%2Eaol %2Ecom" target=
=3D"_blank">AOL.com.

-------------------------------1172881585--

--===============1947490370==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============1947490370==--

Re: Another regex

am 03.03.2007 01:54:26 von Williamawalters

--===============0353856668==
Content-Type: multipart/alternative;
boundary="-----------------------------1172883266"

-------------------------------1172883266
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

hi again, amit --

In a message dated 3/2/2007 7:27:39 P.M. Eastern Standard Time,
Williamawalters@aol.com writes:

> hi amit --
>
> In a message dated 3/2/2007 4:54:57 P.M. Eastern Standard Time,
_amit.hetawal@gmail.com_ (mailto:amit.hetawal@gmail.com) writes:
>
> > hello,
> > Thanks for your previous reponses.
> > Now this time i am using the right syntax for matching, for the string
like:
> >
> >
> > $temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
> >
> > I need to write a regex for filterin out the string between.
> >
> > AAA
> > BBB
> > CCC
> >
> > so in the above case i should have the output as:
> >
> >
> > AAAZZZZZBBB
> > BBBSSSSSSCCC
> > CCCGGGGBBB
> > BBBVVVVVBBB
> > meaning all combinations of start and end for AAA BBB CCC.
> >
> > I have the regex for one of them but how do i do it simultaneously for
> > all 3 of them.
> >
> >
> > $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
> >
> > @t = ($temp =~/(AAA)(.*?)(BBB)/g);
> > foreach (@t)
> > {
> >
> > print $_;
> >
> > }
> >
> > Am not able to figure out how will go about when just after the match
> > i need to match for
> > BBBSSSSCCC.
> >
> > Any suggestions
> >
> >
> > Thanks
>
>
> try this:
>
> C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
> my $starter = qr(AAA | BBB | CCC)x; my $stopper = qr(AAA | BBB | CCC)x;
my @seq;
> while ($dna =~ / ($starter) (.*?) ($stopper) /gx) { push @seq, qq($1$2$3);
pos($dna) = $-[3]; };
> print qq({$_} \n) for @seq"
> {AAAZZZZBBB}
> {BBBSSSSCCC}
> {CCCGGGGBBB}
> {BBBVVVVVBBB}
> {BBBAAA}
>
> the trick is resetting the search position in the body of the while loop.
as far as i know, there is
> no way to do this purely from within a regex.
> i defined two separate patterns for starting and stopping a subsequence
even though the actual
> groups are identical; it may serve if the groups ever differ.
> note that the above also captures the ``empty'' test case i added at the
end. if you do not want this,
> try this instead (the (.*?) becomes (.+?)):
>
> C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
> my $starter = qr(AAA | BBB | CCC)x; my $stopper = qr(AAA | BBB | CCC)x;
my @seq;
> while ($dna =~ / ($starter) (.+?) ($stopper) /gx) { push @seq, qq($1$2$3);
pos($dna) = $-[3]; };
> print qq({$_} \n) for @seq"
> {AAAZZZZBBB}
> {BBBSSSSCCC}
> {CCCGGGGBBB}
> {BBBVVVVVBBB}
>
>
> hth -- bill walters
>

i took another look at my original post and came up with a version that's
slightly more concise:

C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
my $starter = qr(AAA | BBB | CCC)x; my $stopper = qr(AAA | BBB | CCC)x; my
@seq;
while ($dna =~ / ($starter .*? ($stopper)) /gx) { push @seq, $1; pos($dna)
= $-[2]; }
print qq({$_} \n) for @seq"
{AAAZZZZBBB}
{BBBSSSSCCC}
{CCCGGGGBBB}
{BBBVVVVVBBB}
{BBBAAA}

br -- bill

**************************************
AOL now offers free
email to everyone. Find out more about what's free from AOL at
http://www.aol.com.

-------------------------------1172883266
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

Arial"=20
bottomMargin=3D7 leftMargin=3D7 topMargin=3D7 rightMargin=3D7> e_document=20
face=3DArial color=3D#000000 size=3D2>

hi again, amit --

In a message dated 3/2/2007 7:27:39 P.M. Eastern Standard Time,=20
Williamawalters@aol.com writes:

> hi amit --
>
> In a message dated=20
3/2/2007 4:54:57 P.M. Eastern Standard Time, href=3D"mailto:amit.hetawal@gmail.com">amit.hetawal@gmail.co m=20
writes:
>
> > hello,
> > Thanks for your prev=
ious=20
reponses.
> > Now this time i am using the right syntax for matchin=
g,=20
for the string like:
> >
> >
> > $temp=
"XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
> >
> > I need to=20
write a regex for filterin out the string between.
> >
> >=
;=20
AAA
> > BBB
> > CCC
> >
> > so in the a=
bove=20
case i should have the output as:
> >
> >
> >=20
AAAZZZZZBBB
> > BBBSSSSSSCCC
> > CCCGGGGBBB
> >=20
BBBVVVVVBBB
> > meaning all combinations of start and end for AAA B=
BB=20
CCC.
> >
> > I have the regex for one of them but how do=20=
i do=20
it simultaneously for
> > all 3 of them.
> >
> >=
=20

> > $temp=3D'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
> >=20

> > @t =3D ($temp =3D~/(AAA)(.*?)(BBB)/g);
> > foreach=20
(@t)
> > {
> >
> > print $_;
> >
&g=
t;=20
> }
> >
> > Am not able to figure out how will go abou=
t=20
when just after the match
> > i need to match for
> >=20
BBBSSSSCCC.
> >
> > Any suggestions
> >
>=
=20
>
> > Thanks
>
>
> try this: &nb=
sp;=20

>
> C:\@Work\Perl>perl -we "use strict; my $dna=
=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
> my $starter =3D qr(AAA |=20=
BBB |=20
CCC)x; my $stopper =3D qr(AAA | BBB | CCC)x; my @seq;
> wh=
ile=20
($dna =3D~ / ($starter) (.*?) ($stopper) /gx) { push @seq, qq($1$2$3); =
=20
pos($dna) =3D $-[3]; };
> print qq({$_} \n) for @seq"
>=20
{AAAZZZZBBB}
> {BBBSSSSCCC}
> {CCCGGGGBBB}
>=20
{BBBVVVVVBBB}
> {BBBAAA}
>
> the trick is resetting=
the=20
search position in the body of the while loop.   as far as i know,=
=20
there is
> no way to do this purely from within a regex.  =20

> i defined two separate patterns for starting and stopping a subsequ=
ence=20
even though the actual
> groups are identical; it may serve if the gr=
oups=20
ever differ.
> note that the above also captures the=20
``empty'' test case i added at the end.   if you do not want this,=
=20

> try this instead (the (.*?) becomes =20
(.+?)):
>
> C:\@Work\Perl>perl -we "use=20
strict; my $dna=3D 'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
>=20=
my=20
$starter =3D qr(AAA | BBB | CCC)x; my $stopper =3D qr(AAA | BBB | CCC)=
x; =20
my @seq;
> while ($dna =3D~ / ($starter) (.+?) ($stopper) /gx) { push=20=
@seq,=20
qq($1$2$3); pos($dna) =3D $-[3]; };
> print qq({$_} \n) for=20
@seq"
> {AAAZZZZBBB}
> {BBBSSSSCCC}
> {CCCGGGGBBB}
>=
=20
{BBBVVVVVBBB}
>
>
> hth --   bill=20
walters
>

i took another look at my original post and came up with a version=20
that's slightly more concise:

C:\@Work\Perl>perl -we "use strict; my $dna=
'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBBAAA';
my $starter =3D qr(AAA | BBB |=
=20
CCC)x; my $stopper =3D qr(AAA | BBB | CCC)x; my @seq;
while (=
$dna=20
=3D~ / ($starter .*? ($stopper)) /gx) { push @seq, $1; pos($dna) =3D $=
-[2];=20
}
print qq({$_} \n) for=20
@seq"
{AAAZZZZBBB}
{BBBSSSSCCC}
{CCCGGGGBBB}
{BBBVVVVVBBB}
{B=
BBAAA}

br -- bill

font: normal 10pt ARIAL, SAN-SERIF;">

-------------------------------1172883266--

--===============0353856668==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0353856668==--

Re: Another regex

am 03.03.2007 02:10:19 von Sam Dela Cruz

This is a multipart message in MIME format.
--===============1601439636==
Content-Type: multipart/alternative;
boundary="=_alternative 000657DE88257293_="

This is a multipart message in MIME format.
--=_alternative 000657DE88257293_=
Content-Type: text/plain; charset="US-ASCII"

Hi Amit,

Here's my solution:

my $dna = 'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
my @tags = ('AAA',
'BBB',
'CCC',
);
my $tags = join "|", @tags;
my $tag_pattern = qr($tags);
print $_,"\n" foreach ( $dna =~ /(?=($tag_pattern.*?$tag_pattern))/g );

Output is:
AAAZZZZBBB
BBBSSSSCCC
CCCGGGGBBB
BBBVVVVVBBB

This should work for whatever pattern you specify in @tags list.

Regards,

Sam Dela Cruz
__________________________________________________________
Business Applications, Application Developer
AMEC Operations Management - North America

activeperl-bounces@listserv.ActiveState.com wrote on 03/02/2007 01:54:26
PM:

> hello,
> Thanks for your previous reponses.
> Now this time i am using the right syntax for matching, for the string
like:
>
>
> $temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB"
>
> I need to write a regex for filterin out the string between.
>
> AAA
> BBB
> CCC
>
> so in the above case i should have the output as:
>
>
> AAAZZZZZBBB
> BBBSSSSSSCCC
> CCCGGGGBBB
> BBBVVVVVBBB
> meaning all combinations of start and end for AAA BBB CCC.
>
> I have the regex for one of them but how do i do it simultaneously for
> all 3 of them.
>
>
> $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';
>
> @t = ($temp =~/(AAA)(.*?)(BBB)/g);
> foreach (@t)
> {
>
> print $_;
>
> }
>
> Am not able to figure out how will go about when just after the match
> i need to match for
> BBBSSSSCCC.
>
> Any suggestions
>
>
> Thanks
> _______________________________________________
> ActivePerl mailing list
> ActivePerl@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

--=_alternative 000657DE88257293_=
Content-Type: text/html; charset="US-ASCII"

Hi Amit,

Here's my solution:

my $dna = 'XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB';

my @tags = ('AAA',

'BBB',

'CCC',

);

my $tags = join "|", @tags;

my $tag_pattern = qr($tags);

print $_,"\n" foreach ( $dna
=~ /(?=($tag_pattern.*?$tag_pattern))/g );

Output is:

AAAZZZZBBB

BBBSSSSCCC

CCCGGGGBBB

BBBVVVVVBBB

This should work for whatever pattern
you specify in @tags list.

Regards,

Sam Dela Cruz

__________________________________________________________

Business Applications, Application Developer

AMEC Operations Management - North America

activeperl-bounces@listserv.ActiveState.com wrote on 03/02/2007 01:54:26 PM: > hello, > Thanks for your previous reponses. > Now this time i am using the right syntax for matching, for the string like: > > > $temp= "XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB" > > I need to write a regex for filterin out the string between. > > AAA > BBB > CCC > > so in the above case i should have the output as: > > > AAAZZZZZBBB > BBBSSSSSSCCC > CCCGGGGBBB > BBBVVVVVBBB > meaning all combinations of start and end for AAA BBB CCC. > > I have the regex for one of them but how do i do it simultaneously for > all 3 of them. > > > $temp='XXXXAAAZZZZBBBSSSSCCCGGGGBBBVVVVVBBB'; > > @t = ($temp =~/(AAA)(.*?)(BBB)/g); > foreach (@t) > { > > print $_; > > } > > Am not able to figure out how will go about when just after the match > i need to match for > BBBSSSSCCC. > > Any suggestions > > > Thanks > _______________________________________________ > ActivePerl mailing list > ActivePerl@listserv.ActiveState.com > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--=_alternative 000657DE88257293_=--

--===============1601439636==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============1601439636==--

Re: Another regex

am 05.03.2007 19:38:48 von Andy_Bach

> [ is this DNA related?]

Searching CPAN for DNA (or Genetics) there's a whole bunch of stuff for it
- and via the Perl Journal, I recall a major genome mapping project
having been completed *only* on the power of Perl. So if you're doing
something ... I dunno, lab or genome related, you may want to look for the
wheels already out there for Perl and DNA.

a

Andy Bach
Systems Mangler
Internet: andy_bach@wiwb.uscourts.gov
VOICE: (608) 261-5738 FAX 264-5932

"Procrastination is like putting lots and lots of commas in the sentence
of your life."
Ze Frank
http://lifehacker.com/software/procrastination/ze-frank-on-p rocrastination-235859.php
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Another regex

am 07.03.2007 16:43:10 von n.haigh

amit hetawal wrote:
> yes it is a DNA sequence i need to find.
>
> But still not getting how.. should i go about.
>
> Can you advise something
>
> Thanks
>
>
> On 3/2/07, Deane.Rothenmaier@walgreens.com
> wrote:
>
>> If those letters were different, I'd think you were working on a chunk of
>> DNA... P-))
>>
>> Deane Rothenmaier
>> Programmer/Analyst
>> Walgreens Corp.
>> 847-914-5150
>>
>> "On two occasions I have been asked [by members of Parliament], 'Pray, Mr.
>> Babbage, if you put into the machine wrong figures, will the right answers
>> come out?' I am not able rightly to apprehend the kind of confusion of ideas
>> that could provoke such a question." -- Charles Babbage
>>
> _______________________________________________
> ActivePerl mailing list
> ActivePerl@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>

Have you ever thought about using BioPerl?

I know it may not be the place to discuss this, but, could you explain
what you are trying to do in real/biological terms? I'm a
bioinformatician, so it shouldn't scare me!!

Nathan
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs