parsing script removing some lines help please

am 30.09.2011 15:37:50 von Natalie Conte

Hi,
I am lost in my script, and would need to basic help please.
I have got a file , separated by tabs, and the first column contain a
chromosome number, then several other column with different infos.
Basically I am trying to created a script that would take a file(see
example), parse line by line, and when the first column start by any of
the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
it doesn't start by the bad chromosomes , print all the line to a new
output file.
the script below, just reprint the same original file :(
thanks for any clues
Nat

#!/software/bin/perl
use warnings;
use strict;
open(IN, " open(OUT, ">>removed.txt") or die( $! );
my @bad_chromosome=(6,8,14,16,18,Y);
while(){
chomp;
my @column=split /\t/;
foreach my $chr_no(@bad_chromosome){
if ($column[0]==$chr_no){
next;
}
}
print OUT
$column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[4],"\t",$column[5],"\t",$column[6],"\t",$column[7 ],"\t",$column[8],"\t",$column[9],"\t",$column[10],"\t",$col umn[11],"\t",$column[12],"\t",$column[13],"\t",$column[14]," \n";
}

close IN; close OUT;

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: parsing script removing some lines help please

am 30.09.2011 15:46:45 von John SJ Anderson

On Fri, Sep 30, 2011 at 09:37, Nathalie Conte wrote:
> thanks for any clues

It's a simple one, really.. 8^)

> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, " > open(OUT, ">>removed.txt") or die( $! );

ObCorrectness: you should say something more like

open( my $IN , '<' , 'example.txt' ) or die( $! );
open( my $OUT , '>>' , removed.txt' ) or die( $! );

and then change the filehandles correspondingly -- but that's not your prob=
lem.

> my @bad_chromosome=3D(6,8,14,16,18,Y);
> while(){
> Â chomp;
> Â my @column=3Dsplit /\t/;
> Â Â Â foreach my $chr_no(@bad_chromosome){
> Â Â Â Â Â if ($column[0]==$chr_no){
> Â Â Â Â Â next;

here's your problem -- next always applies to the innermost loop -- so
you're jumping to the next $chr_no, not the next $_.

you solve this with a loop label:

LINE: while( ) {
chomp;
my @column =3D split /\t/;
foreach my $chr_no ( @bad_chromosome ) {
if( $column[0] == $chr_no ) {
next LINE;

and then the rest is all the same.

You _may_ want to switch that comparison to 'eq' instead of '==' --
didn't you have 'Y' as one of the chromosomes to drop?

> Â Â Â Â Â print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[4]=
,"\t",$column[5],"\t",$column[6],"\t",$column[7],"\t",$colum n[8],"\t",$colu=
mn[9],"\t",$column[10],"\t",$column[11],"\t",$column[12],"\t ",$column[13],"=
\t",$column[14],"\n";

Oh, and this? Try something like

print (join '\t' , @column), "\n"

chrs,
john.

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

RE: parsing script removing some lines help please

am 30.09.2011 16:57:16 von Ken Slater

> From: Nathalie Conte [mailto:nac@sanger.ac.uk]
> Sent: Friday, September 30, 2011 9:38 AM
> To: beginners@perl.org
> Subject: parsing script removing some lines help please
>
>
>
> Hi,
> I am lost in my script, and would need to basic help please.
> I have got a file , separated by tabs, and the first column contain a
> chromosome number, then several other column with different infos.
> Basically I am trying to created a script that would take a file(see
> example), parse line by line, and when the first column start by any
> of
> the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
> it doesn't start by the bad chromosomes , print all the line to a new
> output file.
> the script below, just reprint the same original file :(
> thanks for any clues
> Nat
>
>
>
> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, " > open(OUT, ">>removed.txt") or die( $! );
> my @bad_chromosome=(6,8,14,16,18,Y);
> while(){
> chomp;
> my @column=split /\t/;
> foreach my $chr_no(@bad_chromosome){
> if ($column[0]==$chr_no){
> next;
> }
> }
> print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[
> 4],"\t",$column[5],"\t",$column[6],"\t",$column[7],"\t",$col umn[8],"\t"
> ,$column[9],"\t",$column[10],"\t",$column[11],"\t",$column[1 2],"\t",$co
> lumn[13],"\t",$column[14],"\n";
> }
>
>
>
> close IN; close OUT;
>
John has provided good advice on this problem, but I wanted to add a couple
of things.
To avoid explicitly coding the foreach loop for @bad_chromosome, you could
use the 'grep' function.
Also, if you are just reprinting the input line, print $_.

unless ( grep {$column[0] eq $_} @bad_chromosome ){
print OUT "$_\n"; # or print $OUT if declared as John suggested

The grep call will return the number of times $column[0] matched an element
of @bad_chromosome.
Thus, if there is a match the grep call will evaluate to 'true'. Otherwise,
it will evaluate to 'false'.

Using grep does have a drawback (but not that much unless you have a lot of
values in @bad_chromosome). It checks all the values of @bad_chromosome for
a match. Using the 'if ... next' stops looking for a match when a match is
found.

If you wonder about the use of $_ in the grep function - that is a localized
copy of $_ and does not affect the $_ that contains the data read from the
file.

If you are using Perl 5.10 or higher, you can use the 'smart match'
operators instead of grep.

HTH, Ken

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: parsing script removing some lines help please

am 30.09.2011 20:20:30 von jwkrahn

Nathalie Conte wrote:
>
> Hi,

Hello,

> I am lost in my script, and would need to basic help please.
> I have got a file , separated by tabs, and the first column contain a
> chromosome number, then several other column with different infos.
> Basically I am trying to created a script that would take a file(see
> example), parse line by line, and when the first column start by any of
> the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
> it doesn't start by the bad chromosomes , print all the line to a new
> output file.
> the script below, just reprint the same original file :(
> thanks for any clues
> Nat
>
>
>
> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, " > open(OUT, ">>removed.txt") or die( $! );
> my @bad_chromosome=(6,8,14,16,18,Y);
> while(){
> chomp;
> my @column=split /\t/;
> foreach my $chr_no(@bad_chromosome){
> if ($column[0]==$chr_no){
> next;
> }
> }
> print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[4],"\t",$column[5],"\t",$column[6],"\t",$column[7 ],"\t",$column[8],"\t",$column[9],"\t",$column[10],"\t",$col umn[11],"\t",$column[12],"\t",$column[13],"\t",$column[14]," \n";
>
> }
>
> close IN; close OUT;

#!/software/bin/perl
use warnings;
use strict;

open my $IN, '<', 'example.txt' or die "Cannot open 'example.txt'
because: $!";
open my $OUT, '>>', 'removed.txt' or die "Cannot open 'removed.txt'
because: $!";

my $bad_chromosomes = qr/^(?:6|8|14|16|18|Y)\t/;

while ( <$IN> ) {
print $OUT $_ if !/$bad_chromosomes/;
}

close $IN;
close $OUT;

John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: parsing script removing some lines help please

am 30.09.2011 23:31:21 von jwkrahn

Mariano Loza Coll wrote:
> Hi John,

Hello,

> I'm trying to learn a little bit more of Perl everyday, and I was
> intrigued about your earlier suggestion in a thread.
>
>
>> my $bad_chromosomes = qr/^(?:6|8|14|16|18|Y)\t/;
>>
>> while (<$IN> ) {
>> print $OUT $_ if !/$bad_chromosomes/;
>> }
>>
>
> I get the spirit of what you suggested, but I was curious about the use of "?:"

() are capturing parentheses and

(?:) are non-capturing parentheses.

> If I got it right, the "?" will make the search non-greedy. But what
> is the ":" for?

perldoc perlre

John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: parsing script removing some lines help please

am 01.10.2011 02:36:56 von Chris Charley

John W. Krahn wrote:
>Mariano Loza Coll wrote:
>> Hi John,
>
>Hello,
>
>> I'm trying to learn a little bit more of Perl everyday, and I was
>> intrigued about your earlier suggestion in a thread.
>>
>>
>>> my $bad_chromosomes = qr/^(?:6|8|14|16|18|Y)\t/;
>>>
>>> while (<$IN> ) {
>>> print $OUT $_ if !/$bad_chromosomes/;
>>> }
>>>
>>
>> I get the spirit of what you suggested, but I was curious about the use
>> of "?:"
>
>() are capturing parentheses and
>
>(?:) are non-capturing parentheses.
>
>
>> If I got it right, the "?" will make the search non-greedy. But what
>> is the ":" for?

The (?:... ) notation is non-capturing. The ?: makes it non capturing.
In this context, the '?' is not used to control greediness.

In perlre, look for the section 'Extended Patterns', then down a page look
for:

(?:pattern)
(?adluimsx-imsx:pattern)
(?^aluimsx:pattern)

Chris

>
>perldoc perlre
>
>
>
>John
>--
>Any intelligent fool can make things bigger and
>more complex... It takes a touch of genius -
>and a lot of courage to move in the opposite
>direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/