Perl script to retrieve specific lines from file

Perl script to retrieve specific lines from file

am 10.08.2011 12:43:46 von vinorex

--90e6ba3fcd65d4e8d304aa245acd
Content-Type: text/plain; charset=ISO-8859-1

Hi every one
i am a Biologist,i am having a DNA file having 2000 to 10000000 letters. eg
file:(ATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACGCG
CGTACCGTGCAGAAGAGCAGGACATATATATTACGCGGCGATCGATCGTAGC
GATCGATCGATCGCTAGCTGACTATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCT ACGCGCGTACCGTGCAGAAGAGCAGGACATATATATTACGCGGCGATCGATCGTAGCGAT CGATCGATCGCTAGCTGACTATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACG CGCGTACCGTGCA
GAAGAGCAGGACATATATATTACGCGGCGATCGATCGTAGCGATCGATCGA
TCGCTAGCTGACTATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACGCGCGTAC CGTGCAGAA
GAGCAGGACATATATATTACGCGGCGATCGATCGTAGCGATCGATCGATCGCTAGCTGAC TATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACGCGCGTACCGTGCAGAAGAG CAGGACATATATATTACGCGGCGATCGATCGTAGCGATCGATCGATCGCTAGCTGACT)
i calculated the total length of the sequence. what i want to execute is to
extract and show the output only the specific(ie highlighted ones only).
thank you

--90e6ba3fcd65d4e8d304aa245acd--

Re: Perl script to retrieve specific lines from file

am 10.08.2011 13:04:43 von Rob Coops

--0015175cdd4cef3d4a04aa24a6a6
Content-Type: text/plain; charset=UTF-8

On Wed, Aug 10, 2011 at 12:43 PM, VinoRex.E wrote:

> Hi every one
> i am a Biologist,i am having a DNA file having 2000 to 10000000 letters. eg
> file:(ATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACGCG
> CGTACCGTGCAGAAGAGCAGGACATATATATTACGCGGCGATCGATCGTAGC
>
> GATCGATCGATCGCTAGCTGACTATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCT ACGCGCGTACCGTGCAGAAGAGCAGGACATATATATTACGCGGCGATCGATCGTAGCGAT CGATCGATCGCTAGCTGACTATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACG CGCGTACCGTGCA
> GAAGAGCAGGACATATATATTACGCGGCGATCGATCGTAGCGATCGATCGA
> TCGCTAGCTGACTATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACGCGCGTAC CGTGCAGAA
>
> GAGCAGGACATATATATTACGCGGCGATCGATCGTAGCGATCGATCGATCGCTAGCTGAC TATGCATGCTAGCTAGTCGATCGCATCGATAGCTAGCTACGCGCGTACCGTGCAGAAGAG CAGGACATATATATTACGCGGCGATCGATCGTAGCGATCGATCGATCGCTAGCTGACT)
> i calculated the total length of the sequence. what i want to execute is to
> extract and show the output only the specific(ie highlighted ones only).
> thank you
>


Hi there,

Here is what I suggest you do:

First always use strict and warnings, second show us what yuo have done so
far that will help show people that you are actually trying and not just
asking others to do your homework for you. ;-)

As for solving the problem here is what I would do:

#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp; # A handy module check it out at:
http://search.cpan.org/~uri/File-Slurp-9999.19/lib/File/Slur p.pm

my $file_contents = read_file( '' ); #Reads the whole file in one
go, please note that this includes the linefeeds etc... you might or might
not need to strip these out depending on how clean the files are.

my $total = length( $file_contents );

my $substring = substr 0, 10, $file_contents;

print "Total number of characters: $total\nSub string: $substring\n";


That should do the trick of course change the 0, 10 to the starting position
of the substring and the total number of chars you would like to get have
returned. The trick for you would be to loop over the files, or allow for
command line or interactive feeding of the filename/path and the start
point/length of the substring you want to get out of the file.

Regards,

Rob

--0015175cdd4cef3d4a04aa24a6a6--

Re: Perl script to retrieve specific lines from file

am 10.08.2011 21:44:18 von Kevin Spencer

On Wed, Aug 10, 2011 at 4:04 AM, Rob Coops wrote:
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use File::Slurp; # A handy module check it out at:
> http://search.cpan.org/~uri/File-Slurp-9999.19/lib/File/Slur p.pm

While handy, be aware that you are slurping the entire file into
memory, so just be careful if you're going to be processing huge
files.

Kevin.

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Perl script to retrieve specific lines from file

am 11.08.2011 01:10:57 von Uri Guttman

>>>>> "KS" == Kevin Spencer writes:

KS> On Wed, Aug 10, 2011 at 4:04 AM, Rob Coops wrote:
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use File::Slurp; # A handy module check it out at:
>> http://search.cpan.org/~uri/File-Slurp-9999.19/lib/File/Slur p.pm

KS> While handy, be aware that you are slurping the entire file into
KS> memory, so just be careful if you're going to be processing huge
KS> files.

in general i would agree to never slurp in most genetics files which can
be in the many GB sizes and up. the OP says the file has up to 10M
letters which is fine to slurp on any modern machine.

uri

--
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
------------ Perl Developer Recruiting and Placement Services -------------
----- Perl Code Review, Architecture, Development, Training, Support -------

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Perl script to retrieve specific lines from file

am 11.08.2011 08:10:55 von Rob Coops

--0016e68ce98810708a04aa34aa2a
Content-Type: text/plain; charset=UTF-8

On Thu, Aug 11, 2011 at 1:10 AM, Uri Guttman wrote:

> >>>>> "KS" == Kevin Spencer writes:
>
> KS> On Wed, Aug 10, 2011 at 4:04 AM, Rob Coops wrote:
> >> #!/usr/bin/perl
> >>
> >> use strict;
> >> use warnings;
> >> use File::Slurp; # A handy module check it out at:
> >> http://search.cpan.org/~uri/File-Slurp-9999.19/lib/File/Slur p.pm
>
> KS> While handy, be aware that you are slurping the entire file into
> KS> memory, so just be careful if you're going to be processing huge
> KS> files.
>
> in general i would agree to never slurp in most genetics files which can
> be in the many GB sizes and up. the OP says the file has up to 10M
> letters which is fine to slurp on any modern machine.
>
> uri
>
> --
> Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com--
> ------------ Perl Developer Recruiting and Placement Services
> -------------
> ----- Perl Code Review, Architecture, Development, Training, Support
> -------
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>
Believe it or not but I actually did count the number of zero's there ;-)

I know that bio data tends to be rather large but looking at the size i
figured it cannot hurt... though indeed if you are going for something more
substantial you will want to use a different method of reading the file that
reads the file in bits of 2MB at the time or so. Of course if you are
pulling out only characters X to Y and you are certain that there is nothing
but normal characters in the file you could simply start reading the file
from point X and continue to Y, there is no need to loop over the whole
thing 2M characters at a time. But beware that making such assmptions will
always lead to failure at some point as there will always be one file that
contains something else that you didn't expect. Even if that file does not
show up in testing in a few years and after a few hundered thousand files
you will at some point run into one. (it is the simple principle of
increasing your sample size eventually you will find a outlier in there)

Regards,

Rob

--0016e68ce98810708a04aa34aa2a--