Re: Running out of memory? -- revised program
am 21.07.2007 23:15:32 von mpetersenBoth Brian and Bill -- thanks immensely. I'm learning a lot in the process
-- and have over the last year just reading yours and others postings. I
understood most of the comments and really appreciate the advice. The
problem still persists -- I think I know from where it is coming -- but not
how to fix it.
First, a more narrow question. I am not sure I completely follow your
comment on local and global variables. If I declare a variable inside a
look (e.g. my $newvariable), it will not be available outside the loop
(which is good if I don't need it so it won't consume memory). Looking at
your edits of my program -- this makes sense.
Ok, now for my persistent problem. As the program runs, I can see it use
more and more memory -- until it crashes. I think (and could be wrong) is
that the program is not deleting the tree when it is done. I will enclose
the program below, but let me explain what I have done. The program will
eventually read different input files -- but for testing it uses the same
input file over and over. At the moment (see below) the
my $root = HTML::TreeBuilder->new;
$root->parse($doc);
$root->eof();
are in the loop. I have tried to include
$root->delete();
at the end of the loop, but with no effect.
If I move the commands
my $root = HTML::TreeBuilder->new;
$root->parse($doc);
$root->eof();
outside the loop -- I don't have a memory problem. Thus I think the program
is not releasing the memory of the old tree, when it builds the new one. I
can't have the $root->parse($doc) command outside the loop, as when I
actually use the program -- it will read different files and build the tree
for each one.
P.S.
I couldn't figure out the commands
my @vals = map {s/[,$ =]//g} @col_asset[0,-1];
print join(",", @vals), "\n";
If you could direct me to a manual, that would be fine as well.
Program ----------
use strict;
use warnings;
use HTML::TreeBuilder;
my $txtfile = 'D:/res/edgar/10k/2178_0000002178-06-000013.txt';
my $csvfile = 'D:/res/edgar/match/test2.csv';
# open the CSV file for writing
open OUT, ">$csvfile" or die "create csv: $!($^E)";
select ((select (OUT), $| = 1)[0]); # unbuffer CSV write
# open the text file for reading
open IN, $txtfile or die "open $txtfile: $!($^E)";
my $doc = join '',
close IN;
my $total = 0;
while ($total <= 3000) {
my $asset_s=0;
my $asset_s2=0;
my $root = HTML::TreeBuilder->new;
$root->parse($doc);
$root->eof();
OUTER_LOOP:
foreach my $table ($root->find_by_tag_name('TABLE')) { # put tables into
array then put each one in $table;
my $txt = $table->as_text_trimmed;
next if ($txt !~ /total asset/is || $txt !~ /(\d|,){4,12}/is); # skip
items not of interest
my @col_asset; # my @col_asset = ();
foreach my $row ($table->find_by_tag_name('tr')) {
next if $row->as_text_trimmed !~ /^total asset/i; # skip rows not of
interest
foreach my $column ($row->find_by_tag_name('td')) {
my $col_text = $column->as_text_trimmed;
if ($col_text =~ /[\d,\.]{4,12}/) {
push @col_asset, $col_text if $col_text =~ /([\d,\.]{4,12})/;
}
}
$asset_s = $col_asset[0];
$asset_s2 = $col_asset[-1];
last;
}
$asset_s =~ s/[,$ =]//g; # drop ',', '$', ' ', & '='
$asset_s2 =~ s/[,$ =]//g;
last OUTER_LOOP; # only do 1st table
}
$total++;
print OUT "$asset_s $asset_s2 $total \n";
}
close OUT;
__END__
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs