Program Dies after 60 or so Iterations of Loop

Program Dies after 60 or so Iterations of Loop

am 26.08.2007 03:07:47 von Hal Vaughan

I found the old time radio shows a the Interet Archives, so I wrote a
program to scan the shows, then the episodes and let me list the ones I
wanted to download. The program then goes through the list of files and
downloads each file (most are MP3, some are Ogg). Files are at least a few
megabytes in length. The problem is that after about 60 or so downloads
(it varies), my computer (running Linux, Ubuntu Fiesty Fawn) slows down for
about 30 minutes (not an exaggeration), then the program stops with
a "Killed!" statement.

I've included the loop below with comments. My best thought is that there's
an issue with reusing variables or something I'm not doing that would make
garbage collection work well. What can I do to keep it from slowing down
the computer and dying regularly?

Thanks!

Hal
---------
sub downloadfiles {
my ($total, $idx, $num, @data, @line, %done);
#Get the list of already downloaded files and convert to an array in another
# module, then convert them to an array so I can quickly tell if a file has
# been downloaded.
@data = filetoarray($donefile);
foreach (@data) {
@line = split(/\t/, $_);
$done{pop(@line)} = 1;
}
#Get an array of the files to download (kill the first line,
# its column titles)
@data = filetoarray($showfile);
$total = $#data;
$idx = 0;
shift(@data);
foreach (@data) {
#Get is whether or not to get the file, other variables are
# self explanitory
my ($get, $show, $episode, $file, $url) = split(/\t/, $_);
my ($data, $loc);
$idx++;
if ($_ =~ /^#/) {next;}
if (!$_) {next;}
if ($get) {
#If it's flagged to download, see if it has been downloaded already
if ($done{$url}) {
print "Already downloaded file: $show: $episode\n";
next;
}
#Make the name of the file we're writing data to
$loc = catfile($fileroot, $show, $file);
$loc =~ s/ //g;
$loc =~ s/,//g;
#Makes sure any non-existent directories in the path name are
# created
makedirpath($loc);
$num = " ".$idx;
$num = substr($num, length($num) - length($total));
print "Downloading file ".$num." of $total: $show: $episode\n";
#Download the file with WWW::Mechanize
#Isn't $data reinitialized each loop? Could old copies not be cleared
#by garbage collector?
$data = getpage($url);
open(OUT, ">$loc");
print OUT $data;
close OUT;
#Use lltag to create appropriate tags for the file
filetagmod($loc, $show, $episode);
open(DOWN, ">>$donefile");
print DOWN "$show\t$episode\t$loc\t$url\n";
close DOWN;
}
}
return;
}

Re: Program Dies after 60 or so Iterations of Loop

am 26.08.2007 13:07:02 von Sisyphus

"Hal Vaughan" wrote in message
news:i-KdnWQuLpP_T03bnZ2dneKdnZydnZ2d@comcast.com...
..
..
> .... then the program stops with
> a "Killed!" statement.

That's not something that would be coming from perl (afaik).
Sounds more like something that the OS is doing.

Does your ISP impose any limits that may be coming into play ?

Cheers,
Rob

Re: Program Dies after 60 or so Iterations of Loop

am 26.08.2007 13:44:10 von hjp-usenet2

On 2007-08-26 01:07, Hal Vaughan wrote:
> I found the old time radio shows a the Interet Archives, so I wrote a
> program to scan the shows, then the episodes and let me list the ones I
> wanted to download. The program then goes through the list of files and
> downloads each file (most are MP3, some are Ogg). Files are at least a few
> megabytes in length. The problem is that after about 60 or so downloads
> (it varies), my computer (running Linux, Ubuntu Fiesty Fawn) slows down for
> about 30 minutes (not an exaggeration), then the program stops with
> a "Killed!" statement.

That sounds like you are exhausting virtual memory. When the total
memory used by all processes approaches the sum of RAM and swap space,
the system spends more and more time to finding yet another piece of
memory it doesn't need immediately and could reuse. If you are unlucky,
the system becomes completely unresponsive. However, most of the time
the system just gives up at some point and kills a process (hopefully
the one which caused this sorry state).

Run "top" in a second window to confirm this.

> #Download the file with WWW::Mechanize
> #Isn't $data reinitialized each loop? Could old copies not be cleared
> #by garbage collector?
> $data = getpage($url);

$data is reinitialized, but WWW::Mechanize may cache downloaded
documents. I vaguely remember that this has been discussed here before
and that there is a way to turn this off even though perldoc
WWW::Mechanize doesn't seem to mention it. Google should find it.

If all else fails, you can use lower level packages like LWP::UserAgent
to do the heavy lifting.

hp


--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"

Re: Program Dies after 60 or so Iterations of Loop

am 27.08.2007 02:11:02 von Hal Vaughan

Peter J. Holzer wrote:

> On 2007-08-26 01:07, Hal Vaughan wrote:
>> I found the old time radio shows a the Interet Archives, so I wrote a
>> program to scan the shows, then the episodes and let me list the ones I
>> wanted to download. The program then goes through the list of files and
>> downloads each file (most are MP3, some are Ogg). Files are at least a
>> few
>> megabytes in length. The problem is that after about 60 or so downloads
>> (it varies), my computer (running Linux, Ubuntu Fiesty Fawn) slows down
>> for about 30 minutes (not an exaggeration), then the program stops with
>> a "Killed!" statement.
>
> That sounds like you are exhausting virtual memory. When the total
> memory used by all processes approaches the sum of RAM and swap space,
> the system spends more and more time to finding yet another piece of
> memory it doesn't need immediately and could reuse. If you are unlucky,
> the system becomes completely unresponsive. However, most of the time
> the system just gives up at some point and kills a process (hopefully
> the one which caused this sorry state).
>
> Run "top" in a second window to confirm this.

I figured it was a virtual memory and swap issue as well, but I figured I'm
best just saying the symptoms instead of assuming because I find that there
is often an aspect I don't know about.

I had problems with running top. There was a slight slow down, then
suddenly a big one and it took forever to change to the Konsole window, so
I switched to my server, which always has an ssh connection open to my
workstation for cases like this and tried running top. By the the time it
came up in Konsole or ssh, the Perl program was already terminated and not
showing and the system was cleaning up.

>> #Download the file with WWW::Mechanize
>> #Isn't $data reinitialized each loop? Could old copies not be cleared
>> #by garbage collector?
>> $data = getpage($url);
>
> $data is reinitialized, but WWW::Mechanize may cache downloaded
> documents. I vaguely remember that this has been discussed here before
> and that there is a way to turn this off even though perldoc
> WWW::Mechanize doesn't seem to mention it. Google should find it.

I found it by Googling for "Perl mechanize memory." I tried several
different phrases, but they didn't help until I tried that one. Here's the
trick. I was creating Mech like this:

Re: Program Dies after 60 or so Iterations of Loop

am 27.08.2007 02:14:53 von Hal Vaughan

I hit the wrong key and sent by mistake. Response continued below:

Hal Vaughan wrote:

> Peter J. Holzer wrote:
>
>> On 2007-08-26 01:07, Hal Vaughan wrote:
>>> I found the old time radio shows a the Interet Archives, so I wrote a
>>> program to scan the shows, then the episodes and let me list the ones I
>>> wanted to download. The program then goes through the list of files and
>>> downloads each file (most are MP3, some are Ogg). Files are at least a
>>> few
>>> megabytes in length. The problem is that after about 60 or so downloads
>>> (it varies), my computer (running Linux, Ubuntu Fiesty Fawn) slows down
>>> for about 30 minutes (not an exaggeration), then the program stops with
>>> a "Killed!" statement.
>>
>> That sounds like you are exhausting virtual memory. When the total
>> memory used by all processes approaches the sum of RAM and swap space,
>> the system spends more and more time to finding yet another piece of
>> memory it doesn't need immediately and could reuse. If you are unlucky,
>> the system becomes completely unresponsive. However, most of the time
>> the system just gives up at some point and kills a process (hopefully
>> the one which caused this sorry state).
>>
>> Run "top" in a second window to confirm this.
>
> I figured it was a virtual memory and swap issue as well, but I figured
> I'm best just saying the symptoms instead of assuming because I find that
> there is often an aspect I don't know about.
>
> I had problems with running top. There was a slight slow down, then
> suddenly a big one and it took forever to change to the Konsole window, so
> I switched to my server, which always has an ssh connection open to my
> workstation for cases like this and tried running top. By the the time it
> came up in Konsole or ssh, the Perl program was already terminated and not
> showing and the system was cleaning up.
>
>>> #Download the file with WWW::Mechanize
>>> #Isn't $data reinitialized each loop? Could old copies not be cleared
>>> #by garbage collector?
>>> $data = getpage($url);
>>
>> $data is reinitialized, but WWW::Mechanize may cache downloaded
>> documents. I vaguely remember that this has been discussed here before
>> and that there is a way to turn this off even though perldoc
>> WWW::Mechanize doesn't seem to mention it. Google should find it.
>
> I found it by Googling for "Perl mechanize memory." I tried several
> different phrases, but they didn't help until I tried that one. Here's
> the
> trick. I was creating Mech like this:

$mech = WWW::Mechanize->new;

By default, Mech caches all the pages (or binary files) it downloads in
memory. I changed it to this:

$mech = WWW::Mechanize->new(stack_depth => 3);

If the stack_depth is set to 0, it'll cache all pages (the default). I
experimented and got VERY slow downloads with a stack_depth of 1 (it
doesn't make sense why, but I don't know what else that effects), so I
experimented. Maybe it was just chance that when I changed it from 1 to 3
it worked, but I'm going to leave it there until it finishes it's current
batch of files before testing it to be sure the speed is the same with a
value of 1.

Thanks! I had never imagined it was an issue with the module. I still tend
to think I've missed something. I'm self-taught and I eem to constantly
find new holes in what I've learned as I keep going.

Hal

Re: Program Dies after 60 or so Iterations of Loop

am 27.08.2007 02:15:28 von Hal Vaughan

Sisyphus wrote:

>
> "Hal Vaughan" wrote in message
> news:i-KdnWQuLpP_T03bnZ2dneKdnZydnZ2d@comcast.com...
> .
> .
>> .... then the program stops with
>> a "Killed!" statement.
>
> That's not something that would be coming from perl (afaik).
> Sounds more like something that the OS is doing.
>
> Does your ISP impose any limits that may be coming into play ?

It's the Mech module. Se the other branch of the thread.

Thanks for the help!

Hal