parallel computing in perl?

am 13.09.2007 22:04:12 von Jie

Hi,

I have to randomly select a sample set and run it 1,000 times. The
following code that i am using now works fine, except it is taking
long time.
fore (1..1000) {
##get the random sample set and then run a command
}

Now I am thinking to split this program into 10 processes to reduce
the processing time. Of course, I can just change the first line to
"for (1..100)" and run this same program in 10 different locations.
but that is really tedi$$ous, and I believe there is a better way to
accomplish it so that a big job can be split into multiple small jobs.
Since I am repeatedly running a random sample set, there would be no
need to worry where each proces ends and another process begins.

Your insight is appreciated!!

jie

Re: parallel computing in perl?

am 13.09.2007 22:14:35 von glex_no-spam

Jie wrote:
> Hi,
>
> I have to randomly select a sample set and run it 1,000 times. The
> following code that i am using now works fine, except it is taking
> long time.
> fore (1..1000) {
Are we golfing? :-)
> ##get the random sample set and then run a command
> }
>
> Now I am thinking to split this program into 10 processes to reduce
> the processing time. Of course, I can just change the first line to
> "for (1..100)" and run this same program in 10 different locations.
> but that is really tedi$$ous, and I believe there is a better way to
> accomplish it so that a big job can be split into multiple small jobs.
> Since I am repeatedly running a random sample set, there would be no
> need to worry where each proces ends and another process begins.
>
> Your insight is appreciated!!

Check CPAN for: Parallel::ForkManager

Re: parallel computing in perl?

am 13.09.2007 22:33:38 von Peter Makholm

Jie writes:

> fore (1..1000) {
> ##get the random sample set and then run a command
> }
>
> Now I am thinking to split this program into 10 processes to reduce
> the processing time. Of course, I can just change the first line to
> "for (1..100)" and run this same program in 10 different locations.

You might want to look at Parallel::ForkManager. You code would look
like something along the way of

use Parallel::ForkManager;
my $pm = new Parallel::ForkManager 10;

for $data (1 .. 1000) {
my $pid = $pm->start and next;

## get the random sample and process it

$pm->finish;
}

$pm->wait_all_children;

//Makholm

Re: parallel computing in perl?

am 13.09.2007 22:39:51 von Jie

however, the problem for parallel computing is a potential file
sharing and overwritten.
for example, previously my code will generate a temporary file and the
next loop will overwrite it with a new generated file. There is no
problem because the overwriting happens after each process is
finished. now when I open 10 parallel processing for example, will
those 10 temporary files or 10 temporary hashs/arrays/variables get
messed up????

thanks!

jie

On Sep 13, 4:33 pm, Peter Makholm wrote:
> Jie writes:
> > fore (1..1000) {
> > ##get the random sample set and then run a command
> > }
>
> > Now I am thinking to split this program into 10 processes to reduce
> > the processing time. Of course, I can just change the first line to
> > "for (1..100)" and run this same program in 10 different locations.
>
> You might want to look at Parallel::ForkManager. You code would look
> like something along the way of
>
> use Parallel::ForkManager;
> my $pm = new Parallel::ForkManager 10;
>
> for $data (1 .. 1000) {
> my $pid = $pm->start and next;
>
> ## get the random sample and process it
>
> $pm->finish;
>
> }
>
> $pm->wait_all_children;
>
> //Makholm

Re: parallel computing in perl?

am 13.09.2007 23:06:20 von Peter Makholm

Jie writes:

> however, the problem for parallel computing is a potential file
> sharing and overwritten.
> for example, previously my code will generate a temporary file and the
> next loop will overwrite it with a new generated file.

Use File::Temp when dealing with temporary files. Then each loop
should overwrite eaqch other, not even when running in parallel.

> There is no problem because the overwriting happens after each
> process is finished. now when I open 10 parallel processing for
> example, will those 10 temporary files or 10 temporary
> hashs/arrays/variables get messed up????

Then perl variables isn't shared between fork'ed processes.

//Makholm

Re: parallel computing in perl?

am 13.09.2007 23:08:17 von Michele Dondi

On Thu, 13 Sep 2007 13:04:12 -0700, Jie wrote:

>I have to randomly select a sample set and run it 1,000 times. The
>following code that i am using now works fine, except it is taking
>long time.
>fore (1..1000) {
> ##get the random sample set and then run a command
>}
>
>Now I am thinking to split this program into 10 processes to reduce
>the processing time. Of course, I can just change the first line to
>"for (1..100)" and run this same program in 10 different locations.
>but that is really tedi$$ous, and I believe there is a better way to
>accomplish it so that a big job can be split into multiple small jobs.
>Since I am repeatedly running a random sample set, there would be no
>need to worry where each proces ends and another process begins.

Given the specs,

perldoc -f fork
perldoc perlipc

Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Re: parallel computing in perl?

am 14.09.2007 11:26:37 von Michele Dondi

On Thu, 13 Sep 2007 13:39:51 -0700, Jie wrote:

>next loop will overwrite it with a new generated file. There is no
>problem because the overwriting happens after each process is
>finished. now when I open 10 parallel processing for example, will
>those 10 temporary files or 10 temporary hashs/arrays/variables get
>messed up????

Variables belong each to their own process. As far as the files are
concerned, just create ten *different* ones. File::Temp may be useful.

Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Re: parallel computing in perl?

am 14.09.2007 16:27:12 von Jie

Hi, thank you very much for the replies.

I think below would be the code to do it.
I don't know if I used the right syntax to open a temporary file...
Also, I don't know if i need to use "$pm->wait_all_children;" as
suggested by Peter

==========================================================
use File::Temp
use Parallel::ForkManager;

my $pm = new Parallel::ForkManager(10);

for $data (1 .. 1000) {
my $pid = $pm->start and next;
open TEMP_FILE, tempfile();
## Do something with this temp_file
$pm->finish;
}
=========================================================

On Sep 14, 5:26 am, Michele Dondi wrote:
> On Thu, 13 Sep 2007 13:39:51 -0700, Jie wrote:
> >next loop will overwrite it with a new generated file. There is no
> >problem because the overwriting happens after each process is
> >finished. now when I open 10 parallel processing for example, will
> >those 10 temporary files or 10 temporary hashs/arrays/variables get
> >messed up????
>
> Variables belong each to their own process. As far as the files are
> concerned, just create ten *different* ones. File::Temp may be useful.
>
> Michele
> --
> {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
> (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ > .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER > 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Re: parallel computing in perl?

am 14.09.2007 17:30:18 von glex_no-spam

Jie wrote:
> Hi, thank you very much for the replies.
>
> I think below would be the code to do it.
> I don't know if I used the right syntax to open a temporary file...
> Also, I don't know if i need to use "$pm->wait_all_children;" as
> suggested by Peter
>
> ==========================================================
> use File::Temp
> use Parallel::ForkManager;

Really??.. that works??..

If you want to know the right syntax, or what a method does,
you may get the answer by actually reading the documentation.

perldoc File::Temp
perldoc Parallel::ForkManager

Re: parallel computing in perl?

am 15.09.2007 13:04:49 von Michele Dondi

On Fri, 14 Sep 2007 07:27:12 -0700, Jie wrote:

>I think below would be the code to do it.
>I don't know if I used the right syntax to open a temporary file...
[snip]
> open TEMP_FILE, tempfile();

Usual recommendations:

1. use lexical filehandles;
2. use three-args form of open();
3. check for success.

open my $tempfile, '+>', tempfile or die badly;

I changed the mode open because I suppose that you want to create the
tempfile for writing and then read back stuff out of it. If you don't
need the file to have a name, or to know it, then you can avoid
File::Temp and let perl do it easily for you:

open my $tempfile, '+>', undef or die badly;

Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,