Reducing memory usage using fewer cgi programs

am 17.10.2008 19:37:09 von Thomas Hilbig

I have about a dozen small cgi programs under mod_perl2 that all pretty well look like this..

use CGI;
use DBI;
use perlchartdir ; # graphing
fetch parameters
build SQL and fetch data from database
build graph image from data
send image

Under mod_perl, will the memory footprint of the libraries be shared across all of the programs that use them, or could I reduce the memory usage by merging all of the programs into one larger program (a maintenance problem, but worth while if it saves memory)?

I also use a PerlRequire startup.pl to preload most modules (CGI, DBI), but I thought that was only for quicker startup times and not for sharing memory. Is that correct?

Thanks, Tom

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Reducing memory usage using fewer cgi programs

am 17.10.2008 19:56:42 von mpeters

Thomas Hilbig wrote:
> I have about a dozen small cgi programs under mod_perl2 that all pretty well look like this..
>
> use CGI;
> use DBI;
> use perlchartdir ; # graphing

To think about how this works under mod_perl, pretend that all of your scripts are put together into
1 larger script and all those "use" statements are repeated. Does having multiple "use CGI"
statements make your script use more memory? No. CGI.pm is only loaded once.

> Under mod_perl, will the memory footprint of the libraries be shared across all of the programs that use them

The specifics depend on what OS you're running, what version of Apache/mod_perl you're running, but
it's basically like this: Each Apache child has it's own Perl interpreter and what is loaded in that
interpreter is persistant across requests. So different scripts can use the same CGI.pm or DBI that
you've loaded for another script. Combining them all into the same program won't make any noticeable
difference in memory one way or the other.

> I also use a PerlRequire startup.pl to preload most modules (CGI, DBI), but I thought that was only for quicker startup times and not for sharing memory. Is that correct?

Preloading helps with speed (you don't get the the initial loading hit for a module the first time
it's used in a specific process) but it can also help with memory on certain OSs. For instance,
Linux has Copy-On-Write memory so that if you preload modules it saves on actual physical RAM used
(even though the separate processes think they have their own separate memory spaces).

But remember that each Apache child get's it's own perl interpreter. So if you have a high
MaxClients you will run out of memory. It's basically Perl Memory * MaxClients for how much RAM
could be used if your system got busy. This is one of the reasons that most people put a vanilla
Apache (or something else like squid, lighttpd, varnish, etc) in front as a Proxy. When you do that,
even if you're running both the proxy and the mod_perl server on the same physical machine you need
a lot less RAM then if you just ran a mod_perl server trying to do static and dynamic requests.

HTH

--
Michael Peters
Plus Three, LP

Re: Reducing memory usage using fewer cgi programs

am 17.10.2008 23:33:26 von Thomas Hilbig

--- On Fri, 10/17/08, Michael Peters wrote:
> To think about how this works under mod_perl, pretend that
> all of your scripts are put together into
> 1 larger script and all those "use" statements
> are repeated. Does having multiple "use CGI"
> statements make your script use more memory? No. CGI.pm is
> only loaded once.

Thanks for clarifying that. I was never sure if the libraries would be shared or placed in their own namespace somehow.

> Preloading helps with speed (you don't get the the
> initial loading hit for a module the first time
> it's used in a specific process) but it can also help
> with memory on certain OSs. For instance,
> Linux has Copy-On-Write memory so that if you preload
> modules it saves on actual physical RAM used
> (even though the separate processes think they have their
> own separate memory spaces).

I'm using linux 2.6.23 with Apache 2.2.8

I did not know about the copy-on-write memory. I've probably heard the term many times, but always assumed it was in regards to a filesystem. That can definitely reduce the memory -- I must move more common libraries to startup.pl

> This is one of the reasons that most people put a vanilla
> Apache (or something else like squid, lighttpd, varnish,
> etc) in front as a Proxy. When you do that,
> even if you're running both the proxy and the mod_perl
> server on the same physical machine you need
> a lot less RAM then if you just ran a mod_perl server
> trying to do static and dynamic requests.
>

Hmmmm. I'm still serving mod_perl programs and static pages from one Apache server, with average load of about 1 hit per second. It's worth looking into some of these combinations a little more.

Thanks for your great answers!
Tom

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Reducing memory usage using fewer cgi programs

am 20.10.2008 14:11:54 von Carl Johnstone

>> I also use a PerlRequire startup.pl to preload most modules (CGI, DBI),
>> but I thought that was only for quicker startup times and not for sharing
>> memory. Is that correct?
>
> Preloading helps with speed (you don't get the the initial loading hit for
> a module the first time it's used in a specific process) but it can also
> help with memory on certain OSs.

Pre-loading *will* give you a longer startup time, as you have to pre-load
all the modules before apache can start.

However, as Michael says, the module would be loaded the first time a script
used it anyway - so you'd have the same delay but in the middle of a request
rather than at server start-up! Additionally you would have that delay in
each child process that apache creates in the entire life of the server.

Carl

Re: Reducing memory usage using fewer cgi programs

am 24.10.2008 05:16:48 von Thomas Hilbig

--- On Mon, 10/20/08, Carl Johnstone wrote:

> From: Carl Johnstone
> Subject: Re: Reducing memory usage using fewer cgi programs
> To: "Michael Peters" , thilbig@yahoo.com
> Cc: modperl@perl.apache.org
> Date: Monday, October 20, 2008, 8:11 AM
> >> I also use a PerlRequire startup.pl to preload most
> modules (CGI, DBI),
> >> but I thought that was only for quicker startup
> times and not for sharing
> >> memory. Is that correct?
> >
> > Preloading helps with speed (you don't get the the
> initial loading hit for
> > a module the first time it's used in a specific
> process) but it can also
> > help with memory on certain OSs.
>
> Pre-loading *will* give you a longer startup time, as you
> have to pre-load
> all the modules before apache can start.
>
> However, as Michael says, the module would be loaded the
> first time a script
> used it anyway - so you'd have the same delay but in
> the middle of a request
> rather than at server start-up! Additionally you would have
> that delay in
> each child process that apache creates in the entire life
> of the server.
>
> Carl

I was referring to script initialization (responding to that first request) and not the httpd daemon startup. Really, the only "startup"
that should be slower is when the whole httpd service is restarted (such as at server startup) since it would have to preload all modules for all standby daemons.

I would expect (or hope -- I don't really know) that any individual httpd daemons that get re-initialized later on automatically (when MaxRequestsPerChild is reached) would be done after the previous request so they are ready for the next request.

Something else I will do for my low-usage but "massive" scripts (those that have large memory structures and take several seconds to execute) is to place these in a non-mod_perl directory so I can be assured their memory usage goes away at the end of the response.

Tom

Re: Reducing memory usage using fewer cgi programs

am 24.10.2008 14:57:26 von Carl Johnstone

>>>>>
I was referring to script initialization (responding to that first request)
and not the httpd daemon startup. Really, the only "startup" that should be
slower is when the whole httpd service is restarted (such as at server
startup) since it would have to preload all modules for all standby daemons.
<<<<<

Sorry for the misunderstanding - startup to me only refers to server
startup. In my mod_perl setup all code is loaded at server startup, there is
no additional code initialisation once the server is running.

Additionally if you preload your modules and code, then it is initialised
only once in the parent apache process. When the child processes are forked
to handle the requests, they are an exact copy of the parent so inherit all
the perl code that was pre-loaded, saving additional load time.

>>>>>
I would expect (or hope -- I don't really know) that any individual httpd
daemons that get re-initialized later on automatically (when
MaxRequestsPerChild is reached) would be done after the previous request so
they are ready for the next request.
<<<<<

When MaxRequestsPerChild is reached, the child process shuts down at the end
of a request cycle. returning any distinct resources to the OS. The parent
then forks a new child process in exactly the same way as it did after
startup. The same process is followed depending your spare server
configuration settings.

>>>>>
Something else I will do for my low-usage but "massive" scripts (those that
have large memory structures and take several seconds to execute) is to
place these in a non-mod_perl directory so I can be assured their memory
usage goes away at the end of the response.
<<<<<

There's no reason not to run these under mod_perl too - any memory allocated
by perl will be re-used.

If you're really concerned and would rather the child process quits and
frees additional memory to the OS, then call $r->child_terminate in any of
your handlers, and the child process will automatically quit at the end of
the request (same as if it had hit it's MaxRequests limit)

Carl

Re: Reducing memory usage using fewer cgi programs

am 24.10.2008 15:03:52 von mpeters

Carl Johnstone wrote:

>>>>>>
> Something else I will do for my low-usage but "massive" scripts (those
> that have large memory structures and take several seconds to execute)
> is to place these in a non-mod_perl directory so I can be assured their
> memory usage goes away at the end of the response.
> <<<<<
>
> There's no reason not to run these under mod_perl too - any memory
> allocated by perl will be re-used.

This is only true if those structures were created during run time and go out of scope at run time.
If they are generated at compile time or attached to global variables or package level variables,
they will not be re-used by Perl.

--
Michael Peters
Plus Three, LP

Re: Reducing memory usage using fewer cgi programs

am 24.10.2008 17:03:40 von Perrin Harkins

On Fri, Oct 24, 2008 at 8:57 AM, Carl Johnstone
wrote:
> If you're really concerned and would rather the child process quits and
> frees additional memory to the OS, then call $r->child_terminate in any of
> your handlers, and the child process will automatically quit at the end of
> the request (same as if it had hit it's MaxRequests limit)

That's a good way to deal with it. This kind of thing can be useful
if you only run these large requests rarely. It will use about the
same amount of memory as doing it through CGI, but the mod_perl
version will not have the startup costs (at request time) so it will
finish and exit sooner than a CGI could.

- Perrin

Re: Reducing memory usage using fewer cgi programs

am 25.10.2008 05:59:25 von Michael Lackhoff

On 24.10.2008 15:03 Michael Peters wrote:

> This is only true if those structures were created during run time and go out of scope at run time.
> If they are generated at compile time or attached to global variables or package level variables,
> they will not be re-used by Perl.

Wait a minute, I would like to do exactly that: use a config module in
startup.pl that loads some massive config hashes in the hope that the
memory they use will be shared:

package MyConfig;
our $aHugeConfigHash = load_data_from_config_file();

then in my mod_perl module:
my $conf = $MyConfig::aHugeConfigHash;

(well sort of, it is actually wrapped in an accessor but that gets its
data from the package variable)

Are you saying, I cannot share the memory this way? And if so, is there
an alternative?

-Michael

Re: Reducing memory usage using fewer cgi programs

am 25.10.2008 13:32:48 von aw

Michael Lackhoff wrote:
> On 24.10.2008 15:03 Michael Peters wrote:
>
>> This is only true if those structures were created during run time and go out of scope at run time.
>> If they are generated at compile time or attached to global variables or package level variables,
>> they will not be re-used by Perl.
>
> Wait a minute, I would like to do exactly that: use a config module in
> startup.pl that loads some massive config hashes in the hope that the
> memory they use will be shared:
>
> package MyConfig;
> our $aHugeConfigHash = load_data_from_config_file();
>
> then in my mod_perl module:
> my $conf = $MyConfig::aHugeConfigHash;
>
> (well sort of, it is actually wrapped in an accessor but that gets its
> data from the package variable)
>
> Are you saying, I cannot share the memory this way?
Yes, he is saying that.
You cannot share memory between Apache "children" (independently of
whether we are talking about perl, mod_perl, or whatever else). Each
child is a separate process, with its separate copy of mod_perl, the
perl interpreter, global variables, everything.

What happens when you are using a mod_perl startup script is :
Apache will load mod_perl and perl, and compile and execute this script
(and all that it "use"'s) *before* it forks into multiple children.

So when Apache has finished its initialisation, and forks into multiple
children, each one of those will have its own copy of what was compiled
and run and initialised there, without needing to recompile and execute
them itself.
The same happens when in the future Apache creates a new child (by
forking again) : this new child will also get that same initial copy of
the modules and structures you compile/create at startup time.

To a certain extent, this can save memory under modern operating
systems, because a piece of memory that is identical for a number of
processes, can be in memory only once, and shared between processes, *as
long as nothing in it is modified*. (That's the "copy-on-write" thing).
But as soon as one of the processes modifies something in that memory
area, the OS will copy the entire area and give a new copy to the
process to modify, and after that the process keeps this "personal
copy". So any changes made to this table are invisible to the other
processes (Apache children), because they are still using the unmodified
"shared" original copy.

That can still be a huge time saving though. Imagine that loading this
table initially takes 2 minutes, and that you have 30 Apache children.
If you load it in your startup script, it will be done once and take 2
minutes. If you don't, it will be done in each new Apache child, and
take in total 60 minutes, plus 2 minutes each time a child dies and a
new one is started.

In your case, what that means is : if you allocate your huge hashtable
once at the beginning, and later you never modify it, then yes you can
probably consider that it will be loaded and present in memory only once
(but even that depends on how perl internally handles it).
But as soon as one of the Apache children modifies this hashtable, then
it is 100% sure that this process now has its own copy forever after.

Now, one of the characteristics of running things under mod_perl, is
that mod_perl and the perl interpreter are "persistent" within that
Apache child. In other words, it is the same mod_perl and perl
interpreter that execute many modules or scripts one after the other,
and they never themselves terminate.
And they do "remember" some things between consecutive runs of scripts
or modules. That is usually undesirable, because it can give nasty
errors : a variable that you declare with "my $var" and that you expect
to be "undef", might not be, if a previous run of the same script or
module (in the same Apache child) has left something in it.
But if you use this carefully, it may also be very useful, because it
might "remember" your hashtable between one call and the
next, and avoid you having to reload the table from scratch.
Just be careful about this, and remember always that when you find
something already in the table, it is due to a previous run of something
in this particular Apache child, not in Apache in general. You are
still not sharing this table with other Apache children and other
mod_perl and perl instances.

> And if so, is there an alternative?
>
There are several, which depend on what you really do with this data,
how often it is modified etc.
One alternative goes somewhat like this :
- the table is loaded from the original data in the startup script, and
a reference to it put in a global variable (our $hashtable)
- the startup script then writes the loaded table into a file, as a
Storable object, and initialises another global variable $stamp with the
current time.
- each time your application module/script starts, it compares its
global $stamp variable with the Storable file's timestamp. If they are
different (and only then), it reloads the table from the Storable file.
That, hopefully, is a lot faster than having to rebuild the table from
scratch.
If the table is mostly used read-only, and modifications to it are
unfrequent, that may be your best bet.
Of course if one process modifies the table, and the changes have to
become visible to the others, it needs to rewrite the Storable object,
with an appropriate inter-process locking mechanism.

The thing is that if a table is in a global variable, it will be kept in
the memory *of that Apache child* across separate invocations of the
application modules over time *executed by that same child*.
So if it does not change often, you may run the script hundreds of times
before it needs to reload the table.

Another alternative is to have this huge data structure loaded/created
by a totally independent "server" process, and have all your application
modules/scripts access this separate process through TCP/IP to read or
modify the table.
There exists a module like that somewhere in CPAN, I believe it is
called "daemon"-something.
IPC-based modules also exist, but they work only under Unix/Linux.

Re: Reducing memory usage using fewer cgi programs

am 25.10.2008 13:34:05 von Clinton Gormley

On Sat, 2008-10-25 at 05:59 +0200, Michael Lackhoff wrote:
> On 24.10.2008 15:03 Michael Peters wrote:
>
> > This is only true if those structures were created during run time and go out of scope at run time.
> > If they are generated at compile time or attached to global variables or package level variables,
> > they will not be re-used by Perl.

> Wait a minute, I would like to do exactly that: use a config module in
> startup.pl that loads some massive config hashes in the hope that the
> memory they use will be shared:

ï»¿I think there was some confusion here about what was being said.

Michael Peters' comment about memory reuse was saying that:
- if at runtime, you load a large memory structure
- then let those variables go out of scope
- then that memory will be come available for reuse by Perl

but
- if you load that same large memory structure at server startup
- and the variables don't go out of scope,
- then that memory stays used

>
> package MyConfig;
> our $aHugeConfigHash = load_data_from_config_file();
>
> (well sort of, it is actually wrapped in an accessor but that gets its
> data from the package variable)
>
> Are you saying, I cannot share the memory this way? And if so, is there
> an alternative?

I do exactly this - a lot of my website is config driven. At startup, I
load a large data structure (7,000 lines of YAML ~~ 700kB in memory),
that I can access from any module.

This data structure is considered by my app to be read-only, but not
enforced.

Plug: You may want to look at Config::Merge (my module for
loading a tree of ï»¿YAML / JSON / XML / Perl / INI /
Config::General files into a single config hash:

http://search.cpan.org/~drtech/Config-Merge-1.00/Merge.pm

Similarly, I preload almost all the modules that I will use at compile
time. I exclude certain seldom used large modules, such as PDF::API2,
which is require'd when needed.

This makes forking a new child very fast, thanks to copy-on-write in
linux.

Unfortunately, in Perl, there is no way to make sure that
data-that-will-never-change and compiled code is stored on a separate
page from data-that-will-change. So, it is likely that, while your
config data starts out completely shared, over time, it will become less
so.

As Stas Bekman said in his Improving mod_perl Sites' Performance
article:

http://www.perl.com/pub/a/2002/07/30/mod_perl.html

don't aim to make MaxRequestsPerChild = 10000, as the amount of memory
each child consumes will just increase over time, while creating a new
child is cheap, thanks to your preloading.

You may want to look at Linux::Smaps

http://search.cpan.org/author/OPI/Linux-Smaps-0.06/lib/Linux /Smaps.pm

and at smem.pl (a script which uses Linux::Smaps to print out memory
usage of a process)

http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps. html

to get some idea of just how shared the memory in your httpd processes
are.

Some examples of smem.pl output from my live site: (compare shared vs
private clean + private dirty)

PARENT PROCESS:
---------------
VMSIZE: 135036 kb
RSS: 45784 kb total
36616 kb shared
1012 kb private clean
8156 kb private dirty

OLDEST CHILD:
-------------
VMSIZE: 143780 kb
RSS: 53140 kb total
33912 kb shared
12 kb private clean
19216 kb private dirty

YOUNGEST CHILD:
---------------
VMSIZE: 138052 kb
RSS: 47172 kb total
36272 kb shared
0 kb private clean
10900 kb private dirty

hth

Clint

Re: Reducing memory usage using fewer cgi programs

am 25.10.2008 20:51:15 von Michael Lackhoff

On 25.10.2008 13:34 Clinton Gormley wrote:

> ï»¿I think there was some confusion here about what was being said.
>
> Michael Peters' comment about memory reuse was saying that:
> - if at runtime, you load a large memory structure
> - then let those variables go out of scope
> - then that memory will be come available for reuse by Perl
>
> but
> - if you load that same large memory structure at server startup
> - and the variables don't go out of scope,
> - then that memory stays used

Indeed, looks like I misunderstood what Michael Peters was saying.

> This makes forking a new child very fast, thanks to copy-on-write in
> linux.

This is what I am after.

> Unfortunately, in Perl, there is no way to make sure that
> data-that-will-never-change and compiled code is stored on a separate
> page from data-that-will-change. So, it is likely that, while your
> config data starts out completely shared, over time, it will become less
> so.

Good hint. I will have to see how I can limit this non-shared memory, if
need be by restarting the server from time to time to restore the
initial state with almost all memory shared.
(Or better by reducing MaxRequestsPerChild as you suggest)

> You may want to look at Linux::Smaps

I am not that far, yet. I am doing all my development work under Windows
and only for production I will use Linux (or Solaris if it needs bigger
iron).

Thanks (also to AndrÃ©)
-Michael

Re: Reducing memory usage using fewer cgi programs

am 26.10.2008 04:43:06 von Perrin Harkins

On Sat, Oct 25, 2008 at 7:32 AM, Andr=E9 Warnier wrote:
> And they do "remember" some things between consecutive runs of scripts
> or modules. That is usually undesirable, because it can give nasty
> errors : a variable that you declare with "my $var" and that you expect
> to be "undef", might not be, if a previous run of the same script or
> module (in the same Apache child) has left something in it.

Nit pick: a "my" variable is lexically scoped and will not retain its
value between requests unless your code is written in a way that makes
a closure around it, e.g.

my $foo =3D 7;

sub bar {
print $foo;
}

- Perrin

Re: Reducing memory usage using fewer cgi programs

am 26.10.2008 04:47:34 von Perrin Harkins

On Sat, Oct 25, 2008 at 7:34 AM, Clinton Gormley wrote:
> Michael Peters' comment about memory reuse was saying that:
> - if at runtime, you load a large memory structure
> - then let those variables go out of scope
> - then that memory will be come available for reuse by Perl

I won't try to speak for Michael, but technically that memory does not
become available for reuse by Perl, even if they are lexical variables
and they go out of scope, unless you explicitly undef the variables.
Perl keeps the memory for them allocated as a performance
optimization.

- Perrin

Re: Reducing memory usage using fewer cgi programs

am 29.10.2008 15:35:58 von Carl Johnstone

> I won't try to speak for Michael, but technically that memory does not
> become available for reuse by Perl, even if they are lexical variables
> and they go out of scope, unless you explicitly undef the variables.
> Perl keeps the memory for them allocated as a performance
> optimization.

That was the bit I was forgetting.

As said elsewhere I generally use a MaxRequestsPerChild value of 1000 which
means that any extra memory allocated in your child processes will be
returned to the OS fairly regularly and apache will fork a nice clean child.

Carl

Re: Reducing memory usage using fewer cgi programs

am 29.10.2008 15:41:00 von Carl Johnstone

> I am not that far, yet. I am doing all my development work under Windows
> and only for production I will use Linux (or Solaris if it needs bigger
> iron).

Something to bear in mind then is that apache on Windows is (usually)
multi-threaded rather than multi-process. So a lot of the discussion in this
topic wouldn't apply in your Windows environment, but will when you deploy
to Linux/Solaris.

Carl