mod_perl output filter and mod_proxy, mod_cache

mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 12:00:53 von Tim Watts

Hi,

Is it in theory possible to insert a perl output filter between
mod_proxy and mod_cache?

Or at least between mod_proxy and the client?



The problem I'm trying to solve is this:

We have 100+ web servers where apache fronts a separate tomcat server
using mod_proxy.

Sadly, the tomcat dev's forgot to set any caching headers in the HTTP
response (either Expires, Last-Modified or Cache-control) so the sites
are largely uncacheable by browsers and the various tomcats are becoming
overloaded.

1/3 of our sites are typically invariant (the production sites have
stable and unchanging data and most queries are via GET requests).

Therefore, the idea of forcing in some cache control headers en-route
and also enabling some apache caching has a good chance of working well
without affecting anything.

mod_headers and mod_proxy don't seem to play well together and mod-cache
doesn't either (probably due to lack of cache control headers in the
tomcat response, though I haven't proved this is actually the case).

So the thought of doing a perl based filter to insert cache-control
headers occurred.

It is likely I can insert such a filter on Apache 2.2 *between*
mod_proxy and mod_cache?

Or am I going to have to implement a filter that includes proxying
and/or caching?

Many thanks for any advice,

Cheers,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 12:16:15 von aw

Tim Watts wrote:
> Hi,
>
> Is it in theory possible to insert a perl output filter between
> mod_proxy and mod_cache?
>
> Or at least between mod_proxy and the client?
>
>
>
> The problem I'm trying to solve is this:
>
> We have 100+ web servers where apache fronts a separate tomcat server
> using mod_proxy.
>
> Sadly, the tomcat dev's forgot to set any caching headers in the HTTP
> response (either Expires, Last-Modified or Cache-control) so the sites
> are largely uncacheable by browsers and the various tomcats are becoming
> overloaded.
>
> 1/3 of our sites are typically invariant (the production sites have
> stable and unchanging data and most queries are via GET requests).
>
> Therefore, the idea of forcing in some cache control headers en-route
> and also enabling some apache caching has a good chance of working well
> without affecting anything.
>
> mod_headers and mod_proxy don't seem to play well together and mod-cache
> doesn't either (probably due to lack of cache control headers in the
> tomcat response, though I haven't proved this is actually the case).
>
> So the thought of doing a perl based filter to insert cache-control
> headers occurred.
>
> It is likely I can insert such a filter on Apache 2.2 *between*
> mod_proxy and mod_cache?
>
> Or am I going to have to implement a filter that includes proxying
> and/or caching?
>
(That would probably be difficult, inefficient or both)

Assuming that what you say about Tomcat is true (I don't know, and it may be worth asking
this on the Tomcat list), I can think of another way to achieve what you seem to want :
if you can distinguish, from the request URL (or any other request property), the requests
that are for invariant things, then you could arrange to /not/ proxy these requests to
Tomcat, and serve them directly from Apache httpd.

Which proxying method exactly are you using between Apache and Tomcat ? (if you are using
mod_proxy, then you are either using mod_proxy_http or mod_proxy_ajp; you could also
consider using mod_jk).

Also, what are the versions of Apache and Tomcat that you are using ?

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 12:39:06 von Tim Watts

On 14/07/11 11:16, André Warnier wrote:

Hi Andre,

Thanks for the quick reply :)

> (That would probably be difficult, inefficient or both)
>
> Assuming that what you say about Tomcat is true (I don't know, and it
> may be worth asking this on the Tomcat list), I can think of another way
> to achieve what you seem to want :
> if you can distinguish, from the request URL (or any other request
> property), the requests that are for invariant things, then you could
> arrange to /not/ proxy these requests to Tomcat, and serve them directly
> from Apache httpd.

Indeed that is a good idea. We are doing that for new projects for css
and js files (apache does not proxy certain paths and picks these up
from the local filesystem).

We can't do that for the 100 odd legacy servers as no-one has time o
delve into the java/JSP code. I need to do something "outside" of tomcat
where possible. Just to explain, each web server is a paid-for project -
and when it's done, it sits there for 5+ years.

Only I have the time/inclination to fix this as it's killing my VMWare
infrastructure. Because the sites are all fronted by apache in a similar
way, one solution is likely to apply to most of the sites.

I would also add that most of the sites are "dynamically" driven pages,
even involving MySQL querying, but once launched, the data remains
fairly static - eg GET X will always resolve to reponse Y.

I'm planning a small seminar on the value of Cache-Control for my dev
colleagues so they can stop making this mistake ;-> But that still
leaves a lot of "done" projects to fix.

> Which proxying method exactly are you using between Apache and Tomcat ?
> (if you are using mod_proxy, then you are either using mod_proxy_http or
> mod_proxy_ajp; you could also consider using mod_jk).

mod_proxy_http specifically.

mod_jk looks interesting for new projects (we have local tomcats for
those now) - I think it may be a non-starter for old stuff as trying to
retro fit it may not be so simple (our older tomcat servers are in a
remote farm on their own machines hence the use of mod_proxy_http).

> Also, what are the versions of Apache and Tomcat that you are using ?
>

Apache 2.2 (various sub versions) and both tomcat 5.5 and tomcat 6 (but
all on remote machines listening on TCP sockets).

I think for this problem, I have to treat tomcat as a little, rather
inefficient, black box and try to fixup on the apache front ends, hence
the direction of my original idea...

Cheers,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 12:52:49 von ajgb

Hi Tim,

If you are after caching the responses, maybe an easier solution would
be to use a reverse proxy - like Varnish?

You would be then in complete control over the incoming and outgoing
headers and could cache responses based on the url / inject Expires
headers so browsers could cache them too etc.

Cheers,
Alex


On 14/07/11 11:39, Tim Watts wrote:
> On 14/07/11 11:16, André Warnier wrote:
>
> Hi Andre,
>
> Thanks for the quick reply :)
>
>> (That would probably be difficult, inefficient or both)
>>
>> Assuming that what you say about Tomcat is true (I don't know, and it
>> may be worth asking this on the Tomcat list), I can think of another way
>> to achieve what you seem to want :
>> if you can distinguish, from the request URL (or any other request
>> property), the requests that are for invariant things, then you could
>> arrange to /not/ proxy these requests to Tomcat, and serve them directly
>> from Apache httpd.
>
> Indeed that is a good idea. We are doing that for new projects for css
> and js files (apache does not proxy certain paths and picks these up
> from the local filesystem).
>
> We can't do that for the 100 odd legacy servers as no-one has time o
> delve into the java/JSP code. I need to do something "outside" of
> tomcat where possible. Just to explain, each web server is a paid-for
> project - and when it's done, it sits there for 5+ years.
>
> Only I have the time/inclination to fix this as it's killing my VMWare
> infrastructure. Because the sites are all fronted by apache in a
> similar way, one solution is likely to apply to most of the sites.
>
> I would also add that most of the sites are "dynamically" driven
> pages, even involving MySQL querying, but once launched, the data
> remains fairly static - eg GET X will always resolve to reponse Y.
>
> I'm planning a small seminar on the value of Cache-Control for my dev
> colleagues so they can stop making this mistake ;-> But that still
> leaves a lot of "done" projects to fix.
>
>> Which proxying method exactly are you using between Apache and Tomcat ?
>> (if you are using mod_proxy, then you are either using mod_proxy_http or
>> mod_proxy_ajp; you could also consider using mod_jk).
>
> mod_proxy_http specifically.
>
> mod_jk looks interesting for new projects (we have local tomcats for
> those now) - I think it may be a non-starter for old stuff as trying
> to retro fit it may not be so simple (our older tomcat servers are in
> a remote farm on their own machines hence the use of mod_proxy_http).
>
>> Also, what are the versions of Apache and Tomcat that you are using ?
>>
>
> Apache 2.2 (various sub versions) and both tomcat 5.5 and tomcat 6
> (but all on remote machines listening on TCP sockets).
>
> I think for this problem, I have to treat tomcat as a little, rather
> inefficient, black box and try to fixup on the apache front ends,
> hence the direction of my original idea...
>
> Cheers,
>
> Tim
>

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 13:04:11 von Tim Watts

On 14/07/11 11:52, "Alex J. G. Burzyński" wrote:
> Hi Tim,
>
> If you are after caching the responses, maybe an easier solution would
> be to use a reverse proxy - like Varnish?
>
> You would be then in complete control over the incoming and outgoing
> headers and could cache responses based on the url / inject Expires
> headers so browsers could cache them too etc.
>
> Cheers,
> Alex
>

[Sorry Alex, hit reply instead of reply-list]

Hi Alex,

I was initially also thinking Squid - but it's rather heavy.

I have not come across Varnish but having a quick look (and noting it is
available on Debian - good) it looks like a damn good option.

I think you are right - apache is great, but the order of execution of
modules is not well documented and prone to changing (hence my original
question here) and trying to splice effectively 3 filters together
(proxy, header-fiddling and cache) is probably doomed to grief.

Thanks for the tip - I'm off to try that today!

All the best,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 13:43:57 von aw

Hi.

I have to apologise.
I misunderstood your first post, and I wanted to verify on the Tomcat list, so I quoted
the following passage of your first post in my message there :

"Sadly, the tomcat dev's forgot to set any caching headers in the HTTP response (either
Expires, Last-Modified or Cache-control) so the sites are largely uncacheable by browsers
and the various tomcats are becoming overloaded."

Unfortunately, the Tomcat Dev's there took it rather seriously, and as a consequence now
you name is shit on the Tomcat list.


... just kidding, I did not quote your name.

Anyway, apart from a few huffed responses to my misquote (since then rectified), someone
provided a suggestion that may not be the simplest, but might be helpful anyway in some
cases :

Have a look at : http://www.tuckey.org/urlrewrite/

This is a "Java Servlet Filter", which can be added transparently "around" any Tomcat web
application (by adding the required section in the web.xml config file of that web
application).
Java Servlet Filters are such that the Tomcat web application is not even aware that it is
there, and continues to work as before. Much like Apache input and output filters in
fact, except that a Java Servlet Filter is both at the same time (it "wraps" the webapp on
both sides).

Anyway, this filter can do such things as conditionally or not adding response headers to
anything the webapp produces. And it can do much more, as with time it has evolved into
some kind of mish-mash of mod_rewrite, mod_headers and mod_proxy.

It is more one-by-one work than doing something at the Apache front-end level or via a
proxy, but it also provides better fine-tuning possibilities.
So, if you can for instance easily identify the worst offenders, it might be an option.

And it is certainly a good tool to have in one's toolcase.

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 14:11:51 von Tim Watts

On 14/07/11 12:43, André Warnier wrote:
> Hi.
>
> I have to apologise.
> I misunderstood your first post, and I wanted to verify on the Tomcat
> list, so I quoted the following passage of your first post in my message
> there :
>
> "Sadly, the tomcat dev's forgot to set any caching headers in the HTTP
> response (either Expires, Last-Modified or Cache-control) so the sites
> are largely uncacheable by browsers and the various tomcats are becoming
> overloaded."
>
> Unfortunately, the Tomcat Dev's there took it rather seriously, and as a
> consequence now you name is shit on the Tomcat list.
>
>
> .. just kidding, I did not quote your name.

LoL - I hate tomcat anyway (for it's fatness) so I don't mind if they
hate me ;->

I should have clarified as "my Department's dev team" (ie the ones who
use tomcat here) rather than the Tomcat Developers themselves...

I have no doubts that jsp can be told to emit certain headers but for
some reason a lot of web developers IME often miss the finer points of
HTTP. This of course would be the correct place to do it as they can
choose different max-age times to suit the content.

I plan to run a 20 minute seminar on this specific point for my lot (and
more such seminars for other issues like security and SQL efficiency)
but that still leaves loads of old black-boxes to manage for a few years.

> Anyway, apart from a few huffed responses to my misquote (since then
> rectified), someone provided a suggestion that may not be the simplest,
> but might be helpful anyway in some cases :
>
> Have a look at : http://www.tuckey.org/urlrewrite/
>
> This is a "Java Servlet Filter", which can be added transparently
> "around" any Tomcat web application (by adding the required section in
> the web.xml config file of that web application).
> Java Servlet Filters are such that the Tomcat web application is not
> even aware that it is there, and continues to work as before. Much like
> Apache input and output filters in fact, except that a Java Servlet
> Filter is both at the same time (it "wraps" the webapp on both sides).

That could be interesting too - as long as it's something I can bolt in
without having to recompile the webapp code, I'm game. As a linux
sysadmin, I draw a clear line between the systems (my problem) and the
apps (dev team) - and not knowing java (much) I'm not qualified to mess
with their stuff... I'm happy to go as far as messing with server.xml
and web.xml though :)

> Anyway, this filter can do such things as conditionally or not adding
> response headers to anything the webapp produces. And it can do much
> more, as with time it has evolved into some kind of mish-mash of
> mod_rewrite, mod_headers and mod_proxy.
>
> It is more one-by-one work than doing something at the Apache front-end
> level or via a proxy, but it also provides better fine-tuning
> possibilities.
> So, if you can for instance easily identify the worst offenders, it
> might be an option.
>
> And it is certainly a good tool to have in one's toolcase.

I agree - I'll have a look at that after I play with Alex's suggestion
of Varnish :)

Thanks very much for your time :)

all the best,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 15:12:16 von js5

On 14/07/2011 11:39, Tim Watts wrote:
> On 14/07/11 11:16, André Warnier wrote:
>
> Hi Andre,
>
> Thanks for the quick reply :)
>
>> (That would probably be difficult, inefficient or both)
>>
>> Assuming that what you say about Tomcat is true (I don't know, and it
>> may be worth asking this on the Tomcat list), I can think of another way
>> to achieve what you seem to want :
>> if you can distinguish, from the request URL (or any other request
>> property), the requests that are for invariant things, then you could
>> arrange to /not/ proxy these requests to Tomcat, and serve them directly
>> from Apache httpd.
>
> Indeed that is a good idea. We are doing that for new projects for css
> and js files (apache does not proxy certain paths and picks these up
> from the local filesystem).
>
> We can't do that for the 100 odd legacy servers as no-one has time o
> delve into the java/JSP code. I need to do something "outside" of
> tomcat where possible. Just to explain, each web server is a paid-for
> project - and when it's done, it sits there for 5+ years.
>
> Only I have the time/inclination to fix this as it's killing my VMWare
> infrastructure. Because the sites are all fronted by apache in a
> similar way, one solution is likely to apply to most of the sites.
>
> I would also add that most of the sites are "dynamically" driven
> pages, even involving MySQL querying, but once launched, the data
> remains fairly static - eg GET X will always resolve to reponse Y.
>
> I'm planning a small seminar on the value of Cache-Control for my dev
> colleagues so they can stop making this mistake ;-> But that still
> leaves a lot of "done" projects to fix.
>
>> Which proxying method exactly are you using between Apache and Tomcat ?
>> (if you are using mod_proxy, then you are either using mod_proxy_http or
>> mod_proxy_ajp; you could also consider using mod_jk).
>
> mod_proxy_http specifically.
>
> mod_jk looks interesting for new projects (we have local tomcats for
> those now) - I think it may be a non-starter for old stuff as trying
> to retro fit it may not be so simple (our older tomcat servers are in
> a remote farm on their own machines hence the use of mod_proxy_http).
>
Shouldn't be an issue you can point the mod_jk to a remote machine - I
do it a lot so that we can push the Tomcat application out through our
templating output filter ... The tomcat produces a plain HTML page with
none of the styling, and this is wrapped using our custom output filter,
I'm guessing at this stage you can do what you want with the script...

James

>> Also, what are the versions of Apache and Tomcat that you are using ?
>>
>
> Apache 2.2 (various sub versions) and both tomcat 5.5 and tomcat 6
> (but all on remote machines listening on TCP sockets).
>
> I think for this problem, I have to treat tomcat as a little, rather
> inefficient, black box and try to fixup on the apache front ends,
> hence the direction of my original idea...
>
> Cheers,
>
> Tim
>



--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

RE: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 15:15:07 von James.B.Muir

SSBoYWQgdG8gYm9sdCBvbiBhbiBpbnB1dCBzZXJ2bGV0IGZpbHRlciB0byB0 b21jYXQgb25jZS4g
VG8gZG8gdGhpcyBJIGhhZCB0byB3cml0ZSB0aGUgc2VydmxldCBmaWx0ZXIg Y29kZSBhbmQgdGhl
biBhZGQgPGZpbHRlcj4gYW5kIDxmaWx0ZXItbWFwcGluZz4gdGFncyB0byB0 aGUgYXBwbGljYXRp
b24gV0VCLUlORi93ZWIueG1sIGZpbGUuDQotSmFtZXMNCg0KDQotLS0tLU9y aWdpbmFsIE1lc3Nh
Z2UtLS0tLQ0KRnJvbTogVGltIFdhdHRzIFttYWlsdG86dHdAZGlvbmljLm5l dF0NClNlbnQ6IFRo
dXJzZGF5LCBKdWx5IDE0LCAyMDExIDg6MTIgQU0NClRvOiBtb2RfcGVybCBs aXN0DQpTdWJqZWN0
OiBSZTogbW9kX3Blcmwgb3V0cHV0IGZpbHRlciBhbmQgbW9kX3Byb3h5LCBt b2RfY2FjaGUNCg0K
T24gMTQvMDcvMTEgMTI6NDMsIEFuZHLDqSBXYXJuaWVyIHdyb3RlOg0KPiBI aS4NCj4NCj4gSSBo
YXZlIHRvIGFwb2xvZ2lzZS4NCj4gSSBtaXN1bmRlcnN0b29kIHlvdXIgZmly c3QgcG9zdCwgYW5k
IEkgd2FudGVkIHRvIHZlcmlmeSBvbiB0aGUgVG9tY2F0DQo+IGxpc3QsIHNv IEkgcXVvdGVkIHRo
ZSBmb2xsb3dpbmcgcGFzc2FnZSBvZiB5b3VyIGZpcnN0IHBvc3QgaW4gbXkg bWVzc2FnZQ0KPiB0
aGVyZSA6DQo+DQo+ICJTYWRseSwgdGhlIHRvbWNhdCBkZXYncyBmb3Jnb3Qg dG8gc2V0IGFueSBj
YWNoaW5nIGhlYWRlcnMgaW4gdGhlIEhUVFANCj4gcmVzcG9uc2UgKGVpdGhl ciBFeHBpcmVzLCBM
YXN0LU1vZGlmaWVkIG9yIENhY2hlLWNvbnRyb2wpIHNvIHRoZSBzaXRlcw0K PiBhcmUgbGFyZ2Vs
eSB1bmNhY2hlYWJsZSBieSBicm93c2VycyBhbmQgdGhlIHZhcmlvdXMgdG9t Y2F0cyBhcmUgYmVj
b21pbmcNCj4gb3ZlcmxvYWRlZC4iDQo+DQo+IFVuZm9ydHVuYXRlbHksIHRo ZSBUb21jYXQgRGV2
J3MgdGhlcmUgdG9vayBpdCByYXRoZXIgc2VyaW91c2x5LCBhbmQgYXMgYQ0K PiBjb25zZXF1ZW5j
ZSBub3cgeW91IG5hbWUgaXMgc2hpdCBvbiB0aGUgVG9tY2F0IGxpc3QuDQo+ DQo+DQo+IC4uIGp1
c3Qga2lkZGluZywgSSBkaWQgbm90IHF1b3RlIHlvdXIgbmFtZS4NCg0KTG9M IC0gSSBoYXRlIHRv
bWNhdCBhbnl3YXkgKGZvciBpdCdzIGZhdG5lc3MpIHNvIEkgZG9uJ3QgbWlu ZCBpZiB0aGV5DQpo
YXRlIG1lIDstPg0KDQpJIHNob3VsZCBoYXZlIGNsYXJpZmllZCBhcyAibXkg RGVwYXJ0bWVudCdz
IGRldiB0ZWFtIiAoaWUgdGhlIG9uZXMgd2hvDQp1c2UgdG9tY2F0IGhlcmUp IHJhdGhlciB0aGFu
IHRoZSBUb21jYXQgRGV2ZWxvcGVycyB0aGVtc2VsdmVzLi4uDQoNCkkgaGF2 ZSBubyBkb3VidHMg
dGhhdCBqc3AgY2FuIGJlIHRvbGQgdG8gZW1pdCBjZXJ0YWluIGhlYWRlcnMg YnV0IGZvcg0Kc29t
ZSByZWFzb24gYSBsb3Qgb2Ygd2ViIGRldmVsb3BlcnMgSU1FIG9mdGVuIG1p c3MgdGhlIGZpbmVy
IHBvaW50cyBvZg0KSFRUUC4gVGhpcyBvZiBjb3Vyc2Ugd291bGQgYmUgdGhl IGNvcnJlY3QgcGxh
Y2UgdG8gZG8gaXQgYXMgdGhleSBjYW4NCmNob29zZSBkaWZmZXJlbnQgbWF4 LWFnZSB0aW1lcyB0
byBzdWl0IHRoZSBjb250ZW50Lg0KDQpJIHBsYW4gdG8gcnVuIGEgMjAgbWlu dXRlIHNlbWluYXIg
b24gdGhpcyBzcGVjaWZpYyBwb2ludCBmb3IgbXkgbG90IChhbmQNCm1vcmUg c3VjaCBzZW1pbmFy
cyBmb3Igb3RoZXIgaXNzdWVzIGxpa2Ugc2VjdXJpdHkgYW5kIFNRTCBlZmZp Y2llbmN5KQ0KYnV0
IHRoYXQgc3RpbGwgbGVhdmVzIGxvYWRzIG9mIG9sZCBibGFjay1ib3hlcyB0 byBtYW5hZ2UgZm9y
IGEgZmV3IHllYXJzLg0KDQo+IEFueXdheSwgYXBhcnQgZnJvbSBhIGZldyBo dWZmZWQgcmVzcG9u
c2VzIHRvIG15IG1pc3F1b3RlIChzaW5jZSB0aGVuDQo+IHJlY3RpZmllZCks IHNvbWVvbmUgcHJv
dmlkZWQgYSBzdWdnZXN0aW9uIHRoYXQgbWF5IG5vdCBiZSB0aGUgc2ltcGxl c3QsDQo+IGJ1dCBt
aWdodCBiZSBoZWxwZnVsIGFueXdheSBpbiBzb21lIGNhc2VzIDoNCj4NCj4g SGF2ZSBhIGxvb2sg
YXQgOiBodHRwOi8vd3d3LnR1Y2tleS5vcmcvdXJscmV3cml0ZS8NCj4NCj4g VGhpcyBpcyBhICJK
YXZhIFNlcnZsZXQgRmlsdGVyIiwgd2hpY2ggY2FuIGJlIGFkZGVkIHRyYW5z cGFyZW50bHkNCj4g
ImFyb3VuZCIgYW55IFRvbWNhdCB3ZWIgYXBwbGljYXRpb24gKGJ5IGFkZGlu ZyB0aGUgcmVxdWly
ZWQgc2VjdGlvbiBpbg0KPiB0aGUgd2ViLnhtbCBjb25maWcgZmlsZSBvZiB0 aGF0IHdlYiBhcHBs
aWNhdGlvbikuDQo+IEphdmEgU2VydmxldCBGaWx0ZXJzIGFyZSBzdWNoIHRo YXQgdGhlIFRvbWNh
dCB3ZWIgYXBwbGljYXRpb24gaXMgbm90DQo+IGV2ZW4gYXdhcmUgdGhhdCBp dCBpcyB0aGVyZSwg
YW5kIGNvbnRpbnVlcyB0byB3b3JrIGFzIGJlZm9yZS4gTXVjaCBsaWtlDQo+ IEFwYWNoZSBpbnB1
dCBhbmQgb3V0cHV0IGZpbHRlcnMgaW4gZmFjdCwgZXhjZXB0IHRoYXQgYSBK YXZhIFNlcnZsZXQN
Cj4gRmlsdGVyIGlzIGJvdGggYXQgdGhlIHNhbWUgdGltZSAoaXQgIndyYXBz IiB0aGUgd2ViYXBw
IG9uIGJvdGggc2lkZXMpLg0KDQpUaGF0IGNvdWxkIGJlIGludGVyZXN0aW5n IHRvbyAtIGFzIGxv
bmcgYXMgaXQncyBzb21ldGhpbmcgSSBjYW4gYm9sdCBpbg0Kd2l0aG91dCBo YXZpbmcgdG8gcmVj
b21waWxlIHRoZSB3ZWJhcHAgY29kZSwgSSdtIGdhbWUuIEFzIGEgbGludXgN CnN5c2FkbWluLCBJ
IGRyYXcgYSBjbGVhciBsaW5lIGJldHdlZW4gdGhlIHN5c3RlbXMgKG15IHBy b2JsZW0pIGFuZCB0
aGUNCmFwcHMgKGRldiB0ZWFtKSAtIGFuZCBub3Qga25vd2luZyBqYXZhICht dWNoKSBJJ20gbm90
IHF1YWxpZmllZCB0byBtZXNzDQp3aXRoIHRoZWlyIHN0dWZmLi4uIEknbSBo YXBweSB0byBnbyBh
cyBmYXIgYXMgbWVzc2luZyB3aXRoIHNlcnZlci54bWwNCmFuZCB3ZWIueG1s IHRob3VnaCA6KQ0K
DQo+IEFueXdheSwgdGhpcyBmaWx0ZXIgY2FuIGRvIHN1Y2ggdGhpbmdzIGFz IGNvbmRpdGlvbmFs
bHkgb3Igbm90IGFkZGluZw0KPiByZXNwb25zZSBoZWFkZXJzIHRvIGFueXRo aW5nIHRoZSB3ZWJh
cHAgcHJvZHVjZXMuIEFuZCBpdCBjYW4gZG8gbXVjaA0KPiBtb3JlLCBhcyB3 aXRoIHRpbWUgaXQg
aGFzIGV2b2x2ZWQgaW50byBzb21lIGtpbmQgb2YgbWlzaC1tYXNoIG9mDQo+ IG1vZF9yZXdyaXRl
LCBtb2RfaGVhZGVycyBhbmQgbW9kX3Byb3h5Lg0KPg0KPiBJdCBpcyBtb3Jl IG9uZS1ieS1vbmUg
d29yayB0aGFuIGRvaW5nIHNvbWV0aGluZyBhdCB0aGUgQXBhY2hlIGZyb250 LWVuZA0KPiBsZXZl
bCBvciB2aWEgYSBwcm94eSwgYnV0IGl0IGFsc28gcHJvdmlkZXMgYmV0dGVy IGZpbmUtdHVuaW5n
DQo+IHBvc3NpYmlsaXRpZXMuDQo+IFNvLCBpZiB5b3UgY2FuIGZvciBpbnN0 YW5jZSBlYXNpbHkg
aWRlbnRpZnkgdGhlIHdvcnN0IG9mZmVuZGVycywgaXQNCj4gbWlnaHQgYmUg YW4gb3B0aW9uLg0K
Pg0KPiBBbmQgaXQgaXMgY2VydGFpbmx5IGEgZ29vZCB0b29sIHRvIGhhdmUg aW4gb25lJ3MgdG9v
bGNhc2UuDQoNCkkgYWdyZWUgLSBJJ2xsIGhhdmUgYSBsb29rIGF0IHRoYXQg YWZ0ZXIgSSBwbGF5
IHdpdGggQWxleCdzIHN1Z2dlc3Rpb24NCm9mIFZhcm5pc2ggOikNCg0KVGhh bmtzIHZlcnkgbXVj
aCBmb3IgeW91ciB0aW1lIDopDQoNCmFsbCB0aGUgYmVzdCwNCg0KVGltDQoN Ci0tDQpUaW0gV2F0
dHMNClBlcnNvbmFsIEJsb2c6IGh0dHA6Ly93d3cuZGlvbmljLm5ldC90aW0v DQoNCklNUE9SVEFO
VCBOT1RJQ0UgUkVHQVJESU5HIFRISVMgRUxFQ1RST05JQyBNRVNTQUdFOg0K DQpUaGlzIG1lc3Nh
Z2UgaXMgaW50ZW5kZWQgZm9yIHRoZSB1c2Ugb2YgdGhlIHBlcnNvbiB0byB3 aG9tIGl0IGlzIGFk
ZHJlc3NlZCBhbmQgbWF5IGNvbnRhaW4gaW5mb3JtYXRpb24gdGhhdCBpcyBw cml2aWxlZ2VkLCBj
b25maWRlbnRpYWwsIGFuZCBwcm90ZWN0ZWQgZnJvbSBkaXNjbG9zdXJlIHVu ZGVyIGFwcGxpY2Fi
bGUgbGF3LiAgSWYgeW91IGFyZSBub3QgdGhlIGludGVuZGVkIHJlY2lwaWVu dCwgeW91ciB1c2Ug
b2YgdGhpcyBtZXNzYWdlIGZvciBhbnkgcHVycG9zZSBpcyBzdHJpY3RseSBw cm9oaWJpdGVkLiAg
SWYgeW91IGhhdmUgcmVjZWl2ZWQgdGhpcyBjb21tdW5pY2F0aW9uIGluIGVy cm9yLCBwbGVhc2Ug
ZGVsZXRlIHRoZSBtZXNzYWdlIGFuZCBub3RpZnkgdGhlIHNlbmRlciBzbyB0 aGF0IHdlIG1heSBj
b3JyZWN0IG91ciByZWNvcmRzLg0K

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 15:38:26 von aw

Tim Watts wrote:
....

>
> LoL - I hate tomcat anyway (for it's fatness) so I don't mind if they
> hate me ;->
>
> I should have clarified as "my Department's dev team" (ie the ones who
> use tomcat here) rather than the Tomcat Developers themselves...
>
Well, I said that too, and said I had misquoted you, but there was little I could do about
that next phrase of yours :

"I think for this problem, I have to treat tomcat as a little, rather inefficient, black
box .."

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 17:10:34 von Tim Watts

On 14/07/11 14:38, André Warnier wrote:
> Tim Watts wrote:
> ...

> "I think for this problem, I have to treat tomcat as a little, rather
> inefficient, black box .."
>

They liked that quote then? ;->>>>



I'm sure it's a lovely development environment (there must be some
reason people use it) - all I know is it's a resource hungry bitch
that's never happy unless it has GB's RAM and at least 2, preferably 4
fast cores. And if you p*ss it off, it will eat your swap and burn all
your cores at 100%. Bane of my sysadmin life...

Don't get me started on the readability of its log files!!

That's across a wide range of applications including commercial stuff
like Confluence.

Bah - give me mod_perl (or even mod_wsgi+python) anyday...

I've got a lot done with HTML::Mason+mod_perl and very efficiently (for
such a simple templating system) and I've considering Mojolicious for
fun. Learning django too right now too for the cool forms+DB stuff.

Thankfully, our guys are making a switch to django away from tomcat and
it is so much nicer to manage.

Cheers,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/

Re [OT]: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 19:41:57 von aw

I'll have to watch my language here, as I might otherwise get ostracised on that other
list of mine.

Tim Watts wrote:
> On 14/07/11 14:38, André Warnier wrote:
>> Tim Watts wrote:
>> ...
>
>> "I think for this problem, I have to treat tomcat as a little, rather
>> inefficient, black box .."
>>
>
> They liked that quote then? ;->>>>
>
>
>
> I'm sure it's a lovely development environment (there must be some
> reason people use it) - all I know is it's a resource hungry bitch
> that's never happy unless it has GB's RAM and at least 2, preferably 4
> fast cores. And if you p*ss it off, it will eat your swap and burn all
> your cores at 100%. Bane of my sysadmin life...

We should start a club.

>
> Don't get me started on the readability of its log files!!

Or worse, the logging configuration.

>
> That's across a wide range of applications including commercial stuff
> like Confluence.
>
> Bah - give me mod_perl (or even mod_wsgi+python) anyday...

+1

>
> I've got a lot done with HTML::Mason+mod_perl and very efficiently (for
> such a simple templating system) and I've considering Mojolicious for
> fun. Learning django too right now too for the cool forms+DB stuff.
>

We have been re-developing stuff that is based on ****, using mod_perl and TT2 for now.
It works faster, uses umpteen MB less memory, and may soon deliver us from the management
of that ****-based stuff too.


> Thankfully, our guys are making a switch to django away from **** and
> it is so much nicer to manage.
>
Don't know it, but will have a look.

[OT, ADVOCACY]

I am partial to perl and CPAN, because there are just so many things I have been able to
do with them over the years at little expense to solve real-world problems.
And despite the fact that I also use a lot of OO modules in perl, I just cannot get in
sympathy with a language like *****, where it seems that you have to mobilise a couple of
dozen classes (and x MB of RAM) just to print a date or so.
Never mind the time spent trying to find their documentations.

As a matter of fact, when I am confronted with a new kind of problem, in an area where I
know a-priori nothing, my first stop is usually not Google nor Wikipedia but CPAN, just to
read the documentation of the modules related to that area. Whether you need to parse
text, to process some weird data format, to talk to Amazon, to make credit-card payments,
to dig out and generate system statistics, to understand how SOAP works, to drive an
MS-Office program through OLE (and know nothing of OLE to start with), create a TCP
server, convert images, read or create and send emails, or whatever, you always find an
answer there. Even if in the end it turns out that the answer is not something in perl,
there is so much knowledge stored in CPAN that it is a pity that it is only consulted by
perl-centric types.

[IDEA]
Maybe creating a website named WikiPerl, containing just the CPAN documentation with a
decent search engine (KinoSearch/Lucy ?), would help restore perl's popularity ?

Or do we just keep that for ourselves, as the best job-preservation scheme ever designed ?


Ooops. I was just about to send this to the wrong list...

Re: Re [OT]: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 20:09:30 von Niels Larsen

Yes, CPAN has very, very useful things. I consider its biggest problems
1) too difficult to find things when not knowing what one wants, 2) a
huge undergrowth of modules that are either bad quality or unmaintained
or duplicated with a later module. The number of lingering bugs are an
obstacle, yet at the same time super-useful things are "hiding" in plain
view.

Apropos, Perl Dancer was "hiding" for me because I didn't see it here,
http://search.cpan.org/modlist/World_Wide_Web .. but many more such
discoveries in the past. A simple global ranking by popularity (the
number of times downloaded) and/or by size and maturity (time located
on CPAN) would expose many "new" things to many, I think. If other
modules depend on them, then that may speak to quality somewhat, and
much better rating could be done. MongoDB would probably make managing
the collection easier. But, I am grateful for what exists of course.

While watching the language certainly, I'm moving from Apache/mod_perl
to Dancer/Nginx for speed and memory reason.

Ok, back to lurk-mode,

Niels Larsen


> [OT, ADVOCACY]
>
> I am partial to perl and CPAN, because there are just so many things I have been able to
> do with them over the years at little expense to solve real-world problems.
> And despite the fact that I also use a lot of OO modules in perl, I just cannot get in
> sympathy with a language like *****, where it seems that you have to mobilise a couple of
> dozen classes (and x MB of RAM) just to print a date or so.
> Never mind the time spent trying to find their documentations.
>
> As a matter of fact, when I am confronted with a new kind of problem, in an area where I
> know a-priori nothing, my first stop is usually not Google nor Wikipedia but CPAN, just to
> read the documentation of the modules related to that area. Whether you need to parse
> text, to process some weird data format, to talk to Amazon, to make credit-card payments,
> to dig out and generate system statistics, to understand how SOAP works, to drive an
> MS-Office program through OLE (and know nothing of OLE to start with), create a TCP
> server, convert images, read or create and send emails, or whatever, you always find an
> answer there. Even if in the end it turns out that the answer is not something in perl,
> there is so much knowledge stored in CPAN that it is a pity that it is only consulted by
> perl-centric types.
>
> [IDEA]
> Maybe creating a website named WikiPerl, containing just the CPAN documentation with a
> decent search engine (KinoSearch/Lucy ?), would help restore perl's popularity ?
>
> Or do we just keep that for ourselves, as the best job-preservation scheme ever designed ?
>
>
> Ooops. I was just about to send this to the wrong list...

Re: Re [OT]: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 20:11:53 von Clinton Gormley

Hi Niels

On Thu, 2011-07-14 at 20:09 +0200, Niels Larsen wrote:
> Yes, CPAN has very, very useful things. I consider its biggest problems
> 1) too difficult to find things when not knowing what one wants, 2) a
> huge undergrowth of modules that are either bad quality or unmaintained
> or duplicated with a later module. The number of lingering bugs are an
> obstacle, yet at the same time super-useful things are "hiding" in plain
> view.

Check out http://metacpan.org - it's a GSOC 2011 project that aims to
improve cpan search. Tagging and user ranking (plus integration of
those into the search results) are next on the feature list

clint

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 22:07:24 von aw

Tim Watts wrote:
> Hi,
>
> Is it in theory possible to insert a perl output filter between
> mod_proxy and mod_cache?
>
> Or at least between mod_proxy and the client?
>
....

>
> mod_headers and mod_proxy don't seem to play well together and mod-cache
> doesn't either (probably due to lack of cache control headers in the
> tomcat response, though I haven't proved this is actually the case).
>
....

Back to the main issue.

See this as just a bit more generic information, as to what/how you could think of solving
your problem, apart from the other suggestions already submitted.

1) I am not sure about mod_perl I/O filters, because I never used them. (*)
But in order to (conditionally/unconditionally) insert/delete/modify request/response
headers, you can also write your own perl handler, and by choosing the appropriate type of
PerlHandler, you can have it run at just about any point in the request/response cycle.

The real power of mod_perl (if you haven't yet discovered that aspect), is that it allows
you to insert your own code at just about any point of the Apache request processing
cycle, and to do just about anything you want with any aspect of the request/response.
That includes "interfering" with anything that other, non-perl, Apache modules do.

See the following page for a good overview of the Apache request processing cycle, and
what you can do with such PerlHandlers :
http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod _perl_Handlers_Categories
You are probably more interested in the "HTTP Protocol" section. By clicking on each item
in that list, you get and explanation of /when/ that type of handle runs.
(It's also indirectly a very good introduction to how Apache itself works).

Such handlers are usually easy to write and configure, and the code to play with HTTP
headers is also quite simple, if you know what to put in the header(s).

2) about mod_headers and mod_proxy playing together :
The trouble is that (contrarily to the mod_perl documentation above) it is not usually
clear at all in the Apache module's documentation, to find out during which exact phase of
the Apache request processing each module runs.

But I seem to remember something in mod_headers about an "early" attribute or parameter.
Maybe that tells you more of when it runs (or can run), compared to mod_proxy.

3) In the documentation of mod_proxy, there should be a possibility to configure it inside
of a section, instead of "globally" (outside of any section).
That forces you to decide more finely which URLs should or should not be proxied/forwarded
to Tomcat, but it also (in my view) makes it more evident to combine the proxying
instruction with other modules, like perl filters or handlers.

In effect, from Apache's point of view, mod_proxy must be the equivalent of a
"content-generating handler" (like a PerlResponseHandler), because for Apache, passing a
request to mod_proxy for processing is not much different than passing it to any other
internal response-generating handler.
Apache in fact knows nothing of Tomcat. It passes a request to mod_proxy, and expects the
response (or an error status) back from mod_proxy. It has no idea that behind mod_proxy
is another server.


4) strictly according to the HTTP protocol, a "GET" request should be "idempotent", which
means (roughly) that running it twice or more should always give the same answer.
Which in theory means that even if the GET request goes to a database, the response should
be cacheable under most circumstances.
Unfortunately, the practice is such that the GET request is much overused, and it is not
always that way.
But if caching the response creates problems, you can always tell your application
developers that it is their fault because they are misusing the protocol..

(In really strict terms, a GET /could/ provide a different response; but it should not
modify the state of the server).

5) despite what I am saying in (4), a GET response can very validly be different from a
previous GET response with the same URL (for example, if in-between the data has been
modified by a POST). So if you are forcing headers on the responses, you should at least
be a bit careful not to do this indiscriminately.

That is also why I personally have a doubt about the effectiveness of another caching
proxy front-end like a couple were mentioned earlier. If the Tomcat web applications
themselves do not provide headers to indicate whether their response can be cached or not,
how is the front-end going to determine that this response /is/ the same as a previous one ?
It seems to me that such a determination would require elements that such a proxy does not
have, no ?


Now if you are still there, one more question :
Are we talking here of a configuration where one front-end Apache front-ends for several
Tomcats possibly on different machines ?
or does each Tomcat have its own personal Apache front-end on the same machine ?
or something in-between ?


(*) considering the name of "filter" however, I would think that
- an "input filter" should always run /before/ any module which generates content (of
which mod_proxy is one)
- an "output filter" should always run /after/ any modules which generate content.
So, it is probably difficult to have a filter which runs /in-between/ other Apache modules.

Re: mod_perl output filter and mod_proxy, mod_cache

am 14.07.2011 23:10:59 von aw

And here is another link which might be interesting.
It is a message on the Tomcat list (where I re-posted your original request, hem), from
Rainer Jung, who is one of the Apache/Tomcat mod_jk connector developers :

"
Yes, go for TC 7:

http://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#E xpires_Filter

Regards,

Rainer
"

Now that Tomcat page, apart from its own interest, also points to the Apache "mod_expires"
module (which I never heard about before) in your case may be exactly what you're looking for.

It seems to be such that it can add headers in a response proxied to Tomcat, without
overwriting such headers if they already exist.


Here is what I would do :

1) identify some "usual suspects" among the URLs proxied to Tomcat
They would have to match the following criteria :
- they happen on an overloaded Tomcat
- they happen often
- I am reasonably sure that the information delivered by that URL
is stable over a period of time
- I am reasonably sure that if it happened that the browser would,
once in a while, get stale information, it would not be dramatic

2) carefully configure the front-end Apache to, for these particular URLs,
add an Expires header specifying "now + N", where N is initially not too large.
This way, a browser would not get a result that is more than N outdated, but any duplicate
request within a period N would get the cached version.

3) look at the impact and loop or not, increasing or decreasing N

YMMV.

Re: mod_perl output filter and mod_proxy, mod_cache

am 15.07.2011 07:53:28 von Tim Watts

Hi Andre,

Thanks for such a detailed reply:

On 14/07/11 21:07, André Warnier wrote:

>
> Back to the main issue.
>
> See this as just a bit more generic information, as to what/how you
> could think of solving your problem, apart from the other suggestions
> already submitted.
>
> 1) I am not sure about mod_perl I/O filters, because I never used them. (*)
> But in order to (conditionally/unconditionally) insert/delete/modify
> request/response headers, you can also write your own perl handler, and
> by choosing the appropriate type of PerlHandler, you can have it run at
> just about any point in the request/response cycle.
>
> The real power of mod_perl (if you haven't yet discovered that aspect),
> is that it allows you to insert your own code at just about any point of
> the Apache request processing cycle, and to do just about anything you
> want with any aspect of the request/response.
> That includes "interfering" with anything that other, non-perl, Apache
> modules do.

I've written auth handlers in mod_perl before - I did get the impression
then the possibilities were extensive to do other things,

> See the following page for a good overview of the Apache request
> processing cycle, and what you can do with such PerlHandlers :
> http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod _perl_Handlers_Categories
>
> You are probably more interested in the "HTTP Protocol" section. By
> clicking on each item in that list, you get and explanation of /when/
> that type of handle runs.
> (It's also indirectly a very good introduction to how Apache itself works).
>
> Such handlers are usually easy to write and configure, and the code to
> play with HTTP headers is also quite simple, if you know what to put in
> the header(s).

ah - that is very useful - I shall read that.

> 2) about mod_headers and mod_proxy playing together :
> The trouble is that (contrarily to the mod_perl documentation above) it
> is not usually clear at all in the Apache module's documentation, to
> find out during which exact phase of the Apache request processing each
> module runs.
>
> But I seem to remember something in mod_headers about an "early"
> attribute or parameter.
> Maybe that tells you more of when it runs (or can run), compared to
> mod_proxy.

Hmm - I did read the web page several times, must have missed that - I
was nearly at the point of reading the source.

> 3) In the documentation of mod_proxy, there should be a possibility to
> configure it inside of a section, instead of
> "globally" (outside of any section).
> That forces you to decide more finely which URLs should or should not be
> proxied/forwarded to Tomcat, but it also (in my view) makes it more
> evident to combine the proxying instruction with other modules, like
> perl filters or handlers.
>
> In effect, from Apache's point of view, mod_proxy must be the equivalent
> of a "content-generating handler" (like a PerlResponseHandler), because
> for Apache, passing a request to mod_proxy for processing is not much
> different than passing it to any other internal response-generating
> handler.
> Apache in fact knows nothing of Tomcat. It passes a request to
> mod_proxy, and expects the response (or an error status) back from
> mod_proxy. It has no idea that behind mod_proxy is another server.

It is an interesting possibility that is also worth playing with,

Most of our servers are: redirect all to the proxy *except* a couple of
url's which are either locally handled or sent to a different proxy.

This is quite typical:

RewriteEngine on
RewriteRule "^/media" - [L] # Local
RewriteRule "^/django" - [L] # Local
# Otherwise proxy
RewriteRule "^/(.*)$" "http://tomcat.server:8180/webapp/$1" [P,L]
ProxyPassReverse / http://tomcat.server:8180/webapp
ProxyPassReverseCookiePath /webapp /


Previously, this had been done with ProxyPass directives, including
negative ones. This did not work well with some Rewrite rules that were
also needed in some cases. So I tend to handle the whole thing with an
ordered list of rewrite rules like above, using the proxy flag to those
where required. It makes the ordering more obvious.

I have not yet tried a system of building the website with set sof
Location directives, which might be interesting - though I do use
Location sections to enforce redirects to SSL and requiring
authentication. Apache is like perl, more than one way to do it.

>
> 4) strictly according to the HTTP protocol, a "GET" request should be
> "idempotent", which means (roughly) that running it twice or more should
> always give the same answer.
> Which in theory means that even if the GET request goes to a database,
> the response should be cacheable under most circumstances.
> Unfortunately, the practice is such that the GET request is much
> overused, and it is not always that way.
> But if caching the response creates problems, you can always tell your
> application developers that it is their fault because they are misusing
> the protocol..
>
> (In really strict terms, a GET /could/ provide a different response; but
> it should not modify the state of the server).

I do recall that.

> 5) despite what I am saying in (4), a GET response can very validly be
> different from a previous GET response with the same URL (for example,
> if in-between the data has been modified by a POST). So if you are
> forcing headers on the responses, you should at least be a bit careful
> not to do this indiscriminately.
>
> That is also why I personally have a doubt about the effectiveness of
> another caching proxy front-end like a couple were mentioned earlier. If
> the Tomcat web applications themselves do not provide headers to
> indicate whether their response can be cached or not, how is the
> front-end going to determine that this response /is/ the same as a
> previous one ?
> It seems to me that such a determination would require elements that
> such a proxy does not have, no ?

I agree - the tomcat apps *should* be declaring what is the correct
caching scenario. But they don't. So this is very much a work around.
However, for any given case, the dev folk usually remember enough about
a project to say "the content of the database does not change, and GETs
will be invariant as a result" (or not). It's on that basis I'm happy to
proceed with a kludge, just to save my poor servers from melting(!).
Well the servers are all VMs, so in more to stop old projects stealing
resources that could be better used on new projects.

I feel I understand Cache-Control (vs Expires) a lot better since I
optimised my own website with mod_cache on top of HTML::Mason/mod_perl
(which do play nice) - and my Mason bits do send sensible Cache-Control
lines. So I plan to give a small lunchtime seminar on that topic with
some demos of using Google's pagespeed firebug plugin (very useful for
this stuff).

The stupid thing is, it is probably trivial at design time to wedge
extra HTTP headers in (maybe JSP has a framework level TTL/expires
control - I don't know) but one has to know one *should* be doing it...

>
> Now if you are still there, one more question :
> Are we talking here of a configuration where one front-end Apache
> front-ends for several Tomcats possibly on different machines ?
> or does each Tomcat have its own personal Apache front-end on the same
> machine ?
> or something in-between ?

Mix. Older projects sent 3 different VHOSTS to 3 different remote tomcat
servers, each of which was handling a dozen+ webapps for a dozen+
different apache servers.

This was a disaster as one bad webapp could take out the tomcat farm and
the bloody logs are so useless it was impossible to find out which one.

These days, we have 3 different tomcat instances on the front machine
(dev, staging, live/production) and one apache with 3 VHOSTs mapping to
each tomcat. We may also blend in some django on the same machine.
Apache may mix in static content itself for efficiciency (CSS/JS).

At least then, the development tomcat can be killed and restarted
without breaking the live one (and no, "touching" the web.xml file to
trigger a single webapp reload is about reliable as asking a robber to
drop your cash off at the bank!).

They used to use a lot of perl - but I think perl lost it a bit with
forms handling and Ajax (until recently perhaps) which is why everyone
went off playing with jsp and now django.

I must admit django does seem well designed and I object to python a lot
less than java. Disadvantage - django likes to write your SQL for you
leading to a lack of thinking there - eg, one I caught the other day:

5 JOINs with a SELECT DISTINCT over all. Bloke wondered why the MySQL
server took 40 seconds to compute the result!

>
> (*) considering the name of "filter" however, I would think that
> - an "input filter" should always run /before/ any module which
> generates content (of which mod_proxy is one)
> - an "output filter" should always run /after/ any modules which
> generate content.
> So, it is probably difficult to have a filter which runs /in-between/
> other Apache modules.

I'm still going to have a look at mod_perl filters - I have a feeling
they could be useful here and there.

Thanks :)

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/