MapToStorage and the use of path_info (was Re: return DECLINED...)

am 29.02.2008 17:55:17 von Frank Maas

Hi,

In an explanation to J. Peng you wrote some interesting bits that I put in
a condensed form below. Now you mentioned in another mail that the subject
has shifted a bit, so I changed the subject and dehooked this mail from
the thread. Because there is something that "worries" me...

> As for /var/www/a/b/c, a user can request for a file that does not exists.
> ... until it finds a match. In Torsten's example, this would be:
>
> /var/www/a/b/
>
> Then, the RequestRec object now stores two parts. A filename
> "/var/www/a/b/" and a path_info "/c/d/e". Why would you want to do that.
> Say you have a news site like:
>
> http://example.com/archive/news/2008/02/29/index.html
>
> Instead, you could do
> a trick above and keep going up the hierarchy until you have a filename
> "/var/www/archive/news/" and a path_info "/2008/02/29/index.html". You can
> then use the path_info as a query string to some database to retrieve
> today's news.

I am using a mechanism where I use the path_info to carry information
about the content to be served. However, as far as I know the only way to
do this is to create a handler that is defined for the correct location.
In the described situation, something like,

PerlHandler MyNews->handler()

I do not see how MapToStorage handler will help here. There probably is no
/var/www/archive/news file (or directory), and even if there is, it is of
no use to Apache. Or am I completely and utterly mistaken here?

Kind regards,
Frank

Re: MapToStorage and the use of path_info (was Re: return DECLINED...)

am 29.02.2008 18:27:39 von Raymond Wan

Hi Frank,

Frank Maas wrote:
> I am using a mechanism where I use the path_info to carry information
> about the content to be served. However, as far as I know the only way to
> do this is to create a handler that is defined for the correct location.
> In the described situation, something like,
>

I have to confess that as I learned Mason first, I have problems
separating it from modperl. So, how one would do it the modperl (only)
way is something that I probably can't put into words and if I did, I
probably would get it wrong. So, I'll have to let someone else answer
you...sorry!

The Mason way (and "my" news example) was taken from here:

http://www.masonbook.com/book/chapter-3.mhtml#TOC-ANCHOR-8

And this is in fact something that I am doing on my site to move up [or
down :-) ] it, but not with news. As Mason is used for templating, the
Mason way could be the same as the modperl way. If you want to know the
details of the Mason way, you can read that chapter, but this book seems
out-dated and relevant to modperl 1.

Sorry I cannot be of more help!

Ray

Re: MapToStorage and the use of path_info (was Re: return DECLINED...)

am 29.02.2008 20:22:55 von torsten.foertsch

On Fri 29 Feb 2008, Frank Maas wrote:
> I am using a mechanism where I use the path_info to carry information
> about the content to be served. However, as far as I know the only way to
> do this is to create a handler that is defined for the correct location.
> In the described situation, something like,
>
>
> =A0 PerlHandler =A0 MyNews->handler()
>
>
> I do not see how MapToStorage handler will help here. There probably is no
> /var/www/archive/news file (or directory), and even if there is, it is of
> no use to Apache. Or am I completely and utterly mistaken here?

Your confusion comes from the fact that you look at it through mod_perl=20
spectacles where you don't necessarily have a corresponding disk file. But=
=20
Apache is made chiefly to ship files.

So in the m2s phase apache splits the filename it gets from trans into the=
=20
name of a filesystem entry and the trailing "path_info".

If you have a CGI script say /bin/x.cgi that is located in /www/cgi-bin/x.c=
gi=20
and you call it as /bin/x.cgi/path/info then after trans filename points=20
to /www/cgi-bin/x.cgi/path/info. m2s then finds that /www/cgi-bin/x.cgi is =
a=20
regular file. So it sets filename to /www/cgi-bin/x.cgi and path_info=20
to /path/info. So in CGI context you can be certain that PATH_INFO is the=20
remainder of the URI after the current script is stripped off.

BTW, the default response handler the one that ships files returns 404 if=20
path_info is not empty.

Now in mod_perl you normally don't have a disk file. You have a compiled=20
handler. So m2s will determine the start of path_info somewhere in the URI=
=20
where it finds the last existing directory.

Hence when using a modperl handler don't rely on path_info. By creating an=
=20
additional directory or deleting one you can spoil your logic! That is call=
ed=20
action at a distance.

Use $r->uri and $r->location instead. Don't use in this cas=
e.=20
Then the first part of $r->uri equals to $r->location. So you can compute a=
=20
mod_perl version of path_info as "substr($r->uri, length($r->location))".=20
This one doesn't depend on existing or non-existing filesystem entries.

Torsten

Re: MapToStorage and the use of path_info (was Re: return DECLINED...)

am 01.03.2008 06:21:00 von peng.kyo

I'm still confused why we need a path_info for the additional info to
CGI/modperl scripts?
Generally under CGI we say x.cgi?key=value to pass arguments, under
modperl handler we say /myHandler/?key=value to do it, or using POST
method.
Under what case we use path_info?

//joy

On Sat, Mar 1, 2008 at 3:22 AM, Torsten Foertsch
wrote:
> On Fri 29 Feb 2008, Frank Maas wrote:
> > I am using a mechanism where I use the path_info to carry information
> > about the content to be served. However, as far as I know the only way to
> > do this is to create a handler that is defined for the correct location.
> > In the described situation, something like,
> >
> >
> > PerlHandler MyNews->handler()
> >
> >
> > I do not see how MapToStorage handler will help here. There probably is no
> > /var/www/archive/news file (or directory), and even if there is, it is of
> > no use to Apache. Or am I completely and utterly mistaken here?
>
> Your confusion comes from the fact that you look at it through mod_perl
> spectacles where you don't necessarily have a corresponding disk file. But
> Apache is made chiefly to ship files.
>
> So in the m2s phase apache splits the filename it gets from trans into the
> name of a filesystem entry and the trailing "path_info".
>
> If you have a CGI script say /bin/x.cgi that is located in /www/cgi-bin/x.cgi
> and you call it as /bin/x.cgi/path/info then after trans filename points
> to /www/cgi-bin/x.cgi/path/info. m2s then finds that /www/cgi-bin/x.cgi is a
> regular file. So it sets filename to /www/cgi-bin/x.cgi and path_info
> to /path/info. So in CGI context you can be certain that PATH_INFO is the
> remainder of the URI after the current script is stripped off.
>
> BTW, the default response handler the one that ships files returns 404 if
> path_info is not empty.
>
> Now in mod_perl you normally don't have a disk file. You have a compiled
> handler. So m2s will determine the start of path_info somewhere in the URI
> where it finds the last existing directory.
>
> Hence when using a modperl handler don't rely on path_info. By creating an
> additional directory or deleting one you can spoil your logic! That is called
> action at a distance.
>
> Use $r->uri and $r->location instead. Don't use in this case.
> Then the first part of $r->uri equals to $r->location. So you can compute a
> mod_perl version of path_info as "substr($r->uri, length($r->location))".
> This one doesn't depend on existing or non-existing filesystem entries.
>
> Torsten
>

Re: MapToStorage and the use of path_info (was Re: return DECLINED...)

am 01.03.2008 06:37:11 von Raymond Wan

Joy,

J. Peng wrote:
> I'm still confused why we need a path_info for the additional info to
> CGI/modperl scripts?
> Generally under CGI we say x.cgi?key=value to pass arguments, under
> modperl handler we say /myHandler/?key=value to do it, or using POST
> method.
> Under what case we use path_info?
>

How about this for an explanation. In the first scenario with the ?,
you are passing arguments explicitly using key/value pairs as if it was
part of the URL sent by the web browser. In the second scenario,
nothing is being passed as key/value pairs. Instead, the server
searches up/down the directory hierarchy until it finds a match and
everything after the match becomes the argument.

In the first scenario when you are splitting the path into key/value,
you need to know where to split it. How? Perhaps by doing a file test
with each split yourself? In the second case, you don't need to worry
about it.

Not a very technical answer, but maybe an easy way of thinking of
things. The second scenario also makes it possible for Google, etc. to
index your web pages since it is a "real" URL. In the first case, it is
possible, but not as straight-forward.

Ray

Re: MapToStorage and the use of path_info (was Re: return DECLINED...)

am 01.03.2008 06:48:24 von peng.kyo

On Sat, Mar 1, 2008 at 1:37 PM, Raymond Wan wrote:
> Not a very technical answer, but maybe an easy way of thinking of
> things. The second scenario also makes it possible for Google, etc. to
> index your web pages since it is a "real" URL. In the first case, it is
> possible, but not as straight-forward.
>

oh, it's good that I learned another way to request an uri with the
path_info way.
yes the path_info uri is good to be recorded by google, since it looks
doesn't like a dynamic page.
for us we generally use mod_rewrite to rewrite a dynamic page to seem
like a static page,like:

RewriteRule ^/myspace/my(\d+).html /myspace/index.cgi?id=$1

thanks.

//joy

Re: MapToStorage and the use of path_info (was Re: return DECLINED...)

am 03.03.2008 05:17:39 von Charlie Garrison

Good afternoon,

On 1/3/08 at 2:37 PM +0900, Raymond Wan=20
wrote:

>Not a very technical answer, but maybe an easy way of thinking=20
>of things. The second scenario also makes it possible for=20
>Google, etc. to index your web pages since it is a "real" URL. =20
>In the first case, it is possible, but not as straight-forward.

And for proxies/caches to cache the page. I don't know if it=20
still applies, but proxies won't cache a page that has query=20
parameters. But path_info just looks like part of URL and will=20
cache just fine. That's the main reason I choose to use=20
path_info rather than query params.

On the other hand, I sometimes use query params as a poor man's=20
cache-control to prevent caching.

Charlie

--=20
Charlie Garrison
PO Box 141, Windsor, NSW 2756, Australia

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
http://www.ietf.org/rfc/rfc1855.txt