Getting a / when regex should produce nothing
Getting a / when regex should produce nothing
am 25.04.2010 04:38:53 von Chris Bennett
When I run this first time, with no values from form, I get
$article_file being a / when it should be nothing. I just can't see the
error. I have tried variations with \w and dash at beginning and end,
but no go.
Debug shows blank at A, / at B
#!/usr/bin/perl
$VERSION = 1.0.0;
use warnings;
no warnings 'uninitialized';
use strict;
#use Apache::Constants qw(:common);
use Apache::Request();
#use Apache::Cookie();
use MyPerl::Articulator qw(get_template print_template print_text
submit_changes backup_server see_html template_form load_template);
our $debug = 1;
delete $ENV{PATH};
my $r = Apache->request;
my $q = Apache::Request->new($r, POST_MAX => 1000000, DISABLE_UPLOADS => 1);
my $site_url = "www.example.com";
my $site_directory = "/var/www/htdocs/users/example.com";
my $site_name = "Example!";
my $secure = 1;
my $article_directory = "articles";
undef my $error;
undef my $article_title;
undef my $article_backup_file;
undef my $article_file;
$article_file = $q->param("articlefilename");
if ($debug) { $error .= qq{
$article_file
};}
$article_file =~ m/^([a-zA-Z0-9_-]*\.html)$/;
$article_file = $1;
if ($debug) { $error .= qq{$article_file
};}
$article_backup_file = $article_file;
$article_backup_file =~ s/\.html$/_backup.html/;
undef my $body;
Thanks
Chris Bennett
--
A human being should be able to change a diaper, plan an invasion,
butcher a hog, conn a ship, design a building, write a sonnet, balance
accounts, build a wall, set a bone, comfort the dying, take orders,
give orders, cooperate, act alone, solve equations, analyze a new
problem, pitch manure, program a computer, cook a tasty meal, fight
efficiently, die gallantly. Specialization is for insects.
-- Robert Heinlein
Re: Getting a / when regex should produce nothing
am 25.04.2010 04:47:08 von Chris Bennett
On 04/24/10 21:38, Chris Bennett wrote:
> When I run this first time, with no values from form, I get
> $article_file being a / when it should be nothing. I just can't see the
> error. I have tried variations with \w and dash at beginning and end,
> but no go.
>
> Debug shows blank at A, / at B
>
> #!/usr/bin/perl
>
> $VERSION = 1.0.0;
>
> use warnings;
> no warnings 'uninitialized';
> use strict;
>
> #use Apache::Constants qw(:common);
> use Apache::Request();
> #use Apache::Cookie();
> use MyPerl::Articulator qw(get_template print_template print_text
> submit_changes backup_server see_html template_form load_template);
>
> our $debug = 1;
>
> delete $ENV{PATH};
> my $r = Apache->request;
> my $q = Apache::Request->new($r, POST_MAX => 1000000, DISABLE_UPLOADS =>
> 1);
> my $site_url = "www.example.com";
> my $site_directory = "/var/www/htdocs/users/example.com";
> my $site_name = "Example!";
> my $secure = 1;
> my $article_directory = "articles";
> undef my $error;
> undef my $article_title;
> undef my $article_backup_file;
> undef my $article_file;
> $article_file = $q->param("articlefilename");
> if ($debug) { $error .= qq{
$article_file
};}
should be if ($debug) { $error .= qq{A $article_file
};}
> $article_file =~ m/^([a-zA-Z0-9_-]*\.html)$/;
> $article_file = $1;
> if ($debug) { $error .= qq{$article_file
};}
should be if ($debug) { $error .= qq{B $article_file
};}
Re: Getting a / when regex should produce nothing
am 25.04.2010 06:40:40 von Craig MacKenna
See what happens if you replace the 2 lines between the "should be"s
with
if ($article_file =~ /^([a-zA-Z0-9_-]*\.html)$/) {$article_file = $1}
I've can't recall seeing 'undef my' before.
my $error = undef; is more typical.
How is this routine executed? Under ModPerl::Registry?
cmac
On Apr 24, 2010, at 7:47 PM, Chris Bennett wrote:
> On 04/24/10 21:38, Chris Bennett wrote:
>> When I run this first time, with no values from form, I get
>> $article_file being a / when it should be nothing. I just can't
>> see the
>> error. I have tried variations with \w and dash at beginning and end,
>> but no go.
>>
>> Debug shows blank at A, / at B
>>
>> #!/usr/bin/perl
>>
>> $VERSION = 1.0.0;
>>
>> use warnings;
>> no warnings 'uninitialized';
>> use strict;
>>
>> #use Apache::Constants qw(:common);
>> use Apache::Request();
>> #use Apache::Cookie();
>> use MyPerl::Articulator qw(get_template print_template print_text
>> submit_changes backup_server see_html template_form load_template);
>>
>> our $debug = 1;
>>
>> delete $ENV{PATH};
>> my $r = Apache->request;
>> my $q = Apache::Request->new($r, POST_MAX => 1000000,
>> DISABLE_UPLOADS =>
>> 1);
>> my $site_url = "www.example.com";
>> my $site_directory = "/var/www/htdocs/users/example.com";
>> my $site_name = "Example!";
>> my $secure = 1;
>> my $article_directory = "articles";
>> undef my $error;
>> undef my $article_title;
>> undef my $article_backup_file;
>> undef my $article_file;
>> $article_file = $q->param("articlefilename");
>> if ($debug) { $error .= qq{
$article_file
};}
>
> should be if ($debug) { $error .= qq{A $article_file
};}
>
>> $article_file =~ m/^([a-zA-Z0-9_-]*\.html)$/;
>> $article_file = $1;
>> if ($debug) { $error .= qq{$article_file
};}
>
> should be if ($debug) { $error .= qq{B $article_file
};}
>
>
Re: Getting a / when regex should produce nothing
am 25.04.2010 12:44:56 von aw
Chris Bennett wrote:
....
Personal observations :
>
> use warnings;
That's good. But this :
> no warnings 'uninitialized';
is very dubious.
> $article_file = $q->param("articlefilename");
will come back undef if :
- there is no "articlefilename" input box on the submitted form
- there is one, but it is not sent by the browser (as some browsers may
do if the form field has not been filled-in)
- someone just calls your script by a URL in the location bar, without
parameters
> if ($debug) { $error .= qq{
$article_file
};}
This then is dubious too, because you are essentially concatenating a
string (which may also be undef), with an undef value. (And before that,
you are passing this undef value to the qq function).
Who knows what this does ?
Unfortunately, you will never know, because you have disabled warnings
for that.
Why not do something more solid, like :
remove the "no warnings" pragma.
$article_file = $q->param("articlefilename") || '';
(making it equal to an empty string if it is undefined), or more explicitly
$article_file = $q->param("articlefilename");
$article_file = '' unless defined $article_file;
And the same for any other form parameter you receive.
If you are programming for the web, where you essentially do not know
which miscreant browser or user is at the other end, you have to program
defensively. Suppressing warnings is the wrong way to go.
Re: Getting a / when regex should produce nothing
am 25.04.2010 14:17:07 von Chris Bennett
On 04/24/10 21:38, Chris Bennett wrote:
> When I run this first time, with no values from form, I get
> $article_file being a / when it should be nothing. I just can't see the
> error. I have tried variations with \w and dash at beginning and end,
> but no go.
>
> Debug shows blank at A, / at B
>
> #!/usr/bin/perl
>
> $VERSION = 1.0.0;
>
> use warnings;
> no warnings 'uninitialized';
> use strict;
>
> #use Apache::Constants qw(:common);
> use Apache::Request();
> #use Apache::Cookie();
> use MyPerl::Articulator qw(get_template print_template print_text
> submit_changes backup_server see_html template_form load_template);
>
> our $debug = 1;
>
> delete $ENV{PATH};
> my $r = Apache->request;
> my $q = Apache::Request->new($r, POST_MAX => 1000000, DISABLE_UPLOADS =>
> 1);
> my $site_url = "www.example.com";
> my $site_directory = "/var/www/htdocs/users/example.com";
> my $site_name = "Example!";
> my $secure = 1;
> my $article_directory = "articles";
> undef my $error;
> undef my $article_title;
> undef my $article_backup_file;
> undef my $article_file;
> $article_file = $q->param("articlefilename");
> if ($debug) { $error .= qq{
$article_file
};}
> $article_file =~ m/^([a-zA-Z0-9_-]*\.html)$/;
> $article_file = $1;
> if ($debug) { $error .= qq{$article_file
};}
> $article_backup_file = $article_file;
> $article_backup_file =~ s/\.html$/_backup.html/;
> undef my $body;
>
> Thanks
> Chris Bennett
>
OK, as per suggestions and adding in another needed part for MultiViews:
my $error = '';
my $article_title ='';
undef my $article_backup_file;
undef my $article_file;
$article_file = $q->param("articlefilename") || '';
if ($debug) { $error .= qq{A $article_file
};}
if ($article_file =~ /^([a-zA-Z0-9_-]+\.html.?\w?\w?)$/) {
$article_file = $1;
} else {
$article_file = '';
}
if ($debug) { $error .= qq{B $article_file
};}
$article_backup_file = $article_file;
$article_backup_file =~ s/\.html$/_backup.html/;
Is there a better regex for .?\w?\w?
I want a . letter letter not . letter or just two letters etc.
This regex is to prevent read or write access to files up the directory
tree or non html files. There is also a username password for any write
access.
undef my $variable is not a common idiom but is seen in Programming Perl
and other places. Is there any reason I should use my $variable = undef?
More typing. :)
Why was I getting a / back? Is that an artifact from the perl internals?
Thanks
Re: Getting a / when regex should produce nothing
am 25.04.2010 14:26:55 von Joe Schaefer
----- Original Message ----
> From: Chris Bennett
> To: chris@bennettconstruction.biz; modperl@perl.apache.org
> Sent: Sun, April 25, 2010 8:17:07 AM
> Subject: Re: Getting a / when regex should produce nothing
> Is there a better regex for .?\w?\w?
> I want a . letter letter not . letter
> or just two letters etc.
(?:\.\w{2})?
> This regex is to prevent read or write access to
> files up the directory
> tree or non html files. There is also a username
> password for any write
> access.
> undef my $variable is not a common
> idiom but is seen in Programming Perl
> and other places. Is there any reason
> I should use my $variable = undef?
> More typing. :)
What's wrong with just typing "my $variable;"?
> Why was I getting
> a / back? Is that an artifact
> from the perl internals?
In your previous code you didn't check that the pattern
had matched before using $1. In the case when your pattern
doesn't match, $1 remains unchanged from whatever last
set it. There's probably some code earlier on, or internal
to mod-perl, which does a (successful) pattern match that
sets $1 to "/".
The lesson here should be to always check the return value
of your regexps for success, especially when you've used
capture variables in your code like $1-$9.
Re: Getting a / when regex should produce nothing
am 25.04.2010 15:57:46 von aw
Chris Bennett wrote:
....
> $article_file = $q->param("articlefilename") || '';
ok, so suppose it is "12345.html.en"
....
> if ($article_file =~ /^([a-zA-Z0-9_-]+\.html.?\w?\w?)$/) {
> $article_file = $1;
> } else {
> $article_file = '';
> }
ok, matches, so it's still "12345.html.en"
> $article_backup_file = $article_file;
still "12345.html.en"
> $article_backup_file =~ s/\.html$/_backup.html/;
>
still "12345.html.en"
(because \.html$ did not match)
Re: Getting a / when regex should produce nothing
am 25.04.2010 19:25:44 von Chris Bennett
On 04/25/10 08:57, André Warnier wrote:
> Chris Bennett wrote:
> ...
>
> > $article_file = $q->param("articlefilename") || '';
>
> ok, so suppose it is "12345.html.en"
> ...
>
>> if ($article_file =~ /^([a-zA-Z0-9_-]+\.html.?\w?\w?)$/) {
>> $article_file = $1;
>> } else {
>> $article_file = '';
>> }
>
> ok, matches, so it's still "12345.html.en"
>
>> $article_backup_file = $article_file;
>
> still "12345.html.en"
>
>> $article_backup_file =~ s/\.html$/_backup.html/;
>>
>
> still "12345.html.en"
> (because \.html$ did not match)
>
>
>
I have since changed to:
$article_file = $q->param("articlefilename") || '';
if ($debug) { $error .= qq{
A $article_file
};}
if ($article_file =~ /^([a-zA-Z0-9_-]+\.html(?:\.\w{2})?)$/) {
$article_file = $1;
} else {
$article_file = '';
}
if ($debug) { $error .= qq{B $article_file
};}
$article_backup_file = $article_file;
if ($article_backup_file =~ /\.html(?:\.\w{2})?$/) {
$article_backup_file =~ s/\.html/_backup.html/;
} else {
$error .= "Please choose an existing File Name ( like y_9-e.html or
yyy.html.xx ) or use Fill in Template to create a new file.
";
}
I think this does the trick.
Please feel free to break this! :)
Re: Getting a / when regex should produce nothing
am 25.04.2010 20:17:54 von Michael Ludwig
André Warnier schrieb am 25.04.2010 um 12:44:56 (+0200):
> >use warnings;
> That's good. But this :
>
> >no warnings 'uninitialized';
>
> is very dubious.
I used to think so, too, but I've recently changed my mind.
> >$article_file = $q->param("articlefilename");
>
> will come back undef if :
> - there is no "articlefilename" input box on the submitted form
> - there is one, but it is not sent by the browser (as some browsers
> may do if the form field has not been filled-in)
> - someone just calls your script by a URL in the location bar, without
> parameters
True, the value will be undef, but so what? Perl treats undef as the
empty string or zero depending on the context, regardless of whether
you've resolved to have yourself harassed with warnings because of
uninitialized values or not; if, however, you *have* done so, then
you'll see those warnings on STDERR and feel that you should fix your
code by doing something like:
$str = $q->param("articlefilename") || '';
$num = $bla->calc_blub || 0;
So you're performing manually what Perl does automatically just to get
rid of the warning you've decided to turn on because you thought it was
good, or robust, or solid. If you think about it, you have to admit that
this is not exactly clever.
> >if ($debug) { $error .= qq{
$article_file
};}
>
> This then is dubious too, because you are essentially concatenating a
> string (which may also be undef), with an undef value.
So what? The undef is automatically converted to an empty string. That's
what you want anyway. Let Perl do it for you.
> (And before that, you are passing this undef value to the qq
> function). Who knows what this does ?
>
> Unfortunately, you will never know, because you have disabled warnings
> for that.
There's probably no need to know in this case. If your fix is to convert
undef to an empty string, why not have Perl do it for you?
> Why not do something more solid, like :
>
> remove the "no warnings" pragma.
>
> $article_file = $q->param("articlefilename") || '';
> (making it equal to an empty string if it is undefined), or more
> explicitly
> $article_file = $q->param("articlefilename");
> $article_file = '' unless defined $article_file;
>
> And the same for any other form parameter you receive.
I've been doing this for ten years now, but I've stopped, because it's
tedious and, I think, pointless. Perl does it for you.
> If you are programming for the web, where you essentially do not know
> which miscreant browser or user is at the other end, you have to
> program defensively. Suppressing warnings is the wrong way to go.
I wouldn't say so. Either you agree with Perl's automatic conversion of
undef to '' or 0 depending on the context (which apparently you do when
you write "|| ''" or "|| 0"), or you don't agree because you do not want
to tolerate undef at all (because you're counting money, for example).
In the former case, just do as the OP did (no warnings 'uninitialized');
in the latter case consider making your code really robust and the
warning fatal.
For the record, I've changed my mind about this uninitialized business
after reading the perldoc for common::sense by Marc Lehmann:
http://search.cpan.org/~mlehmann/common-sense-3.2/
--
Michael Ludwig
Re: Getting a / when regex should produce nothing
am 26.04.2010 10:22:18 von aw
Michael Ludwig wrote:
> André Warnier schrieb am 25.04.2010 um 12:44:56 (+0200):
>
....
>>> no warnings 'uninitialized';
>> is very dubious.
>
> I used to think so, too, but I've recently changed my mind.
>
....
Michael, I have no doubt that your intrinsic perl knowledge surpasses
mine, but I disagree, not with the details of what you mention, but on
the general spirit and in the context. And I find your quoting of the
common::sense module a bit biased, again not in the details but in the
spirit in which you seem to "recommend" it, again in this context. I
like the module, but in the sense that it seems indeed a useful shortcut
for a number of *explicit* assertions which *some* perl coders use.
Which is what its description honestly says :
"This module implements some sane defaults for Perl programs, as defined
by two typical (or not so typical - use your common sense) specimens of
Perl coders. In fact, after working out details on which warnings and
strict modes to enable and make fatal, we found that we (and our code
written so far, and others) fully agree on every option, even though we
never used warnings before, so it seems this module indeed reflects a
"common" sense among some long-time Perl coders."
However, many years of designing and programming stable and reliable
applications in perl have taught me to prefer explicit code,
understandable and maintainable also by other people; this as compared
for example to relying on assumptions about the compiler's defaults.
And for this OP, who on the face of it is not a perl guru, all the more so.
In my view, code such as
$article_file = $q->param("articlefilename");
(and then proceeding to use $article_file without further ado)
in a web application, is ok if you know exactly what you are doing, and
if you know somehow that the client cannot "not send" this parameter in
the request, and that it cannot send it empty, and that even if it does,
it does not matter all that much anyway; and that the person who
is going to re-read your code in 6 months knows as much about perl's
internals as you do.
$article_file = $q->param("articlefilename") || '';
does not add much technically, since as you rightly mention perl already
does that, in a way (although not at the same time or place). But it
makes *explicit* what you are doing, which makes the difference to
someone maybe not as fluent in perl and who has
to look at that code in a year's time.
In a real web application with which people interact through a browser
and a html form, I would have made it even much more defensive and
explicit, like :
$article_file = $q->param("articlefilename");
unless ((defined($article_file) && ($article_file ne '')) {
return an error of type "essential info not submitted"
}
unless ($article_file is-what-we-expect) {
return an error of type "invalid value submitted"
}
because in this case, that filename seems to be the real essential
element of the application, so I would not want an empty or undefined
value to even create a doubt as to where it comes from or what happens
with it. And I know that this is not elegant code; but it is code that
will survive another version of perl, another version of
Apache::Request, and another version of the programmer maintaining it.
I will qualify all the above by adding that this regards application
code, written and/or read and/or modified by programmers whose skills vary.
It is on the other hand perfectly ok in my view for a perl guru to write
perl modules using whatever clever techniques and idioms suit him, as
long as the module does what it claims to do. And as long as
(preferably) any section of such code comes with ample internal
documentation explaining what it does and what it relies on.
Re: Getting a / when regex should produce nothing
am 26.04.2010 12:32:52 von Chris Bennett
I started learning mod_perl versus regular perl for web applications for
two reasons.
Mod_perl is much faster, but that was only an interest, not quite strong
enough to push the effort, despite seeing many applications that are
horribly slow without it.
When someone said that using mod_perl was an easy way to deal with
having apache chrooted without having to drag all kinds of files inside,
that made my decision.
I have not regretted it. I have learned many details that I could have
overlooked with regular perl. Mod_perl is more unforgiving of not
knowing exactly what my variables are doing and what values they hold
This particular issue has taught me two things.
I am never going to use no warnings 'uninitialized' again. It is too
dangerous to be overlooking possible problems.
It has also taught me that perl itself may leave values in variables
such as $1, even after a server stop and start and first running of a
program. Sounds like an early lesson out of C. Never assume anything is
in fact defined without defining it yourself.
Nope, not a perl guru, yet. But if you keep on thorough pointing out my
every error clearly, I guess that will happen sooner! :)
Thanks
Chris Bennett
Re: Getting a / when regex should produce nothing
am 26.04.2010 13:23:30 von aw
Chris Bennett wrote:
....
>
> I have not regretted it. I have learned many details that I could have
> overlooked with regular perl. Mod_perl is more unforgiving of not
> knowing exactly what my variables are doing and what values they hold
>
Perl and nod_perl are very deterministic, and there is no mystery in
what they do with variables. The trick is to understand exactly how
mod_perl works, and how this plays along with the way Apache (in its
different MPM variations) works.
> This particular issue has taught me two things.
> I am never going to use no warnings 'uninitialized' again. It is too
> dangerous to be overlooking possible problems.
I agree.
Maybe even do as Michael said, and make all warnings fatal, if we are
talking about user/web oriented applications.
> It has also taught me that perl itself may leave values in variables
> such as $1, even after a server stop and start and first running of a
> program.
That however, is definitely not the case.
If you stop and start the server, you have a totally new environment,
and there is nothing left from the previous one.
If you are using a "prefork" version of the Apache server, then that is
also true each time an Apache child ends and a new one is started : it
gets a "new perl" and a new set of variables.
(but the thing is, you mostly cannot predict when Apache will start a
new child, nor which child will handle which request).
For other Apache configurations, the situation may be a bit more
complicated.
The main aspect to understand with mod_perl (as opposed to running a
perl program without it) :
- when you run a perl script without mod_perl, the sequence is :
- a new perl interpreter is started, clean
- your script gets compiled, and gets a brand-new set of variables
- your script gets run, starting with this new set of variables
- your script "exits"
- perl exits, and returns all its memory to the OS
.. and the next time you run your script, the same steps happen.
under mod_perl :
- an Apache child starts, and it gets a new perl
- your script gets compiled, the first time around. That time, it
gets a brand-new set of variables.
- your script gets run, the first time, with these brand-new variables.
- when the script terminates, whatever is in these variables stays
there.
- the perl interpreter inside that same Apache child stays alive, amd
it "remembers" the compiled code of your script.
Now is the difference : when a new request comes into Apache, and (as
may happen) it is sent to the *same* Apache child, it is processed by
the same perl interpreter. And that one is not "clean" : it remembers
your compiled script, and the state of its variables from the last
execution. And that is where it starts from.
The big gain is that perl does not have to compile the script again, it
can run the compiled code right away.
The big danger is that your variables start with the state in which the
previous run of the same script in the same Apache child left them.
That can sometimes be put to good use, but it is also deadly if you are
not very careful.
The above is only a very approximative explanation, and the reality is
somewhat subtler. But if you stick to that basic explanation, you will
avoid much trouble.
The fact that your $1 variable retained a value from an earlier
comparison however has nothing to do with the above. That would be true
even if your script was not running under mod_perl.
Sounds like an early lesson out of C. Never assume anything is
> in fact defined without defining it yourself.
That is a good principle in general (and not only in perl).
A final observation : at the beginning, I think what most perl/web
programmers find the most interesting aspect of mod_perl is that
scripts/modules run much faster (because they do not need to be
re-compiled each time).
But I find that the real benefit is more in terms of how closely it is
integrated into the Apache "insides", and the incredible power it gives
you to create "handlers" and "filters" to let you intervene at just
about every stage of the processing of a request, and use all the power
and flexibility of perl (and of the CPAN library) to do all kinds of
stuff you could not even dream of otherwise.
reading the on-line mod_perl documentation is also a unique way to learn
how Apache itself works.
Re: Getting a / when regex should produce nothing
am 26.04.2010 18:24:20 von Craig MacKenna
Thank you, AW, for a well-written summary of the aspect of
mod_perl that causes the most difficult/nasty bugs for people.
But you left out one important caveat which could scare away
more potential users than it saves.
The retention of values from previous executions applies
only to global variables. Specifically:
> - when the script terminates, whatever is in the *global* =20
> variables stays there.
> it remembers your compiled script, and the state of its *global* =20=
> variables from the last execution
> The big danger is that your *global* variables start with the =20
> state in which the previous run
Programming pundits have been discouraging the use of global
variables for years now, perhaps a little more strongly than
is good for the state of the art. However anyone feels
about that, it's useful to write out some guidelines:
* Use global variables for information that you specifically
want to save across executions. The best case is for items
that are defined during the first execution, then used in
later executions.
* If you want to use a global variable for some other reason,
be very careful and aware that it may start an execution
with a value from a previous execution.
* Put most variables inside of subs (including handlers).
These will be initialized for each execution just as in
other perl contexts. Most people have come to accept a bit
more parameter-passing to subs than would be necessary with
global variables, for this initialization plus avoiding
inadvertent bugs when one forgets a global variable name.
A technique that I've used to avoid problems (particularly
when converting old CGI scripts) is, given a list of global
variables at the start of the module:
my ($var1, $var2, $other_var);
my $inited_var =3D 123;
write directly thereafter:
sub init_vars {
undef $var1; undef $var2; undef $other_var;
inited_var =3D 123;
}
and call init_vars just before each exit from each handler.
Even easier though to me less satisfying, call init_vars right
after entry to each handler.
Regards,
cmac
On Apr 26, 2010, at 4:23 AM, Andr=E9 Warnier wrote:
> Chris Bennett wrote:
> ...
>> I have not regretted it. I have learned many details that I could =20
>> have overlooked with regular perl. Mod_perl is more unforgiving of =20=
>> not knowing exactly what my variables are doing and what values =20
>> they hold
> Perl and nod_perl are very deterministic, and there is no mystery =20
> in what they do with variables. The trick is to understand exactly =20=
> how mod_perl works, and how this plays along with the way Apache =20
> (in its different MPM variations) works.
>
>> This particular issue has taught me two things.
>> I am never going to use no warnings 'uninitialized' again. It is =20
>> too dangerous to be overlooking possible problems.
> I agree.
> Maybe even do as Michael said, and make all warnings fatal, if we =20
> are talking about user/web oriented applications.
>
>> It has also taught me that perl itself may leave values in =20
>> variables such as $1, even after a server stop and start and first =20=
>> running of a program.
> That however, is definitely not the case.
> If you stop and start the server, you have a totally new =20
> environment, and there is nothing left from the previous one.
> If you are using a "prefork" version of the Apache server, then =20
> that is also true each time an Apache child ends and a new one is =20
> started : it gets a "new perl" and a new set of variables.
> (but the thing is, you mostly cannot predict when Apache will start =20=
> a new child, nor which child will handle which request).
> For other Apache configurations, the situation may be a bit more =20
> complicated.
>
> The main aspect to understand with mod_perl (as opposed to running =20
> a perl program without it) :
> - when you run a perl script without mod_perl, the sequence is :
> - a new perl interpreter is started, clean
> - your script gets compiled, and gets a brand-new set of variables
> - your script gets run, starting with this new set of variables
> - your script "exits"
> - perl exits, and returns all its memory to the OS
> .. and the next time you run your script, the same steps happen.
>
> under mod_perl :
> - an Apache child starts, and it gets a new perl
> - your script gets compiled, the first time around. That time, it =20=
> gets a brand-new set of variables.
> - your script gets run, the first time, with these brand-new =20
> variables.
> - when the script terminates, whatever is in these variables =20
> stays there.
> - the perl interpreter inside that same Apache child stays alive, =20=
> amd it "remembers" the compiled code of your script.
>
> Now is the difference : when a new request comes into Apache, and =20
> (as may happen) it is sent to the *same* Apache child, it is =20
> processed by the same perl interpreter. And that one is not =20
> "clean" : it remembers your compiled script, and the state of its =20
> variables from the last execution. And that is where it starts from.
> The big gain is that perl does not have to compile the script =20
> again, it can run the compiled code right away.
> The big danger is that your variables start with the state in which =20=
> the previous run of the same script in the same Apache child left =20
> them.
> That can sometimes be put to good use, but it is also deadly if you =20=
> are not very careful.
>
> The above is only a very approximative explanation, and the reality =20=
> is somewhat subtler. But if you stick to that basic explanation, =20
> you will avoid much trouble.
>
> The fact that your $1 variable retained a value from an earlier =20
> comparison however has nothing to do with the above. That would be =20=
> true even if your script was not running under mod_perl.
>
> Sounds like an early lesson out of C. Never assume anything is
>> in fact defined without defining it yourself.
> That is a good principle in general (and not only in perl).
>
>
> A final observation : at the beginning, I think what most perl/web =20
> programmers find the most interesting aspect of mod_perl is that =20
> scripts/modules run much faster (because they do not need to be re-=20
> compiled each time).
> But I find that the real benefit is more in terms of how closely it =20=
> is integrated into the Apache "insides", and the incredible power =20
> it gives you to create "handlers" and "filters" to let you =20
> intervene at just about every stage of the processing of a request, =20=
> and use all the power and flexibility of perl (and of the CPAN =20
> library) to do all kinds of stuff you could not even dream of =20
> otherwise.
> reading the on-line mod_perl documentation is also a unique way to =20
> learn how Apache itself works.
>
Re: Getting a / when regex should produce nothing
am 26.04.2010 20:58:41 von Michael Ludwig
André Warnier schrieb am 26.04.2010 um 10:22:18 (+0200):
> $article_file = $q->param("articlefilename");
> $article_file = $q->param("articlefilename") || '';
> does not add much technically
True, but no benefit either, other than telling the reader that the
empty string is your default value, which would have also been Perl's
default value. It could also be something differing from the Perl
default:
$article_file = $q->param("articlefilename") || get_dflt_article_name;
> But it makes *explicit* what you are doing, which makes the difference
> to someone maybe not as fluent in perl and who has to look at that
> code in a year's time.
Okay, that's right. On the other hand, PHP works just like Perl without
explicit 'uninitialized' warnings, and I guess this PHP trait is mostly
thought of as a feature. (Which it is.)
> It is on the other hand perfectly ok in my view for a perl guru to
> write perl modules using whatever clever techniques and idioms suit
> him, as long as the module does what it claims to do. And as long as
> (preferably) any section of such code comes with ample internal
> documentation explaining what it does and what it relies on.
I think that's the important part: document the intent.
--
Michael Ludwig
Re: Getting a / when regex should produce nothing
am 26.04.2010 21:44:32 von aw
craig@animalhead.com wrote:
....
> But you left out one important caveat which could scare away
> more potential users than it saves.
>
> The retention of values from previous executions applies
> only to global variables.
Ah, yes.
But that would have triggered another discussion (which it might now
still do of course), about what exactly /is/ a global variable, in the
context of a mod_perl handler or perl script run under modperl::Registry.
I must admit that I am not totally clear on that subject either. I
understand the basic idea of scoping, but as to the fine distinctions
between "our" and "my" variables defined/referenced within/without
various functions defined in the same package, and what mod_perl makes
of this package when it compiles it, I tend to get a bit confused. And
I would not be surprised if the perl documentation to that effect
confused a relative beginner even more.
So again, to be defensive I find that the safest (if not most
efficient/elegant) way is to just treat every variable as a potential
problem, and make sure they are (re-)initialised unless I specifically
don't want them to be.
This is no critic to the writers of the perl and mod_perl documentation.
I am sure that this particular topic is quite hard to get across
clearly and succintly to perl plodders such as me.
And I find the perl documentation, in general, extremely accessible and
a treasure-trove of information (and not just about perl).
It's just that on that particular topic I seem to be a bit thick, and
considering that, I'd rather be safe than sorry.
Re: Getting a / when regex should produce nothing
am 27.04.2010 10:18:17 von Michael Ludwig
Moin Andr=E9,
Am 26.04.2010 um 21:44 schrieb Andr=E9 Warnier:
> craig@animalhead.com wrote:
>> The retention of values from previous executions applies
>> only to global variables.
>=20
> Ah, yes.
> But that would have triggered another discussion (which it might now=20
> still do of course), about what exactly /is/ a global variable, in the=20
> context of a mod_perl handler or perl script run under modperl::Registry.
Let's first clarify it for Perl in general, and then for mod_perl.
A global variable in Perl is any variable not declared with "my". Which inc=
ludes variables declared with "our" or "use vars" (I'll get to these), and =
also variables created by full qualification, as in "$Bla::Blub =3D 1".
A lexical variable in Perl is any variable declared with "my", regardless o=
f the scope, which may be file-level. Unlike globals, lexical variables are=
n't directly accessible from outside the package.
A global variable declared (or introduced, or admitted) with "use vars" is =
in scope for the entire package where it is declared. A global variable dec=
lared with "our" is in scope only for the lexical scope where it is declare=
d (see "perldoc -f our").
(There's also "local", a misnomer, to temporarily stash away the current va=
lue of a global variable and shadow it with another value. We can leave it =
out of the picture here.)
Now, how is this different for mod_perl? Well, it isn't, if you think about=
it, or rather it boils down to the difference between a mod_perl handler a=
nd your typical batch script. Your batch script is invoked, it runs, and en=
ds. Running it probably includes some initialization code of yours placed a=
t the file level. Next time around, the whole thing start anew. Nothing spe=
cial here.
A mod_perl handler, as you know, is loaded once, and unless it is reloaded,=
is only acted upon by invocation of its functions, such as handler(). Whic=
h means that reinitialization doesn't happen automatically, as with your ba=
tch script running in a new process each time.
So what does this mean for file level lexical variables (my-variables) you =
have defined? Well, they don't get reinitialized (unless you provide code t=
o do so), so they start behaving like global variables, retaining state bet=
ween invocations. They are not, however, accessible from outside the curren=
t package, so they're still lexical variables.
There's one more thing to understand, especially in the context of Apache::=
Registry and Apache2::Registry, and that's lexical "my" variables reference=
d from nested names subroutines. You do not usually create nested named sub=
routines, but the Registry handler does it for you by wrapping your registr=
y script in a handler subroutine in a package made up from the filesystem p=
ath of rour registry script. So if you define a registry script with a subr=
outine that references a lexical variable from the enclosing scope, you'll =
see the familiar warning message "Variable "$x" will not stay shared".
You can read up about this issue here:
http://perl.apache.org/docs/general/perl_reference/perl_refe rence.html
Hope this helps :-)
--=20
Michael.Ludwig (#) XING.com
Re: Getting a / when regex should produce nothing
am 27.04.2010 11:25:33 von torsten.foertsch
On Tuesday 27 April 2010 10:18:17 Michael Ludwig wrote:
> A lexical variable in Perl is any variable declared with "my", regardless
> of the scope, which may be file-level. Unlike globals, lexical variables
> aren't directly accessible from outside the package.
Not quite correct. Consider this:
$ perl -Mstrict -le '
my $x=3D10;
our $y=3D20;
{
package hugo;
print "x=3D$x"; # references $x outside of package hugo
print "y=3D$y"; # references $main::y
($x,$y)=3D(1,2);
}
print "x=3D$x";
print "y=3D$y"
'
x=3D10
y=3D20
x=3D1
y=3D2
$x is file-level lexical. It is visible all over the file. The embedded=20
package hugo has no influence. Lexical variables are not bound to a package=
=20
but to a lexical scope.
Same with our-variables. C declares the visibility of a variable in th=
e=20
current lexical scope.
> You can read up about this issue here:
>=20
> http://perl.apache.org/docs/general/perl_reference/perl_refe rence.html
Or for German speakers, there was a series of articles about scoping in $fo=
o-
magazin:
http://foo-magazin.de/
Torsten Förtsch
=2D-=20
Need professional modperl support? Hire me! (http://foertsch.name)
Like fantasy? http://kabatinte.net
Re: Getting a / when regex should produce nothing
am 27.04.2010 13:16:23 von aw
Michael Ludwig wrote:
> Moin André,
>
> Am 26.04.2010 um 21:44 schrieb André Warnier:
>> craig@animalhead.com wrote:
>>> The retention of values from previous executions applies
>>> only to global variables.
>> Ah, yes.
>> But that would have triggered another discussion (which it might now
>> still do of course), about what exactly /is/ a global variable, in the
>> context of a mod_perl handler or perl script run under modperl::Registry.
>
> Let's first clarify it for Perl in general, and then for mod_perl.
>
> A global variable in Perl is any variable not declared with "my". Which includes variables declared with "our" or "use vars" (I'll get to these), and also variables created by full qualification, as in "$Bla::Blub = 1".
>
> A lexical variable in Perl is any variable declared with "my", regardless of the scope, which may be file-level. Unlike globals, lexical variables aren't directly accessible from outside the package.
>
> A global variable declared (or introduced, or admitted) with "use vars" is in scope for the entire package where it is declared. A global variable declared with "our" is in scope only for the lexical scope where it is declared (see "perldoc -f our").
>
> (There's also "local", a misnomer, to temporarily stash away the current value of a global variable and shadow it with another value. We can leave it out of the picture here.)
>
> Now, how is this different for mod_perl? Well, it isn't, if you think about it, or rather it boils down to the difference between a mod_perl handler and your typical batch script. Your batch script is invoked, it runs, and ends. Running it probably includes some initialization code of yours placed at the file level. Next time around, the whole thing start anew. Nothing special here.
>
> A mod_perl handler, as you know, is loaded once, and unless it is reloaded, is only acted upon by invocation of its functions, such as handler(). Which means that reinitialization doesn't happen automatically, as with your batch script running in a new process each time.
>
> So what does this mean for file level lexical variables (my-variables) you have defined? Well, they don't get reinitialized (unless you provide code to do so), so they start behaving like global variables, retaining state between invocations. They are not, however, accessible from outside the current package, so they're still lexical variables.
>
> There's one more thing to understand, especially in the context of Apache::Registry and Apache2::Registry, and that's lexical "my" variables referenced from nested names subroutines. You do not usually create nested named subroutines, but the Registry handler does it for you by wrapping your registry script in a handler subroutine in a package made up from the filesystem path of rour registry script. So if you define a registry script with a subroutine that references a lexical variable from the enclosing scope, you'll see the familiar warning message "Variable "$x" will not stay shared".
>
> You can read up about this issue here:
>
> http://perl.apache.org/docs/general/perl_reference/perl_refe rence.html
>
> Hope this helps :-)
>
Very nice. And it does help my understanding.
Although the key paragraph here, I would say, is :
So what does this mean for file level lexical variables (my-variables)
you have defined? Well, they don't get reinitialized (unless you provide
code to do so), so they start behaving like global variables, retaining
state between invocations. They are not, however, accessible from
outside the current package, so they're still lexical variables.
Let me give an example of how I understand this, for mod_perl handler
packages :
# -- start of code --
package My::Something;
my $lexical_mine;
sub access {
my $r = shift;
if (defined($lexical_mine) {
$r->log_error("in access: value present : $lexical_mine);
$lexical_mine++;
} else {
$lexical_mine = 1;
$r->log_error("in access: initialised to : $lexical_mine);
}
return OK;
}
sub response {
my $r = shift;
if (defined($lexical_mine) {
$r->log_error("in response: value present : $lexical_mine);
$lexical_mine++;
} else {
$lexical_mine = 1;
$r->log_error("in response: initialised to : $lexical_mine);
}
# .. generate some response for the browser
return OK;
}
sub finalise {
my $r = shift;
my $lexical_mine;
if (defined($lexical_mine) {
$r->log_error("in finalise: value present : $lexical_mine);
$lexical_mine++;
} else {
$lexical_mine = 1;
$r->log_error("in finalise: initialised to : $lexical_mine);
}
return OK;
}
# -- end of code --
Now if I configure the first of these subs as a PerlAccessHandler and
the second as a PerlResponseHandler, what happens to $lexical_mine ?
It might be lexical, in the sense that there is no way for code in
another file, to access this variable from outside as, for example,
$My::Something::lexical_mine
But for all intents and purposes, this variable is "functionally
global", in the sense that throughout the life of the Apache child that
contains this perl interpreter, this variable is "shared", not only by
the separate handler subs, but even by subsequent invocations of these
subs in the course of processing all HTTP requests which happen to be
processed by this Apache child.(*)
On the other hand, if I configure "finalise" as a PerlSomethingHandler,
then the $lexical_mine that is defined inside it, does not play along
with the other one. It is its own thing, and it will print its own
incremental sequence 1,1,2,3,4,5
But it is still "global" in a sense : while "private" to the sub
"finalise", it nevertheless is shared between consecutive invocations of
the same finalise() sub by the same Apache child.
So I guess what I mean is :
"global" and "lexical", as you use the terms above and as they are used
in the perl documentation (and no doubt rightly so), refer to scoping in
the sense of "how can I / can I not access that variable from outside of
the block/file where it is declared, using a "name" for it in my code.
However, for someone starting with perl and mod_perl, the term "global"
has a tendency to be interpreted as "shared between the handler subs
which I define in my package while they successively handle various
stages of one request", or even "shared between different invocations of
these handler subs for different requests", or even as "shared by all
handler subs processing all requests to this Apache".
(The last one being impossible with prefork).
So, back to the basics, my interpretation : by default, consider any
variable as "global/shared" and you'll generally stay out of trouble.
(*) which is going to be very confusing in the logfile however, as for
now there is no way to distinguish which child logs a message.
For that, we might want to add ($$) to the log messages.
Re: Getting a / when regex should produce nothing
am 27.04.2010 18:35:10 von Perrin Harkins
On Tue, Apr 27, 2010 at 7:16 AM, Andr=E9 Warnier wrote:
> Now if I configure the first of these subs as a PerlAccessHandler and
> the second as a PerlResponseHandler, what happens to $lexical_mine ?
This is actually import to point out: it makes no difference if you
configure these subs as mod_perl anything. There is no difference at
all in scoping rules for mod_perl because mod_perl is just the perl
interpreter running inside apache. The thing that can make it appear
different is the lifecycle of the interpreter, i.e. the fact that it
doesn't exit as soon as your code has run once. This doesn't change
anything about scoping rules in perl though.
> But for all intents and purposes, this variable is "functionally
> global", in the sense that throughout the life of the Apache child that
> contains this perl interpreter, this variable is "shared", not only by
> the separate handler subs, but even by subsequent invocations of these
> subs in the course of processing all HTTP requests which happen to be
> processed by this Apache child.(*)
I think it's better to say the variable is "persistent" rather than
global, and the reason for that is that you're creating a closure.
Again, not a mod_perl thing. Any time you reference a lexical
variable from outside the sub you're in, it causes a closure.
> On the other hand, if I configure "finalise" as a PerlSomethingHandler,
> then the $lexical_mine that is defined inside it, does not play along
> with the other one. =A0It is its own thing, and it will print its own
> incremental sequence 1,1,2,3,4,5
> But it is still "global" in a sense : while "private" to the sub
> "finalise", it nevertheless is shared between consecutive invocations of
> the same finalise() sub by the same Apache child.
No, this isn't true. It's not a closure, so it will be undefined
every time you enter the sub. It will also give you a warning about
reusing the name of a lexical from a larger scope.
> So, back to the basics, my interpretation : by default, consider any
> variable as "global/shared" and you'll generally stay out of trouble.
Here's another rule you could use: don't reference lexical variables
defined outside of a sub from within that sub. Either use globals for
things you want shared, or pass in the data you need as parameters.
Then you won't be surprised by closures messing up your ideas about
scoping.
I also suggest that anyone who finds this confusing should have a look
through the information on variable scoping in the perl man pages,
Programming Perl, or perlmonks.org. It really pays to understand this
stuff.
- Perrin
Re: Getting a / when regex should produce nothing
am 27.04.2010 18:48:21 von Craig MacKenna
On Apr 27, 2010, at 4:16 AM, Andr=E9 Warnier wrote:
> So, back to the basics, my interpretation : by default, consider =20
> any variable as "global/shared" and you'll generally stay out of =20
> trouble.
>
Isn't it true that a variable declared (with my) inside of a
sub (including a handler) starts its existence initialized for
each execution of the sub? So I consider variables declared
outside of any sub "global" and get along OK without knowing
what "lexical scope" means, though I may have to learn someday.
That is all I was trying to say in the post that triggered
this thread's recent display of technical erudition.
>
> (*) which is going to be very confusing in the logfile however, as for
> now there is no way to distinguish which child logs a message.
> For that, we might want to add ($$) to the log messages.
>
The 3rd field of Apache log file entries is quite useless, and
I have long used it to include the process ID and keepalive status.
Here's my log file format from httpd.conf:
LogFormat "%h %l %P:%{Connection}i>%{Connection}o %t \"%r\" %>s %b \"%=20=
{Referer}i\" \"%{User-Agent}i\"" keepalive
Best Regards,
cmac
Re: Getting a / when regex should produce nothing
am 27.04.2010 20:08:49 von Michael Ludwig
Torsten Förtsch schrieb am 27.04.2010 um 11:25:33 (+0200):
> On Tuesday 27 April 2010 10:18:17 Michael Ludwig wrote:
>
> > A lexical variable in Perl is any variable declared with "my",
> > regardless of the scope, which may be file-level. Unlike globals,
> > lexical variables aren't directly accessible from outside the
> > package.
>
> Not quite correct. Consider this:
>
> $ perl -Mstrict -le '
> my $x=10;
> our $y=20;
> {
> package hugo;
>
> print "x=$x"; # references $x outside of package hugo
> print "y=$y"; # references $main::y
Thanks for catching this. Indeed, lexical variables aren't described
in terms of packages, but in terms of lexical scope.
> $x is file-level lexical. It is visible all over the file. The
> embedded package hugo has no influence. Lexical variables are not
> bound to a package but to a lexical scope.
>
> Same with our-variables. C declares the visibility of a variable
> in the current lexical scope.
Variables declared with "our" are a funny hybrid between global
variables, which are attached to a package, and lexical variables,
which are attached to a scope.
--
Michael Ludwig
Re: Getting a / when regex should produce nothing
am 27.04.2010 20:20:13 von Perrin Harkins
On Tue, Apr 27, 2010 at 2:08 PM, Michael Ludwig wrote:
> Variables declared with "our" are a funny hybrid between global
> variables, which are attached to a package, and lexical variables,
> which are attached to a scope.
They are package variables (usually referred to as globals), which
have a lexically-scoped alias that lets you call them by their short
name. It's the short name alias that is lexical.
Here's a normal use of a package variable:
package Foo;
use strict;
use warnings;
$Foo::bar = 1;
sub print_it {
print $Foo::bar;
}
And here's the exact same thing, using "our" to save some typing:
package Foo;
use strict;
use warnings;
our $bar;
$bar = 1;
sub print_it {
print $bar;
}
Aside from the difference in how you refer to the variable, these are identical.
Hope that helps. If it doesn't, try this:
http://perldoc.perl.org/functions/our.html
- Perrin
Re: Getting a / when regex should produce nothing
am 28.04.2010 00:34:36 von Michael Ludwig
Perrin Harkins schrieb am 27.04.2010 um 14:20:13 (-0400):
> On Tue, Apr 27, 2010 at 2:08 PM, Michael Ludwig wrote:
> > Variables declared with "our" are a funny hybrid between global
> > variables, which are attached to a package, and lexical variables,
> > which are attached to a scope.
>
> They are package variables (usually referred to as globals), which
> have a lexically-scoped alias that lets you call them by their short
> name. It's the short name alias that is lexical.
I used to think (and still do so) that non-lexical variables, just
like subroutines, belong to the package they're in and do not need
the package prefix. Using which, however, it is possible to refer
to variables in a package other than the current one, like
$Data::Dumper::Indent = 1 or something similar. But simply writing
$bar in package Foo just refers to $Foo::bar:
\,,,/
(o o)
------oOOo-(_)-oOOo------
package Foo;
$bar = 1;
package main;
print "bar = $bar";
print "Foo::bar = $Foo::bar";
-------------------------
$ perl -l /tmp/pkg.pl
bar =
Foo::bar = 1
It's only the strict pragma that (fortunately) forces you to qualify
your globals. In other words, the "alias" is not really an alias, but
rather the thing itself, regardless of whether is is excepted from the
strict 'vars' pragma by means of (1) full qualification or (2) "use
vars" or (3) "our".
> Here's a normal use of a package variable:
>
> package Foo;
> use strict;
> use warnings;
>
> $Foo::bar = 1;
I think I've more frequently encountered the form:
use strict; # ban unqualified globals
use vars qw($bar); # make exceptions
$bar = 1; # use them
As perldoc -f our informs you, the above has been superseded by "our".
However, you still see a lot of "use vars" for reasons of backward
compatibility.
--
Michael Ludwig