how to safely eval user-generated code

am 31.03.2008 05:19:53 von emmettnicholas

Hi,

I realize that eval() is generally discouraged, but I've found myself
wishing that I could execute user-generated code.

One idea I've seen is to use token_get_all(), and then make sure no
T_STRING tokens match known "dangerous" function names.

Where could I find such a list of "dangerous" functions? What are the
pitfalls of this approach? Is there any way to safely allow user-
controlled scripting, or is it just a bad idea in general? Thanks.

-Emmett

Re: how to safely eval user-generated code

am 31.03.2008 10:44:37 von Erwin Moller

emmettnicholas@gmail.com schreef:
> Hi,
>
> I realize that eval() is generally discouraged, but I've found myself
> wishing that I could execute user-generated code.
>
> One idea I've seen is to use token_get_all(), and then make sure no
> T_STRING tokens match known "dangerous" function names.
>
> Where could I find such a list of "dangerous" functions? What are the
> pitfalls of this approach? Is there any way to safely allow user-
> controlled scripting, or is it just a bad idea in general? Thanks.
>
> -Emmett

Hi Emmett,

I think such an approach will never be 100% safe.
For starters, what do YOU consider a dangerous function? And me?
And the next version of PHP? Will it hold functionnames that will be
'dangerous' that are not in the current set?

When I was once in the situation I had to eval code provided by a user
(user was providing a function I needed to eval on some results from a
database), I approached it the other way round: I defined a few strings
that WERE allowed.
I am not sure if that help you because it is very restricting, and might
not at all apply to your situation.
In my situation I needed a function, so:
Y=eval('userinput')
and userinput could only contain:
numbers, (), */+-,sin(), cos(), and columnames for some table.
I wrote a function that stripped everything that did not follow these
demands, and if original didn't match result, the function was rejected.

Hope that helps.
If you explain what you try to accomplish, maybe we can give you another
solution.

Regards,
Erwin Moller

Re: how to safely eval user-generated code

am 01.04.2008 14:04:23 von Toby A Inkster

emmettnicholas wrote:

> One idea I've seen is to use token_get_all(), and then make sure no
> T_STRING tokens match known "dangerous" function names.

That's certainly an idea, but it has weaknesses. For example, you will
have to remember to include the names of "dangerous" functions not just in
PHP itself, but in your own code, and third-party libraries (e.g. PEAR).

Also be aware that the evaluated code will be able to tamper with any
variables you have defined. This means that if multiple evaluations are
used, the first eval() might be able to rewrite the next eval, and do
nasty things that way.

For example:

$script1 = get_script_from_db(1);
$script2 = get_script_from_db(2);

if (not_safe($script1) || not_safe($script2))
{
die("unsafe script!");
}

eval($script1);
eval($script2);

But imagine that $script1 is:

"$script2 = 'unlink(\'/etc/passwd\');';"

$script1 is a simple string assignment -- doesn't contain any dangerous
tokens. However, it maliciously overwrites $script2, which has already
been checked for safety. When $script2 is evaluated, something bad happens.

What you've suggested (checking tokens) is a lot better than naive ideas
like regexp checks, but requires more thought and attention than you may
expect at first.

Some of these problems can be alleviated by combining this token checking
with a Runkit sandbox:

http://uk.php.net/manual/en/runkit.sandbox.php

That enables you to evaluate the string in a closed-off environment which
has no access to your own functions and variables (apart from those that
you explicitly give it access to). Yes, it can still call built-in PHP
functions like unlink(), which is why you still need to exclude a list of
dangerous functions. (And Runkit sandbox provides the "disable_functions"
option to help.)

If you want to be truly safe, you really want to parse and execute the
user-defined script yourself. Because implementing your own PHP parser is
likely to be a lot of work, you'd probably want to define your own simple
scripting language and write an interpreter for that in PHP. You can make
this language as safe as you like, because *you* define the built-in
functions.

I have previously posted an outline on how to write interpreters for
arbitrary scripting languages in PHP here:
http://message-id.net/4j01a4-9jr.ln1@ophelia.g5n.co.uk

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 5 days, 22:50.]

Cognition 0.1 Alpha 6
http://tobyinkster.co.uk/blog/2008/03/29/cognition-alpha6/

AW: how to safely eval user-generated code

am 01.04.2008 18:45:43 von palbertini

emmettnicholas:

> One idea I've seen is to use token_get_all(), and then make sure no
> T_STRING tokens match known "dangerous" function names.

I think it might be imposiible the identify these functions, since
harmless function may become dangerous when combined in the right way.

Consider this script:

$i = 1000*1000*1000;
$s = "foo and bar hang around";

for ($a =0; $a < $i; $a++) {
$h = fopen ("file$a.txt","w");
fputs($h,$s);
fclose($h);
}

The only function used here is simple file manip functions, but your
webserver might not be able to deal with 1000000000 small txt files. I
could also avoid these functions by using copy() (and maybe copying some
images you used in the webpage). This is not directly malicious code.

Maybe a script could copy itself and afterwards include the copy (one
million times), which will surely allocate a lot of memory ....

Better stick to another solution. Maybe describing your project would
help.