Renaming all variables in a repository

Renaming all variables in a repository

am 23.07.2009 18:56:51 von Eddie Drapkin

Hey all,
we've got a repository here at work, with something like 55,000 files
in it. For the last few years, we've been naming $variables_like_this
and functions_the_same($way_too). And now we've decided to switch to
camelCasing everything and I've been tasked with somehow determining
if it's possible to automate this process. Usually, I'd just use the
IDE refactoring functionality, but doing it on a
per-method/per-function and a per-variable basis would take weeks, if
not longer, not to mention driving everyone insane.

I've tried with regular expressions, but I can't make them smart
enough to distinguish between builtins and userland code. I've looked
at the tokenizer and it seems to be the right way forward, but that's
also a huge project to get that to work.

I was wondering if anyone had had any experience doing this and could
either point me in the right direction or just down and out tell me
how to do it.

Thanks so much
--Eddie

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Renaming all variables in a repository

am 23.07.2009 19:06:53 von Robert Cummings

Eddie Drapkin wrote:
> Hey all,
> we've got a repository here at work, with something like 55,000 files
> in it. For the last few years, we've been naming $variables_like_this
> and functions_the_same($way_too). And now we've decided to switch to
> camelCasing everything and I've been tasked with somehow determining
> if it's possible to automate this process. Usually, I'd just use the
> IDE refactoring functionality, but doing it on a
> per-method/per-function and a per-variable basis would take weeks, if
> not longer, not to mention driving everyone insane.
>
> I've tried with regular expressions, but I can't make them smart
> enough to distinguish between builtins and userland code. I've looked
> at the tokenizer and it seems to be the right way forward, but that's
> also a huge project to get that to work.
>
> I was wondering if anyone had had any experience doing this and could
> either point me in the right direction or just down and out tell me
> how to do it.

Are any of these variables created by exporting an array to variables?
Are any of these variables global or otherwise and subsequently accessed
via an array and key? It may not just be a case of finding and replacing
variable names. It may also be a case of finding and replacing any array
keys that also follow the underscore system.

Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Renaming all variables in a repository

am 23.07.2009 19:12:53 von Martin Scotta

--0016e64769b629be48046f62998a
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

function toCamelCase( $string )
{
return str_replace( ' ' , '', ucwords( strtolower( strtr($string, '_',
' ') ) ));
}

echo toCamelCase( 'this_is_not_properly_written' );


You can use this simplest function to translate a string to camelCase.

The process could be...
1) parse by PHP
2) translate tokens to camelCase
3) writte the file with changes

You can easily parse a php file using http://php.net/token_get_all


On Thu, Jul 23, 2009 at 1:56 PM, Eddie Drapkin wrote:

> Hey all,
> we've got a repository here at work, with something like 55,000 files
> in it. For the last few years, we've been naming $variables_like_this
> and functions_the_same($way_too). And now we've decided to switch to
> camelCasing everything and I've been tasked with somehow determining
> if it's possible to automate this process. Usually, I'd just use the
> IDE refactoring functionality, but doing it on a
> per-method/per-function and a per-variable basis would take weeks, if
> not longer, not to mention driving everyone insane.
>
> I've tried with regular expressions, but I can't make them smart
> enough to distinguish between builtins and userland code. I've looked
> at the tokenizer and it seems to be the right way forward, but that's
> also a huge project to get that to work.
>
> I was wondering if anyone had had any experience doing this and could
> either point me in the right direction or just down and out tell me
> how to do it.
>
> Thanks so much
> --Eddie
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


--
Martin Scotta

--0016e64769b629be48046f62998a--

RE: Renaming all variables in a repository

am 23.07.2009 19:14:38 von jenai tomaka

--_94d14c13-f608-4f8b-8861-75cb42be42cf_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


In a project with this large number of files=2C is better if you let the wa=
y it is=2C doing this now you can crash the project and lost much much time=
..

Yuri Yarlei.
Programmer PHP=2C CSS=2C Java=2C PostregreSQL=3B
Today PHP=2C tomorrow Java=2C after the world.
Kyou wa PHP=2C ashita wa Java=2C sono ato sekai desu.



> Date: Thu=2C 23 Jul 2009 12:56:51 -0400
> From: oorza2k5@gmail.com
> To: php-general@lists.php.net
> Subject: [PHP] Renaming all variables in a repository
>=20
> Hey all=2C
> we've got a repository here at work=2C with something like 55=2C000 files
> in it. For the last few years=2C we've been naming $variables_like_this
> and functions_the_same($way_too). And now we've decided to switch to
> camelCasing everything and I've been tasked with somehow determining
> if it's possible to automate this process. Usually=2C I'd just use the
> IDE refactoring functionality=2C but doing it on a
> per-method/per-function and a per-variable basis would take weeks=2C if
> not longer=2C not to mention driving everyone insane.
>=20
> I've tried with regular expressions=2C but I can't make them smart
> enough to distinguish between builtins and userland code. I've looked
> at the tokenizer and it seems to be the right way forward=2C but that's
> also a huge project to get that to work.
>=20
> I was wondering if anyone had had any experience doing this and could
> either point me in the right direction or just down and out tell me
> how to do it.
>=20
> Thanks so much
> --Eddie
>=20
> --=20
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe=2C visit: http://www.php.net/unsub.php
>=20

____________________________________________________________ _____
Novo Internet Explorer 8. Baixe agora=2C =E9 gr=E1tis!
http://brasil.microsoft.com.br/IE8/mergulhe/?utm_source=3DMS N%3BHotmail&utm=
_medium=3DTagline&utm_campaign=3DIE8=

--_94d14c13-f608-4f8b-8861-75cb42be42cf_--

Re: Renaming all variables in a repository

am 23.07.2009 19:50:01 von Greg Beaver

Eddie Drapkin wrote:
> Hey all,
> we've got a repository here at work, with something like 55,000 files
> in it. For the last few years, we've been naming $variables_like_this
> and functions_the_same($way_too). And now we've decided to switch to
> camelCasing everything and I've been tasked with somehow determining
> if it's possible to automate this process. Usually, I'd just use the
> IDE refactoring functionality, but doing it on a
> per-method/per-function and a per-variable basis would take weeks, if
> not longer, not to mention driving everyone insane.
>
> I've tried with regular expressions, but I can't make them smart
> enough to distinguish between builtins and userland code. I've looked
> at the tokenizer and it seems to be the right way forward, but that's
> also a huge project to get that to work.
>
> I was wondering if anyone had had any experience doing this and could
> either point me in the right direction or just down and out tell me
> how to do it.

Hi Eddie,

That's quite the task :).

You're going to need to scan the source to generate a list of every
variable and function name using the tokenizer. Fortunately, this is
easy - with the caveat that if you do this anywhere in your source:

$a = $this->{$constructed . '_name'}();

you will have to handle these manually.

Basically, run token_get_all() on the source, scanning for T_VARIABLE,
and record every T_VARIABLE in an array. Then, scan for:

1) T_FUNCTION T_WHITESPACE* T_STRING
2) T_OBJECT_OPERATOR T_WHITESPACE* T_STRING

$replace = array();
foreach (new RegexIterator(new RecursiveIteratorIterator(new
RecursiveDirectoryIterator('/path/to/src')), '/\.php$/',
RegexIterator::MATCH, RegexIterator::USE_KEY) as $path => $file) {
$source = file_get_contents($path);

$checkForID = false;
$var = false;
$last = '';
foreach (token_get_all($source) as $token) {
if (!is_array($token)) continue;

if ($checkForID) {
if ($token[0] == T_WHITESPACE) {
$last .= $token[1];
continue;
}
if ($token[0] != T_STRING) {
$checkForID = false;
$last = '';
continue;
}
$token[1] = $last . $token[1];
} elseif ($token[0] == T_FUNCTION || $token[0] == T_OBJECT_OPERATOR) {
$checkForID = true;
$last = $token[1];
continue;
} elseif ($token[0] == T_STRING) {
if (function_exists($token[1])) {
continue; // skip internal functions
}
if (strtolower($token[1]) != $token[1]) {
continue; // assuming you UPPER-CASE constants, this skips them
}
} elseif ($token[0] != T_VARIABLE) {
continue;
}

// we get to here if we've found one to process
$new = explode('_', $token[1]);
$new = array_map('ucfirst', $new);
$new[0] = lcfirst($new); // for your camelCasing

$new = implode('', $new);
$replace[] = array($token[1], $new);
?>

Next, load each file (you should use RecursiveIteratorIterator with a
RecursiveDirectoryIterator and some kind of filter, probably
RegexIterator, to grab the PHP source files), and then iterate over the
list of variable names somewhat like this:

foreach (new RegexIterator(new RecursiveIteratorIterator(new
RecursiveDirectoryIterator('/path/to/src')), '/\.php$/',
RegexIterator::MATCH, RegexIterator::USE_KEY) as $path => $file) {
$source = file_get_contents($path);
foreach ($replace as $items) {

$source = str_replace($items[0], $items[1], $source);

if ($items[0][0] == '$') {
$source = preg_replace('/->(\s*)' . substr($variable, 1) . '/',
'->\\1'substr($new, 1),
$source);
}
}
file_put_contents($path, $source);
}
?>

Voila, code refactored.

I trust you know this, but don't run that example code without testing
it on a limited sandbox and comparing the results first :). I did not
test anything except the regexiterator part to make sure that it
actually grabbed PHP files, the rest is based on my experience
tokenizing for parsing PHP when writing tools like phpDocumentor.

If I made any mistakes, it would be good for you to post your final
scripts for posterity back on here.

Greg

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Renaming all variables in a repository

am 23.07.2009 21:34:33 von Eddie Drapkin

On Thu, Jul 23, 2009 at 1:50 PM, Greg Beaver wrote:
> Eddie Drapkin wrote:
>> Hey all,
>> we've got a repository here at work, with something like 55,000 files
>> in it. For the last few years, we've been naming $variables_like_this
>> and functions_the_same($way_too).  And now we've decided to switch =
to
>> camelCasing everything and I've been tasked with somehow determining
>> if it's possible to automate this process.  Usually, I'd just use t=
he
>> IDE refactoring functionality, but doing it on a
>> per-method/per-function and a per-variable basis would take weeks, if
>> not longer, not to mention driving everyone insane.
>>
>> I've tried with regular expressions, but I can't make them smart
>> enough to distinguish between builtins and userland code.  I've loo=
ked
>> at the tokenizer and it seems to be the right way forward, but that's
>> also a huge project to get that to work.
>>
>> I was wondering if anyone had had any experience doing this and could
>> either point me in the right direction or just down and out tell me
>> how to do it.
>
> Hi Eddie,
>
> That's quite the task :).
>
> You're going to need to scan the source to generate a list of every
> variable and function name using the tokenizer.  Fortunately, this i=
s
> easy - with the caveat that if you do this anywhere in your source:
>
> $a =3D $this->{$constructed . '_name'}();
>
> you will have to handle these manually.
>
> Basically, run token_get_all() on the source, scanning for T_VARIABLE,
> and record every T_VARIABLE in an array.  Then, scan for:
>
> 1) T_FUNCTION T_WHITESPACE* T_STRING
> 2) T_OBJECT_OPERATOR T_WHITESPACE* T_STRING
>
> > $replace =3D array();
> foreach (new RegexIterator(new RecursiveIteratorIterator(new
> RecursiveDirectoryIterator('/path/to/src')), '/\.php$/',
> RegexIterator::MATCH, RegexIterator::USE_KEY) as $path =3D> $file) {
> $source =3D file_get_contents($path);
>
> $checkForID =3D false;
> $var =3D false;
> $last =3D '';
> foreach (token_get_all($source) as $token) {
>    if (!is_array($token)) continue;
>
>    if ($checkForID) {
>        if ($token[0] == T_WHITESPACE) {
>            $last .=3D $token[1];
>            continue;
>        }
>        if ($token[0] !=3D T_STRING) {
>            $checkForID =3D false;
>            $last =3D '';
>            continue;
>        }
>        $token[1] =3D $last . $token[1];
>    } elseif ($token[0] == T_FUNCTION || $token[0] == T_=
OBJECT_OPERATOR) {
>        $checkForID =3D true;
>        $last =3D $token[1];
>        continue;
>    } elseif ($token[0] == T_STRING) {
>        if (function_exists($token[1])) {
>            continue; // skip internal funct=
ions
>        }
>        if (strtolower($token[1]) !=3D $token[1]) {
>            continue; // assuming you UPPER-=
CASE constants, this skips them
>        }
>    } elseif ($token[0] !=3D T_VARIABLE) {
>        continue;
>    }
>
>    // we get to here if we've found one to process
>    $new =3D explode('_', $token[1]);
>    $new =3D array_map('ucfirst', $new);
>    $new[0] =3D lcfirst($new); // for your camelCasing
>
>    $new =3D implode('', $new);
>    $replace[] =3D array($token[1], $new);
> ?>
>
> Next, load each file (you should use RecursiveIteratorIterator with a
> RecursiveDirectoryIterator and some kind of filter, probably
> RegexIterator, to grab the PHP source files), and then iterate over the
> list of variable names somewhat like this:
>
> > foreach (new RegexIterator(new RecursiveIteratorIterator(new
> RecursiveDirectoryIterator('/path/to/src')), '/\.php$/',
> RegexIterator::MATCH, RegexIterator::USE_KEY) as $path =3D> $file) {
>    $source =3D file_get_contents($path);
>    foreach ($replace as $items) {
>
>        $source =3D str_replace($items[0], $items[1], =
$source);
>
>        if ($items[0][0] == '$') {
>            $source =3D preg_replace('/->(\s=
*)' . substr($variable, 1) . '/',
>                     =C2=
=A0             '->\\1'substr($new, 1),
>                     =C2=
=A0             $source);
>        }
>    }
>    file_put_contents($path, $source);
> }
> ?>
>
> Voila, code refactored.
>
> I trust you know this, but don't run that example code without testing
> it on a limited sandbox and comparing the results first :).  I did n=
ot
> test anything except the regexiterator part to make sure that it
> actually grabbed PHP files, the rest is based on my experience
> tokenizing for parsing PHP when writing tools like phpDocumentor.
>
> If I made any mistakes, it would be good for you to post your final
> scripts for posterity back on here.
>
> Greg
>


Thanks so much, man. I'm using most of your methodology, although
there were definitely some hiccups along the way, but it seems to make
a map of what to replace and what to replace with so far, although the
code is far from pretty. I'll be sure to send it to the list when it's
done.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Renaming all variables in a repository

am 23.07.2009 22:29:00 von Stut

2009/7/23 Eddie Drapkin :
> Hey all,
> we've got a repository here at work, with something like 55,000 files
> in it. For the last few years, we've been naming $variables_like_this
> and functions_the_same($way_too).  And now we've decided to switch t=
o
> camelCasing everything and I've been tasked with somehow determining
> if it's possible to automate this process.  Usually, I'd just use th=
e
> IDE refactoring functionality, but doing it on a
> per-method/per-function and a per-variable basis would take weeks, if
> not longer, not to mention driving everyone insane.
>
> I've tried with regular expressions, but I can't make them smart
> enough to distinguish between builtins and userland code.  I've look=
ed
> at the tokenizer and it seems to be the right way forward, but that's
> also a huge project to get that to work.
>
> I was wondering if anyone had had any experience doing this and could
> either point me in the right direction or just down and out tell me
> how to do it.

I'd question the wisdom of doing such a thing at all. When it comes to
coding standards the important thing is not what they are, just that
they exist and are observed by everybody.

This sounds like a colossal waste of time, whether you can find an
automated method or not, for no apparent gain. Seriously, what's the
benefit of using camel over underscores? Sounds like a decision made
by a manager who feels the need to create work when none is actually
required.

-Stuart

--=20
http://stut.net/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php