Naming modules from the Internet Archive

am 15.04.2006 04:24:22 von TTK Ciar

Hello! My name is TTK Ciar, and I am a software engineer at the
Internet Archive.

In the Data Repository and Collections departments we have some
perl modules which we use in a lot of our software, which might be
of more general interest. I would like to clean up and document
these and publish them to CPAN.

My main question involves the module namespaces. Right now we have
things set up so all of our data clusters' servers have a /petabox
directory, and our in-house modules heirarchy is stored in
/petabox/sw/modules. Our scripts "use lib '/petabox/sw/modules'"
explicitly to use these modules, for instance:

#!/usr/bin/perl
use lib '/petabox/sw/modules';
use DR;
use DR::Ctl;
use DR::ItemTracker;
use Database::UniversalDB;

From the perspective of someone who uses these modules a lot, it
would be really nice if we could just have our own module namespace,
like "Cluster::InternetArchive", and then replicate the heirarchy
currently in /petabox/sw/module into Cluster::InternetArchive, such
that the above code fragment would turn into:

#!/usr/bin/perl
use Cluster::InternetArchive::DR;
use Cluster::InternetArchive::DR::Ctl;
use Cluster::InternetArchive::DR::ItemTracker;
use Cluster::InternetArchive::Database::UniversalDB;

This would make things convenient for *us* in a variety of ways
(and especially easy for me in particular) :-) but would it be what
the larger Perl community wants? Should I instead plan on tucking
each of our various modules under the module namespaces most closely
related to their function? For instance: App::Ctl, Db::UniversalDB.

An added spin to the issue is that many of these modules contain
functions which assume that the cluster environment is set up in
the "Archive Way" (things like ssh configured to run without
passwords, mountpoints and rsync modules following our naming
conventions, large persistent /tmp filesystems, and so on), though
many of these functions will operate fine in alien environments.
It would be handy if institutions which have configured their
clusters like ours could simply "install Cluster::InternetArchive"
and have all of the modules which are written to those conventions
installed.

I thought Cluster::InternetArchive might be appropriate because
most of these functions are cluster-centric, and the current
Cluster namespace seems underutilized, but on the other hand the
00modlist.long.html document asks: "Please avoid using more than
one level of nesting for module names (packages or classes within
modules can, of course, use any number)". So would dropping the
"Cluster::" and just going with "InternetArchive::" be the right
thing to do, in light of this? Or the other way around, dropping
"InternetArchive::" and putting them under "Cluster::"? Or would
it be better to scatter these modules across multiple existing
module heirarchies?

What's the right thing to do?

Thanks in advance,
-- TTK