ANNOUNCE: first post-rewrite Rosetta release (v0.720.0)

am 03.02.2006 05:08:10 von darren

2006-02-01 Darren Duncan
--------------------------------------------------

I am pleased to announce the first CPAN release of the second major
code base (started on 2005-10) of the Rosetta database access
framework, v0.720.0, which is available now in synchronized native
Perl 5 and Perl 6 versions.

This is a complete rewrite, including very different detail designs,
implementations, and documentations, though it still retains the same
high level design and purpose.

------------

The Perl 5 version is composed of these 2 distributions (more come later):

* Rosetta-v0.720.0.tar.gz
* Rosetta-Engine-Native-v0.1.0.tar.gz

These have Locale-KeyedText-v1.72.1.tar.gz (released at the same
time) as an external dependency.

The Perl 6 versions of all 3 of the above items are bundled with
Perl6-Pugs-6.2.11.tar.gz (released a half-day earlier) in its ext/
subdirectory.

The Perl 6 versions don't depend on anything outside the Perl6-Pugs
distro that they live in. But the Perl 5 versions also have external
dependencies on Perl 5.8.1+ and these Perl 5 packages, which add
features that Perl 6 and Pugs already have built-in: 'version',
'only', 'Readonly', Class::Std, Class::Std::Utils, Scalar::Util,
Test::More; the latter 2 are bundled with Perl 5.

------------

Following is both a reintroduction to the remade Rosetta as it is and
will soon be, and a summary of the main changes from before the
rewrite (first major code base of 2002 thru 2005-09).

For various reasons such will be bared below, it should be more
apparent than ever that Rosetta is "not just another DBI wrapper" and
really stands out as something different than any existing tools on
CPAN.

Note that many of these details aren't yet in Rosetta's own
documentation (they will be later), so they are distinct to this
email.

* Locale::KeyedText is officially not part of the Rosetta framework
anymore, being a distinct external dependency instead of its
localization component.

* Anything that was in the SQL::Routine name space has been renamed
into the 'Rosetta' name space.

* Briefly comparing DBI to Rosetta, DBI provides users with database
driver independence; Rosetta provides them with database language
independence, which is a higher abstraction, but it should still work
quickly.

* Rosetta is now officially a federated relational database of its
own that just happens to be good with cross-database-manager
portability issues, and be good as a toolkit on which to build ORMs
and persistence tools, rather than being mainly about portable SQL
generation.

* The native query and schema design language of Rosetta is now based
mainly on Tutorial D (by Christopher J. Date and Hugh Darwen) and
closely resembles relational algrebra, rather than being based on SQL
as it was before (note that some current documentation suggests
otherwise, but that will be rewritten).

* Note, see http://www.oreilly.com/catalog/databaseid/ , the book by
Date named "Database in Depth", which is one of the best references
on database design I have ever seen. Everyone who works with
databases should read it. Its not dry and has practical stuff you
can apply right now. I am.

* The native language of Rosetta is presently called "Intermediate
Relational Language" ("IRL", pronounced "earl", or "girl" without the
"g"); it is inspired by Pugs' "PIL", which serves a similar purpose
for Perl 6 as what IRL does for Tutorial D and SQL and other
languages.

* IRL is strongly typed, where every value and container is of a
single type, and permits user data type definitions to be arbitrarily
complex (such as temporal and spacial data) but non-recursive. Aside
from forbidding "references", it includes the features of so-called
"object-relational" databases which are actually part of the true
plain "relational" data model. Values of each distinct data type can
not be substituted as operator arguments for others, or stored in
containers for others, but they can be explicitly cross-converted in
some circumstances (eg num to str or str to num).

* Despite actually being strongly typed, IRL has facilities to
simulate weak data types over strong ones; for example, you can
define an SV type that has numerical and character string components.
More broadly speaking, you can define multi-part "disjunctive" types,
each of a different other type, where only one member has a
significant value at once, and the others have their type's concept
of an "empty" value; actually, these have a single extra member that
says which of the others holds the significant value.

* IRL natively uses 2-valued-logic (2VL) like Tutorial D, and not
3-valued-logic (3VL) like SQL, so every boolean valued expression
always evaluates to true or false, not true or false or unknown (a
SQL NULL). But it does simulate 3-valued-logic using disjunctive
data types, one of whose members is the system defined "Unknown"
strong data type, which can only ever hold the same single value; by
definition, a disjunctive data type value whose member A is the
significant one will never match with another whose significant
member is B, and hence we can distinguish between "Unknown" and zero
or the empty string when a number or string can't actually be set to
Unknown (null).

* IRL has distinct data types for what are commonly referred to as
"relations" (like a SQL table with a key, which may be over all of
its columns) and "bags" (like a SQL table that lacks a key), where
the former forbids duplicates and the latter allows them. Given
Rosetta's hard typing, a relation and a bag can not be substituted
for each other (except that they can be cross-converted, as numbers
and character strings can be cross-converted), but rather have their
own operators which either never output or can output duplicates
respectively. A bag can be implemented over a relation where the
relation has one extra attribute which stores a count of occurances
for the otherwise distinct combination of other attributes, and
operators do the right thing with that count.

* There is no inherent order of the attributes/columns of
relations/bags/tables, and there is no inherent order to their
tuples/rows, unlike SQL where at least the order of columns is
significant. IRL does all references by names rather than by
position; all operator parameters are named, as are relation
attributes.

* Besides relations and bags, IRL has a distinct array data type,
which is what you get when using an order-by; usually it only makes
sense to use this as the last step in a query when fetching data, if
the order is important.

* All typical joins between relations/bags/tables are natural joins,
where attributes/columns of each joined item implicitly correspond
and match when they have the same names and data types (and if none
match, you have a cartesian). You never specify join conditions
explicitly by using "foo = bar" or any such thing; rather, if you
want to match on dis-similar names, you first rename (like SQL's
"AS") one or both source columns. This also means that you can join
an arbitrary number of relations/tables in a single operation, and
they will just work, with the combined output relation/table having
distinct attribute/column names already.

* Instead of saying "select from where
", you nest arbitrary relational algebra expressions like
"project( restrict( , ), )" or
"restrict( project( , ), )"; both
of those latter 2 happen to give the exact same result.

* The finer grained IRL should be easier to write non-trivial queries
in than SQL, especially when adding things like groups and havings
and such, since you can more reliably know what pieces you have to
work with, and exactly what will happen when you say certain things,
and you don't have to needlessly duplicate expressions. Writing
queries in IRL should be more reliable than SQL since you don't have
to worry about getting different results from 2 logically identical
queries and you don't have to deal with ambiguous syntax.

* IRL should also be a lot easier to optimize for speed given the
lack of ambiguity that plagues attempts to optimize SQL.

* Rosetta is designed to be very componentized, where you can
substitute back-ends and front-ends at will, so it can work over both
SQL based and non-SQL based database engines, and its user interface
can resemble anything you want. It is also reasonably easy to map
SQL to IRL and back, so you can still query Rosetta databases using
various SQL dialects or other languages if you don't want to see the
IRL, and this can help with migrating older applications.

* It is likely to be the ideal case for most Rosetta users to have an
alternate front end, such as some adapted from current DBI wrappers,
object persistence or relational mapping tools, and so on, rather
than using IRL directly. Using Rosetta rather than DBI should make
the tasks of people making such wrappers and tools easier, since they
have a more reliable language to work against and they don't have to
maintain a multiplicity of back ends for each storage engine; Rosetta
does the latter for them.

* A typical Rosetta back-end that operates over an existing database
engine will take care of optimizing the queries for the native
database so they perform best. When using Rosetta, you just say
*what* you want to happen, not so much how, and Rosetta will take
care of getting it done quickly and correctly.

* A self contained back-end named Rosetta::Engine::Native implements
a relational database in Perl, so you can have that functionality
without straying outside Perl if you want. Of course,
Rosetta::Engine::Native is only meant to be a correct example, not
fast, so it should only be used for testing. Other backends can be
used for production.

* Genezzo is an already existing fast third party database,
implemented in Perl, which will be adapted to use Rosetta as its
interface, so you do have, a Perl option besides the for-testing-only
Native.

* The license of Rosetta has changed, such that my GPL exception
granted to allow linked code to retain its own license has changed;
it is no longer based on technicalities like how the linking is done,
but rather on what kind of license the linked code has. This should
make things a lot easier for developers of all stripes.

* See the Changes file with 'Rosetta' for more details on some aspects.

------------

Note that the current Rosetta framework on CPAN is mostly
documentation (incomplete and partly out of date), and has little in
the way of executable code right now.

I recommend looking, in particular, at the pod in these files:
Rosetta.pm, Model.pm, Language.pod, Overview.pod, TODO.pod.

Over the next month or so, hopefully coinciding with the Pugs 6.28.0
release (that is refactored over the new PIL2 and perl 6 object
model), I should have more code such that you can actually start
playing with Rosetta in your code.

I welcome any kind of assistence that you can provide with Rosetta,
and I hope that it will have a huge positive impact on the community.
Really, assistence would be appreciated.

Thank you and have a good day. -- Darren Duncan