DBI to BerkeleyDB?

am 24.08.2006 20:09:12 von mankyuhan2001

Hi. Again.
I did benchmark using BerkeleyDB (Hash) module (random select) and the num=
ber I got was around 10,000 / sec.=20
I also did similar benchmark using DBD::DBM (with BerkeleyDB Hash support).=
But this time, the number was too low. (less than 200 /sec).

I moved "prepare" outside loop, so DBD::DBM (BerkeleyDB) works almost twice=
faster (100 -> 200 /sec) but compare to BerkeleyDB, it is still slower.
Since DBD::DBM is now using BerkeleyDB, shouldn't it perform as well as Ber=
keleyDB ?? (or at least not as bad as what I got?)

Here are source codes that I used.
Thanks a lot!

####################################################
# BerkeleyDB
####################################################
use strict;
use BerkeleyDB;
use mkUtil;
my $Env =3D BerkeleyDB::Env->new(
-Home =3D> '/home/mhan/DB_bench/berkeley/db',
-Flags =3D> DB_INIT_MPOOL|DB_INIT_LOCK|DB_INIT_LOG|DB_INIT_TXN|DB_C=
REATE|DB_RECOVER,
-LockDetect =3D> DB_LOCK_YOUNGEST,
-ErrFile =3D> '/home/mhan/DB_bench/berkeley/db/berkeley_error.log',
) or die "Failed to Open: $!\n";

my $time =3D 10;
my (%berkeleyInt, %berkeleyChar);

tie %berkeleyInt, 'BerkeleyDB::Hash',
-Env =3D> $Env,
-Filename =3D> "berkeleyHashInt.db",
-Flags =3D> DB_CREATE,
or die "Cannot open 'berkeleyHashInt.db':$!\n";

my $count =3D 0;
my $startTime =3D mkUtil::printTime("BerkeleyDB::Hash:: Integer Select Star=
t");
my $endTime =3D mkUtil::getTime();
while($endTime - $startTime <$time){
my %r =3D mkUtil::strRand();
my $rInt =3D $r{'integer'};
my $temp =3D $berkeleyInt{$rInt};
# print "$rInt - $berkeleyInt{$rInt}\n";
$endTime =3D mkUtil::getTime();
$count++;
}
$endTime =3D mkUtil::printTime("BerkeleyDB::Hash:: Integer Select End");
print "Count: $count\n";
mkUtil::printAVG($count, $time, "BerkeleyDB::Hash:: Integer Select AVG (sec=
)");
####################################################
# DBD::DBM - BerkeleyDB support
####################################################
use DBI;
use mkUtil;
use BerkeleyDB;
my $dbh =3D DBI->connect('dbi:DBM:type=3DBerkeleyDB');

$dbh->{RaiseError} =3D 1;
$dbh->{dbm_berkeley_flags} =3D {
'DB_CREATE' =3D> DB_CREATE # pass in constants
, 'DB_RDONLY' =3D> DB_RDONLY # pass in constants
};

my $time =3D 10;

my $count =3D 0;
my $startTime =3D mkUtil::printTime("DBD::DBM:: Integer Select Start");
my $endTime =3D mkUtil::getTime();
my $sth =3D $dbh->prepare("SELECT * FROM dbmBerkeleyInt WHERE id =3D ?");
while($endTime - $startTime < $time){
my %r =3D mkUtil::strRand();
my $rInt =3D $r{'integer'};
$sth->execute($rInt);
while( my $row =3D $sth->fetch){
# print "@$row\n";
# sleep(1);
}
$endTime =3D mkUtil::getTime();
$count++;
}
$endTime =3D mkUtil::printTime("DBD::DBM:: Integer Select End");
print "COUNT: $count\n";
mkUtil::printAVG($count, $time, "DBD::DBM:: Integer Select AVG (sec)");

RE: DBI to BerkeleyDB?

am 25.08.2006 15:14:32 von Philip.Garrett

ManKyu Han wrote:
> Hi. Again.
> I did benchmark using BerkeleyDB (Hash) module (random select) and
> the number I got was around 10,000 / sec.=20
> I also did similar benchmark using DBD::DBM (with BerkeleyDB Hash
> support). But this time, the number was too low. (less than 200
> /sec). =20
>=20
> I moved "prepare" outside loop, so DBD::DBM (BerkeleyDB) works almost
> twice faster (100 -> 200 /sec) but compare to BerkeleyDB, it is still
> slower. Since DBD::DBM is now using BerkeleyDB, shouldn't it perform
> as well as BerkeleyDB ?? (or at least not as bad as what I got?) =20

You're comparing the speed of a very low-level dbm file interface to the
speed of a high-level rdbms interface that is implemented in pure Perl.
With BerkeleyDB there is almost no overhead -- you're essentially
calling the C library directly. With DBD::DBM, though, you're going
through several layers of abstraction -- at least DBI, a pure-perl DBD,
and a pure-perl SQL engine.

I don't have much experience in DBD::DBM, but this speed difference
doesn't really seem unreasonable.

If you have the option to use BerkeleyDB (specifically, if you only want
to store an index of name-value pairs), then you should probably use
that directly. It's one of the fastest (if not THE fastest) ways to
persist a hash. If you might need multi-column support in the future
and you need an in-process database, try DBD::SQLite. It's relatively
robust and is implemented in C.

Regards,
Philip

Re: DBI to BerkeleyDB?

am 25.08.2006 18:57:44 von jeff

ManKyu Han wrote:
> I moved "prepare" outside loop, so DBD::DBM (BerkeleyDB) works almost twice faster (100 -> 200 /sec) but compare to BerkeleyDB, it is still slower.
> Since DBD::DBM is now using BerkeleyDB, shouldn't it perform as well as BerkeleyDB ?? (or at least not as bad as what I got?)
If all you need is hash lookups on a single key and your main
requirement is absolute speed, then stick with straight BerkeleyDB, it
is guaranteed to be faster since it doesn't have to provide you with a
SQL (or any) querying interface. If you need the portability and
maintainability of a SQL interface, then use DBI. If you need SQL and
absolute speed, then use DBI with SQLite or PostgreSQL or MySQL or a
proprietary RDBMS. If you need SQL and want to make quick prototypes
of databases or handle smaller datasets where absolute speed isn't your
most important criterion or you need a SQL interface to pre-existing
BerkeleyDB data, then use DBD::DBM.

Yes, 0.005 seconds per query is slow compared to other ways to access
the data, but in many situations (e.g. a CGI environment), that amount
of time will not even be perceptible compared to other things that cost
time.

--
Jeff
(author of DBD::DBM)