Best approach for DB-based event logging?
Best approach for DB-based event logging?
am 01.07.2009 19:57:20 von Marcus Bointon
I need to log fairly large numbers of historical events relating to
mailing list activity on a per-recipient basis, a kind of audit trail
if you like. So for a given user, I might log the fact that they
subscribed to a list, that they were uploaded by someone, that they
were sent a message, that they unsubscribed etc. For the most part
this is write-only and is only ever read very rarely, but when I do,
it will be to retrieve the details of a single user, and all I need is
the whole history, not individual events. At present I'm logging by
appending to a text blob field for each recipient. This works ok, but
the append gets progressively slower over time as the append speed is
dependent on the length of the existing field, which I think makes for
an O(n^2) complexity overall, which is obviously not good. A typical
choke point is after a list upload when I need to batch-update records
for everyone on the list, which may be half a million or more records.
I'm not bothered about current live data as that's not such a problem
- this is just about the historical data.
I'm looking for a good alternative to this. So far I've looked at
archive tables for logging on a per-event basis (can't do per-user as
it doesn't do updates), however, with a large number of users and
events, this would rapidly grow by several million records per week,
and I suspect searches would become unusably slow (no index in that
storage engine). I thought of using memcachedb instead, however,
unless I threw large numbers of servers at it, I would simply be
moving my bottleneck to whatever back-end memcachedb uses (sqlite?
bdb?) which is probably less efficient than MySQL anyway. Partitioning
isn't a great solution as that's mainly for improving select
performance. Keeping a file on disk for each user might work, but I
suspect that would become impractical as I have millions of users, and
files on disk is just a kind of inefficient home-brew database.
So, any other ideas?
Marcus
--
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK resellers of info@hand CRM solutions
marcus@synchromedia.co.uk | http://www.synchromedia.co.uk/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: Best approach for DB-based event logging?
am 03.07.2009 10:42:52 von Johan De Meersman
--0016e6dbe8895c8686046dc92403
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
To be honest, this sounds like more of a filesystem thing, given that you
only ever need to select the full set of an individual user. Just build up
an FS structure with one file per user.
On Wed, Jul 1, 2009 at 7:57 PM, Marcus Bointon wrote:
> I need to log fairly large numbers of historical events relating to mailing
> list activity on a per-recipient basis, a kind of audit trail if you like.
> So for a given user, I might log the fact that they subscribed to a list,
> that they were uploaded by someone, that they were sent a message, that they
> unsubscribed etc. For the most part this is write-only and is only ever read
> very rarely, but when I do, it will be to retrieve the details of a single
> user, and all I need is the whole history, not individual events. At present
> I'm logging by appending to a text blob field for each recipient. This works
> ok, but the append gets progressively slower over time as the append speed
> is dependent on the length of the existing field, which I think makes for an
> O(n^2) complexity overall, which is obviously not good. A typical choke
> point is after a list upload when I need to batch-update records for
> everyone on the list, which may be half a million or more records.
>
> I'm not bothered about current live data as that's not such a problem -
> this is just about the historical data.
>
> I'm looking for a good alternative to this. So far I've looked at archive
> tables for logging on a per-event basis (can't do per-user as it doesn't do
> updates), however, with a large number of users and events, this would
> rapidly grow by several million records per week, and I suspect searches
> would become unusably slow (no index in that storage engine). I thought of
> using memcachedb instead, however, unless I threw large numbers of servers
> at it, I would simply be moving my bottleneck to whatever back-end
> memcachedb uses (sqlite? bdb?) which is probably less efficient than MySQL
> anyway. Partitioning isn't a great solution as that's mainly for improving
> select performance. Keeping a file on disk for each user might work, but I
> suspect that would become impractical as I have millions of users, and files
> on disk is just a kind of inefficient home-brew database.
>
> So, any other ideas?
>
> Marcus
> --
> Marcus Bointon
> Synchromedia Limited: Creators of http://www.smartmessages.net/
> UK resellers of info@hand CRM solutions
> marcus@synchromedia.co.uk | http://www.synchromedia.co.uk/
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=vegivamp@tuxera.be
>
>
--
Celsius is based on water temperature.
Fahrenheit is based on alcohol temperature.
Ergo, Fahrenheit is better than Celsius. QED.
--0016e6dbe8895c8686046dc92403--
Re: Best approach for DB-based event logging?
am 03.07.2009 12:09:16 von Marcus Bointon
On 3 Jul 2009, at 09:42, Johan De Meersman wrote:
> To be honest, this sounds like more of a filesystem thing, given
> that you only ever need to select the full set of an individual
> user. Just build up an FS structure with one file per user.
You really think so? Even though I'll need to initially create about
64k folders (keying off a user-id related hash) in order to keep files-
per-dir down to a sensible amount? Its probably about now that I start
wanting a reiserFS partition...
Marcus
--
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK resellers of info@hand CRM solutions
marcus@synchromedia.co.uk | http://www.synchromedia.co.uk/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: Best approach for DB-based event logging?
am 03.07.2009 12:33:02 von nigel wood
Marcus Bointon wrote:
"For the most part this is write-only and is only ever read very rarely,
but when I do, it will be to retrieve the details of a single user, and
all I need is the whole history, not individual events."
For your stated requirements the filesystem is probably most efficient.
It does sound odd from a data retention/archiving and data retrieval
times point of view though. Will the earliest log data be held on line
and constantly available literally forever? Will you still want to pull
back the whole list when a user's history hits 50mb? Would this data
pass over a network?
Personally I'd use the database and I really wouldn't worry about the
number of rows. Properly normalised hundreds millions of rows aren't a
problem until you cant hold indexes in memory, them its time to shard.
It looks like the logical shard point for http://www.smartmessages.net/
is customer.
I'd use bulk inserts where approprate in applications adding lots of
events and I wouldn't store the text I didn't need. In log messages most
of the text is the same. I'd store just the needed to generate the
message and perhaps the templated text of the log message itsself.
I'd use mutiple 'events' tables and tie them together with temporary
tables and SQL or if efficient enough a view. The first table is written
to and probably innodb or something else with row level locking and
foreign keys. The other tables would be 1:N MyISAM compressed read only
tables (which do support indexes) covered by a merge table. The copy
from the innodb to new MyISAM tables would be automated as would the
modification of the merge table. At some point old events could be
removed by removing the table from the merge table and simply dropping it.
Here;s a rough table stucture. The indexes in events tables would be
TargetId. But problably TargetId+EventDate probably eventId+event date
as you found more uses/added paging.
User/Actor -------> CurrentEvents Innodb<------- EventType
TargetID EventId EventTypeId
Username TargetId LogMesssageText
(optional)
TargetEventType
EventData
EventDate
user/actor-------> ArchivedEventsNNN (MyISAM compressed) <---EventType
EventId
TargetId
TargetEventType
EventData
EventDate
User/Actor -------> MergedHistoricalEvents <--- EventType
***Merge of ArchivedEventsN to M
For maximum speed or If you need to use the log messages in numerous
languages add a tiny template parser to replace EventData into the
logMessageText at high speed using a native function . lots of
application level things can build the messages from string + data of
course.
http://dev.mysql.com/doc/refman/5.1/en/adding-native-functio n.html
Just my brainstorm and untried so I'd appreciate other folks thoughts on
the suggestion but it may be of some use.
Nigel
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: Best approach for DB-based event logging?
am 03.07.2009 12:33:38 von Johan De Meersman
--00504502d07c7ec06e046dcab069
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
On Fri, Jul 3, 2009 at 12:09 PM, Marcus Bointon
wrote:
> On 3 Jul 2009, at 09:42, Johan De Meersman wrote:
>
> To be honest, this sounds like more of a filesystem thing, given that you
>> only ever need to select the full set of an individual user. Just build up
>> an FS structure with one file per user.
>>
>
>
> You really think so? Even though I'll need to initially create about 64k
> folders (keying off a user-id related hash) in order to keep files-per-dir
> down to a sensible amount? Its probably about now that I start wanting a
> reiserFS partition...
Given that you mostly do append-only, and when you read, you don't need
complex subsets, but only the full set for a given user, yes. No need to
bother with the overhead of a database if you're not going to be using any
of the benefits :-)
Adding data won't be slower in a file than in a database - it may even be
faster, as there's no query parsing and data validation.
Reading a single file vs filtering a huge table to get all the records for a
single user, on the other hand...
--
Celsius is based on water temperature.
Fahrenheit is based on alcohol temperature.
Ergo, Fahrenheit is better than Celsius. QED.
--00504502d07c7ec06e046dcab069--
Re: Best approach for DB-based event logging?
am 03.07.2009 12:39:02 von nigel wood
nigel wood wrote:
>
> Here's a rough table stucture. The indexes in events tables would be
> TargetId. But problably TargetId+EventDate probably eventId+event date
> as you found more uses/added paging.
>
Well that didn't format very well :-( The tables structures are:
User/Actor
=======
TargetId
Username
EventType
======
EventTypeId
LogMesssageText (optional)
CurrentEvents,ArchivedEventsNN,MergedEvents
=============================
EventId
TargetId
EventTypeId
EventData
EventDate
HTH
Nigel
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
RE: Best approach for DB-based event logging?
am 03.07.2009 16:18:00 von Martin Gainty
--_b31a7e56-dc23-4d9a-a031-81090aea635e_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Nigel/Marcus/Johann
mg>hopefully quick answer
>=20
> nigel wood wrote:
> >
> > Here's a rough table stucture. The indexes in events tables would be =
=20
> > TargetId.
mg>there would be need to be a 1:1 correspondence between
mg>TargetId index and Username..all of your other tables would need to know
mg>the TargetId which would have to be somehow lookedup
mg>best to stick with a non-unique index on Username which everyone can div=
ine from firstName and lastName
But problably TargetId+EventDate probably eventId+event date=20
mg>i would suggest switching this to a comibination og Username + EventDate
> > as you found more uses/added paging.
> >
> Well that didn't format very well :-( The tables structures are:
>=20
> User/Actor
> =======3D
> TargetId
> Username
mg>pretty light structure ..what about contact and or demographic informati=
on for the user?
>
mg>this table maybe unnecessary as you can store the text directly into the=
table=20
> EventType
> ======
> EventTypeId
> LogMesssageText (optional)
>=20
> CurrentEvents=2CArchivedEventsNN=2CMergedEvents
> ==================== =====
=====3D
> EventId
> TargetId
mg>i would substitute in Username for TargetId
> EventTypeId
mg>diskspace is cheap and the ability to store a long message text is cheap
mg>usually handled by appserver thru 'locale specific' resource properties
> EventData
> EventDate
>=20
>=20
> HTH
>=20
> Nigel
>=20
> --=20
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dmgainty@hotmail.c=
om
>=20
mg>bedankt=2C
mg>Martin
____________________________________________________________ _____
Windows Live=99 SkyDrive=99: Get 25 GB of free online storage.
http://windowslive.com/online/skydrive?ocid=3DTXT_TAGLM_WL_S D_25GB_062009=
--_b31a7e56-dc23-4d9a-a031-81090aea635e_--