Database suggestion

am 05.12.2007 12:13:59 von Mikhail Kovalev

Hi.

I work with recursive array trees of 10-20 levels in PHP. So far I
have been using serialize() to store the arrays, generating files 5+
MB large, which take 10-15 seconds to unserialize and about tenfold of
RAM! I've been adviced to use MySQL (relational db's), but are there
any other options beside that and var_export/include? Something that
works in a similar way as MySQL when adding a new element without
loading the whole database itself...

Thanks!

Re: Database suggestion

am 05.12.2007 12:42:15 von Toby A Inkster

Mikhail Kovalev wrote:

> I've been adviced to use MySQL (relational db's), but are there any
> other options beside that and var_export/include? Something that works
> in a similar way as MySQL when adding a new element without loading the
> whole database itself...

SQLite, PostgreSQL, Oracle, Microsoft SQL Server, ...

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 18:31.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/11/28/itunes-sharing/

Re: Database suggestion

am 05.12.2007 13:19:58 von Mikhail Kovalev

I mean, I don't mind using MySQL in itself, it's just that I don't
want use relational db's.
I don't know if it is possible to predefine the structures I'm using
in MySQL, because the number of levels varies etc.
Here is an example recording a single sequence of 6 elements:

How do I put this in a db?

array (
0 =>
array (
0 => 2,
1 =>
array (
1 =>
array (
0 => 1,
1 =>
array (
),
),
2 =>
array (
0 => 1,
1 =>
array (
3 =>
array (
0 => 1,
1 =>
array (
4 =>
array (
0 => 1,
1 =>
array (
3 =>
array (
0 => 1,
1 =>
array (
1 =>
array (
0 => 1,
1 =>
array (
),
),
),
),
),
),
),
),
),
),
),
),
1 =>
array (
0 => 2,
1 =>
array (
),
),
2 =>
array (
0 => 1,
1 =>
array (
3 =>
array (
0 => 1,
1 =>
array (
4 =>
array (
0 => 1,
1 =>
array (
3 =>
array (
0 => 1,
1 =>
array (
1 =>
array (
0 => 1,
1 =>
array (
),
),
),
),
),
),
),
),
),
),
3 =>
array (
0 => 2,
1 =>
array (
4 =>
array (
0 => 1,
1 =>
array (
3 =>
array (
0 => 1,
1 =>
array (
1 =>
array (
0 => 1,
1 =>
array (
),
),
),
),
),
),
1 =>
array (
0 => 1,
1 =>
array (
),
),
),
),
4 =>
array (
0 => 1,
1 =>
array (
3 =>
array (
0 => 1,
1 =>
array (
1 =>
array (
0 => 1,
1 =>
array (
),
),
),
),
),
),
)

On 5 Des, 12:42, Toby A Inkster
wrote:
> Mikhail Kovalev wrote:
> > I've been adviced to use MySQL (relational db's), but are there any
> > other options beside that and var_export/include? Something that works
> > in a similar way as MySQL when adding a new element without loading the
> > whole database itself...
>
> SQLite, PostgreSQL, Oracle, Microsoft SQL Server, ...
>
> --
> Toby A Inkster BSc (Hons) ARCS
> [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
> [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 18:31.]
>
> Sharing Music with Apple iTunes
> http://tobyinkster.co.uk/blog/2007/11/28/itunes-sharing/

Re: Database suggestion

am 05.12.2007 13:28:15 von Jerry Stuckle

Mikhail Kovalev wrote:
> Hi.
>
> I work with recursive array trees of 10-20 levels in PHP. So far I
> have been using serialize() to store the arrays, generating files 5+
> MB large, which take 10-15 seconds to unserialize and about tenfold of
> RAM! I've been adviced to use MySQL (relational db's), but are there
> any other options beside that and var_export/include? Something that
> works in a similar way as MySQL when adding a new element without
> loading the whole database itself...
>
> Thanks!
>

What do you mean by "without loading the whole database itself"?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Database suggestion

am 05.12.2007 13:45:18 von Mikhail Kovalev

I unserialize an 8MB file, which contains the db array, and put the
new element there (then serialize the whole thing again and store)

On 5 Des, 13:28, Jerry Stuckle wrote:
> Mikhail Kovalev wrote:
> > Hi.
>
> > I work with recursive array trees of 10-20 levels in PHP. So far I
> > have been using serialize() to store the arrays, generating files 5+
> > MB large, which take 10-15 seconds to unserialize and about tenfold of
> > RAM! I've been adviced to use MySQL (relational db's), but are there
> > any other options beside that and var_export/include? Something that
> > works in a similar way as MySQL when adding a new element without
> > loading the whole database itself...
>
> > Thanks!
>
> What do you mean by "without loading the whole database itself"?
>
> --
> ==================
> Remove the "x" from my email address
> Jerry Stuckle
> JDS Computer Training Corp.
> jstuck...@attglobal.net
> ==================

Re: Database suggestion

am 05.12.2007 14:02:14 von Courtney

Mikhail Kovalev wrote:
> I unserialize an 8MB file, which contains the db array, and put the
> new element there (then serialize the whole thing again and store)
>

I am far from understanding what you are actually doing, but it sounds
like you are willing to trade disk usage for speed and memory usage.

Mysql - and any dB program - allows you to insert records randomly.

It maintains its own internal indices..so sorting and ordering is never
done at the raw data level, but by a data abstraction when you remove
all or part of the data for use, or alteration.

To walk an N level tree will of course take N accesses to the mysqld,
but I hardly think that would take anything like the 10-20 seconds you
are currently experiencing.

Mysql is, actually, once installed, a very usable and simple interface
to program PHP against.

Any other alternative would likely as not involve you in re-inventing
the database wheel,. and one has to say, why bother?

I would strongly advise biting the bullet and installing MYSQL, and then
coming back to get help on how to implement your curious requirements
against it.

> On 5 Des, 13:28, Jerry Stuckle wrote:
>> Mikhail Kovalev wrote:
>>> Hi.
>>> I work with recursive array trees of 10-20 levels in PHP. So far I
>>> have been using serialize() to store the arrays, generating files 5+
>>> MB large, which take 10-15 seconds to unserialize and about tenfold of
>>> RAM! I've been adviced to use MySQL (relational db's), but are there
>>> any other options beside that and var_export/include? Something that
>>> works in a similar way as MySQL when adding a new element without
>>> loading the whole database itself...
>>> Thanks!
>> What do you mean by "without loading the whole database itself"?
>>
>> --
>> ==================
>> Remove the "x" from my email address
>> Jerry Stuckle
>> JDS Computer Training Corp.
>> jstuck...@attglobal.net
>> ==================
>

Re: Database suggestion

am 05.12.2007 14:11:04 von colin.mckinnon

On 5 Dec, 11:13, Mikhail Kovalev wrote:
> Hi.
>
> I work with recursive array trees of 10-20 levels in PHP. So far I
> have been using serialize() to store the arrays, generating files 5+
> MB large, which take 10-15 seconds to unserialize and about tenfold of
> RAM! I've been adviced to use MySQL (relational db's), but are there
> any other options beside that and var_export/include? Something that
> works in a similar way as MySQL when adding a new element without
> loading the whole database itself...
>
> Thanks!

Any relational DB can model a hierarchical data structure - at the
simplest level:

node_id unique identifier, not null
parent_node null
payload whatever

But getting the threaded structure is a bit tricky -

SELECT parent.node_id as Parent_node_id, parent.payload as
parent_payload,
child.node_id as child_node_id, child.payload as child_payload
FROM mytable parent, mytable child
WHERE child.parent_node=parent.node_id
AND parent.node_id=$start_here

... i.e. by explicitly declaring aliases for each level in the tree you
can get the sub-nodes, but when you don't know how deep the tree is,
its not possible to return the whole structure in a single query using
standard SQL. Oracle allows you to compose queries returning
representations of hierarchical data (have a google for 'connect by' &
'start with') but thats rather non-standard.

Another solution is to run a single query one the database to return
the entire unconnected tree and link it in PHP...
(for simplicity I'm omitting the payload)..

$src=mysql_query('SELECT * FROM my table');
while ($row[++$x]=mysql_fetch_assoc($src)) {
if (!$row[$x]['parent_id']) {
$final[$row[$x][$node_id]]=array();
}
}
add_children($final, $row);

function add_children(&$final, &$row)
{
foreach($final as $node_id=>$contents) {
if (! is_array($final[$node_id]) $final[$node_id]=array();
foreach($row as $node) {
if ($node['parent_id']==$node_id) {
$final[$node_id][$node['node_id']]=$node;
add_children($final[$node_id][$node['node_id']], $row);
}
}
}
}

(YMMV - not tested)
Although this is far from intuitive and has a very high order of
execution (O^NM?).

The best solution is to look at your system and see if you really need
the entire tree in memory, as a tree at run time - it might be simpler
to run a few queries to descend the tree to the relevant node, or if
you want to return all the nodes under a particular parent then
maintain a corresponding data structure.

You might also consider using the filesystem as the database - since
it is intrinsically an indexed tree structure - so....

$payload=file_get_contents(TREE_ROOT . "/tangibles/organic/vegetable/
fruit/apples/coxs_orange_pippins.txt");

C.

Re: Database suggestion

am 05.12.2007 14:23:24 von Mikhail Kovalev

On 5 Des, 14:02, The Natural Philosopher wrote:
> Mikhail Kovalev wrote:
> > I unserialize an 8MB file, which contains the db array, and put the
> > new element there (then serialize the whole thing again and store)
>
> I am far from understanding what you are actually doing, but it sounds
> like you are willing to trade disk usage for speed and memory usage.

Yep, you nailed it right on through, my friend

My question is, will I be able to access and store values by
addresses?
Such as here:
$database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example

Each element contains a COUNT value and an array of childs (NEXT),
which can be non-empty, empty or non-existent (when i check if it
exists)
Looks like this at top level:

Array(
433 => Array(
COUNT => 2121,
NEXT => Array( //
23 => Array( count&next etc ...)
433 => ...
etc...
)
),
1012 => Array( count & next.....)
.....
)

Re: Database suggestion

am 05.12.2007 14:52:56 von Marijn

> My question is, will I be able to access and store values by
> addresses?
> Such as here:
> $database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example
>
> Each element contains a COUNT value and an array of childs (NEXT),
> which can be non-empty, empty or non-existent (when i check if it
> exists)

Maybe you should tell us what it is exactly that you are storing,
cause it seems to me that you are over complicating things. What is it
you are counting and why do you currently group it in multi-level
arrays?

Marijn

Re: Database suggestion

am 05.12.2007 14:56:55 von Mikhail Kovalev

On 5 Des, 14:11, "C. (http://symcbean.blogspot.com/)"
wrote:
> On 5 Dec, 11:13, Mikhail Kovalev wrote:
>
> > Hi.
>
> > I work with recursive array trees of 10-20 levels in PHP. So far I
> > have been using serialize() to store the arrays, generating files 5+
> > MB large, which take 10-15 seconds to unserialize and about tenfold of
> > RAM! I've been adviced to use MySQL (relational db's), but are there
> > any other options beside that and var_export/include? Something that
> > works in a similar way as MySQL when adding a new element without
> > loading the whole database itself...
>
> > Thanks!
>
> Any relational DB can model a hierarchical data structure - at the
> simplest level:
>
> node_id unique identifier, not null
> parent_node null
> payload whatever
>
> But getting the threaded structure is a bit tricky -
>
> SELECT parent.node_id as Parent_node_id, parent.payload as
> parent_payload,
> child.node_id as child_node_id, child.payload as child_payload
> FROM mytable parent, mytable child
> WHERE child.parent_node=parent.node_id
> AND parent.node_id=$start_here
>
> .. i.e. by explicitly declaring aliases for each level in the tree you
> can get the sub-nodes, but when you don't know how deep the tree is,
> its not possible to return the whole structure in a single query using
> standard SQL. Oracle allows you to compose queries returning
> representations of hierarchical data (have a google for 'connect by' &
> 'start with') but thats rather non-standard.

I don't know how deep the tree is, but I don't need to return the
whole structure in a single query. Alas, I don't know the nature of
MySQL

At this moment everything I do with my structure happens by calling
addresses
See my reply to The Natural Philosopher.
Here's an example with one node at top level and it's childs:

112 (10)
225 (3)
930 (1)
11 (5)

$array[112][COUNT] = 10;
$array[112][NEXT][225][COUNT] = 3;
$array[112][NEXT][225][NEXT][930][COUNT] = 1;
$array[112][NEXT][225][NEXT][930][NEXT] = array();
$array[112][NEXT][11][COUNT] = 5;
$array[112][NEXT][11][NEXT] = array();

An address is an array, f ex array(112, 225, 930)

I then use the following mechanism:

// retrieves a node with its two values as an array(COUNT, NEXT)
// NEXT contain further childs, but i dont use them
// array_1 is the whole structure, and array_2 is the address
function function_7($array_1, $array_2) { return eval('return
$array_1'.function_8($array_2).';'); }

// array(112, 225, 930) becomes "[112][NEXT][225][NEXT][930]"
function function_8($array_1) { return implode('[NEXT]',
array_map('function_9', $array_1)); }

function function_9($integer_1) { return '['.$integer_1.']'; }

// saves a value array_3 to an address array_2 in structure array_1
function function_10($array_1, $array_2, $array_3)
{
eval('$array_1'.function_8($array_2).' = $array_3;');
return $array_1;
}

Re: Database suggestion

am 05.12.2007 15:03:07 von Mikhail Kovalev

On 5 Des, 14:52, Marijn wrote:
> > My question is, will I be able to access and store values by
> > addresses?
> > Such as here:
> > $database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example
>
> > Each element contains a COUNT value and an array of childs (NEXT),
> > which can be non-empty, empty or non-existent (when i check if it
> > exists)
>
> Maybe you should tell us what it is exactly that you are storing,
> cause it seems to me that you are over complicating things. What is it
> you are counting and why do you currently group it in multi-level
> arrays?
>
> Marijn

I'm storing some statistics on Markov chains of characters in words

Re: Database suggestion

am 05.12.2007 16:21:07 von Captain Paralytic

On 5 Dec, 13:56, Mikhail Kovalev wrote:
> Alas, I don't know the nature of
> MySQL
This has nothing to do with MySQL per se and everything to do with
understanding data storage models in general.

Here is some good reading on the subject.
http://del.icio.us/Captain_Paralytic/hierarchical

Re: Database suggestion

am 05.12.2007 17:22:46 von Courtney

Mikhail Kovalev wrote:
> On 5 Des, 14:02, The Natural Philosopher wrote:
>> Mikhail Kovalev wrote:
>>> I unserialize an 8MB file, which contains the db array, and put the
>>> new element there (then serialize the whole thing again and store)
>> I am far from understanding what you are actually doing, but it sounds
>> like you are willing to trade disk usage for speed and memory usage.
>
> Yep, you nailed it right on through, my friend
>
> My question is, will I be able to access and store values by
> addresses?

essentially yes.

T

> Such as here:
> $database[33][NEXT][1221][NEXT][12][COUNT] += 1; for example
>

That would translate into a very large Mysql conditional statement, :
you might find it easier to split that into a series of separate
recursive calls into the SQL data base.. I have implemented something
like that..I have a data base of products that have a parent category,
and categories of categories..and so on.

I simply run a recursive subroutine that executes a single SQL statement
to walk all the tree branches.

All a database is, is an array of structures..called a table. Plus a set
of routines for accessing them.

Each structure is a row in the table, and can have a unique ID. This is
indexed by a hash by default.

Elements within the structure are the fields of the record. These can
also have indices associated with them.

To retrieve or update an individual record whose ID is known is
blindingly fast, but it sounds like you would need to search through
fields first to identify the one you want. That probably would mean
several recursive calls to the database and some attention to indexing
to get the best performance.

Re: Database suggestion

am 05.12.2007 17:24:42 von Courtney

Captain Paralytic wrote:
> On 5 Dec, 13:56, Mikhail Kovalev wrote:
>> Alas, I don't know the nature of
>> MySQL
> This has nothing to do with MySQL per se and everything to do with
> understanding data storage models in general.
>

Indeed,. but a background explanation of SQL does not come amiss.

Sounds like they OP is a scientist/mathematician, more than a programmer.

So likely to know about different things.

> Here is some good reading on the subject.
> http://del.icio.us/Captain_Paralytic/hierarchical

Re: Database suggestion

am 05.12.2007 17:26:07 von Toby A Inkster

Mikhail Kovalev wrote:

> $array[112][COUNT] = 10;
> $array[112][NEXT][225][COUNT] = 3;
> $array[112][NEXT][225][NEXT][930][COUNT] = 1;
> $array[112][NEXT][225][NEXT][930][NEXT] = array();
> $array[112][NEXT][11][COUNT] = 5;
> $array[112][NEXT][11][NEXT] = array();
>
> An address is an array, f ex array(112, 225, 930)

If $array[112][NEXT][11] exists, does that imply that
$array[113][NEXT][11] cannot exist?

i.e. Can 11 have only one "parent" or multiple?

If it can only have one parent, then I'd suggest using the structure
suggested by "C". If '11' can have more than one parent, then a classic
tree structure is out of the window, so I'd suggest a simple string key,
like this:

CREATE TABLE nodes
(
node_address varchar(1024) NOT NULL PRIMARY KEY,
node_count integer
);

To add counts to the database, you'd use:

INSERT INTO nodes VALUES ('112', 10);
INSERT INTO nodes VALUES ('112/225', 3);
INSERT INTO nodes VALUES ('112/225/930', 1);
INSERT INTO nodes VALUES ('112/11', 5);

To retrieve the count for key array(112, 225, 930), you'd use:

SELECT node_count
FROM nodes
WHERE node_address='112/225/930';

To find a list of descendant nodes of '112', you'd use:

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%';

Or to find just child nodes (i.e. no grandchildren, etc):

SELECT node_address
FROM nodes
WHERE node_address LIKE '112/%'
AND NOT node_address LIKE '112/%/%';

By the way -- do yourself a favour and stop naming your functions things
like function_9() and function_10(), and variables like $array_3. If you
have to come back to this code after 2 or 3 months away from it, you will
have forgotten what they all do. Give them memorable names.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 22:57.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/11/28/itunes-sharing/

Re: Database suggestion

am 05.12.2007 18:26:27 von Norman Peelman

Mikhail Kovalev wrote:
> On 5 Des, 14:11, "C. (http://symcbean.blogspot.com/)"
> wrote:
>> On 5 Dec, 11:13, Mikhail Kovalev wrote:
>>
>>> Hi.
>>> I work with recursive array trees of 10-20 levels in PHP. So far I
>>> have been using serialize() to store the arrays, generating files 5+
>>> MB large, which take 10-15 seconds to unserialize and about tenfold of
>>> RAM! I've been adviced to use MySQL (relational db's), but are there
>>> any other options beside that and var_export/include? Something that
>>> works in a similar way as MySQL when adding a new element without
>>> loading the whole database itself...
>>> Thanks!
>> Any relational DB can model a hierarchical data structure - at the
>> simplest level:
>>
>> node_id unique identifier, not null
>> parent_node null
>> payload whatever
>>
>> But getting the threaded structure is a bit tricky -
>>
>> SELECT parent.node_id as Parent_node_id, parent.payload as
>> parent_payload,
>> child.node_id as child_node_id, child.payload as child_payload
>> FROM mytable parent, mytable child
>> WHERE child.parent_node=parent.node_id
>> AND parent.node_id=$start_here
>>
>> .. i.e. by explicitly declaring aliases for each level in the tree you
>> can get the sub-nodes, but when you don't know how deep the tree is,
>> its not possible to return the whole structure in a single query using
>> standard SQL. Oracle allows you to compose queries returning
>> representations of hierarchical data (have a google for 'connect by' &
>> 'start with') but thats rather non-standard.
>
> I don't know how deep the tree is, but I don't need to return the
> whole structure in a single query. Alas, I don't know the nature of
> MySQL
>
> At this moment everything I do with my structure happens by calling
> addresses
> See my reply to The Natural Philosopher.
> Here's an example with one node at top level and it's childs:
>
> 112 (10)
> 225 (3)
> 930 (1)
> 11 (5)
>
> $array[112][COUNT] = 10;
> $array[112][NEXT][225][COUNT] = 3;
> $array[112][NEXT][225][NEXT][930][COUNT] = 1;
> $array[112][NEXT][225][NEXT][930][NEXT] = array();
> $array[112][NEXT][11][COUNT] = 5;
> $array[112][NEXT][11][NEXT] = array();
>
> An address is an array, f ex array(112, 225, 930)
>
> I then use the following mechanism:
>
> // retrieves a node with its two values as an array(COUNT, NEXT)
> // NEXT contain further childs, but i dont use them
> // array_1 is the whole structure, and array_2 is the address
> function function_7($array_1, $array_2) { return eval('retur
> $array_1'.function_8($array_2).';'); }
>
> // array(112, 225, 930) becomes "[112][NEXT][225][NEXT][930]"
> function function_8($array_1) { return implode('[NEXT]',
> array_map('function_9', $array_1)); }
>
> function function_9($integer_1) { return '['.$integer_1.']'; }
>
> // saves a value array_3 to an address array_2 in structure array_1
> function function_10($array_1, $array_2, $array_3)
> {
> eval('$array_1'.function_8($array_2).' = $array_3;');
> return $array_1;
> }

Tried to look up on Markov chains... are any of your addresses
duplicated? ie: can you have an address like 112, 225, 112 ?

Norm

Re: Database suggestion

am 05.12.2007 18:42:00 von Norman Peelman

Toby A Inkster wrote:
> Mikhail Kovalev wrote:
>
>> $array[112][COUNT] = 10;
>> $array[112][NEXT][225][COUNT] = 3;
>> $array[112][NEXT][225][NEXT][930][COUNT] = 1;
>> $array[112][NEXT][225][NEXT][930][NEXT] = array();
>> $array[112][NEXT][11][COUNT] = 5;
>> $array[112][NEXT][11][NEXT] = array();
>>
>> An address is an array, f ex array(112, 225, 930)
>
> If $array[112][NEXT][11] exists, does that imply that
> $array[113][NEXT][11] cannot exist?
>
> i.e. Can 11 have only one "parent" or multiple?
>
> If it can only have one parent, then I'd suggest using the structure
> suggested by "C". If '11' can have more than one parent, then a classic
> tree structure is out of the window, so I'd suggest a simple string key,
> like this:
>
> CREATE TABLE nodes
> (
> node_address varchar(1024) NOT NULL PRIMARY KEY,
> node_count integer
> );
>
> To add counts to the database, you'd use:
>
> INSERT INTO nodes VALUES ('112', 10);
> INSERT INTO nodes VALUES ('112/225', 3);
> INSERT INTO nodes VALUES ('112/225/930', 1);
> INSERT INTO nodes VALUES ('112/11', 5);
>
> To retrieve the count for key array(112, 225, 930), you'd use:
>
> SELECT node_count
> FROM nodes
> WHERE node_address='112/225/930';
>
> To find a list of descendant nodes of '112', you'd use:
>
> SELECT node_address
> FROM nodes
> WHERE node_address LIKE '112/%';
>
> Or to find just child nodes (i.e. no grandchildren, etc):
>
> SELECT node_address
> FROM nodes
> WHERE node_address LIKE '112/%'
> AND NOT node_address LIKE '112/%/%';
>
> By the way -- do yourself a favour and stop naming your functions things
> like function_9() and function_10(), and variables like $array_3. If you
> have to come back to this code after 2 or 3 months away from it, you will
> have forgotten what they all do. Give them memorable names.
>

INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
$node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1

....will INSERT new entries and UPDATE existing entries in one swoop.

Norm

Re: Database suggestion

am 05.12.2007 19:09:28 von Mikhail Kovalev

On 5 Des, 18:26, Norman Peelman wrote:
> Mikhail Kovalev wrote:
> > On 5 Des, 14:11, "C. (http://symcbean.blogspot.com/)"
> > wrote:
> >> On 5 Dec, 11:13, Mikhail Kovalev wrote:
>
> >>> Hi.
> >>> I work with recursive array trees of 10-20 levels in PHP. So far I
> >>> have been using serialize() to store the arrays, generating files 5+
> >>> MB large, which take 10-15 seconds to unserialize and about tenfold of
> >>> RAM! I've been adviced to use MySQL (relational db's), but are there
> >>> any other options beside that and var_export/include? Something that
> >>> works in a similar way as MySQL when adding a new element without
> >>> loading the whole database itself...
> >>> Thanks!
> >> Any relational DB can model a hierarchical data structure - at the
> >> simplest level:
>
> >> node_id unique identifier, not null
> >> parent_node null
> >> payload whatever
>
> >> But getting the threaded structure is a bit tricky -
>
> >> SELECT parent.node_id as Parent_node_id, parent.payload as
> >> parent_payload,
> >> child.node_id as child_node_id, child.payload as child_payload
> >> FROM mytable parent, mytable child
> >> WHERE child.parent_node=parent.node_id
> >> AND parent.node_id=$start_here
>
> >> .. i.e. by explicitly declaring aliases for each level in the tree you
> >> can get the sub-nodes, but when you don't know how deep the tree is,
> >> its not possible to return the whole structure in a single query using
> >> standard SQL. Oracle allows you to compose queries returning
> >> representations of hierarchical data (have a google for 'connect by' &
> >> 'start with') but thats rather non-standard.
>
> > I don't know how deep the tree is, but I don't need to return the
> > whole structure in a single query. Alas, I don't know the nature of
> > MySQL
>
> > At this moment everything I do with my structure happens by calling
> > addresses
> > See my reply to The Natural Philosopher.
> > Here's an example with one node at top level and it's childs:
>
> > 112 (10)
> > 225 (3)
> > 930 (1)
> > 11 (5)
>
> > $array[112][COUNT] = 10;
> > $array[112][NEXT][225][COUNT] = 3;
> > $array[112][NEXT][225][NEXT][930][COUNT] = 1;
> > $array[112][NEXT][225][NEXT][930][NEXT] = array();
> > $array[112][NEXT][11][COUNT] = 5;
> > $array[112][NEXT][11][NEXT] = array();
>
> > An address is an array, f ex array(112, 225, 930)
>
> > I then use the following mechanism:
>
> > // retrieves a node with its two values as an array(COUNT, NEXT)
> > // NEXT contain further childs, but i dont use them
> > // array_1 is the whole structure, and array_2 is the address
> > function function_7($array_1, $array_2) { return eval('retur
> > $array_1'.function_8($array_2).';'); }
>
> > // array(112, 225, 930) becomes "[112][NEXT][225][NEXT][930]"
> > function function_8($array_1) { return implode('[NEXT]',
> > array_map('function_9', $array_1)); }
>
> > function function_9($integer_1) { return '['.$integer_1.']'; }
>
> > // saves a value array_3 to an address array_2 in structure array_1
> > function function_10($array_1, $array_2, $array_3)
> > {
> > eval('$array_1'.function_8($array_2).' = $array_3;');
> > return $array_1;
> > }
>
> Tried to look up on Markov chains... are any of your addresses
> duplicated? ie: can you have an address like 112, 225, 112 ?
>
> Norm

Yes. Any id can be child or parent of any other id or itself and any
way around.
I'm only interested in how many times a given sequence occurs.
For example, if I was to record this sequence (112, 225, 930) (the
example above doesn't show it), it would look like this:

Split by length from 1 to the size of the sequence itself (3)
112
225
930
112, 225
225, 930
112, 225, 930

I use those as addresses to add count values to make this structure.

112 (1)
225 (1)
930 (1)
225 (1)
930 (1)
930 (1)

If I now add another sequence 700, 112, 112, 225, the table changes
to:

112 (3)
225 (2)
930 (1)
112 (1)
225 (1)
225 (2)
930 (1)
930 (1)
700 (1)
112 (1)
112 (1)
225 (1)

So from this I know that (112) occurs 3 times (in both sequences that
have been added), (112, 225) occurs 2 times, and (700, 112, 112)
occurs 1 times etc.

I have another file which translates id's to other variables.

Re: Database suggestion

am 05.12.2007 19:16:32 von Mikhail Kovalev

On 5 Des, 17:26, Toby A Inkster
wrote:
> Mikhail Kovalev wrote:
> > $array[112][COUNT] = 10;
> > $array[112][NEXT][225][COUNT] = 3;
> > $array[112][NEXT][225][NEXT][930][COUNT] = 1;
> > $array[112][NEXT][225][NEXT][930][NEXT] = array();
> > $array[112][NEXT][11][COUNT] = 5;
> > $array[112][NEXT][11][NEXT] = array();
>
> > An address is an array, f ex array(112, 225, 930)
>
> If $array[112][NEXT][11] exists, does that imply that
> $array[113][NEXT][11] cannot exist?
>
> i.e. Can 11 have only one "parent" or multiple?
>
> If it can only have one parent, then I'd suggest using the structure
> suggested by "C". If '11' can have more than one parent, then a classic
> tree structure is out of the window, so I'd suggest a simple string key,
> like this:
>
> CREATE TABLE nodes
> (
> node_address varchar(1024) NOT NULL PRIMARY KEY,
> node_count integer
> );
>
> To add counts to the database, you'd use:
>
> INSERT INTO nodes VALUES ('112', 10);
> INSERT INTO nodes VALUES ('112/225', 3);
> INSERT INTO nodes VALUES ('112/225/930', 1);
> INSERT INTO nodes VALUES ('112/11', 5);
>
> To retrieve the count for key array(112, 225, 930), you'd use:
>
> SELECT node_count
> FROM nodes
> WHERE node_address='112/225/930';
>
> To find a list of descendant nodes of '112', you'd use:
>
> SELECT node_address
> FROM nodes
> WHERE node_address LIKE '112/%';
>
> Or to find just child nodes (i.e. no grandchildren, etc):
>
> SELECT node_address
> FROM nodes
> WHERE node_address LIKE '112/%'
> AND NOT node_address LIKE '112/%/%';
>
> By the way -- do yourself a favour and stop naming your functions things
> like function_9() and function_10(), and variables like $array_3. If you
> have to come back to this code after 2 or 3 months away from it, you will
> have forgotten what they all do. Give them memorable names.
>
> --
> Toby A Inkster BSc (Hons) ARCS
> [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
> [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 10 days, 22:57.]
>
> Sharing Music with Apple iTunes
> http://tobyinkster.co.uk/blog/2007/11/28/itunes-sharing/

Testing it! Forever grateful for the examples!=)

Also, see my reply to Norman below.

Re: Database suggestion

am 05.12.2007 20:10:54 von Mikhail Kovalev

On 5 Des, 18:42, Norman Peelman wrote:
> Toby A Inkster wrote:
> > Mikhail Kovalev wrote:
>
> >> $array[112][COUNT] = 10;
> >> $array[112][NEXT][225][COUNT] = 3;
> >> $array[112][NEXT][225][NEXT][930][COUNT] = 1;
> >> $array[112][NEXT][225][NEXT][930][NEXT] = array();
> >> $array[112][NEXT][11][COUNT] = 5;
> >> $array[112][NEXT][11][NEXT] = array();
>
> >> An address is an array, f ex array(112, 225, 930)
>
> > If $array[112][NEXT][11] exists, does that imply that
> > $array[113][NEXT][11] cannot exist?
>
> > i.e. Can 11 have only one "parent" or multiple?
>
> > If it can only have one parent, then I'd suggest using the structure
> > suggested by "C". If '11' can have more than one parent, then a classic
> > tree structure is out of the window, so I'd suggest a simple string key,
> > like this:
>
> > CREATE TABLE nodes
> > (
> > node_address varchar(1024) NOT NULL PRIMARY KEY,
> > node_count integer
> > );
>
> > To add counts to the database, you'd use:
>
> > INSERT INTO nodes VALUES ('112', 10);
> > INSERT INTO nodes VALUES ('112/225', 3);
> > INSERT INTO nodes VALUES ('112/225/930', 1);
> > INSERT INTO nodes VALUES ('112/11', 5);
>
> > To retrieve the count for key array(112, 225, 930), you'd use:
>
> > SELECT node_count
> > FROM nodes
> > WHERE node_address='112/225/930';
>
> > To find a list of descendant nodes of '112', you'd use:
>
> > SELECT node_address
> > FROM nodes
> > WHERE node_address LIKE '112/%';
>
> > Or to find just child nodes (i.e. no grandchildren, etc):
>
> > SELECT node_address
> > FROM nodes
> > WHERE node_address LIKE '112/%'
> > AND NOT node_address LIKE '112/%/%';
>
> > By the way -- do yourself a favour and stop naming your functions things
> > like function_9() and function_10(), and variables like $array_3. If you
> > have to come back to this code after 2 or 3 months away from it, you will
> > have forgotten what they all do. Give them memorable names.
>
> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>
> ...will INSERT new entries and UPDATE existing entries in one swoop.
>
> Norm

Ok, suppose I'm joining to structures which have been created
separately,
from before i have ('112/225/930', 3)
and i want to add ('112/225/930', 2), which also happens to be present
in the second table, only with a different count,
to make ('112/225/930', 5)

From there I want to make it the general case so that when updating an
entry with count 1 I am actually adding ('112/225/930', 1) to the
existing one, if it exists:

INSERT INTO nodes (node_address, node_count) VALUES (,
) ON DUPLICATE KEY UPDATE node_count = node_count +

Is this correct? (Do I have to use <> in VALUES, I'm following an
example which does it?)

Re: Database suggestion

am 05.12.2007 20:41:13 von Good Man

Mikhail Kovalev wrote in news:59d586fd-60f0-4a33-
8c48-247d1e719203@o6g2000hsd.googlegroups.com:

> Hi.
>
> I work with recursive array trees of 10-20 levels in PHP. So far I
> have been using serialize() to store the arrays, generating files 5+
> MB large, which take 10-15 seconds to unserialize and about tenfold of
> RAM! I've been adviced to use MySQL (relational db's), but are there
> any other options beside that and var_export/include? Something that
> works in a similar way as MySQL when adding a new element without
> loading the whole database itself...
>
> Thanks!

Databases are truly helpful in the case.

You might find this article well worth reading:

http://www.sitepoint.com/article/hierarchical-data-database

Re: Database suggestion

am 05.12.2007 20:46:38 von Norman Peelman

Mikhail Kovalev wrote:
> On 5 Des, 18:42, Norman Peelman wrote:

>> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
>> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>>
>> ...will INSERT new entries and UPDATE existing entries in one swoop.
>>
>> Norm
>
> Ok, suppose I'm joining to structures which have been created
> separately,
> from before i have ('112/225/930', 3)
> and i want to add ('112/225/930', 2), which also happens to be present
> in the second table, only with a different count,
> to make ('112/225/930', 5)
>
> From there I want to make it the general case so that when updating an
> entry with count 1 I am actually adding ('112/225/930', 1) to the
> existing one, if it exists:
>
> INSERT INTO nodes (node_address, node_count) VALUES (,
> ) ON DUPLICATE KEY UPDATE node_count = node_count +
>
>
> Is this correct? (Do I have to use <> in VALUES, I'm following an
> example which does it?)

No replace everything '<..>' with your own variable name, that will do it.

$new_count = 5
....ON DUPLICATE KEY UPDATE node_count = $new_count

Not sure what you mean by 'second table'. How many do you have? Though
this was one table...

Norm

Re: Database suggestion

am 05.12.2007 20:52:58 von Norman Peelman

Mikhail Kovalev wrote:
> On 5 Des, 18:42, Norman Peelman wrote:
>> Toby A Inkster wrote:
>>> Mikhail Kovalev wrote:
>>>> $array[112][COUNT] = 10;
>>>> $array[112][NEXT][225][COUNT] = 3;
>>>> $array[112][NEXT][225][NEXT][930][COUNT] = 1;
>>>> $array[112][NEXT][225][NEXT][930][NEXT] = array();
>>>> $array[112][NEXT][11][COUNT] = 5;
>>>> $array[112][NEXT][11][NEXT] = array();
>>>> An address is an array, f ex array(112, 225, 930)
>>> If $array[112][NEXT][11] exists, does that imply that
>>> $array[113][NEXT][11] cannot exist?
>>> i.e. Can 11 have only one "parent" or multiple?
>>> If it can only have one parent, then I'd suggest using the structure
>>> suggested by "C". If '11' can have more than one parent, then a classic
>>> tree structure is out of the window, so I'd suggest a simple string key,
>>> like this:
>>> CREATE TABLE nodes
>>> (
>>> node_address varchar(1024) NOT NULL PRIMARY KEY,
>>> node_count integer
>>> );
>>> To add counts to the database, you'd use:
>>> INSERT INTO nodes VALUES ('112', 10);
>>> INSERT INTO nodes VALUES ('112/225', 3);
>>> INSERT INTO nodes VALUES ('112/225/930', 1);
>>> INSERT INTO nodes VALUES ('112/11', 5);
>>> To retrieve the count for key array(112, 225, 930), you'd use:
>>> SELECT node_count
>>> FROM nodes
>>> WHERE node_address='112/225/930';
>>> To find a list of descendant nodes of '112', you'd use:
>>> SELECT node_address
>>> FROM nodes
>>> WHERE node_address LIKE '112/%';
>>> Or to find just child nodes (i.e. no grandchildren, etc):
>>> SELECT node_address
>>> FROM nodes
>>> WHERE node_address LIKE '112/%'
>>> AND NOT node_address LIKE '112/%/%';
>>> By the way -- do yourself a favour and stop naming your functions things
>>> like function_9() and function_10(), and variables like $array_3. If you
>>> have to come back to this code after 2 or 3 months away from it, you will
>>> have forgotten what they all do. Give them memorable names.
>> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
>> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>>
>> ...will INSERT new entries and UPDATE existing entries in one swoop.
>>
>> Norm
>
> Ok, suppose I'm joining to structures which have been created
> separately,
> from before i have ('112/225/930', 3)
> and i want to add ('112/225/930', 2), which also happens to be present
> in the second table, only with a different count,
> to make ('112/225/930', 5)
>
> From there I want to make it the general case so that when updating an
> entry with count 1 I am actually adding ('112/225/930', 1) to the
> existing one, if it exists:
>
> INSERT INTO nodes (node_address, node_count) VALUES (,
> ) ON DUPLICATE KEY UPDATE node_count = node_count +
>
>
> Is this correct? (Do I have to use <> in VALUES, I'm following an
> example which does it?)

Ooops, in my other reply that should have read:

$your_variable = 2
"...ON DUPLICATE KEY UPDATE new_count = new_count + $your_variable"

....to add an arbitrary amount to new_count. 3 + 2 = 5 for your example.

Norm

Re: Database suggestion

am 05.12.2007 21:21:43 von Mikhail Kovalev

On 5 Des, 20:46, Norman Peelman wrote:
> Mikhail Kovalev wrote:
> > On 5 Des, 18:42, Norman Peelman wrote:
>
>
>
>
>
> >> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
> >> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>
> >> ...will INSERT new entries and UPDATE existing entries in one swoop.
>
> >> Norm
>
> > Ok, suppose I'm joining to structures which have been created
> > separately,
> > from before i have ('112/225/930', 3)
> > and i want to add ('112/225/930', 2), which also happens to be present
> > in the second table, only with a different count,
> > to make ('112/225/930', 5)
>
> > From there I want to make it the general case so that when updating an
> > entry with count 1 I am actually adding ('112/225/930', 1) to the
> > existing one, if it exists:
>
> > INSERT INTO nodes (node_address, node_count) VALUES (,
> > ) ON DUPLICATE KEY UPDATE node_count = node_count +
> >
>
> > Is this correct? (Do I have to use <> in VALUES, I'm following an
> > example which does it?)
>
> No replace everything '<..>' with your own variable name, that will do it.
>
> $new_count = 5
> ...ON DUPLICATE KEY UPDATE node_count = $new_count
>
> Not sure what you mean by 'second table'. How many do you have? Though
> this was one table...
>
> Norm

I have different databases for different types of sequences. Each
sequence is some tousands elements in length and is recorded by chunks
of small series each from 1 to 20 elements long. Sometimes I join
different databases together and analyze each one separately and then
in relation to the joint database.

I have been thinking, if a sequence is for instance 5.000 elements
long, chunks 1-20, and there are no duplicates (absolutely unlikely,
but still) I will end up having a database with around 105.000
entries...
My current solution is probably better because it saves so much
space(?)

An entry like this in the current solution:
112 (310)
225 (20)
930 (6)
700 (1)
7 (1)
812 (1)

Equals this in the new system I am about to implement:

('112', 310)
('112/225', 20)
('112/225/930', 6)
('112/225/930/700', 1)
('112/225/930/700/7', 1)
('112/225/930/812', 1)

I don't know how a flat MySQL database of 100.000-200.000 entries like
these will perform. I'm now also considering a filesystem as the
database, as C has suggested.

Re: Database suggestion

am 05.12.2007 23:03:46 von Norman Peelman

Mikhail Kovalev wrote:
> On 5 Des, 20:46, Norman Peelman wrote:
>> Mikhail Kovalev wrote:
>>> On 5 Des, 18:42, Norman Peelman wrote:
>>
>>
>>
>>
>>>> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
>>>> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>>>> ...will INSERT new entries and UPDATE existing entries in one swoop.
>>>> Norm
>>> Ok, suppose I'm joining to structures which have been created
>>> separately,
>>> from before i have ('112/225/930', 3)
>>> and i want to add ('112/225/930', 2), which also happens to be present
>>> in the second table, only with a different count,
>>> to make ('112/225/930', 5)
>>> From there I want to make it the general case so that when updating an
>>> entry with count 1 I am actually adding ('112/225/930', 1) to the
>>> existing one, if it exists:
>>> INSERT INTO nodes (node_address, node_count) VALUES (,
>>> ) ON DUPLICATE KEY UPDATE node_count = node_count +
>>>
>>> Is this correct? (Do I have to use <> in VALUES, I'm following an
>>> example which does it?)
>> No replace everything '<..>' with your own variable name, that will do it.
>>
>> $new_count = 5
>> ...ON DUPLICATE KEY UPDATE node_count = $new_count
>>
>> Not sure what you mean by 'second table'. How many do you have? Though
>> this was one table...
>>
>> Norm
>
> I have different databases for different types of sequences. Each
> sequence is some tousands elements in length and is recorded by chunks
> of small series each from 1 to 20 elements long. Sometimes I join
> different databases together and analyze each one separately and then
> in relation to the joint database.
>
> I have been thinking, if a sequence is for instance 5.000 elements
> long, chunks 1-20, and there are no duplicates (absolutely unlikely,
> but still) I will end up having a database with around 105.000
> entries...
> My current solution is probably better because it saves so much
> space(?)
>
> An entry like this in the current solution:
> 112 (310)
> 225 (20)
> 930 (6)
> 700 (1)
> 7 (1)
> 812 (1)
>
> Equals this in the new system I am about to implement:
>
> ('112', 310)
> ('112/225', 20)
> ('112/225/930', 6)
> ('112/225/930/700', 1)
> ('112/225/930/700/7', 1)
> ('112/225/930/812', 1)
>
> I don't know how a flat MySQL database of 100.000-200.000 entries like
> these will perform. I'm now also considering a filesystem as the
> database, as C has suggested.

I would imagine it would perform quite well as each one of the
addresses is a key (index) into the database. MySQL won't have to
examine all the entries that start with 112 to find where 930 or 700 or
whatever is. It knows exactly where 112/225/930/812/???/???/??? is. The
INSERT UPDATE will take less than a second on average most likely.

All this being said, you can do the same thing with your current array
in memory style using the same syntax:

$your_array['112/225/930'] = 6
$your_array['112/225/930/700'] = 1

the only difference is you still have to load it into memory (an array)
to use it. If you go with MySQL and still need an array in memory at
some point, a simple:

$dataset = array();
$result = mysql_query("SELECT * FROM nodes");
while($row = mysql_fetch_assoc($result))
{
$dataset[$row['node_address']] = $row['node_count'];
}

will get you your entire dataset into an in memory array in which you
would access just like sql:

$dataset['112'] = 310
$dataset['112/225'] = 20
$dataset['112/225/930'] = 6
$dataset['112/225/930/700'] = 1
$dataset['112/225/930/700/7'] = 1

Norm

Re: Database suggestion

am 05.12.2007 23:39:27 von 23s

"Mikhail Kovalev" wrote in message
news:82298e14-74e0-4d52-bc32-805b5c562ee4@d27g2000prf.google groups.com...
>I mean, I don't mind using MySQL in itself, it's just that I don't
> want use relational db's.
> I don't know if it is possible to predefine the structures I'm using
> in MySQL, because the number of levels varies etc.
> Here is an example recording a single sequence of 6 elements:
>
> How do I put this in a db?
>
> array (
> 0 =>
> array (
> 0 => 2,
> 1 =>
> array (
> 1 =>
> array (
> 0 => 1,
> 1 =>
> array (
> ),
> ),
> 2 =>
> array (
> 0 => 1,
> 1 =>
> array (
[snip]

Check this out: http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html

Google this:
http://www.google.com.au/search?hl=en&q=Joe+Celko+%27hierarc hies%27&btnG=Search&meta=

Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :((

Re: Database suggestion

am 05.12.2007 23:44:50 von luiheidsgoeroe

On Wed, 05 Dec 2007 23:39:27 +0100, asdf wrote:
> Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :((

Joe Celko's Trees and Hierarchies in SQL for Smarties, a must read for
anyone serious about the issue indeed. Not the easiest book to plough
through, but very rewarding :)
--
Rik Wasmus

Re: Database suggestion

am 06.12.2007 10:34:02 von Toby A Inkster

Mikhail Kovalev wrote:

> I have different databases for different types of sequences.

Aha. I don't think you mentioned that earlier.

CREATE TABLE data_sets
(
data_set integer NOT NULL PRIMARY KEY,
data_set_name varchar(20)
);
CREATE TABLE nodes
(
data_set integer NOT NULL
REFERENCES data_sets ON DELETE CASCADE,
node_address varchar(1024) NOT NULL,
node_count integer,
PRIMARY KEY (data_set_id, node_address)
);

An example of inserting some data:

INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
INSERT INTO nodes VALUES (1, '112', 3);
INSERT INTO nodes VALUES (1, '118', 4);
INSERT INTO nodes VALUES (1, '112/11', 5);
INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
INSERT INTO nodes VALUES (2, '112', 2);
INSERT INTO nodes VALUES (2, '4/115', 3);

An example of selecting data from Data Set 1:

SELECT node_count
FROM nodes
WHERE data_set=1
AND node_address='112';
(returns 3)

Some examples of selecting the sum of data from multiple sets:

SELECT sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
AND node_address='112'
GROUP BY node_address;
(returns 5)

SELECT node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;
(returns list of all nodes in set 1 or set 2, plus sum of counts
for both data sets)

Let's create a new data set 3 and populate it with the data from the
previous query. That is, data set 3 is the sum of data sets 1 and 2:

INSERT INTO data_sets VALUES (3, 'Combined Set');
INSERT INTO nodes
SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;

SQL is a really easy way of manipulating data sets.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/11/28/itunes-sharing/

Re: Database suggestion

am 06.12.2007 18:05:33 von Mikhail Kovalev

On 6 Des, 10:34, Toby A Inkster
wrote:
> Mikhail Kovalev wrote:
> > I have different databases for different types of sequences.
>
> Aha. I don't think you mentioned that earlier.
>
> CREATE TABLE data_sets
> (
> data_set integer NOT NULL PRIMARY KEY,
> data_set_name varchar(20)
> );
> CREATE TABLE nodes
> (
> data_set integer NOT NULL
> REFERENCES data_sets ON DELETE CASCADE,
> node_address varchar(1024) NOT NULL,
> node_count integer,
> PRIMARY KEY (data_set_id, node_address)
> );
>
> An example of inserting some data:
>
> INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
> INSERT INTO nodes VALUES (1, '112', 3);
> INSERT INTO nodes VALUES (1, '118', 4);
> INSERT INTO nodes VALUES (1, '112/11', 5);
> INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
> INSERT INTO nodes VALUES (2, '112', 2);
> INSERT INTO nodes VALUES (2, '4/115', 3);
>
> An example of selecting data from Data Set 1:
>
> SELECT node_count
> FROM nodes
> WHERE data_set=1
> AND node_address='112';
> (returns 3)
>
> Some examples of selecting the sum of data from multiple sets:
>
> SELECT sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> AND node_address='112'
> GROUP BY node_address;
> (returns 5)
>
> SELECT node_address, sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> GROUP BY node_address;
> (returns list of all nodes in set 1 or set 2, plus sum of counts
> for both data sets)
>
> Let's create a new data set 3 and populate it with the data from the
> previous query. That is, data set 3 is the sum of data sets 1 and 2:
>
> INSERT INTO data_sets VALUES (3, 'Combined Set');
> INSERT INTO nodes
> SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> GROUP BY node_address;
>
> SQL is a really easy way of manipulating data sets.
>
> --
> Toby A Inkster BSc (Hons) ARCS
> [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
> [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]
>
> Sharing Music with Apple iTunes
> http://tobyinkster.co.uk/blog/2007/11/28/itunes-sharing/

Re: Database suggestion

am 06.12.2007 18:09:51 von Mikhail Kovalev