Organisational question: surely someone has implemented many Booleanvalues (tags) and a solution exi

am 20.01.2011 15:32:17 von Dotan Cohen

I am designing an application that make heavy usage of one-to-many
tags for items. That is, each item can have multiple tags, and there
are tens of tags (likely to grow to hundreds). Most operation on the
database are expected to be searches for the items that have a
particular tag. That is, users will search per tags, not per items.

These are the ways that I've thought about storing the tags, some bad
and some worse. If there is a better way I'd love to know.

1) Each item will get a row in a "tags" table, with a column for each tag.
mysql> CREATE TABLE tags (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
item VARCHAR(100),
tag1 bool,
tag2 bool,
....
tagN bool
);

With this approach I would be adding a new column every time a new
category is added. This looks to me a good way given that users will
be searching per tag and a simple "SELECT item FROM tags WHERE
tag1=true;" is an easy, inexpensive query. This table will get very
large, there will likely be literally thousands of items (there will
exist more items than tags).

2) Store the applicable tags one per line in a text field in the items table.
mysql> CREATE TABLE items (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
item VARCHAR(100),
tags text,
);

This looks like a bad idea, searching by tag will be a mess.

3) Store the tags in a table and add items to a text field. For instance:
mysql> CREATE TABLE tags (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
tagName VARCHAR(100),
items text,
);

This looks to be the best way from a MySQL data retrieval perspective,
but I do not know how expensive it will be to then split the items in
PHP. Furthermore, adding items to tags could get real expensive.

Caveat: at some point in the future there may be added the ability to
have a tag hierarchy. For instance, there could exist a tag
"restaurant" that will get the subtags "italian" and "french". I could
fake this with any approach by having a table of existing tags with a
"parentTag" field, so if I plan on having this table anyway would
method 3 above be preferable?

Note: this message is cross-posted to the MySQL and the PHP lists as I
am really not sure where is the best place to do the logic. My
apologies to those who receive the message twice.

Thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Organisational question: surely someone has implementedmany Boolean values (tags) and a solution

am 20.01.2011 16:00:27 von Richard Quadling

On 20 January 2011 14:32, Dotan Cohen wrote:
> I am designing an application that make heavy usage of one-to-many
> tags for items. That is, each item can have multiple tags, and there
> are tens of tags (likely to grow to hundreds). Most operation on the
> database are expected to be searches for the items that have a
> particular tag. That is, users will search per tags, not per items.
>
> These are the ways that I've thought about storing the tags, some bad
> and some worse. If there is a better way I'd love to know.
>
> 1) Each item will get a row in a "tags" table, with a column for each tag=
..
> mysql> CREATE TABLE tags (
> Â Â id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> Â Â item VARCHAR(100),
> Â Â tag1 bool,
> Â Â tag2 bool,
> Â Â ....
> Â Â tagN bool
> );
>
> With this approach I would be adding a new column every time a new
> category is added. This looks to me a good way given that users will
> be searching per tag and a simple "SELECT item FROM tags WHERE
> tag1=3Dtrue;" is an easy, inexpensive query. This table will get very
> large, there will likely be literally thousands of items (there will
> exist more items than tags).
>
>
>
> 2) Store the applicable tags one per line in a text field in the items ta=
ble.
> mysql> CREATE TABLE items (
> Â Â id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> Â Â item VARCHAR(100),
> Â Â tags text,
> );
>
> This looks like a bad idea, searching by tag will be a mess.
>
>
>
> 3) Store the tags in a table and add items to a text field. For instance:
> mysql> CREATE TABLE tags (
> Â Â id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> Â Â tagName VARCHAR(100),
> Â Â items text,
> );
>
> This looks to be the best way from a MySQL data retrieval perspective,
> but I do not know how expensive it will be to then split the items in
> PHP. Furthermore, adding items to tags could get real expensive.
>
>
>
> Caveat: at some point in the future there may be added the ability to
> have a tag hierarchy. For instance, there could exist a tag
> "restaurant" that will get the subtags "italian" and "french". I could
> fake this with any approach by having a table of existing tags with a
> "parentTag" field, so if I plan on having this table anyway would
> method 3 above be preferable?
>
> Note: this message is cross-posted to the MySQL and the PHP lists as I
> am really not sure where is the best place to do the logic. My
> apologies to those who receive the message twice.
>
> Thanks!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

I'd have my items table, my tags table and a join table for the two.
My join table is really simple. UniqueID, ItemID, TagID.

I'd recommend using a nested set approach for the tags
(http://dev.mysql.com/tech-resources/articles/hierarchical-d ata.html
gives a good explanation on the issues and methodology of nested
sets).

--=20
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Organisational question: surely someone has implemented manyBoolean values (tags) and a solution

am 20.01.2011 16:22:39 von Peter Brawley

I'd exclude (1) because new tags require restructuring the table, (2)
and (3) because they break a cardinal rule of design and will be a mess
to query, leaving ...

4) Standard many-many bridge table:
mysql> CREATE TABLE items_tags (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
itemID int,
tagID INT
);

Will not require a major overhaul if you later turn categories into a tree.

PB

-----

On 1/20/2011 8:32 AM, Dotan Cohen wrote:
> I am designing an application that make heavy usage of one-to-many
> tags for items. That is, each item can have multiple tags, and there
> are tens of tags (likely to grow to hundreds). Most operation on the
> database are expected to be searches for the items that have a
> particular tag. That is, users will search per tags, not per items.
>
> These are the ways that I've thought about storing the tags, some bad
> and some worse. If there is a better way I'd love to know.
>
> 1) Each item will get a row in a "tags" table, with a column for each tag.
> mysql> CREATE TABLE tags (
> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> item VARCHAR(100),
> tag1 bool,
> tag2 bool,
> ....
> tagN bool
> );
>
> With this approach I would be adding a new column every time a new
> category is added. This looks to me a good way given that users will
> be searching per tag and a simple "SELECT item FROM tags WHERE
> tag1=true;" is an easy, inexpensive query. This table will get very
> large, there will likely be literally thousands of items (there will
> exist more items than tags).
>
>
>
> 2) Store the applicable tags one per line in a text field in the items table.
> mysql> CREATE TABLE items (
> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> item VARCHAR(100),
> tags text,
> );
>
> This looks like a bad idea, searching by tag will be a mess.
>
>
>
> 3) Store the tags in a table and add items to a text field. For instance:
> mysql> CREATE TABLE tags (
> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> tagName VARCHAR(100),
> items text,
> );
>
> This looks to be the best way from a MySQL data retrieval perspective,
> but I do not know how expensive it will be to then split the items in
> PHP. Furthermore, adding items to tags could get real expensive.
>
>
>
> Caveat: at some point in the future there may be added the ability to
> have a tag hierarchy. For instance, there could exist a tag
> "restaurant" that will get the subtags "italian" and "french". I could
> fake this with any approach by having a table of existing tags with a
> "parentTag" field, so if I plan on having this table anyway would
> method 3 above be preferable?
>
> Note: this message is cross-posted to the MySQL and the PHP lists as I
> am really not sure where is the best place to do the logic. My
> apologies to those who receive the message twice.
>
> Thanks!
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Organisational question: surely someone has implemented many Boolean values (tags) and a solutio

am 20.01.2011 16:22:41 von Jerry Schwartz

I think the canonical way would be to have one table for your items, one table
for your tags, and one table for your tag assignments.

CREATE TABLE items (
item_id INT(11) AUTO-INCREMENT PRIMARY KEY,
item_name VARCHAR(100) NOT NULL KEY,
....
);

CREATE TABLE tags (
tag_id INT(11) AUTO-INCREMENT PRIMARY KEY,
tag_name VARCHAR(100) NOT NULL KEY,
....
);

CREATE TABLE item_tags (
item_id INT(11) NOT NULL KEY,
tag_id INT(11) NOT NULL KEY
);

This way you could do

SELECT item_id, item_name FROM
tags JOIN item_tags ON tags.tag_id = item_tags.tag_id
JOIN items ON item_tags.item_id = items.item_id
WHERE ...
;

to get all of the items with a particular tag, or

SELECT tag_id, tag_name FROM
items JOIN item_tags ON items.item_id = item_tags.item_id
JOIN tags ON item_tags.tag_id = tags.tag_id
WHERE ...
;

with equal ease and efficiency.

Using an ever-lengthening bitmap for the tag assignments is a trap for the
unwary. The path to perdition is lined with the bodies of those who believed
"We'll never need more than x..."

As for setting up a hierarchy, that's trickier. One way to handle that is to
work like libraries do: 10 is "fiction", 10.05 is "crime novels", 10.05.07 is
"British authors", and so forth. Your `tags` table then looks like

CREATE TABLE tags (
tag_id INT(11) AUTO-INCREMENT PRIMARY KEY,
tag_name VARCHAR(100) NOT NULL KEY,
tag_number VARCHAR(100) NOT NULL KEY,
....
);

Then you can search for tags by

tag_number LIKE ('10.%') or
tag_number LIKE ('10.05%')

and so forth. This scheme is infinitely extendable. To get the entire
hierarchy, you simply

SELECT tag_number, tag_name FROM tags ORDER BY tag_number;

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com

>-----Original Message-----
>From: Dotan Cohen [mailto:dotancohen@gmail.com]
>Sent: Thursday, January 20, 2011 9:32 AM
>To: mysql.; php-general.
>Subject: Organisational question: surely someone has implemented many Boolean
>values (tags) and a solution exist
>
>I am designing an application that make heavy usage of one-to-many
>tags for items. That is, each item can have multiple tags, and there
>are tens of tags (likely to grow to hundreds). Most operation on the
>database are expected to be searches for the items that have a
>particular tag. That is, users will search per tags, not per items.
>
>These are the ways that I've thought about storing the tags, some bad
>and some worse. If there is a better way I'd love to know.
>
>1) Each item will get a row in a "tags" table, with a column for each tag.
>mysql> CREATE TABLE tags (
> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> item VARCHAR(100),
> tag1 bool,
> tag2 bool,
> ....
> tagN bool
>);
>
>With this approach I would be adding a new column every time a new
>category is added. This looks to me a good way given that users will
>be searching per tag and a simple "SELECT item FROM tags WHERE
>tag1=true;" is an easy, inexpensive query. This table will get very
>large, there will likely be literally thousands of items (there will
>exist more items than tags).
>
>
>
>2) Store the applicable tags one per line in a text field in the items table.
>mysql> CREATE TABLE items (
> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> item VARCHAR(100),
> tags text,
>);
>
>This looks like a bad idea, searching by tag will be a mess.
>
>
>
>3) Store the tags in a table and add items to a text field. For instance:
>mysql> CREATE TABLE tags (
> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> tagName VARCHAR(100),
> items text,
>);
>
>This looks to be the best way from a MySQL data retrieval perspective,
>but I do not know how expensive it will be to then split the items in
>PHP. Furthermore, adding items to tags could get real expensive.
>
>
>
>Caveat: at some point in the future there may be added the ability to
>have a tag hierarchy. For instance, there could exist a tag
>"restaurant" that will get the subtags "italian" and "french". I could
>fake this with any approach by having a table of existing tags with a
>"parentTag" field, so if I plan on having this table anyway would
>method 3 above be preferable?
>
>Note: this message is cross-posted to the MySQL and the PHP lists as I
>am really not sure where is the best place to do the logic. My
>apologies to those who receive the message twice.
>
>Thanks!
>
>--
>Dotan Cohen
>
>http://gibberish.co.il
>http://what-is-what.com
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=jerry@gii.co.jp

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 17:20:36 von Dotan Cohen

On Thu, Jan 20, 2011 at 17:00, Richard Quadling wrote:
> I'd have my items table, my tags table and a join table for the two.
> My join table is really simple. UniqueID, ItemID, TagID.
>

Yes, that is the first approach that I mentioned. It looks to be a
good compromise.

> I'd recommend using a nested set approach for the tags
> (http://dev.mysql.com/tech-resources/articles/hierarchical-d ata.html
> gives a good explanation on the issues and methodology of nested
> sets).
>

That is terrific, at least the first half. The second half, with the
Venn diagrams, is awkward!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 17:23:18 von Dotan Cohen

On Thu, Jan 20, 2011 at 18:20, Dotan Cohen wrote:
> On Thu, Jan 20, 2011 at 17:00, Richard Quadling wrote:
>> I'd have my items table, my tags table and a join table for the two.
>> My join table is really simple. UniqueID, ItemID, TagID.
>>
>
> Yes, that is the first approach that I mentioned. It looks to be a
> good compromise.
>
>

Sorry, that was _not_ the first approach that I mentioned. It looks to
be the right method though, thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Organisational question: surely someone has implemented manyBoolean values (tags) and a solution

am 20.01.2011 17:25:20 von Dotan Cohen

On Thu, Jan 20, 2011 at 17:22, Jerry Schwartz wrote:
> I think the canonical way would be to have one table for your items, one table
> for your tags, and one table for your tag assignments.
>

Thank you, I do agree that this is the best way. Other posters seem to
agree as well!

> Using an ever-lengthening bitmap for the tag assignments is a trap for the
> unwary. The path to perdition is lined with the bodies of those who believed
> "We'll never need more than x..."
>

640 kb?

> As for setting up a hierarchy, that's trickier. One way to handle that is to
> work like libraries do: 10 is "fiction", 10.05 is "crime novels", 10.05.07 is
> "British authors", and so forth. Your `tags` table then looks like
>

Thanks. I prefer the "parent tag" field, though, I feel that it is
more flexible.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Organisational question: surely someone has implemented manyBoolean values (tags) and a solution

am 20.01.2011 17:26:18 von Dotan Cohen

On Thu, Jan 20, 2011 at 17:22, Peter Brawley
wrote:
> I'd exclude (1) because new tags require restructuring the table, (2)
> and (3) because they break a cardinal rule of design and will be a mess
> to query, leaving ...
>
> 4) Standard many-many bridge table:
> mysql> Â CREATE TABLE items_tags (
> Â id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
> Â itemID int,
> Â tagID INT
> );
>
> Will not require a major overhaul if you later turn categories into a tre=
e.
>

Terrific, Peter, this looks like the right direction. I appreciate the inpu=
t.

--=20
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 18:21:27 von Richard Quadling

On 20 January 2011 16:20, Dotan Cohen wrote:
> On Thu, Jan 20, 2011 at 17:00, Richard Quadling wrote:
>> I'd have my items table, my tags table and a join table for the two.
>> My join table is really simple. UniqueID, ItemID, TagID.
>>
>
> Yes, that is the first approach that I mentioned. It looks to be a
> good compromise.
>
>
>> I'd recommend using a nested set approach for the tags
>> (http://dev.mysql.com/tech-resources/articles/hierarchical-d ata.html
>> gives a good explanation on the issues and methodology of nested
>> sets).
>>
>
> That is terrific, at least the first half. The second half, with the
> Venn diagrams, is awkward!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

When you get heavily nested data, the adjacent set model (where you
have a parentid for every uniqueid), you very quickly get into
complicated logic trying to traverse n-levels. The nested set model is
specifically built to handle this issue. I'd recommend getting to
grips with it. It will make finding items belonging to a group (or a
super group) a LOT easier.

Especially if you have multiple tag hierarchies.

--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Organisational question: surely someone has implementedmany Boolean values (tags) and a solution

am 20.01.2011 18:57:05 von David Harkness

--00163649a31995e2e7049a4adc1e
Content-Type: text/plain; charset=ISO-8859-1

I cannot agree more with the others about using a join table. While it's
tempting to go with your first solution due to fear of performance issues,
you can usually address performance issues with a technical solution.
Addressing problems that arise from a constraining design choice is much
more difficult.

David

--00163649a31995e2e7049a4adc1e--

Re: Organisational question: surely someone has implementedmany Boolean values (tags) and a solution

am 20.01.2011 19:50:18 von David Hutto

Pseudo = Design Algorithm
Design Algorithm = Actual Code
Actual Code = Alterable db tables
Alterable db tables = manipulated data through the app interface with data

--
The lawyer in me says argue...even if you're wrong. The scientist in
me... says shut up, listen, and then argue. But the lawyer won on
appeal, so now I have to argue due to a court order.

Furthermore, if you could be a scientific celebrity, would you want
einstein sitting around with you on saturday morning, while you're
sitting in your undies, watching Underdog?...Or better yet, would
Einstein want you to violate his Underdog time?

Can you imagine Einstein sitting around in his underware? Thinking
about the relativity between his pubic nardsac, and his Fruit of the
Looms, while knocking a few Dorito's crumbs off his inner brilliant
white thighs, and hailing E = mc**2, and licking the orangy,
delicious, Doritoey crust that layered his genetically rippled
fingertips?

But then again, J. Edgar Hoover would want his pantyhose intertwined
within the equation.

However, I digress, momentarily.

But Einstein gave freely, for humanity, not for gain, other than
personal freedom.

An equation that benefited all, and yet gain is a personal product.

Also, if you can answer it, is gravity anymore than interplanetary static cling?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 20:20:39 von Dotan Cohen

On Thu, Jan 20, 2011 at 19:21, Richard Quadling wrote:
>> That is terrific, at least the first half. The second half, with the
>> Venn diagrams, is awkward!
>
> When you get heavily nested data, the adjacent set model (where you
> have a parentid for every uniqueid), you very quickly get into
> complicated logic trying to traverse n-levels. The nested set model is
> specifically built to handle this issue. I'd recommend getting to
> grips with it. It will make finding items belonging to a group (or a
> super group) a LOT easier.
>
> Especially if you have multiple tag hierarchies.
>

Is that strategy widely deployed, then? It seems so unruly having to
change on average half the database records for every new leaf.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 20:22:09 von Dotan Cohen

On Thu, Jan 20, 2011 at 20:50, David Hutto wrote:
> Pseudo = Design Algorithm
> Design Algorithm = Actual Code
> Actual Code = Alterable db tables
> Alterable db tables = manipulated data through the app interface with data
>
> --
> The lawyer in me says argue...even if you're wrong. The scientist in
> me... says shut up, listen, and then argue. But the lawyer won on
> appeal, so now I have to argue due to a court order.
>
> Furthermore, if you could be a scientific celebrity, would you want
> einstein sitting around with you on saturday morning, while you're
> sitting in your undies, watching Underdog?...Or better yet, would
> Einstein want you to violate his Underdog time?
>
> Can you imagine Einstein sitting around in his underware? Thinking
> about the relativity between his pubic nardsac, and his Fruit of the
> Looms, while knocking a few Dorito's crumbs off his inner brilliant
> white thighs, and hailing E = mc**2, and licking the orangy,
> delicious, Doritoey crust that layered his genetically rippled
> fingertips?
>
> But then again, J. Edgar Hoover would want his pantyhose intertwined
> within the equation.
>
> However, I digress, momentarily.
>
> But Einstein gave freely, for humanity, not for gain, other than
> personal freedom.
>
> An equation that benefited all, and yet gain is a personal product.
>
> Also, if you can answer it, is gravity anymore than interplanetary static cling?
>

Is this a troll? Am I about to be baited?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 20:24:04 von David Hutto

> Is this a troll? Am I about to be baited?

Baited to deploy what is designed to the consumer's specification?
Surely. From what is wanted to what is needed. Troll on that.
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

--
The lawyer in me says argue...even if you're wrong. The scientist in
me... says shut up, listen, and then argue. But the lawyer won on
appeal, so now I have to argue due to a court order.

Furthermore, if you could be a scientific celebrity, would you want
einstein sitting around with you on saturday morning, while you're
sitting in your undies, watching Underdog?...Or better yet, would
Einstein want you to violate his Underdog time?

Can you imagine Einstein sitting around in his underware? Thinking
about the relativity between his pubic nardsac, and his Fruit of the
Looms, while knocking a few Dorito's crumbs off his inner brilliant
white thighs, and hailing E = mc**2, and licking the orangy,
delicious, Doritoey crust that layered his genetically rippled
fingertips?

But then again, J. Edgar Hoover would want his pantyhose intertwined
within the equation.

However, I digress, momentarily.

But Einstein gave freely, for humanity, not for gain, other than
personal freedom.

An equation that benefited all, and yet gain is a personal product.

Also, if you can answer it, is gravity anymore than interplanetary static cling?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 20:26:53 von Dotan Cohen

On Thu, Jan 20, 2011 at 21:24, David Hutto wrote:
>> Is this a troll? Am I about to be baited?
>
> Baited to deploy what is designed to the consumer's specification?
> Surely. From what is wanted to what is needed. Troll on that.

Actually, I'm the customer! But assuming that a customer exists, that
implies compensation, and therefore fair bait.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 20:32:16 von David Hutto

On Thu, Jan 20, 2011 at 2:26 PM, Dotan Cohen wrote:
> On Thu, Jan 20, 2011 at 21:24, David Hutto wrote:
>>> Is this a troll? Am I about to be baited?
>>
>> Baited to deploy what is designed to the consumer's specification?
>> Surely. From what is wanted to what is needed. Troll on that.
>
> Actually, I'm the customer! But assuming that a customer exists, that
> implies compensation, and therefore fair bait.
Then that's different altogether. you get to decide what information
is displayed, and what information is 'sensed', and on what platform.

What do you want to sense and what do you want to display(not to say
I'm an expert, but I like to think in CS)?

>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

--
The lawyer in me says argue...even if you're wrong. The scientist in
me... says shut up, listen, and then argue. But the lawyer won on
appeal, so now I have to argue due to a court order.

Furthermore, if you could be a scientific celebrity, would you want
einstein sitting around with you on saturday morning, while you're
sitting in your undies, watching Underdog?...Or better yet, would
Einstein want you to violate his Underdog time?

Can you imagine Einstein sitting around in his underware? Thinking
about the relativity between his pubic nardsac, and his Fruit of the
Looms, while knocking a few Dorito's crumbs off his inner brilliant
white thighs, and hailing E = mc**2, and licking the orangy,
delicious, Doritoey crust that layered his genetically rippled
fingertips?

But then again, J. Edgar Hoover would want his pantyhose intertwined
within the equation.

However, I digress, momentarily.

But Einstein gave freely, for humanity, not for gain, other than
personal freedom.

An equation that benefited all, and yet gain is a personal product.

Also, if you can answer it, is gravity anymore than interplanetary static cling?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: Organisational question: surely someone has implemented many Boolean values (tags) and a solutio

am 20.01.2011 20:40:05 von Jerry Schwartz

>-----Original Message-----
>From: Dotan Cohen [mailto:dotancohen@gmail.com]
>Sent: Thursday, January 20, 2011 11:25 AM
>To: Jerry Schwartz
>Cc: mysql.; php-general.
>Subject: Re: Organisational question: surely someone has implemented many
>Boolean values (tags) and a solution exist
>
>
>> As for setting up a hierarchy, that's trickier. One way to handle that is
>> to
>> work like libraries do: 10 is "fiction", 10.05 is "crime novels", 10.05.07
>> is
>> "British authors", and so forth. Your `tags` table then looks like
>>
>
>Thanks. I prefer the "parent tag" field, though, I feel that it is
>more flexible.
>
>
[JS] I disagree. The method I proposed can be extended to any depth, and any
leaf or branch can be retrieved with a single query.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com

>--
>Dotan Cohen
>
>http://gibberish.co.il
>http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] RE: Organisational question: surely someone has implementedmany Boolean values (tags) and

am 20.01.2011 20:43:41 von David Hutto

On Thu, Jan 20, 2011 at 2:40 PM, Jerry Schwartz wrote:
>>-----Original Message-----
>>From: Dotan Cohen [mailto:dotancohen@gmail.com]
>>Sent: Thursday, January 20, 2011 11:25 AM
>>To: Jerry Schwartz
>>Cc: mysql.; php-general.
>>Subject: Re: Organisational question: surely someone has implemented many
>>Boolean values (tags) and a solution exist
>>
>>
>>> As for setting up a hierarchy, that's trickier. One way to handle that is
>>> to
>>> work like libraries do: 10 is "fiction", 10.05 is "crime novels", 10.05.07
>>> is
>>> "British authors", and so forth. Your `tags` table then looks like
>>>
>>
>>Thanks. I prefer the "parent tag" field, though, I feel that it is
>>more flexible.
>>
>>
> [JS] I disagree. The method I proposed can be extended to any depth, and any
> leaf or branch can be retrieved with a single query.

No one argues with method of implementation(not that i explored the
individual ideas), but with that available, you can hierarchy the
concept of your db conceptualization.

>
> Regards,
>
> Jerry Schwartz
> Global Information Incorporated
> 195 Farmington Ave.
> Farmington, CT 06032
>
> 860.674.8796 / FAX: 860.674.8341
> E-mail: jerry@gii.co.jp
> Web site: www.the-infoshop.com
>
>
>
>>--
>>Dotan Cohen
>>
>>http://gibberish.co.il
>>http://what-is-what.com
>
>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--
The lawyer in me says argue...even if you're wrong. The scientist in
me... says shut up, listen, and then argue. But the lawyer won on
appeal, so now I have to argue due to a court order.

Furthermore, if you could be a scientific celebrity, would you want
einstein sitting around with you on saturday morning, while you're
sitting in your undies, watching Underdog?...Or better yet, would
Einstein want you to violate his Underdog time?

Can you imagine Einstein sitting around in his underware? Thinking
about the relativity between his pubic nardsac, and his Fruit of the
Looms, while knocking a few Dorito's crumbs off his inner brilliant
white thighs, and hailing E = mc**2, and licking the orangy,
delicious, Doritoey crust that layered his genetically rippled
fingertips?

But then again, J. Edgar Hoover would want his pantyhose intertwined
within the equation.

However, I digress, momentarily.

But Einstein gave freely, for humanity, not for gain, other than
personal freedom.

An equation that benefited all, and yet gain is a personal product.

Also, if you can answer it, is gravity anymore than interplanetary static cling?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] RE: Organisational question: surely someone has implementedmany Boolean values (tags) and

am 20.01.2011 20:59:38 von Dotan Cohen

On Thu, Jan 20, 2011 at 21:40, Jerry Schwartz wrote:
>>Thanks. I prefer the "parent tag" field, though, I feel that it is
>>more flexible.
>>
>>
> [JS] I disagree. The method I proposed can be extended to any depth, and any
> leaf or branch can be retrieved with a single query.
>

I suppose for retrievals this structure has advantages, but unless
MySQL has a ++ operator (or better yet, one that adds or subtracts 2
from an int) then it looks to be a pain to add nodes.

But I will play with the idea. Maybe after I write the code (I'm
saving that for tomorrow) I'll see it differently. Thanks.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 21:05:53 von David Harkness

--0016e6d77e5ef76d80049a4caa70
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Jan 20, 2011 at 7:00 AM, Richard Quadling wrote:

> I'd recommend using a nested set approach for the tags
> (http://dev.mysql.com/tech-resources/articles/hierarchical-d ata.html
> gives a good explanation on the issues and methodology of nested
> sets).
>

Thanks for the link. That article proposes an interesting way to organize
the categories. Have you implemented this in the wild? Clearly the design
would work as it's pretty simple, and I like that it removes the need for
recursive queries.

Dotan, the Venn diagrams are just used to explain the concept. If you use
the code to determine the left and right values, you can ignore the diagrams
entirely. As long as you're not adding/removing categories every minute,
having to recalculate left and right values isn't that big of a deal.

Also, there's no reason you couldn't keep the parent_id field with the
nested sets. It would come in handy for certain types of queries, though
it's not necessary.

On Thu, Jan 20, 2011 at 11:40 AM, Jerry Schwartz wrote:

> I disagree. The method I proposed can be extended to any depth, and any
> leaf or branch can be retrieved with a single query.
>

The nested set method can be extended to any depth, and it pays off more the
larger the hierarchy grows. While you can retrieve any branch (all
ancestors) of a node with a single SQL query, the SQL engine itself actually
must perform a recursive query meaning multiple hits on the parent_id index.

On Thu, Jan 20, 2011 at 11:59 AM, Dotan Cohen wrote:

> I suppose for retrievals this structure has advantages, but unless
> MySQL has a ++ operator (or better yet, one that adds or subtracts 2
> from an int) then it looks to be a pain to add nodes.
>

++ or += wouldn't be any better here than x = x + 2. Once you're modifying
indexed values, you'll pay a much higher price writing to disk than += could
ever save you in CPU cycles. The beauty is that inserting a node requires
only two update statements that will fix *all* categories that need to be
adjusted. Adding categories to the hierarchical model is definitely faster
so it comes down to your insert-to-select ratio. Moving a subtree is also
much easier with the hierarchical model.

David

--0016e6d77e5ef76d80049a4caa70--

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 21:21:29 von Dotan Cohen

On Thu, Jan 20, 2011 at 22:05, David Harkness wrote:
> Thanks for the link. That article proposes an interesting way to organize
> the categories. Have you implemented this in the wild? Clearly the design
> would work as it's pretty simple, and I like that it removes the need for
> recursive queries.

I am also interested in knowing if this approach is used in any production code.

> Dotan, the Venn diagrams are just used to explain the concept. If you use
> the code to determine the left and right values, you can ignore the diagrams
> entirely. As long as you're not adding/removing categories every minute,
> having to recalculate left and right values isn't that big of a deal.

I understood that. My concern is exactly with adding new nodes. There
is no incrementor (++i) in SQL, so knowingly coding a solution that
will require incrementing two fields in half the database rows seems
irresponsible.

> Also, there's no reason you couldn't keep the parent_id field with the
> nested sets. It would come in handy for certain types of queries, though
> it's not necessary.

That is true. I could store both methods, and experiment to see which
is preferable. But what a mess this would be if the two methods go out
of sync! Isn't there a name for that in SQL, something along the lines
of not storing the same data in two places lest one should change and
not the other? The term escapes me.

>> I disagree. The method I proposed can be extended to any depth, and any
>> leaf or branch can be retrieved with a single query.
>
> The nested set method can be extended to any depth, and it pays off more the
> larger the hierarchy grows. While you can retrieve any branch (all
> ancestors) of a node with a single SQL query, the SQL engine itself actually
> must perform a recursive query meaning multiple hits on the parent_id index.

That pays off more? For the guy writing code or for the database
memory requirement?

>> I suppose for retrievals this structure has advantages, but unless
>> MySQL has a ++ operator (or better yet, one that adds or subtracts 2
>> from an int) then it looks to be a pain to add nodes.
>
> ++ or += wouldn't be any better here than x = x + 2. Once you're modifying
> indexed values, you'll pay a much higher price writing to disk than += could
> ever save you in CPU cycles. The beauty is that inserting a node requires
> only two update statements that will fix *all* categories that need to be
> adjusted.

Only two update statements, but they are affecting on average half the
database's rows!

> Adding categories to the hierarchical model is definitely faster
> so it comes down to your insert-to-select ratio. Moving a subtree is also
> much easier with the hierarchical model.

Which do you call the hierarchical model? That term is not used in the
linked article.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 21:58:38 von David Harkness

--0016e6d77e5ed9af12049a4d6597
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Jan 20, 2011 at 12:21 PM, Dotan Cohen wrote:

> I understood that. My concern is exactly with adding new nodes. There
> is no incrementor (++i) in SQL, so knowingly coding a solution that
> will require incrementing two fields in half the database rows seems
> irresponsible.
>

It only requires updating the category rows. If you have several hundred
categories this is a non-issue. If you have several thousand categories, you
probably have millions of products, and you'd want to do some performance
analysis on it. Even still, this is necessary only when adding new
categories.

If you are doing this often, you could leave spaces in the left and right
values so that you could minimize the number of rows that need to be
updated. The article makes every leaf use x and x+1 for left and right which
forces another update to add a child. If instead you used x and x+20 you'd
leave space for more children without any updates. This could be applied
from top to bottom, starting with the root category getting 0 and MAX_INT
for its values.

However, it's probably not even worth applying that complexity until you
prove that frequent category additions are causing problems. Most systems
will be querying against the categories table far more frequently, and
that's where this model pays off. If you want to see all products in
category X and its subcategories, it's a single *non-recursive* query.
That's huge if you are doing a lot of searches like this.

But what a mess this would be if the two methods go out of sync!
>

Sure, but these values would be maintained by your code--not end-users. It
just comes down to making sure your code is correct through appropriate unit
tests. By moving the logic to a stored procedure, you can ensure the table
is locked during the updates to keep two users from adding a new category
simultaneously.

That pays off more? For the guy writing code or for the database
> memory requirement?
>

Performance-wise. The nested set method looks to be moderately more complex
code-wise, but luckily that is done just once while querying the database is
done again and again. As with all optimizations, it's best to measure and
make sure there's a problem before trying to solve it. Once you've built a
few hierarchical systems, you'll be able to make a gut call up front.

Only two update statements, but they are affecting on average half the
> database's rows!
>

Of a single table: categories. Hopefully you have far more items that get
categorized than you do categories.

> Which do you call the hierarchical model? That term is not used in the
> linked article.
>

Well, both models are hierarchical in the sense that there's a parent-child
relationship. By hierarchical here I mean that the method of implementation
involves each category pointing to its parent directly via a parent_id
column. Searching for all subcategories of category X requires searching
first for all children, then all grandchildren, and so on, resulting in a
recursive query.

Using the nested sets model requires a single non-recursive query to get the
same data.

David

--0016e6d77e5ed9af12049a4d6597--

RE: [PHP] RE: Organisational question: surely someone has implemented many Boolean values (tags) and

am 20.01.2011 22:46:31 von Jerry Schwartz

>> [JS] I disagree. The method I proposed can be extended to any depth, and
>> any
>> leaf or branch can be retrieved with a single query.
>>
>
>I suppose for retrievals this structure has advantages, but unless
>MySQL has a ++ operator (or better yet, one that adds or subtracts 2
>from an int) then it looks to be a pain to add nodes.
>
[JS] Not at all. Somebody, somehow, has to assign a name to the tag and
designate its place in the hierarchy. I don't see how you can avoid that being
done by a human.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com

>But I will play with the idea. Maybe after I write the code (I'm
>saving that for tomorrow) I'll see it differently. Thanks.
>
>--
>Dotan Cohen
>
>http://gibberish.co.il
>http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 20.01.2011 23:28:08 von Peter Brawley

>My concern is exactly with adding new nodes. There
>is no incrementor (++i) in SQL, so knowingly coding a solution that
>will require incrementing two fields in half the database rows seems
>irresponsible.

Yes, and an edge list model may perform better in other respects too:

http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20 .html
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20 .html

PB

-----

On 1/20/2011 2:21 PM, Dotan Cohen wrote:
> On Thu, Jan 20, 2011 at 22:05, David Harkness wrote:
>> Thanks for the link. That article proposes an interesting way to organize
>> the categories. Have you implemented this in the wild? Clearly the design
>> would work as it's pretty simple, and I like that it removes the need for
>> recursive queries.
> I am also interested in knowing if this approach is used in any production code.
>
>
>> Dotan, the Venn diagrams are just used to explain the concept. If you use
>> the code to determine the left and right values, you can ignore the diagrams
>> entirely. As long as you're not adding/removing categories every minute,
>> having to recalculate left and right values isn't that big of a deal.
> I understood that. My concern is exactly with adding new nodes. There
> is no incrementor (++i) in SQL, so knowingly coding a solution that
> will require incrementing two fields in half the database rows seems
> irresponsible.
>
>
>> Also, there's no reason you couldn't keep the parent_id field with the
>> nested sets. It would come in handy for certain types of queries, though
>> it's not necessary.
> That is true. I could store both methods, and experiment to see which
> is preferable. But what a mess this would be if the two methods go out
> of sync! Isn't there a name for that in SQL, something along the lines
> of not storing the same data in two places lest one should change and
> not the other? The term escapes me.
>
>
>>> I disagree. The method I proposed can be extended to any depth, and any
>>> leaf or branch can be retrieved with a single query.
>> The nested set method can be extended to any depth, and it pays off more the
>> larger the hierarchy grows. While you can retrieve any branch (all
>> ancestors) of a node with a single SQL query, the SQL engine itself actually
>> must perform a recursive query meaning multiple hits on the parent_id index.
> That pays off more? For the guy writing code or for the database
> memory requirement?
>
>
>>> I suppose for retrievals this structure has advantages, but unless
>>> MySQL has a ++ operator (or better yet, one that adds or subtracts 2
>>> from an int) then it looks to be a pain to add nodes.
>> ++ or += wouldn't be any better here than x = x + 2. Once you're modifying
>> indexed values, you'll pay a much higher price writing to disk than += could
>> ever save you in CPU cycles. The beauty is that inserting a node requires
>> only two update statements that will fix *all* categories that need to be
>> adjusted.
> Only two update statements, but they are affecting on average half the
> database's rows!
>
>
>> Adding categories to the hierarchical model is definitely faster
>> so it comes down to your insert-to-select ratio. Moving a subtree is also
>> much easier with the hierarchical model.
> Which do you call the hierarchical model? That term is not used in the
> linked article.
>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 21.01.2011 11:29:35 von Richard Quadling

On 20 January 2011 19:20, Dotan Cohen wrote:
> On Thu, Jan 20, 2011 at 19:21, Richard Quadling wrote:
>>> That is terrific, at least the first half. The second half, with the
>>> Venn diagrams, is awkward!
>>
>> When you get heavily nested data, the adjacent list model (where you
>> have a parentid for every uniqueid), you very quickly get into
>> complicated logic trying to traverse n-levels. The nested set model is
>> specifically built to handle this issue. I'd recommend getting to
>> grips with it. It will make finding items belonging to a group (or a
>> super group) a LOT easier.
>>
>> Especially if you have multiple tag hierarchies.
>>
>
> Is that strategy widely deployed, then? It seems so unruly having to
> change on average half the database records for every new leaf.
>
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

Changing data in a database is the role of the database engine. It is
much more efficient to have the cost on the insert than it is on the
select.

The adjacent list model is very expensive at n-levels for the select,
but trivial cost for the insert. If you are inserting millions of rows
but only occasionally looking at the data, then stick with the
adjacent list model. But if tags and n-levels are regularly accessed
and form a main part to the functionality of the app, then you may
want to reconsider.

Sure, the insert for the nested set model is more expensive in terms
of the number of rows to amend, but indexing will certainly should
certainly help. If you have tools to help optimize the tag table and
the queries you use, then I'd follow the recommendations (I use MS
SQL, so my Query Optimization tools help me here). The nested set
model is extremely efficient on the select.

It is a trade off that you have to decide upon, based upon your data
and needs. If, as I suspect, you are going to be doing a LOT of
selects on the tags and (in the future) to multiple levels, then this
aspect needs to be very efficient.

For me it is well worth the effort of moving from the adjacent list
model to the nested set model.

Both mechanisms work. In my opinion, the adjacent list model is for
truly simply lookups, not for complicated n-levels.

One of the changes I made to the nested set model was for a Bill Of
Materials module. The client made complex machinery (industrial
lathes). The sum quantity for all the parts were in the 20,000 region.
Each nut, bolt, screw, etc. Many sub-assemblies. The parent/child node
logic was massive in dealing with retrieving questions like "How many
machines can we build?", "What stock do we need to buy/make to
complete an order of 20 lathes?". Lot's of recursion into each level
to build the list. Getting the results would take 3 or 4 minutes (this
is in a non SQL environment using a peer-to-peer modified D-ISAM
database - it was already slow because of all that). When I moved to
the nested set model, no recursion and 1 query (more or less) and I
have all the results I needed. It was seconds in comparison.

Regards,

Richard.
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 21.01.2011 13:31:11 von Dotan Cohen

>> Actually, I'm the customer! But assuming that a customer exists, that
>> implies compensation, and therefore fair bait.
> Then that's different altogether. you get to decide what information
> is displayed, and what information is 'sensed', and on what platform.
>

Yes, but before I get to that stage (relatively easy today with high
level languages such as PHP or C#) I need to decide how to organise
the data.

> What do you want to sense and what do you want to display(not to say
> I'm an expert, but I like to think in CS)?
>

The application will pull calendar records by category: entertainment,
food, gov, transportation, etc. The idea is that the user could query
for, say, postal offices open after 17:00 on Thursday, or films
between 20:00 and 22:00 on Monday, or buses leaving Amsterdam to
Rotterdam on Monday morning.

By the way, I figured out what the "troll" issue was. I was confusing
your signature for the body of the post. It was familiar, too, so I
thought that it may have been a troll post on /. or such. It turns out
that I had seen it on the Python-tutor list, and their I also mistook
it for a troll as the length of the sig far outweighs the length of
your typically concise and to-the-point post. Constructive advice:
trim the sig!

Thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 21.01.2011 13:44:26 von Dotan Cohen

> If you are doing this often, you could leave spaces in the left and right
> values so that you could minimize the number of rows that need to be
> updated. The article makes every leaf use x and x+1 for left and right wh=
ich
> forces another update to add a child. If instead you used x and x+20 you'=
d
> leave space for more children without any updates. This could be applied
> from top to bottom, starting with the root category getting 0 and MAX_INT
> for its values.

Then I would have to check what values are available when inserting,
and possibly normalise every so often. I'll think about that, and when
I have enough data in the database I'll set up a test system to play
with the possibility.

> However, it's probably not even worth applying that complexity until you
> prove that frequent category additions are causing problems. Most systems
> will be querying against the categories table far more frequently, and
> that's where this model pays off. If you want to see all products in
> category X and its subcategories, it's a single *non-recursive* query.
> That's huge if you are doing a lot of searches like this.

You are right, that non-recursive bit is important. In fact, I think
that I'm convinced. Thanks!

>> But what a mess this would be if the two methods go outÂ of sync!
>
> Sure, but these values would be maintained by your code--not end-users. I=
t
> just comes down to making sure your code is correct through appropriate u=
nit
> tests. By moving the logic to a stored procedure, you can ensure the tabl=
e
> is locked during the updates to keep two users from adding a new category
> simultaneously.

So long as it is in fact my code, that's fine. But when others start
maintaining it and not reading comments, it may get ugly. That does
not apply to this particular pet project, but it is a consideration
for future projects.

>> That pays off more? For the guy writing code or for the database
>> memory requirement?
>
> Performance-wise. The nested set method looks to be moderately more compl=
ex
> code-wise, but luckily that is done just once while querying the database=
is
> done again and again. As with all optimizations, it's best to measure and
> make sure there's a problem before trying to solve it. Once you've built =
a
> few hierarchical systems, you'll be able to make a gut call up front.

I see, thanks. Good point about making sure that the problem exists
before trying to fix it, I've seen people optimise away where there is
no bottleneck.

>> Only two update statements, but they are affecting on average half the
>> database's rows!
>
> Of a single table: categories. Hopefully you have far more items that get
> categorized than you do categories.
>

True.

>> Which do you call the hierarchical model? That term is not used in the
>> linked article.
>
> Well, both models are hierarchical in the sense that there's a parent-chi=
ld
> relationship. By hierarchical here I mean that the method of implementati=
on
> involves each category pointing to its parent directly via a parent_id
> column. Searching for all subcategories of category X requires searching
> first for all children, then all grandchildren, and so on, resulting in a
> recursive query.
> Using the nested sets model requires a single non-recursive query to get =
the
> same data.
>

I do agree that the non-recursive method at retrieval time advantage
far outweighs the update-half-the-table issue upon addition of an
additional category.

Thanks!

--=20
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 21.01.2011 13:47:42 von Dotan Cohen

> Yes, and an edge list model may perform better in other respects too:
>
> http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20 .html
> http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20 .html
>

Thanks. I am currently reading "Trees and Hierarchies in SQL for
Smarties" by Joe Celko, which also deals with a similar model. It is
revealing and an interesting way of looking at the issue of organising
data.

In addition to those two links, this one also seems relevant:
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20 .html

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 21.01.2011 13:53:19 von Dotan Cohen

On Fri, Jan 21, 2011 at 12:29, Richard Quadling wrote:
> Changing data in a database is the role of the database engine. It is
> much more efficient to have the cost on the insert than it is on the
> select.
>

Agreed. On insert I could even delegate the operation to another
thread which does not timeout with the pageload.

> The adjacent list model is very expensive at n-levels for the select,
> but trivial cost for the insert. If you are inserting millions of rows
> but only occasionally looking at the data, then stick with the
> adjacent list model. But if tags and n-levels are regularly accessed
> and form a main part to the functionality of the app, then you may
> want to reconsider.
>

I've already reconsidered after some sleep and coffee!

> Sure, the insert for the nested set model is more expensive in terms
> of the number of rows to amend, but indexing will certainly should
> certainly help. If you have tools to help optimize the tag table and
> the queries you use, then I'd follow the recommendations (I use MS
> SQL, so my Query Optimization tools help me here). The nested set
> model is extremely efficient on the select.
>

Interesting. I am using MySQL for this application, but another hat I
wear is learning C# with MS tools and I will have to look into the
Query Optimisation.

> It is a trade off that you have to decide upon, based upon your data
> and needs. If, as I suspect, you are going to be doing a LOT of
> selects on the tags and (in the future) to multiple levels, then this
> aspect needs to be very efficient.
>

You suspect correctly.

> For me it is well worth the effort of moving from the adjacent list
> model to the nested set model.
>
> Both mechanisms work. In my opinion, the adjacent list model is for
> truly simply lookups, not for complicated n-levels.
>
> One of the changes I made to the nested set model was for a Bill Of
> Materials module. The client made complex machinery (industrial
> lathes). The sum quantity for all the parts were in the 20,000 region.
> Each nut, bolt, screw, etc. Many sub-assemblies. The parent/child node
> logic was massive in dealing with retrieving questions like "How many
> machines can we build?", "What stock do we need to buy/make to
> complete an order of 20 lathes?". Lot's of recursion into each level
> to build the list. Getting the results would take 3 or 4 minutes (this
> is in a non SQL environment using a peer-to-peer modified D-ISAM
> database - it was already slow because of all that). When I moved to
> the nested set model, no recursion and 1 query (more or less) and I
> have all the results I needed. It was seconds in comparison.
>

Thanks, I enjoy reading these real-life scenarios. This was a terrific example.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: [PHP] Organisational question: surely someone has implementedmany Boolean values (tags) and a so

am 21.01.2011 18:52:24 von David Harkness

--0016e65a0dfca5b16e049a5ee93f
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jan 21, 2011 at 4:44 AM, Dotan Cohen wrote:

> Then I would have to check what values are available when inserting,
> and possibly normalise every so often. I'll think about that, and when
> I have enough data in the database I'll set up a test system to play
> with the possibility.
>

Yes, it's a much more complicated process, and one that I wouldn't enter
into without verifying there was a performance problem. The neat thing is
you could do this at any point as you can renumber the left and right values
at will--as long as you don't *reorder* them.

> I see, thanks. Good point about making sure that the problem exists
>
before trying to fix it, I've seen people optimise away where there is
> no bottleneck.
>

People who love to work on complicated problems often fall victim to
premature optimization because it's fun to do. I was guilty of this myself
earlier in my career, but as CPUs and memory and disk speeds increased over
the decades, it's harder to justify it. At least back then the optimization
that wasn't *necessary* still had a good payoff. Nowadays you can waste days
shaving 10ms off of an operation that takes 1s. Congratulations, you've just
poured $$ down the drain. But boy wasn't that fun! :)

Good luck with the implementation. I'd love to hear how it pans out once
you're done, and I'm sure the list would find it useful as well.

David

--0016e65a0dfca5b16e049a5ee93f--