A common request

am 29.03.2011 19:50:33 von Gregory Magarshak

Hey there. My company writes a lot of social applications, and there is
one operation that is very common, but I don't know if MySQL supports it
in a good way. I thought I'd write to this list for two reasons:

1) Maybe MySQL has a good way to do this, and I just don't know
about it

2) Propose to MySQL developers a simple algorithm which would
greatly improve MySQL support for social networking apps.

Here is the situation. Let's say I have built a social networking
application where people create and edit some item (article, photo,
music mix, whatever). Now, a typical user logs in, and this user has
3000 friends. How can I quickly find all the articles written by this
user's friends, and not just random articles?

Ideally, I would want to write something like this:

SELECT * FROM article WHERE user_id IN (345789, 324875, 398, ...,
349580)

basically, execute a query with a huge IN ( ... ). Maybe if this
would exceed the buffer size for the MySQL wire protocol, I would break
up the list into several lists, and execute several queries, and union
the results together myself.

But my point is, this is very common for social networking apps.
Every app wants to show "the X created by your friends", or "friends of
yours (given some list from a social network) who have taken action X".

Here is how I would do it if I had raw access to the MySQL index in
memory:

a) Sort the list of entries in the IN, in ascending order.

b) Do *ONE* binary search through the index (assuming it's a BTREE
index) and get them all in one pass. If it's a HASH index or something,
I would have to look up each one individually.

The benefits of this approach would be that this common operation
would be done extremely quickly. If the index fits entirely in memory,
and I just want to get the primary keys (i.e. get the list of friends
who did X), the disk isn't even touched. In addition, for BTREE indexes,
I would just need ONE binary search, because the entries have been
sorted in ascending order.

Does MySQL have something like this? And if not, perhaps you can
add it in the next version? It would really boost MySQL's support for
social networking apps tremendously. Alternative, how can I add this to
my MySQL? Any advice would be appreciated.

Sincerely,
Gregory Magarshak
Qbix

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 29.03.2011 20:10:53 von Peter Brawley

> How can I quickly find all the articles written by this user's
friends, and not just random articles?

Taking the simplest possible case, with table friends(userID,friendID)
where each friendID refers to a userID in another row, the friends of
userID u are ...

select friendID from user where userID=u;

so articles by those friends of u are ...

select a.* from article a join ( select friendID from user where
userID=u ) f on a.userID=f.friendID;

PB

-----

On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
> Hey there. My company writes a lot of social applications, and there
> is one operation that is very common, but I don't know if MySQL
> supports it in a good way. I thought I'd write to this list for two
> reasons:
>
> 1) Maybe MySQL has a good way to do this, and I just don't know
> about it
>
> 2) Propose to MySQL developers a simple algorithm which would
> greatly improve MySQL support for social networking apps.
>
> Here is the situation. Let's say I have built a social networking
> application where people create and edit some item (article, photo,
> music mix, whatever). Now, a typical user logs in, and this user has
> 3000 friends. How can I quickly find all the articles written by this
> user's friends, and not just random articles?
>
> Ideally, I would want to write something like this:
>
> SELECT * FROM article WHERE user_id IN (345789, 324875, 398, ...,
> 349580)
>
> basically, execute a query with a huge IN ( ... ). Maybe if this
> would exceed the buffer size for the MySQL wire protocol, I would
> break up the list into several lists, and execute several queries, and
> union the results together myself.
>
> But my point is, this is very common for social networking apps.
> Every app wants to show "the X created by your friends", or "friends
> of yours (given some list from a social network) who have taken action
> X".
>
> Here is how I would do it if I had raw access to the MySQL index
> in memory:
>
> a) Sort the list of entries in the IN, in ascending order.
>
> b) Do *ONE* binary search through the index (assuming it's a BTREE
> index) and get them all in one pass. If it's a HASH index or
> something, I would have to look up each one individually.
>
> The benefits of this approach would be that this common operation
> would be done extremely quickly. If the index fits entirely in memory,
> and I just want to get the primary keys (i.e. get the list of friends
> who did X), the disk isn't even touched. In addition, for BTREE
> indexes, I would just need ONE binary search, because the entries have
> been sorted in ascending order.
>
> Does MySQL have something like this? And if not, perhaps you can
> add it in the next version? It would really boost MySQL's support for
> social networking apps tremendously. Alternative, how can I add this
> to my MySQL? Any advice would be appreciated.
>
> Sincerely,
> Gregory Magarshak
> Qbix
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 29.03.2011 20:27:43 von Gregory Magarshak

Yes, this would be fine. But often, the list of friends is obtained from
a social network like facebook, and is not stored internally. Basically,
I obtain the friend list in a request to facebook, and then see which of
those users have created things. So would I have to create a temporary
table and insert all those uids just to make a join? Why not optimize
the IN ( ... ) to do the same type of thing?

There is also a second problem: I want to use MySQL Cluster, because I
expect to have many users. Would it be efficient to use JOIN between the
friends table and the articles table? Both tables are partitioned by
user_id as the primary key, so the join would have to hit many different
nodes. I always tried to avoid joins because I am planning to
horizontally partition my data. But if MySQL cluster can handle this
join transparently and split it up based on the partition, then that's
fine. Do you have any info on this?

Greg

On 3/29/11 2:10 PM, Peter Brawley wrote:
> > How can I quickly find all the articles written by this user's
> friends, and not just random articles?
>
> Taking the simplest possible case, with table friends(userID,friendID)
> where each friendID refers to a userID in another row, the friends of
> userID u are ...
>
> select friendID from user where userID=u;
>
> so articles by those friends of u are ...
>
> select a.* from article a join ( select friendID from user where
> userID=u ) f on a.userID=f.friendID;
>
> PB
>
> -----
>
> On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>> Hey there. My company writes a lot of social applications, and there
>> is one operation that is very common, but I don't know if MySQL
>> supports it in a good way. I thought I'd write to this list for two
>> reasons:
>>
>> 1) Maybe MySQL has a good way to do this, and I just don't know
>> about it
>>
>> 2) Propose to MySQL developers a simple algorithm which would
>> greatly improve MySQL support for social networking apps.
>>
>> Here is the situation. Let's say I have built a social networking
>> application where people create and edit some item (article, photo,
>> music mix, whatever). Now, a typical user logs in, and this user has
>> 3000 friends. How can I quickly find all the articles written by this
>> user's friends, and not just random articles?
>>
>> Ideally, I would want to write something like this:
>>
>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398, ...,
>> 349580)
>>
>> basically, execute a query with a huge IN ( ... ). Maybe if this
>> would exceed the buffer size for the MySQL wire protocol, I would
>> break up the list into several lists, and execute several queries,
>> and union the results together myself.
>>
>> But my point is, this is very common for social networking apps.
>> Every app wants to show "the X created by your friends", or "friends
>> of yours (given some list from a social network) who have taken
>> action X".
>>
>> Here is how I would do it if I had raw access to the MySQL index
>> in memory:
>>
>> a) Sort the list of entries in the IN, in ascending order.
>>
>> b) Do *ONE* binary search through the index (assuming it's a
>> BTREE index) and get them all in one pass. If it's a HASH index or
>> something, I would have to look up each one individually.
>>
>> The benefits of this approach would be that this common operation
>> would be done extremely quickly. If the index fits entirely in
>> memory, and I just want to get the primary keys (i.e. get the list of
>> friends who did X), the disk isn't even touched. In addition, for
>> BTREE indexes, I would just need ONE binary search, because the
>> entries have been sorted in ascending order.
>>
>> Does MySQL have something like this? And if not, perhaps you can
>> add it in the next version? It would really boost MySQL's support for
>> social networking apps tremendously. Alternative, how can I add this
>> to my MySQL? Any advice would be appreciated.
>>
>> Sincerely,
>> Gregory Magarshak
>> Qbix
>>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 29.03.2011 22:17:29 von Peter Brawley

> Why not optimize the IN ( ... ) to do the same type of thing?

If the argument to IN() is a list of values, it'll be OK. If it's a
SELECT, in 5.0 it will be slower than molasses (see "The unbearable
slowness of IN()" at http://www.artfulsoftware.com/queries.php.

> I always tried to avoid joins because I am planning to horizontally
partition my data.

A severe & unfortunate constraint. Can't help you there.

PB

-----

On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
> Yes, this would be fine. But often, the list of friends is obtained
> from a social network like facebook, and is not stored internally.
> Basically, I obtain the friend list in a request to facebook, and then
> see which of those users have created things. So would I have to
> create a temporary table and insert all those uids just to make a
> join? Why not optimize the IN ( ... ) to do the same type of thing?
>
> There is also a second problem: I want to use MySQL Cluster, because I
> expect to have many users. Would it be efficient to use JOIN between
> the friends table and the articles table? Both tables are partitioned
> by user_id as the primary key, so the join would have to hit many
> different nodes. I always tried to avoid joins because I am planning
> to horizontally partition my data. But if MySQL cluster can handle
> this join transparently and split it up based on the partition, then
> that's fine. Do you have any info on this?
>
> Greg
>
> On 3/29/11 2:10 PM, Peter Brawley wrote:
>> > How can I quickly find all the articles written by this user's
>> friends, and not just random articles?
>>
>> Taking the simplest possible case, with table
>> friends(userID,friendID) where each friendID refers to a userID in
>> another row, the friends of userID u are ...
>>
>> select friendID from user where userID=u;
>>
>> so articles by those friends of u are ...
>>
>> select a.* from article a join ( select friendID from user where
>> userID=u ) f on a.userID=f.friendID;
>>
>> PB
>>
>> -----
>>
>> On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>> Hey there. My company writes a lot of social applications, and there
>>> is one operation that is very common, but I don't know if MySQL
>>> supports it in a good way. I thought I'd write to this list for two
>>> reasons:
>>>
>>> 1) Maybe MySQL has a good way to do this, and I just don't know
>>> about it
>>>
>>> 2) Propose to MySQL developers a simple algorithm which would
>>> greatly improve MySQL support for social networking apps.
>>>
>>> Here is the situation. Let's say I have built a social
>>> networking application where people create and edit some item
>>> (article, photo, music mix, whatever). Now, a typical user logs in,
>>> and this user has 3000 friends. How can I quickly find all the
>>> articles written by this user's friends, and not just random articles?
>>>
>>> Ideally, I would want to write something like this:
>>>
>>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398,
>>> ..., 349580)
>>>
>>> basically, execute a query with a huge IN ( ... ). Maybe if this
>>> would exceed the buffer size for the MySQL wire protocol, I would
>>> break up the list into several lists, and execute several queries,
>>> and union the results together myself.
>>>
>>> But my point is, this is very common for social networking apps.
>>> Every app wants to show "the X created by your friends", or "friends
>>> of yours (given some list from a social network) who have taken
>>> action X".
>>>
>>> Here is how I would do it if I had raw access to the MySQL index
>>> in memory:
>>>
>>> a) Sort the list of entries in the IN, in ascending order.
>>>
>>> b) Do *ONE* binary search through the index (assuming it's a
>>> BTREE index) and get them all in one pass. If it's a HASH index or
>>> something, I would have to look up each one individually.
>>>
>>> The benefits of this approach would be that this common
>>> operation would be done extremely quickly. If the index fits
>>> entirely in memory, and I just want to get the primary keys (i.e.
>>> get the list of friends who did X), the disk isn't even touched. In
>>> addition, for BTREE indexes, I would just need ONE binary search,
>>> because the entries have been sorted in ascending order.
>>>
>>> Does MySQL have something like this? And if not, perhaps you can
>>> add it in the next version? It would really boost MySQL's support
>>> for social networking apps tremendously. Alternative, how can I add
>>> this to my MySQL? Any advice would be appreciated.
>>>
>>> Sincerely,
>>> Gregory Magarshak
>>> Qbix
>>>
>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 29.03.2011 22:53:16 von Sander de Bruijne

Hi Gregory,

Are you sure you'd like to do this using MySQL? What would happen if you
start using sharding?

Maybe you could consider using a stack (stored in a tool like Redis?).
Whenever some user adds some item, you add primary key of the new item
to the "network updates" stack of each friend of the user (and remove
the last one). This way, your reads will be fast and you don't need
complex joins over multiple shards. Just one of my first ideas which
came up.

Any thoughts?

Best regards,
Sander

On 03/29/2011 07:50 PM, Gregory Magarshak wrote:
> Hey there. My company writes a lot of social applications, and there
> is one operation that is very common, but I don't know if MySQL
> supports it in a good way. I thought I'd write to this list for two
> reasons:
>
> 1) Maybe MySQL has a good way to do this, and I just don't know
> about it
>
> 2) Propose to MySQL developers a simple algorithm which would
> greatly improve MySQL support for social networking apps.
>
> Here is the situation. Let's say I have built a social networking
> application where people create and edit some item (article, photo,
> music mix, whatever). Now, a typical user logs in, and this user has
> 3000 friends. How can I quickly find all the articles written by this
> user's friends, and not just random articles?
>
> Ideally, I would want to write something like this:
>
> SELECT * FROM article WHERE user_id IN (345789, 324875, 398, ...,
> 349580)
>
> basically, execute a query with a huge IN ( ... ). Maybe if this
> would exceed the buffer size for the MySQL wire protocol, I would
> break up the list into several lists, and execute several queries, and
> union the results together myself.
>
> But my point is, this is very common for social networking apps.
> Every app wants to show "the X created by your friends", or "friends
> of yours (given some list from a social network) who have taken action
> X".
>
> Here is how I would do it if I had raw access to the MySQL index
> in memory:
>
> a) Sort the list of entries in the IN, in ascending order.
>
> b) Do *ONE* binary search through the index (assuming it's a BTREE
> index) and get them all in one pass. If it's a HASH index or
> something, I would have to look up each one individually.
>
> The benefits of this approach would be that this common operation
> would be done extremely quickly. If the index fits entirely in memory,
> and I just want to get the primary keys (i.e. get the list of friends
> who did X), the disk isn't even touched. In addition, for BTREE
> indexes, I would just need ONE binary search, because the entries have
> been sorted in ascending order.
>
> Does MySQL have something like this? And if not, perhaps you can
> add it in the next version? It would really boost MySQL's support for
> social networking apps tremendously. Alternative, how can I add this
> to my MySQL? Any advice would be appreciated.
>
> Sincerely,
> Gregory Magarshak
> Qbix
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 31.03.2011 14:29:35 von Gregory Magarshak

Thanks for your insight! But I'm still worried about the performance of
IN ( big list of values ). Can you tell me how it is implemented?

Suppose I have SELECT a FROM b WHERE c IN (1, 4, 5, 117, 118, 119, ...,
387945)

1) If I put 200 values there, does it do 200 individual SELECTs
internally, and union them? Or does it notice that c has a "UNIQUE"
index and thus at most one row can be returned per SELECT, and does them
all at once?

2) If I want to get just the primary key, or join with another table
based on just the primary key, does this query ever touch the disk
(assuming the index is in memory, which I think it always is -- correct
me if I'm wrong about that).

The way I would recommend doing it (for BTREE indexes, anyway) is to
sort the values in ascending order, and do the search in one pass
through the index. The index is already in memory, and it would be
straightforward to modify a binary search algorithm to find the rows
corresponding to monotonically ascending values of the primary key, all
in one pass.

Even if the binary search algorithm is run 200 or 2000 times for a list,
it would still be faster than hitting the disk. (Even though the CPU
cache performance would be worse.)

Can you let me know the specifics of it, and especially how I can avoid
hitting the I/O bottlenecks?

Thank you,
Greg

On 3/29/11 4:17 PM, Peter Brawley wrote:
> > Why not optimize the IN ( ... ) to do the same type of thing?
>
> If the argument to IN() is a list of values, it'll be OK. If it's a
> SELECT, in 5.0 it will be slower than molasses (see "The unbearable
> slowness of IN()" at http://www.artfulsoftware.com/queries.php.
>
> > I always tried to avoid joins because I am planning to horizontally
> partition my data.
>
> A severe & unfortunate constraint. Can't help you there.
>
> PB
>
> -----
>
> On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
>> Yes, this would be fine. But often, the list of friends is obtained
>> from a social network like facebook, and is not stored internally.
>> Basically, I obtain the friend list in a request to facebook, and
>> then see which of those users have created things. So would I have to
>> create a temporary table and insert all those uids just to make a
>> join? Why not optimize the IN ( ... ) to do the same type of thing?
>>
>> There is also a second problem: I want to use MySQL Cluster, because
>> I expect to have many users. Would it be efficient to use JOIN
>> between the friends table and the articles table? Both tables are
>> partitioned by user_id as the primary key, so the join would have to
>> hit many different nodes. I always tried to avoid joins because I am
>> planning to horizontally partition my data. But if MySQL cluster can
>> handle this join transparently and split it up based on the
>> partition, then that's fine. Do you have any info on this?
>>
>> Greg
>>
>> On 3/29/11 2:10 PM, Peter Brawley wrote:
>>> > How can I quickly find all the articles written by this user's
>>> friends, and not just random articles?
>>>
>>> Taking the simplest possible case, with table
>>> friends(userID,friendID) where each friendID refers to a userID in
>>> another row, the friends of userID u are ...
>>>
>>> select friendID from user where userID=u;
>>>
>>> so articles by those friends of u are ...
>>>
>>> select a.* from article a join ( select friendID from user where
>>> userID=u ) f on a.userID=f.friendID;
>>>
>>> PB
>>>
>>> -----
>>>
>>> On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>>> Hey there. My company writes a lot of social applications, and
>>>> there is one operation that is very common, but I don't know if
>>>> MySQL supports it in a good way. I thought I'd write to this list
>>>> for two reasons:
>>>>
>>>> 1) Maybe MySQL has a good way to do this, and I just don't know
>>>> about it
>>>>
>>>> 2) Propose to MySQL developers a simple algorithm which would
>>>> greatly improve MySQL support for social networking apps.
>>>>
>>>> Here is the situation. Let's say I have built a social
>>>> networking application where people create and edit some item
>>>> (article, photo, music mix, whatever). Now, a typical user logs in,
>>>> and this user has 3000 friends. How can I quickly find all the
>>>> articles written by this user's friends, and not just random articles?
>>>>
>>>> Ideally, I would want to write something like this:
>>>>
>>>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398,
>>>> ..., 349580)
>>>>
>>>> basically, execute a query with a huge IN ( ... ). Maybe if
>>>> this would exceed the buffer size for the MySQL wire protocol, I
>>>> would break up the list into several lists, and execute several
>>>> queries, and union the results together myself.
>>>>
>>>> But my point is, this is very common for social networking
>>>> apps. Every app wants to show "the X created by your friends", or
>>>> "friends of yours (given some list from a social network) who have
>>>> taken action X".
>>>>
>>>> Here is how I would do it if I had raw access to the MySQL
>>>> index in memory:
>>>>
>>>> a) Sort the list of entries in the IN, in ascending order.
>>>>
>>>> b) Do *ONE* binary search through the index (assuming it's a
>>>> BTREE index) and get them all in one pass. If it's a HASH index or
>>>> something, I would have to look up each one individually.
>>>>
>>>> The benefits of this approach would be that this common
>>>> operation would be done extremely quickly. If the index fits
>>>> entirely in memory, and I just want to get the primary keys (i.e.
>>>> get the list of friends who did X), the disk isn't even touched. In
>>>> addition, for BTREE indexes, I would just need ONE binary search,
>>>> because the entries have been sorted in ascending order.
>>>>
>>>> Does MySQL have something like this? And if not, perhaps you
>>>> can add it in the next version? It would really boost MySQL's
>>>> support for social networking apps tremendously. Alternative, how
>>>> can I add this to my MySQL? Any advice would be appreciated.
>>>>
>>>> Sincerely,
>>>> Gregory Magarshak
>>>> Qbix
>>>>
>>
>>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 31.03.2011 14:32:09 von Gregory Magarshak

By the way, sorry ... I wanted to clarify one thing:

I am trying to FILTER by the unique index (fb_uid) in this case, but
JOIN on the primary key (id) and I don't need to use any other fields. I
am guessing that the MySQL indexes map indexed fields (fb_uid) to the
primary key (id) so I wouldn't have to touch the disk. Am I right about
that?

On 3/31/11 8:29 AM, Gregory Magarshak wrote:
> Thanks for your insight! But I'm still worried about the performance
> of IN ( big list of values ). Can you tell me how it is implemented?
>
> Suppose I have SELECT a FROM b WHERE c IN (1, 4, 5, 117, 118, 119,
> ..., 387945)
>
> 1) If I put 200 values there, does it do 200 individual SELECTs
> internally, and union them? Or does it notice that c has a "UNIQUE"
> index and thus at most one row can be returned per SELECT, and does
> them all at once?
>
> 2) If I want to get just the primary key, or join with another table
> based on just the primary key, does this query ever touch the disk
> (assuming the index is in memory, which I think it always is --
> correct me if I'm wrong about that).
>
> The way I would recommend doing it (for BTREE indexes, anyway) is to
> sort the values in ascending order, and do the search in one pass
> through the index. The index is already in memory, and it would be
> straightforward to modify a binary search algorithm to find the rows
> corresponding to monotonically ascending values of the primary key,
> all in one pass.
>
> Even if the binary search algorithm is run 200 or 2000 times for a
> list, it would still be faster than hitting the disk. (Even though the
> CPU cache performance would be worse.)
>
> Can you let me know the specifics of it, and especially how I can
> avoid hitting the I/O bottlenecks?
>
> Thank you,
> Greg
>
> On 3/29/11 4:17 PM, Peter Brawley wrote:
>> > Why not optimize the IN ( ... ) to do the same type of thing?
>>
>> If the argument to IN() is a list of values, it'll be OK. If it's a
>> SELECT, in 5.0 it will be slower than molasses (see "The unbearable
>> slowness of IN()" at http://www.artfulsoftware.com/queries.php.
>>
>> > I always tried to avoid joins because I am planning to horizontally
>> partition my data.
>>
>> A severe & unfortunate constraint. Can't help you there.
>>
>> PB
>>
>> -----
>>
>> On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
>>> Yes, this would be fine. But often, the list of friends is obtained
>>> from a social network like facebook, and is not stored internally.
>>> Basically, I obtain the friend list in a request to facebook, and
>>> then see which of those users have created things. So would I have
>>> to create a temporary table and insert all those uids just to make a
>>> join? Why not optimize the IN ( ... ) to do the same type of thing?
>>>
>>> There is also a second problem: I want to use MySQL Cluster, because
>>> I expect to have many users. Would it be efficient to use JOIN
>>> between the friends table and the articles table? Both tables are
>>> partitioned by user_id as the primary key, so the join would have to
>>> hit many different nodes. I always tried to avoid joins because I am
>>> planning to horizontally partition my data. But if MySQL cluster can
>>> handle this join transparently and split it up based on the
>>> partition, then that's fine. Do you have any info on this?
>>>
>>> Greg
>>>
>>> On 3/29/11 2:10 PM, Peter Brawley wrote:
>>>> > How can I quickly find all the articles written by this user's
>>>> friends, and not just random articles?
>>>>
>>>> Taking the simplest possible case, with table
>>>> friends(userID,friendID) where each friendID refers to a userID in
>>>> another row, the friends of userID u are ...
>>>>
>>>> select friendID from user where userID=u;
>>>>
>>>> so articles by those friends of u are ...
>>>>
>>>> select a.* from article a join ( select friendID from user where
>>>> userID=u ) f on a.userID=f.friendID;
>>>>
>>>> PB
>>>>
>>>> -----
>>>>
>>>> On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>>>> Hey there. My company writes a lot of social applications, and
>>>>> there is one operation that is very common, but I don't know if
>>>>> MySQL supports it in a good way. I thought I'd write to this list
>>>>> for two reasons:
>>>>>
>>>>> 1) Maybe MySQL has a good way to do this, and I just don't
>>>>> know about it
>>>>>
>>>>> 2) Propose to MySQL developers a simple algorithm which would
>>>>> greatly improve MySQL support for social networking apps.
>>>>>
>>>>> Here is the situation. Let's say I have built a social
>>>>> networking application where people create and edit some item
>>>>> (article, photo, music mix, whatever). Now, a typical user logs
>>>>> in, and this user has 3000 friends. How can I quickly find all the
>>>>> articles written by this user's friends, and not just random
>>>>> articles?
>>>>>
>>>>> Ideally, I would want to write something like this:
>>>>>
>>>>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398,
>>>>> ..., 349580)
>>>>>
>>>>> basically, execute a query with a huge IN ( ... ). Maybe if
>>>>> this would exceed the buffer size for the MySQL wire protocol, I
>>>>> would break up the list into several lists, and execute several
>>>>> queries, and union the results together myself.
>>>>>
>>>>> But my point is, this is very common for social networking
>>>>> apps. Every app wants to show "the X created by your friends", or
>>>>> "friends of yours (given some list from a social network) who have
>>>>> taken action X".
>>>>>
>>>>> Here is how I would do it if I had raw access to the MySQL
>>>>> index in memory:
>>>>>
>>>>> a) Sort the list of entries in the IN, in ascending order.
>>>>>
>>>>> b) Do *ONE* binary search through the index (assuming it's a
>>>>> BTREE index) and get them all in one pass. If it's a HASH index or
>>>>> something, I would have to look up each one individually.
>>>>>
>>>>> The benefits of this approach would be that this common
>>>>> operation would be done extremely quickly. If the index fits
>>>>> entirely in memory, and I just want to get the primary keys (i.e.
>>>>> get the list of friends who did X), the disk isn't even touched.
>>>>> In addition, for BTREE indexes, I would just need ONE binary
>>>>> search, because the entries have been sorted in ascending order.
>>>>>
>>>>> Does MySQL have something like this? And if not, perhaps you
>>>>> can add it in the next version? It would really boost MySQL's
>>>>> support for social networking apps tremendously. Alternative, how
>>>>> can I add this to my MySQL? Any advice would be appreciated.
>>>>>
>>>>> Sincerely,
>>>>> Gregory Magarshak
>>>>> Qbix
>>>>>
>>>
>>>
>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 31.03.2011 18:06:42 von Johan De Meersman

----- Original Message -----
> From: "Gregory Magarshak"
> I am guessing that the MySQL indexes map indexed fields (fb_uid) to the
> primary key (id) so I wouldn't have to touch the disk. Am I right
> about that?

Correct for InnoDB, but MyISAM maps every index straight onto records. That's why secondary indices are slightly more performant on MyISAM tables - InnoDB needs to do the extra primary key lookup - which it of course is bloody good at :-)

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 31.03.2011 18:20:24 von mos

At 07:29 AM 3/31/2011, you wrote:
>Thanks for your insight! But I'm still worried about the performance of IN
>( big list of values ). Can you tell me how it is implemented?
>
>Suppose I have SELECT a FROM b WHERE c IN (1, 4, 5, 117, 118, 119, ...,
>387945)
>
>1) If I put 200 values there, does it do 200 individual SELECTs
>internally, and union them?

No. It uses one Select statement.

>Or does it notice that c has a "UNIQUE" index and thus at most one row can
>be returned per SELECT, and does them all at once?

The IN() clause is very inefficient because MySQL will NOT use the index.
It will have to traverse the entire table looking for these values. That is
why a table join will be much faster than using IN().

>2) If I want to get just the primary key, or join with another table based
>on just the primary key, does this query ever touch the disk (assuming the
>index is in memory, which I think it always is -- correct me if I'm wrong
>about that).

It will get the information from the index and not have to access the
record data from disk. If the index is stored in memory, then it won't have
to go to disk (unless you also have a sort). That is why the query cache is
so important.

>The way I would recommend doing it (for BTREE indexes, anyway) is to sort
>the values in ascending order, and do the search in one pass through the
>index. The index is already in memory, and it would be straightforward to
>modify a binary search algorithm to find the rows corresponding to
>monotonically ascending values of the primary key, all in one pass.
>
>Even if the binary search algorithm is run 200 or 2000 times for a list,
>it would still be faster than hitting the disk. (Even though the CPU cache
>performance would be worse.)
>
>Can you let me know the specifics of it, and especially how I can avoid
>hitting the I/O bottlenecks?

Use a table join and make sure you have the indexes loaded into memory. See
http://dev.mysql.com/doc/refman/5.0/en/load-index.html. If using InnoDb
then its index cache scheme is quite good.

Mike

>Thank you,
>Greg
>
>On 3/29/11 4:17 PM, Peter Brawley wrote:
>> > Why not optimize the IN ( ... ) to do the same type of thing?
>>
>>If the argument to IN() is a list of values, it'll be OK. If it's a
>>SELECT, in 5.0 it will be slower than molasses (see "The unbearable
>>slowness of IN()" at http://www.artfulsoftware.com/queries.php.
>>
>> > I always tried to avoid joins because I am planning to horizontally
>> partition my data.
>>
>>A severe & unfortunate constraint. Can't help you there.
>>
>>PB
>>
>>-----
>>
>>On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
>>>Yes, this would be fine. But often, the list of friends is obtained from
>>>a social network like facebook, and is not stored internally. Basically,
>>>I obtain the friend list in a request to facebook, and then see which of
>>>those users have created things. So would I have to create a temporary
>>>table and insert all those uids just to make a join? Why not optimize
>>>the IN ( ... ) to do the same type of thing?
>>>
>>>There is also a second problem: I want to use MySQL Cluster, because I
>>>expect to have many users. Would it be efficient to use JOIN between the
>>>friends table and the articles table? Both tables are partitioned by
>>>user_id as the primary key, so the join would have to hit many different
>>>nodes. I always tried to avoid joins because I am planning to
>>>horizontally partition my data. But if MySQL cluster can handle this
>>>join transparently and split it up based on the partition, then that's
>>>fine. Do you have any info on this?
>>>
>>>Greg
>>>
>>>On 3/29/11 2:10 PM, Peter Brawley wrote:
>>>> > How can I quickly find all the articles written by this user's
>>>> friends, and not just random articles?
>>>>
>>>>Taking the simplest possible case, with table friends(userID,friendID)
>>>>where each friendID refers to a userID in another row, the friends of
>>>>userID u are ...
>>>>
>>>>select friendID from user where userID=u;
>>>>
>>>>so articles by those friends of u are ...
>>>>
>>>>select a.* from article a join ( select friendID from user where
>>>>userID=u ) f on a.userID=f.friendID;
>>>>
>>>>PB
>>>>
>>>>-----
>>>>
>>>>On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>>>>Hey there. My company writes a lot of social applications, and there
>>>>>is one operation that is very common, but I don't know if MySQL
>>>>>supports it in a good way. I thought I'd write to this list for two reasons:
>>>>>
>>>>> 1) Maybe MySQL has a good way to do this, and I just don't know
>>>>> about it
>>>>>
>>>>> 2) Propose to MySQL developers a simple algorithm which would
>>>>> greatly improve MySQL support for social networking apps.
>>>>>
>>>>> Here is the situation. Let's say I have built a social networking
>>>>> application where people create and edit some item (article, photo,
>>>>> music mix, whatever). Now, a typical user logs in, and this user has
>>>>> 3000 friends. How can I quickly find all the articles written by this
>>>>> user's friends, and not just random articles?
>>>>>
>>>>> Ideally, I would want to write something like this:
>>>>>
>>>>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398, ...,
>>>>> 349580)
>>>>>
>>>>> basically, execute a query with a huge IN ( ... ). Maybe if this
>>>>> would exceed the buffer size for the MySQL wire protocol, I would
>>>>> break up the list into several lists, and execute several queries,
>>>>> and union the results together myself.
>>>>>
>>>>> But my point is, this is very common for social networking apps.
>>>>> Every app wants to show "the X created by your friends", or "friends
>>>>> of yours (given some list from a social network) who have taken action X".
>>>>>
>>>>> Here is how I would do it if I had raw access to the MySQL index
>>>>> in memory:
>>>>>
>>>>> a) Sort the list of entries in the IN, in ascending order.
>>>>>
>>>>> b) Do *ONE* binary search through the index (assuming it's a
>>>>> BTREE index) and get them all in one pass. If it's a HASH index or
>>>>> something, I would have to look up each one individually.
>>>>>
>>>>> The benefits of this approach would be that this common operation
>>>>> would be done extremely quickly. If the index fits entirely in
>>>>> memory, and I just want to get the primary keys (i.e. get the list of
>>>>> friends who did X), the disk isn't even touched. In addition, for
>>>>> BTREE indexes, I would just need ONE binary search, because the
>>>>> entries have been sorted in ascending order.
>>>>>
>>>>> Does MySQL have something like this? And if not, perhaps you can
>>>>> add it in the next version? It would really boost MySQL's support for
>>>>> social networking apps tremendously. Alternative, how can I add this
>>>>> to my MySQL? Any advice would be appreciated.
>>>>>
>>>>>Sincerely,
>>>>>Gregory Magarshak
>>>>>Qbix
>>>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 31.03.2011 20:20:18 von Johan De Meersman

----- Original Message -----
> From: "mos"
>
> The IN() clause is very inefficient because MySQL will NOT use the
> index.
> It will have to traverse the entire table looking for these values.

Has that still not been remedied ?

> It will get the information from the index and not have to access the
> record data from disk. If the index is stored in memory, then it
> won't have to go to disk (unless you also have a sort). That is why the query
> cache is so important.

Indices don't go in the query cache, strictly speaking, they go in the MyISAM index cache or in the InnoDB pool.

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 31.03.2011 21:33:04 von mos

At 11:20 AM 3/31/2011, you wrote:
>At 07:29 AM 3/31/2011, you wrote:
>>Thanks for your insight! But I'm still worried about the performance of
>>IN ( big list of values ). Can you tell me how it is implemented?
>>
>>Suppose I have SELECT a FROM b WHERE c IN (1, 4, 5, 117, 118, 119, ...,
>>387945)
>>
>>1) If I put 200 values there, does it do 200 individual SELECTs
>>internally, and union them?
>
>No. It uses one Select statement.
>
>>Or does it notice that c has a "UNIQUE" index and thus at most one row
>>can be returned per SELECT, and does them all at once?
>
>The IN() clause is very inefficient because MySQL will NOT use the index.
>It will have to traverse the entire table looking for these values. That
>is why a table join will be much faster than using IN().

Oops. Sorry, the In() clause in MySQL 5.5 does use the index (not sure when
they implemented that). Even so, I still find it slow when the In() clause
has a lot of elements.
I should have used an Explain in front of my Select statement to see if the
index is used before I posted.

Mike

>>2) If I want to get just the primary key, or join with another table
>>based on just the primary key, does this query ever touch the disk
>>(assuming the index is in memory, which I think it always is -- correct
>>me if I'm wrong about that).
>
>It will get the information from the index and not have to access the
>record data from disk. If the index is stored in memory, then it won't
>have to go to disk (unless you also have a sort). That is why the query
>cache is so important.
>
>
>>The way I would recommend doing it (for BTREE indexes, anyway) is to sort
>>the values in ascending order, and do the search in one pass through the
>>index. The index is already in memory, and it would be straightforward to
>>modify a binary search algorithm to find the rows corresponding to
>>monotonically ascending values of the primary key, all in one pass.
>>
>>Even if the binary search algorithm is run 200 or 2000 times for a list,
>>it would still be faster than hitting the disk. (Even though the CPU
>>cache performance would be worse.)
>>
>>Can you let me know the specifics of it, and especially how I can avoid
>>hitting the I/O bottlenecks?
>
>Use a table join and make sure you have the indexes loaded into memory.
>See http://dev.mysql.com/doc/refman/5.0/en/load-index.html. If using
>InnoDb then its index cache scheme is quite good.
>
>
>Mike
>
>
>>Thank you,
>>Greg
>>
>>On 3/29/11 4:17 PM, Peter Brawley wrote:
>>> > Why not optimize the IN ( ... ) to do the same type of thing?
>>>
>>>If the argument to IN() is a list of values, it'll be OK. If it's a
>>>SELECT, in 5.0 it will be slower than molasses (see "The unbearable
>>>slowness of IN()" at http://www.artfulsoftware.com/queries.php.
>>>
>>> > I always tried to avoid joins because I am planning to horizontally
>>> partition my data.
>>>
>>>A severe & unfortunate constraint. Can't help you there.
>>>
>>>PB
>>>
>>>-----
>>>
>>>On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
>>>>Yes, this would be fine. But often, the list of friends is obtained
>>>>from a social network like facebook, and is not stored internally.
>>>>Basically, I obtain the friend list in a request to facebook, and then
>>>>see which of those users have created things. So would I have to create
>>>>a temporary table and insert all those uids just to make a join? Why
>>>>not optimize the IN ( ... ) to do the same type of thing?
>>>>
>>>>There is also a second problem: I want to use MySQL Cluster, because I
>>>>expect to have many users. Would it be efficient to use JOIN between
>>>>the friends table and the articles table? Both tables are partitioned
>>>>by user_id as the primary key, so the join would have to hit many
>>>>different nodes. I always tried to avoid joins because I am planning to
>>>>horizontally partition my data. But if MySQL cluster can handle this
>>>>join transparently and split it up based on the partition, then that's
>>>>fine. Do you have any info on this?
>>>>
>>>>Greg
>>>>
>>>>On 3/29/11 2:10 PM, Peter Brawley wrote:
>>>>> > How can I quickly find all the articles written by this user's
>>>>> friends, and not just random articles?
>>>>>
>>>>>Taking the simplest possible case, with table friends(userID,friendID)
>>>>>where each friendID refers to a userID in another row, the friends of
>>>>>userID u are ...
>>>>>
>>>>>select friendID from user where userID=u;
>>>>>
>>>>>so articles by those friends of u are ...
>>>>>
>>>>>select a.* from article a join ( select friendID from user where
>>>>>userID=u ) f on a.userID=f.friendID;
>>>>>
>>>>>PB
>>>>>
>>>>>-----
>>>>>
>>>>>On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>>>>>Hey there. My company writes a lot of social applications, and there
>>>>>>is one operation that is very common, but I don't know if MySQL
>>>>>>supports it in a good way. I thought I'd write to this list for two reasons:
>>>>>>
>>>>>> 1) Maybe MySQL has a good way to do this, and I just don't know
>>>>>> about it
>>>>>>
>>>>>> 2) Propose to MySQL developers a simple algorithm which would
>>>>>> greatly improve MySQL support for social networking apps.
>>>>>>
>>>>>> Here is the situation. Let's say I have built a social
>>>>>> networking application where people create and edit some item
>>>>>> (article, photo, music mix, whatever). Now, a typical user logs in,
>>>>>> and this user has 3000 friends. How can I quickly find all the
>>>>>> articles written by this user's friends, and not just random articles?
>>>>>>
>>>>>> Ideally, I would want to write something like this:
>>>>>>
>>>>>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398,
>>>>>> ..., 349580)
>>>>>>
>>>>>> basically, execute a query with a huge IN ( ... ). Maybe if this
>>>>>> would exceed the buffer size for the MySQL wire protocol, I would
>>>>>> break up the list into several lists, and execute several queries,
>>>>>> and union the results together myself.
>>>>>>
>>>>>> But my point is, this is very common for social networking apps.
>>>>>> Every app wants to show "the X created by your friends", or "friends
>>>>>> of yours (given some list from a social network) who have taken action X".
>>>>>>
>>>>>> Here is how I would do it if I had raw access to the MySQL index
>>>>>> in memory:
>>>>>>
>>>>>> a) Sort the list of entries in the IN, in ascending order.
>>>>>>
>>>>>> b) Do *ONE* binary search through the index (assuming it's a
>>>>>> BTREE index) and get them all in one pass. If it's a HASH index or
>>>>>> something, I would have to look up each one individually.
>>>>>>
>>>>>> The benefits of this approach would be that this common
>>>>>> operation would be done extremely quickly. If the index fits
>>>>>> entirely in memory, and I just want to get the primary keys (i.e.
>>>>>> get the list of friends who did X), the disk isn't even touched. In
>>>>>> addition, for BTREE indexes, I would just need ONE binary search,
>>>>>> because the entries have been sorted in ascending order.
>>>>>>
>>>>>> Does MySQL have something like this? And if not, perhaps you can
>>>>>> add it in the next version? It would really boost MySQL's support
>>>>>> for social networking apps tremendously. Alternative, how can I add
>>>>>> this to my MySQL? Any advice would be appreciated.
>>>>>>
>>>>>>Sincerely,
>>>>>>Gregory Magarshak
>>>>>>Qbix
>>
>>
>>--
>>MySQL General Mailing List
>>For list archives: http://lists.mysql.com/mysql
>>To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=mos99@fastmail.fm

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: A common request

am 31.03.2011 22:16:29 von mussatto

On Thu, March 31, 2011 12:33, mos wrote:
> At 11:20 AM 3/31/2011, you wrote:
>>At 07:29 AM 3/31/2011, you wrote:
>>>Thanks for your insight! But I'm still worried about the performance of
>>>IN ( big list of values ). Can you tell me how it is implemented?
>>>
>>>Suppose I have SELECT a FROM b WHERE c IN (1, 4, 5, 117, 118, 119, ...,
>>>387945)
>>>
>>>1) If I put 200 values there, does it do 200 individual SELECTs
>>>internally, and union them?
>>
>>No. It uses one Select statement.
>>
>>>Or does it notice that c has a "UNIQUE" index and thus at most one row
>>>can be returned per SELECT, and does them all at once?
>>
>>The IN() clause is very inefficient because MySQL will NOT use the index.
>>It will have to traverse the entire table looking for these values. That
>>is why a table join will be much faster than using IN().
>
> Oops. Sorry, the In() clause in MySQL 5.5 does use the index (not sure
> when
> they implemented that). Even so, I still find it slow when the In()
> clause
> has a lot of elements.
> I should have used an Explain in front of my Select statement to see if
> the
> index is used before I posted.
Its in 5.0 series. When you made your original statement I went back and
checked since I uses it in place of multiple "OR"s
>
> Mike
>
>>>2) If I want to get just the primary key, or join with another table
>>>based on just the primary key, does this query ever touch the disk
>>>(assuming the index is in memory, which I think it always is -- correct
>>>me if I'm wrong about that).
>>
>>It will get the information from the index and not have to access the
>>record data from disk. If the index is stored in memory, then it won't
>>have to go to disk (unless you also have a sort). That is why the query
>>cache is so important.
>>
>>
>>>The way I would recommend doing it (for BTREE indexes, anyway) is to
>>> sort
>>>the values in ascending order, and do the search in one pass through the
>>>index. The index is already in memory, and it would be straightforward
>>> to
>>>modify a binary search algorithm to find the rows corresponding to
>>>monotonically ascending values of the primary key, all in one pass.
>>>
>>>Even if the binary search algorithm is run 200 or 2000 times for a list,
>>>it would still be faster than hitting the disk. (Even though the CPU
>>>cache performance would be worse.)
>>>
>>>Can you let me know the specifics of it, and especially how I can avoid
>>>hitting the I/O bottlenecks?
>>
>>Use a table join and make sure you have the indexes loaded into memory.
>>See http://dev.mysql.com/doc/refman/5.0/en/load-index.html. If using
>>InnoDb then its index cache scheme is quite good.
>>
>>
>>Mike
>>
>>
>>>Thank you,
>>>Greg
>>>
>>>On 3/29/11 4:17 PM, Peter Brawley wrote:
>>>> > Why not optimize the IN ( ... ) to do the same type of thing?
>>>>
>>>>If the argument to IN() is a list of values, it'll be OK. If it's a
>>>>SELECT, in 5.0 it will be slower than molasses (see "The unbearable
>>>>slowness of IN()" at http://www.artfulsoftware.com/queries.php.
>>>>
>>>> > I always tried to avoid joins because I am planning to horizontally
>>>> partition my data.
>>>>
>>>>A severe & unfortunate constraint. Can't help you there.
>>>>
>>>>PB
>>>>
>>>>-----
>>>>
>>>>On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
>>>>>Yes, this would be fine. But often, the list of friends is obtained
>>>>>from a social network like facebook, and is not stored internally.
>>>>>Basically, I obtain the friend list in a request to facebook, and then
>>>>>see which of those users have created things. So would I have to
>>>>> create
>>>>>a temporary table and insert all those uids just to make a join? Why
>>>>>not optimize the IN ( ... ) to do the same type of thing?
>>>>>
>>>>>There is also a second problem: I want to use MySQL Cluster, because I
>>>>>expect to have many users. Would it be efficient to use JOIN between
>>>>>the friends table and the articles table? Both tables are partitioned
>>>>>by user_id as the primary key, so the join would have to hit many
>>>>>different nodes. I always tried to avoid joins because I am planning
>>>>> to
>>>>>horizontally partition my data. But if MySQL cluster can handle this
>>>>>join transparently and split it up based on the partition, then that's
>>>>>fine. Do you have any info on this?
>>>>>
>>>>>Greg
>>>>>
>>>>>On 3/29/11 2:10 PM, Peter Brawley wrote:
>>>>>> > How can I quickly find all the articles written by this user's
>>>>>> friends, and not just random articles?
>>>>>>
>>>>>>Taking the simplest possible case, with table
>>>>>> friends(userID,friendID)
>>>>>>where each friendID refers to a userID in another row, the friends of
>>>>>>userID u are ...
>>>>>>
>>>>>>select friendID from user where userID=u;
>>>>>>
>>>>>>so articles by those friends of u are ...
>>>>>>
>>>>>>select a.* from article a join ( select friendID from user where
>>>>>>userID=u ) f on a.userID=f.friendID;
>>>>>>
>>>>>>PB
>>>>>>
>>>>>>-----
>>>>>>
>>>>>>On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>>>>>>Hey there. My company writes a lot of social applications, and there
>>>>>>>is one operation that is very common, but I don't know if MySQL
>>>>>>>supports it in a good way. I thought I'd write to this list for two
>>>>>>> reasons:
>>>>>>>
>>>>>>> 1) Maybe MySQL has a good way to do this, and I just don't know
>>>>>>> about it
>>>>>>>
>>>>>>> 2) Propose to MySQL developers a simple algorithm which would
>>>>>>> greatly improve MySQL support for social networking apps.
>>>>>>>
>>>>>>> Here is the situation. Let's say I have built a social
>>>>>>> networking application where people create and edit some item
>>>>>>> (article, photo, music mix, whatever). Now, a typical user logs in,
>>>>>>> and this user has 3000 friends. How can I quickly find all the
>>>>>>> articles written by this user's friends, and not just random
>>>>>>> articles?
>>>>>>>
>>>>>>> Ideally, I would want to write something like this:
>>>>>>>
>>>>>>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398,
>>>>>>> ..., 349580)
>>>>>>>
>>>>>>> basically, execute a query with a huge IN ( ... ). Maybe if
>>>>>>> this
>>>>>>> would exceed the buffer size for the MySQL wire protocol, I would
>>>>>>> break up the list into several lists, and execute several queries,
>>>>>>> and union the results together myself.
>>>>>>>
>>>>>>> But my point is, this is very common for social networking
>>>>>>> apps.
>>>>>>> Every app wants to show "the X created by your friends", or
>>>>>>> "friends
>>>>>>> of yours (given some list from a social network) who have taken
>>>>>>> action X".
>>>>>>>
>>>>>>> Here is how I would do it if I had raw access to the MySQL
>>>>>>> index
>>>>>>> in memory:
>>>>>>>
>>>>>>> a) Sort the list of entries in the IN, in ascending order.
>>>>>>>
>>>>>>> b) Do *ONE* binary search through the index (assuming it's a
>>>>>>> BTREE index) and get them all in one pass. If it's a HASH index or
>>>>>>> something, I would have to look up each one individually.
>>>>>>>
>>>>>>> The benefits of this approach would be that this common
>>>>>>> operation would be done extremely quickly. If the index fits
>>>>>>> entirely in memory, and I just want to get the primary keys (i.e.
>>>>>>> get the list of friends who did X), the disk isn't even touched. In
>>>>>>> addition, for BTREE indexes, I would just need ONE binary search,
>>>>>>> because the entries have been sorted in ascending order.
>>>>>>>
>>>>>>> Does MySQL have something like this? And if not, perhaps you
>>>>>>> can
>>>>>>> add it in the next version? It would really boost MySQL's support
>>>>>>> for social networking apps tremendously. Alternative, how can I add
>>>>>>> this to my MySQL? Any advice would be appreciated.
>>>>>>>
>>>>>>>Sincerely,
>>>>>>>Gregory Magarshak
>>>>>>>Qbix
------
William R. Mussatto
Systems Engineer
http://www.csz.com
909-920-9154

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org