Slow query on MySQL4 server doing simple inner join of two InnoDB tables

am 21.01.2011 20:21:44 von Kendall Gifford

--00221532cba41efd41049a602944
Content-Type: text/plain; charset=ISO-8859-1

Hello everyone, I've got a database on an old Fedora Core 4 server running
MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has
just two (InnoDB) tables:

messages (approx 2.5 million records)
recipients (approx 6.5 million records)

These track information about email messages. Each message "has many"
recipient records. The structure of the two tables (omitting irrelevant data
fields) are as follows:

+-------------+------------------+------+-----+------------- --------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+-------------+------------------+------+-----+------------- --------+----------------+
| id | int(10) unsigned | | PRI | NULL |
auto_increment |
| sent_at | datetime | | MUL | 0000-00-00 00:00:00
| |
| ......................... OTHER FIELDS OMITTED FOR BREVITY
........................ |
+-------------+------------------+------+-----+------------- --------+----------------+

+-------------+------------------+------+-----+------------- --------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+-------------+------------------+------+-----+------------- --------+----------------+
| id | int(10) unsigned | | PRI | NULL |
auto_increment |
| message_id | int(10) unsigned | | MUL | 0
| |
| employee_id | int(10) unsigned | YES | MUL | NULL
| |
| ......................... OTHER FIELDS OMITTED FOR BREVITY
........................ |
+-------------+------------------+------+-----+------------- --------+----------------+

I have the following query that is just too slow:

> SELECT messages.* FROM messages
> INNER JOIN recipients ON recipients.message_id = messages.id
> WHERE recipients.employee_id = X
> GROUP BY messages.id
> ORDER BY sent_at DESC
> LIMIT 0, 25;

This takes about 44 seconds on average. The query explanation is as follows:

+----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
| id | select_type | table | type | possible_keys |
key | key_len | ref | rows |
Extra |
+----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
| 1 | SIMPLE | recipients | ref | messages_fk,employee_idx |
employee_idx | 5 | const | 222640 |
Using where; Using temporary; Using filesort |
| 1 | SIMPLE | messages | eq_ref | PRIMARY |
PRIMARY | 4 | email_archive.recipients.message_id | 1
| |
+----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+

I've been doing some searching on the web and have no idea if/how this can
be sped up. Most searches these days reference MySQL 5.x which I'm just not
sure how much applies. I'm hoping that there is something obvious that I'm
missing, or that one of you experts knows what I might be able to change to
speed this query up.

Anyhow, thanks in advance for even so much as reading my message, let alone
replying :).

--
Kendall Gifford
zettabyte@gmail.com

--00221532cba41efd41049a602944--

Re: Slow query on MySQL4 server doing simple inner join of two InnoDBtables

am 21.01.2011 20:56:20 von Reindl Harald

--------------enigFD6F20C193A1B61D403C9D5C
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

you need hughe ram / innodb_buffer_pool for large datasets
in a perfect world the buffer_pool is as large as the data

how looks your current config?
how much RAM has the machine?

Am 21.01.2011 20:21, schrieb Kendall Gifford:
> Hello everyone, I've got a database on an old Fedora Core 4 server runn=
ing
> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question =
has
> just two (InnoDB) tables:
>=20
> messages (approx 2.5 million records)
> recipients (approx 6.5 million records)
>=20
> These track information about email messages. Each message "has many"
> recipient records. The structure of the two tables (omitting irrelevant=
data
> fields) are as follows:
>=20
> +-------------+------------------+------+-----+------------- --------+--=
--------------+
> | Field | Type | Null | Key | Default |
> Extra |
> +-------------+------------------+------+-----+------------- --------+--=
--------------+
> | id | int(10) unsigned | | PRI | NULL |
> auto_increment |
> | sent_at | datetime | | MUL | 0000-00-00 00:00:00
> | |
> | ......................... OTHER FIELDS OMITTED FOR BREVITY
> ....................... |
> +-------------+------------------+------+-----+------------- --------+--=
--------------+
>=20
> +-------------+------------------+------+-----+------------- --------+--=
--------------+
> | Field | Type | Null | Key | Default |
> Extra |
> +-------------+------------------+------+-----+------------- --------+--=
--------------+
> | id | int(10) unsigned | | PRI | NULL |
> auto_increment |
> | message_id | int(10) unsigned | | MUL | 0
> | |
> | employee_id | int(10) unsigned | YES | MUL | NULL
> | |
> | ......................... OTHER FIELDS OMITTED FOR BREVITY
> ....................... |
> +-------------+------------------+------+-----+------------- --------+--=
--------------+
>=20
> I have the following query that is just too slow:
>=20
>> SELECT messages.* FROM messages
>> INNER JOIN recipients ON recipients.message_id =3D messages.id
>> WHERE recipients.employee_id =3D X
>> GROUP BY messages.id
>> ORDER BY sent_at DESC
>> LIMIT 0, 25;
>=20
> This takes about 44 seconds on average. The query explanation is as fol=
lows:
>=20
> +----+-------------+------------+--------+------------------ --------+--=
------------+---------+------------------------------------- +--------+---=
-------------------------------------------+
> | id | select_type | table | type | possible_keys |
> key | key_len | ref | rows |=

> Extra |
> +----+-------------+------------+--------+------------------ --------+--=
------------+---------+------------------------------------- +--------+---=
-------------------------------------------+
> | 1 | SIMPLE | recipients | ref | messages_fk,employee_idx |
> employee_idx | 5 | const | 222640 |=

> Using where; Using temporary; Using filesort |
> | 1 | SIMPLE | messages | eq_ref | PRIMARY |
> PRIMARY | 4 | email_archive.recipients.message_id | 1
> | |
> +----+-------------+------------+--------+------------------ --------+--=
------------+---------+------------------------------------- +--------+---=
-------------------------------------------+
>=20
> I've been doing some searching on the web and have no idea if/how this =
can
> be sped up. Most searches these days reference MySQL 5.x which I'm just=
not
> sure how much applies. I'm hoping that there is something obvious that =
I'm
> missing, or that one of you experts knows what I might be able to chang=
e to
> speed this query up.
>=20
> Anyhow, thanks in advance for even so much as reading my message, let a=
lone
> replying :).
>=20

--=20

Mit besten Grüßen, Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / software-development / cms-solutions
p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
icq: 154546673, http://www.thelounge.net/

--------------enigFD6F20C193A1B61D403C9D5C
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk055OQACgkQhmBjz394AnlOwwCdFxuqaCnGb5qaH2BUl/NP SXua
COgAn1IFXn0x77szBgaJcyJBjb5DlZ5L
=P979
-----END PGP SIGNATURE-----

--------------enigFD6F20C193A1B61D403C9D5C--

Re: Slow query on MySQL4 server doing simple inner join of two InnoDBtables

am 21.01.2011 22:01:49 von shawn.l.green

On 1/21/2011 14:21, Kendall Gifford wrote:
> Hello everyone, I've got a database on an old Fedora Core 4 server running
> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has
> just two (InnoDB) tables:
>
> messages (approx 2.5 million records)
> recipients (approx 6.5 million records)
>
> These track information about email messages. Each message "has many"
> recipient records. The structure of the two tables (omitting irrelevant data
> fields) are as follows:
>
> +-------------+------------------+------+-----+------------- --------+----------------+
> | Field | Type | Null | Key | Default |
> Extra |
> +-------------+------------------+------+-----+------------- --------+----------------+
> | id | int(10) unsigned | | PRI | NULL |
> auto_increment |
> | sent_at | datetime | | MUL | 0000-00-00 00:00:00
> | |
> | ......................... OTHER FIELDS OMITTED FOR BREVITY
> ....................... |
> +-------------+------------------+------+-----+------------- --------+----------------+
>
> +-------------+------------------+------+-----+------------- --------+----------------+
> | Field | Type | Null | Key | Default |
> Extra |
> +-------------+------------------+------+-----+------------- --------+----------------+
> | id | int(10) unsigned | | PRI | NULL |
> auto_increment |
> | message_id | int(10) unsigned | | MUL | 0
> | |
> | employee_id | int(10) unsigned | YES | MUL | NULL
> | |
> | ......................... OTHER FIELDS OMITTED FOR BREVITY
> ....................... |
> +-------------+------------------+------+-----+------------- --------+----------------+
>
> I have the following query that is just too slow:
>
>> SELECT messages.* FROM messages
>> INNER JOIN recipients ON recipients.message_id = messages.id
>> WHERE recipients.employee_id = X
>> GROUP BY messages.id
>> ORDER BY sent_at DESC
>> LIMIT 0, 25;
>
> This takes about 44 seconds on average. The query explanation is as follows:
>
> +----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
> | id | select_type | table | type | possible_keys |
> key | key_len | ref | rows |
> Extra |
> +----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
> | 1 | SIMPLE | recipients | ref | messages_fk,employee_idx |
> employee_idx | 5 | const | 222640 |
> Using where; Using temporary; Using filesort |
> | 1 | SIMPLE | messages | eq_ref | PRIMARY |
> PRIMARY | 4 | email_archive.recipients.message_id | 1
> | |
> +----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
>
> I've been doing some searching on the web and have no idea if/how this can
> be sped up. Most searches these days reference MySQL 5.x which I'm just not
> sure how much applies. I'm hoping that there is something obvious that I'm
> missing, or that one of you experts knows what I might be able to change to
> speed this query up.
>
> Anyhow, thanks in advance for even so much as reading my message, let alone
> replying :).
>

You need to get rid of the GROUP BY to make this go faster. You can do
that by running two queries, one to pick the list of unique
recipients.message_id values that match your where condition then
another to actually retrieve the message data. Something like this

CREATE TEMPORARY TABLE tmpMessages (message_id INT UNSIGNED, PRIMARY KEY
(message_id)) ENGINE=MEMORY;

INSERT IGNORE tmpMessages
SELECT message_id
FROM recipients
WHERE employee_id = X;

SELECT messages.* FROM messages
INNER JOIN tmpMessages ON tmpMessages.message_id = messages.id
ORDER BY sent_at DESC
LIMIT 0, 25;

By pre-selecting a limited set of message_id values from the recipients
table, you seriously reduce the number of rows that need to be scanned.
Also, the INSERT IGNORE technique is faster than the GROUP BY because it
uses an index to identify any duplicates instead of a scan of all
previous unique values.

Please let us all know if this is faster enough. (and don't forget to
drop the temp table once you are through using it)

--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables

am 21.01.2011 22:57:52 von Kendall Gifford

--00221534d3b37c33f5049a6257ce
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jan 21, 2011 at 2:01 PM, Shawn Green (MySQL) <
shawn.l.green@oracle.com> wrote:

> On 1/21/2011 14:21, Kendall Gifford wrote:
>
>> Hello everyone, I've got a database on an old Fedora Core 4 server running
>> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has
>> just two (InnoDB) tables:
>>
>> messages (approx 2.5 million records)
>> recipients (approx 6.5 million records)
>>
>> These track information about email messages. Each message "has many"
>> recipient records. The structure of the two tables (omitting irrelevant
>> data
>> fields) are as follows:
>>
>>
>> +-------------+------------------+------+-----+------------- --------+----------------+
>> | Field | Type | Null | Key | Default |
>> Extra |
>>
>> +-------------+------------------+------+-----+------------- --------+----------------+
>> | id | int(10) unsigned | | PRI | NULL |
>> auto_increment |
>> | sent_at | datetime | | MUL | 0000-00-00 00:00:00
>> | |
>> | ......................... OTHER FIELDS OMITTED FOR BREVITY
>> ....................... |
>>
>> +-------------+------------------+------+-----+------------- --------+----------------+
>>
>>
>> +-------------+------------------+------+-----+------------- --------+----------------+
>> | Field | Type | Null | Key | Default |
>> Extra |
>>
>> +-------------+------------------+------+-----+------------- --------+----------------+
>> | id | int(10) unsigned | | PRI | NULL |
>> auto_increment |
>> | message_id | int(10) unsigned | | MUL | 0
>> | |
>> | employee_id | int(10) unsigned | YES | MUL | NULL
>> | |
>> | ......................... OTHER FIELDS OMITTED FOR BREVITY
>> ....................... |
>>
>> +-------------+------------------+------+-----+------------- --------+----------------+
>>
>> I have the following query that is just too slow:
>>
>> SELECT messages.* FROM messages
>>> INNER JOIN recipients ON recipients.message_id = messages.id
>>> WHERE recipients.employee_id = X
>>> GROUP BY messages.id
>>> ORDER BY sent_at DESC
>>> LIMIT 0, 25;
>>>
>>
>> This takes about 44 seconds on average. The query explanation is as
>> follows:
>>
>>
>> +----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
>> | id | select_type | table | type | possible_keys |
>> key | key_len | ref | rows |
>> Extra |
>>
>> +----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
>> | 1 | SIMPLE | recipients | ref | messages_fk,employee_idx |
>> employee_idx | 5 | const | 222640 |
>> Using where; Using temporary; Using filesort |
>> | 1 | SIMPLE | messages | eq_ref | PRIMARY |
>> PRIMARY | 4 | email_archive.recipients.message_id | 1
>> | |
>>
>> +----+-------------+------------+--------+------------------ --------+--------------+---------+-------------------------- -----------+--------+--------------------------------------- -------+
>>
>> I've been doing some searching on the web and have no idea if/how this can
>> be sped up. Most searches these days reference MySQL 5.x which I'm just
>> not
>> sure how much applies. I'm hoping that there is something obvious that I'm
>> missing, or that one of you experts knows what I might be able to change
>> to
>> speed this query up.
>>
>> Anyhow, thanks in advance for even so much as reading my message, let
>> alone
>> replying :).
>>
>>
> You need to get rid of the GROUP BY to make this go faster. You can do that
> by running two queries, one to pick the list of unique recipients.message_id
> values that match your where condition then another to actually retrieve the
> message data. Something like this
>
> CREATE TEMPORARY TABLE tmpMessages (message_id INT UNSIGNED, PRIMARY KEY
> (message_id)) ENGINE=MEMORY;
>
> INSERT IGNORE tmpMessages
> SELECT message_id
> FROM recipients
> WHERE employee_id = X;
>
> SELECT messages.* FROM messages
> INNER JOIN tmpMessages ON tmpMessages.message_id = messages.id
>
> ORDER BY sent_at DESC
> LIMIT 0, 25;
>
> By pre-selecting a limited set of message_id values from the recipients
> table, you seriously reduce the number of rows that need to be scanned.
> Also, the INSERT IGNORE technique is faster than the GROUP BY because it
> uses an index to identify any duplicates instead of a scan of all previous
> unique values.
>
> Please let us all know if this is faster enough. (and don't forget to drop
> the temp table once you are through using it)
>
> --
> Shawn Green
> MySQL Principal Technical Support Engineer
> Oracle USA, Inc.
> Office: Blountville, TN
>

Thanks Shawn, I'm in the process of trying you're suggestion now. I'll let
you know how it goes.

--
Kendall Gifford
zettabyte@gmail.com

--00221534d3b37c33f5049a6257ce--

Re: Slow query on MySQL4 server doing simple inner join of two InnoDBtables

am 24.01.2011 11:40:29 von Joerg Bruehe

Hi everybody!

Shawn Green (MySQL) wrote:
> On 1/21/2011 14:21, Kendall Gifford wrote:
>> Hello everyone, I've got a database on an old Fedora Core 4 server
>> running
>> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question=

>> has
>> just two (InnoDB) tables:
>>
>> messages (approx 2.5 million records)
>> recipients (approx 6.5 million records)
>>
>> [[ ... see the original post for the schema details ... ]]
>>
>>
>> I have the following query that is just too slow:
>>
>>> SELECT messages.* FROM messages
>>> INNER JOIN recipients ON recipients.message_id =3D messages.id
>>> WHERE recipients.employee_id =3D X
>>> GROUP BY messages.id
>>> ORDER BY sent_at DESC
>>> LIMIT 0, 25;
>>
>> This takes about 44 seconds on average. [[...]]
>>
>=20
> You need to get rid of the GROUP BY to make this go faster. You can do
> that by running two queries, one to pick the list of unique
> recipients.message_id values that match your where condition then
> another to actually retrieve the message data. [[...]]

I don't want to contradict Shawn, but currently I fail to see the need
for the GROUP BY: Joining like this

messages INNER JOIN recipients ON recipients.message_id =3D messages.id=

WHERE recipients.employee_id =3D X

can return only one row, unless there are multiple "recipients" records
for the same values of "message_id" and "employee_id".

I don't know whether that can happen in the poster's application, and
whether it would cause trouble if the result line would occur multiple
times.

Regards,
Jörg

--=20
Joerg Bruehe, MySQL Build Team, joerg.bruehe@oracle.com
ORACLE Deutschland B.V. & Co. KG, Komturstrasse 18a, D-12099 Berlin
Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven
Amtsgericht Muenchen: HRA 95603

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables

am 24.01.2011 22:20:27 von Kendall Gifford

--0022152d610938659f049a9e2bc3
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Mon, Jan 24, 2011 at 3:40 AM, Joerg Bruehe wrot=
e:

> Hi everybody!
>
>
> Shawn Green (MySQL) wrote:
> > On 1/21/2011 14:21, Kendall Gifford wrote:
> >> Hello everyone, I've got a database on an old Fedora Core 4 server
> >> running
> >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question
> >> has
> >> just two (InnoDB) tables:
> >>
> >> messages (approx 2.5 million records)
> >> recipients (approx 6.5 million records)
> >>
> >> [[ ... see the original post for the schema details ... ]]
> >>
> >>
> >> I have the following query that is just too slow:
> >>
> >>> SELECT messages.* FROM messages
> >>> INNER JOIN recipients ON recipients.message_id =3D messages.id
> >>> WHERE recipients.employee_id =3D X
> >>> GROUP BY messages.id
> >>> ORDER BY sent_at DESC
> >>> LIMIT 0, 25;
> >>
> >> This takes about 44 seconds on average. [[...]]
> >>
> >
> > You need to get rid of the GROUP BY to make this go faster. You can do
> > that by running two queries, one to pick the list of unique
> > recipients.message_id values that match your where condition then
> > another to actually retrieve the message data. [[...]]
>
> I don't want to contradict Shawn, but currently I fail to see the need
> for the GROUP BY: Joining like this
>
> messages INNER JOIN recipients ON recipients.message_id =3D messages.id
> WHERE recipients.employee_id =3D X
>
> can return only one row, unless there are multiple "recipients" records
> for the same values of "message_id" and "employee_id".
>
> I don't know whether that can happen in the poster's application, and
> whether it would cause trouble if the result line would occur multiple
> times.
>
>
In my application, there CAN in fact be several "recipients" records with
both the same "message_id" foreign key value AND the same "employee_id"
value (some employees may be a recipient of a message several times over vi=
a
alternative addresses and/or aliases). However, as I rework things, I could
probably rework application logic nuke the GROUP BY and just cope, in code,
with these extra "messages" records in my result set. (Just FYI, the SQL
query is simply the default query as created by rails or, more specifically=
,
ActiveRecord 2.3.9 which I can/will-be optimizing).

I will additionally be moving this database to a new server. However, for
academic interest, I'll see if I can make time to post the query time(s)
once I change the app, before moving the database to a new (and better
configured) server.

Thanks for the help everybody.

>
> Regards,
> Jörg
>
>
--=20
Kendall Gifford
zettabyte@gmail.com

--0022152d610938659f049a9e2bc3--

Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables

am 24.01.2011 23:28:59 von Kendall Gifford

--90e6ba53a6a2546372049a9f20d9
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Jan 24, 2011 at 2:20 PM, Kendall Gifford wrote:

>
>
> On Mon, Jan 24, 2011 at 3:40 AM, Joerg Bruehe wrote:
>
>> Hi everybody!
>>
>>
>> Shawn Green (MySQL) wrote:
>> > On 1/21/2011 14:21, Kendall Gifford wrote:
>> >> Hello everyone, I've got a database on an old Fedora Core 4 server
>> >> running
>> >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question
>> >> has
>> >> just two (InnoDB) tables:
>> >>
>> >> messages (approx 2.5 million records)
>> >> recipients (approx 6.5 million records)
>> >>
>> >> [[ ... see the original post for the schema details ... ]]
>> >>
>> >>
>> >> I have the following query that is just too slow:
>> >>
>> >>> SELECT messages.* FROM messages
>> >>> INNER JOIN recipients ON recipients.message_id = messages.id
>> >>> WHERE recipients.employee_id = X
>> >>> GROUP BY messages.id
>> >>> ORDER BY sent_at DESC
>> >>> LIMIT 0, 25;
>> >>
>> >> This takes about 44 seconds on average. [[...]]
>> >>
>> >
>> > You need to get rid of the GROUP BY to make this go faster. You can do
>> > that by running two queries, one to pick the list of unique
>> > recipients.message_id values that match your where condition then
>> > another to actually retrieve the message data. [[...]]
>>
>> I don't want to contradict Shawn, but currently I fail to see the need
>> for the GROUP BY: Joining like this
>>
>> messages INNER JOIN recipients ON recipients.message_id = messages.id
>> WHERE recipients.employee_id = X
>>
>> can return only one row, unless there are multiple "recipients" records
>> for the same values of "message_id" and "employee_id".
>>
>> I don't know whether that can happen in the poster's application, and
>> whether it would cause trouble if the result line would occur multiple
>> times.
>>
>>
> In my application, there CAN in fact be several "recipients" records with
> both the same "message_id" foreign key value AND the same "employee_id"
> value (some employees may be a recipient of a message several times over via
> alternative addresses and/or aliases). However, as I rework things, I could
> probably rework application logic nuke the GROUP BY and just cope, in code,
> with these extra "messages" records in my result set. (Just FYI, the SQL
> query is simply the default query as created by rails or, more specifically,
> ActiveRecord 2.3.9 which I can/will-be optimizing).
>
> I will additionally be moving this database to a new server. However, for
> academic interest, I'll see if I can make time to post the query time(s)
> once I change the app, before moving the database to a new (and better
> configured) server.
>
>
Just an update for posterity, simply removing the GROUP BY clause of my
query above has, overall, no noticeable effect on performance. I suspect
server configuration, as pointed out by Reindl, is too much of a bottleneck
and is what I first need to change (working on that now). Perhaps the
removal of GROUP BY would/will be noticeable if the server configuration for
InnoDB tables wasn't so horrendous. I'll find out...

--
Kendall Gifford
zettabyte@gmail.com

--90e6ba53a6a2546372049a9f20d9--

RE: Slow query on MySQL4 server doing simple inner join of twoInnoDB tables

am 25.01.2011 02:43:20 von Gavin Towey

If you show the EXPLAIN SELECT .. output, and the table structure, someone =
will be able to give a more definite answer.

-----Original Message-----
From: Kendall Gifford [mailto:zettabyte@gmail.com]
Sent: Monday, January 24, 2011 2:29 PM
To: mysql@lists.mysql.com
Subject: Re: Slow query on MySQL4 server doing simple inner join of two Inn=
oDB tables

On Mon, Jan 24, 2011 at 2:20 PM, Kendall Gifford wrote=
:

>
>
> On Mon, Jan 24, 2011 at 3:40 AM, Joerg Bruehe wr=
ote:
>
>> Hi everybody!
>>
>>
>> Shawn Green (MySQL) wrote:
>> > On 1/21/2011 14:21, Kendall Gifford wrote:
>> >> Hello everyone, I've got a database on an old Fedora Core 4 server
>> >> running
>> >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in questio=
n
>> >> has
>> >> just two (InnoDB) tables:
>> >>
>> >> messages (approx 2.5 million records)
>> >> recipients (approx 6.5 million records)
>> >>
>> >> [[ ... see the original post for the schema details ... ]]
>> >>
>> >>
>> >> I have the following query that is just too slow:
>> >>
>> >>> SELECT messages.* FROM messages
>> >>> INNER JOIN recipients ON recipients.message_id =3D messages.id
>> >>> WHERE recipients.employee_id =3D X
>> >>> GROUP BY messages.id
>> >>> ORDER BY sent_at DESC
>> >>> LIMIT 0, 25;
>> >>
>> >> This takes about 44 seconds on average. [[...]]
>> >>
>> >
>> > You need to get rid of the GROUP BY to make this go faster. You can do
>> > that by running two queries, one to pick the list of unique
>> > recipients.message_id values that match your where condition then
>> > another to actually retrieve the message data. [[...]]
>>
>> I don't want to contradict Shawn, but currently I fail to see the need
>> for the GROUP BY: Joining like this
>>
>> messages INNER JOIN recipients ON recipients.message_id =3D messages.id
>> WHERE recipients.employee_id =3D X
>>
>> can return only one row, unless there are multiple "recipients" records
>> for the same values of "message_id" and "employee_id".
>>
>> I don't know whether that can happen in the poster's application, and
>> whether it would cause trouble if the result line would occur multiple
>> times.
>>
>>
> In my application, there CAN in fact be several "recipients" records with
> both the same "message_id" foreign key value AND the same "employee_id"
> value (some employees may be a recipient of a message several times over =
via
> alternative addresses and/or aliases). However, as I rework things, I cou=
ld
> probably rework application logic nuke the GROUP BY and just cope, in cod=
e,
> with these extra "messages" records in my result set. (Just FYI, the SQL
> query is simply the default query as created by rails or, more specifical=
ly,
> ActiveRecord 2.3.9 which I can/will-be optimizing).
>
> I will additionally be moving this database to a new server. However, for
> academic interest, I'll see if I can make time to post the query time(s)
> once I change the app, before moving the database to a new (and better
> configured) server.
>
>
Just an update for posterity, simply removing the GROUP BY clause of my
query above has, overall, no noticeable effect on performance. I suspect
server configuration, as pointed out by Reindl, is too much of a bottleneck
and is what I first need to change (working on that now). Perhaps the
removal of GROUP BY would/will be noticeable if the server configuration fo=
r
InnoDB tables wasn't so horrendous. I'll find out...

--
Kendall Gifford
zettabyte@gmail.com

IMPORTANT: This email message is intended only for the use of the individua=
l to whom, or entity to which, it is addressed and may contain information =
that is privileged, confidential and exempt from disclosure under applicabl=
e law. If you are NOT the intended recipient, you are hereby notified that =
any use, dissemination, distribution or copying of this communication is st=
rictly prohibited. If you have received this communication in error, pleas=
e reply to the sender immediately and permanently delete this email. Thank =
you.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

InnoDB and rsync

am 25.01.2011 05:37:28 von Eric.Robinson

Is there a way to safely backup an InnoDB database using rsync?=20

Right now we have a very efficient and reliable way to backup 240+
separate instances of MySQL with MyISAM tables. The databases range in
size from .5GB to 16GB. During this time, users can still access the
system, so our customers can work 24x7. In the process, we also refresh
240+ slave instances with a perfect byte-for-byte replica of the master
databases.=20

The whole thing takes about 30 minutes.=20

Here's how we do it.=20

Late at night when the number of users on the system is low, we do the
following for each of the 240+ instances of MySQL...

1. Shut down the slave and remove all replication-related log files.

2. Perform an rsync of the master's data directory to the slave. Users
may be making changes to tables during this rsync.

3. Issue a FLUSH TABLES WITH READ LOCK on the master followed by a RESET
MASTER.

4. Perform a second rsync of the data directory from the master to the
slave to copy any user changes that happened during step 2. This usually
completes in a few seconds, often less than 1. If any users were trying
to insert records at this exact moment, their application may appear to
pause very briefly.

5. Start the slave.

When I'm all done, I have 240+ slave servers in perfect sync with their
masters, each having a 100% identical binary copy of its master's
database. Since these copies are truly identical, they can be used for a
second layer of backup to other media. =20

Like I said, the whole process takes about 30 minutes because the rsync
algorithm only copies the byte-level changes.

IS THERE ANY WAY TO SET UP SOMETHING THIS EASY AND EFFICIENT USING
INNODB?

I've been reading about InnoDB hot copy and other approaches, but none
of them seem to work as well as the approach I have worked out with
MyISAM. Unfortunately, my software wants to force us to switch to
InnoDB, so I'm really stuck. If we have to switch to InnoDB and we
cannot come up with a method for doing fast, rsync-style backups, it
will probably mean huge, costly, and unnecessary changes to our
infrastructure.

Any help will be GREATLY appreciated.

--
Eric Robinson

Disclaimer - January 24, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for mysql@lists.mysql.com. If you are not the named =
addressee you should not disseminate, distribute, copy or alter this =
email. Any views or opinions presented in this email are solely those of =
the author and might not represent those of Physicians' Managed Care or =
Physician Select Management. Warning: Although Physicians' Managed Care =
or Physician Select Management has taken reasonable precautions to =
ensure no viruses are present in this email, the company cannot accept =
responsibility for any loss or damage arising from the use of this email =
or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 25.01.2011 07:02:20 von Johan De Meersman

--0015174c462698632b049aa5756b
Content-Type: text/plain; charset=ISO-8859-1

I suspect the same trick might work with InnoDB (with pretty much the same
caveats), but you'd be best off setting innodb-file-per-table - I'm sure
you've already seen that the large datafiles are a hindrance to smooth
rsyncing :-)

Make sure to test extensively, though.

On Tue, Jan 25, 2011 at 5:37 AM, Robinson, Eric wrote:

> Is there a way to safely backup an InnoDB database using rsync?
>
> Right now we have a very efficient and reliable way to backup 240+
> separate instances of MySQL with MyISAM tables. The databases range in
> size from .5GB to 16GB. During this time, users can still access the
> system, so our customers can work 24x7. In the process, we also refresh
> 240+ slave instances with a perfect byte-for-byte replica of the master
> databases.
>
> The whole thing takes about 30 minutes.
>
> Here's how we do it.
>
> Late at night when the number of users on the system is low, we do the
> following for each of the 240+ instances of MySQL...
>
> 1. Shut down the slave and remove all replication-related log files.
>
> 2. Perform an rsync of the master's data directory to the slave. Users
> may be making changes to tables during this rsync.
>
> 3. Issue a FLUSH TABLES WITH READ LOCK on the master followed by a RESET
> MASTER.
>
> 4. Perform a second rsync of the data directory from the master to the
> slave to copy any user changes that happened during step 2. This usually
> completes in a few seconds, often less than 1. If any users were trying
> to insert records at this exact moment, their application may appear to
> pause very briefly.
>
> 5. Start the slave.
>
> When I'm all done, I have 240+ slave servers in perfect sync with their
> masters, each having a 100% identical binary copy of its master's
> database. Since these copies are truly identical, they can be used for a
> second layer of backup to other media.
>
> Like I said, the whole process takes about 30 minutes because the rsync
> algorithm only copies the byte-level changes.
>
> IS THERE ANY WAY TO SET UP SOMETHING THIS EASY AND EFFICIENT USING
> INNODB?
>
> I've been reading about InnoDB hot copy and other approaches, but none
> of them seem to work as well as the approach I have worked out with
> MyISAM. Unfortunately, my software wants to force us to switch to
> InnoDB, so I'm really stuck. If we have to switch to InnoDB and we
> cannot come up with a method for doing fast, rsync-style backups, it
> will probably mean huge, costly, and unnecessary changes to our
> infrastructure.
>
> Any help will be GREATLY appreciated.
>
> --
> Eric Robinson
>
>
> Disclaimer - January 24, 2011
> This email and any files transmitted with it are confidential and intended
> solely for mysql@lists.mysql.com. If you are not the named addressee you
> should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might
> not represent those of Physicians' Managed Care or Physician Select
> Management. Warning: Although Physicians' Managed Care or Physician Select
> Management has taken reasonable precautions to ensure no viruses are present
> in this email, the company cannot accept responsibility for any loss or
> damage arising from the use of this email or attachments.
> This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=vegivamp@tuxera.be
>
>

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0015174c462698632b049aa5756b--

Re: InnoDB and rsync

am 25.01.2011 09:05:05 von Reindl Harald

--------------enigF08804ED24F5436B8C83C750
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 25.01.2011 05:37, schrieb Robinson, Eric:
> Is there a way to safely backup an InnoDB database using rsync?=20

Not without stop mysqld
Foregt it, do not try it and stop searching if you do not waste time

If you understand how innodb works you will see that
this is not possible by design

your whole solution is crippled because why in the world
are you killing your salves and reinit them without any
reason daily?

you can get a fresgh backup of the master with the following method
but nobody do that daily - this is only for init/reinit a salve

* rsync on master in another local folder while running
* stop master
* remove bin-logs
* second rsync to get last changes
* start master
* rsync to final destination

> When I'm all done, I have 240+ slave servers in perfect sync with their=

> masters, each having a 100% identical binary copy of its master's
> database. Since these copies are truly identical, they can be used for =
a
> second layer of backup to other media.

why in the world do you not leave the slaves in peace and backup them
to another media because you can stop them as long you want for
a consistent backup and after starting the last changes from
the master are applied

> Right now we have a very efficient and reliable way to backup 240+
> separate instances of MySQL with MyISAM tables. The databases range in
> size from .5GB to 16GB. During this time, users can still access the
> system, so our customers can work 24x7. In the process, we also refresh=

> 240+ slave instances with a perfect byte-for-byte replica of the master=

> databases.=20
>=20
> The whole thing takes about 30 minutes.=20
>=20
> Here's how we do it.=20
>=20
> Late at night when the number of users on the system is low, we do the
> following for each of the 240+ instances of MySQL...
>=20
> 1. Shut down the slave and remove all replication-related log files.
>=20
> 2. Perform an rsync of the master's data directory to the slave. Users
> may be making changes to tables during this rsync.
>=20
> 3. Issue a FLUSH TABLES WITH READ LOCK on the master followed by a RESE=
T
> MASTER.
>=20
> 4. Perform a second rsync of the data directory from the master to the
> slave to copy any user changes that happened during step 2. This usuall=
y
> completes in a few seconds, often less than 1. If any users were trying=

> to insert records at this exact moment, their application may appear to=

> pause very briefly.
>=20
> 5. Start the slave.
>=20
> When I'm all done, I have 240+ slave servers in perfect sync with their=

> masters, each having a 100% identical binary copy of its master's
> database. Since these copies are truly identical, they can be used for =
a
> second layer of backup to other media. =20
>=20
> Like I said, the whole process takes about 30 minutes because the rsync=

> algorithm only copies the byte-level changes.
>=20
> IS THERE ANY WAY TO SET UP SOMETHING THIS EASY AND EFFICIENT USING
> INNODB?
>=20
> I've been reading about InnoDB hot copy and other approaches, but none
> of them seem to work as well as the approach I have worked out with
> MyISAM. Unfortunately, my software wants to force us to switch to
> InnoDB, so I'm really stuck. If we have to switch to InnoDB and we
> cannot come up with a method for doing fast, rsync-style backups, it
> will probably mean huge, costly, and unnecessary changes to our
> infrastructure.
>=20
> Any help will be GREATLY appreciated.
>=20
> --
> Eric Robinson
>=20
>=20
> Disclaimer - January 24, 2011=20
> This email and any files transmitted with it are confidential and inten=
ded solely for mysql@lists.mysql.com. If you are not the named addressee =
you should not disseminate, distribute, copy or alter this email. Any vie=
ws or opinions presented in this email are solely those of the author and=
might not represent those of Physicians' Managed Care or Physician Selec=
t Management. Warning: Although Physicians' Managed Care or Physician Sel=
ect Management has taken reasonable precautions to ensure no viruses are =
present in this email, the company cannot accept responsibility for any l=
oss or damage arising from the use of this email or attachments.=20
> This disclaimer was added by Policy Patrol: http://www.policypatrol.com=
/

--=20

Mit besten Grüßen, Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / software-development / cms-solutions
p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
icq: 154546673, http://www.thelounge.net/

--------------enigF08804ED24F5436B8C83C750
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk0+hDEACgkQhmBjz394Anm50ACfWQy31qN4510WFaK0Jj/o VJd9
rBUAn2Pl6uyiI7aRRhx5JhBom0II0SSK
=9DK8
-----END PGP SIGNATURE-----

--------------enigF08804ED24F5436B8C83C750--

RE: InnoDB and rsync

am 25.01.2011 15:00:29 von Eric.Robinson

> your whole solution is crippled because why in the world are=20
> you killing your salves and reinit them without any reason daily?

There is a very good reason: it is the phenomenon of row drift. The
master and slave can appear to be in good sync, but often it is not
actually the case. For this reason, most people agree that it is not
safe to rely on the slave server as the source for your backups. My
solution efficiently corrects row drift and makes sure the slaves are
100% binary replicas of the slaves, which can then be trusted as backup
sources. The whole thing is very fast and there is no downtime for
users, who can continue to work 24x7. I fail to see how this is
"crippled."

> why in the world do you not leave the slaves in peace and=20
> backup them to another media because you can stop them as=20
> long you want for a consistent backup and after starting the=20
> last changes from the master are applied

See my comment above. (But also we cannot stop them as long as we want
because the slaves are used for running reports. Using my approach, each
slave is down for about 30 seconds. The masters are not brought down at
all.)

> If you understand how innodb works you will see that this is=20
> not possible by design

I'm starting to worry that you may be right. I know FLUSH TABLES WITH
READ LOCK does not work as expected with InnoDB, but is there really no
way to put InnoDB into a state where all changes have been flushed to
disk and it is safe to rsync the directory? Is stopping the service
really the only way? (And even if I stop the service, is rsync totally
safe with InnoDB?)

--
Eric Robinson

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Reindl Harald,mysql@lists.mysql.com. If you are not =
the named addressee you should not disseminate, distribute, copy or =
alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 25.01.2011 15:22:28 von Mattia Merzi

2011/1/25 Robinson, Eric :
>> your whole solution is crippled because why in the world are
>> you killing your salves and reinit them without any reason daily?
> There is a very good reason: it is the phenomenon of row drift. The
> master and slave can appear to be in good sync, but often it is not
> actually the case.

.... sounds interesting; have you got any document explaining
this phenomenon? AFAIK, the things that (silently) break replication are:
- non-deterministic functions in statement-based replication
- hand-made updates on the slave db
is this enough to justify a *daily* resync?!

However, this could be a solution for your problem (maybe)
http://www.pythian.com/news/5113/video-building-a-mysql-slav e-and-keeping-it-in-sync/

if you watch the movie, at ~40 minutes, you can see a slide
"What causes slave to get out of sync"...

Greetings,

Mattia.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: InnoDB and rsync

am 25.01.2011 15:33:27 von Johan De Meersman

--0015174c43c083651d049aac99a2
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jan 25, 2011 at 3:00 PM, Robinson, Eric wrote:

> > your whole solution is crippled because why in the world are
> > you killing your salves and reinit them without any reason daily?
>
> There is a very good reason: it is the phenomenon of row drift. The
>

Interesting. I never heard of that, and can't, at first glance, seem to find
a lot of useful things on Google. Could you explain what you mean ?

The one thing I can think of, would be the fact that your rows are not
guaranteed to be in the same disk blocks, or even necessarily in the same
data block of your file. This in itself doesn't really pose a problem for
backups, though, afaik ?

I'm starting to worry that you may be right. I know FLUSH TABLES WITH
> READ LOCK does not work as expected with InnoDB, but is there really no
>

It doesn't, exactly, no; but afaik no actual data will be written. Some
metadata may not be fully sync, but I do not believe a lot could happen that
the recovery when you start your slave can't fix. Still, the issue is there.

> really the only way? (And even if I stop the service, is rsync totally
> safe with InnoDB?)
>

As a stopped MySQL can't update the files or keep data in memory, that
should be safe, yes.

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0015174c43c083651d049aac99a2--

Re: InnoDB and rsync

am 25.01.2011 15:44:15 von Reindl Harald

--------------enigEA82419C3F86107259ABBB91
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 25.01.2011 15:00, schrieb Robinson, Eric:
>> your whole solution is crippled because why in the world are=20
>> you killing your salves and reinit them without any reason daily?
>=20
> There is a very good reason: it is the phenomenon of row drift. The
> master and slave can appear to be in good sync, but often it is not
> actually the case.=20

There is nothing drifting and nobody cares if the files on both servers
are binary identical, the data must be consistent and it is

binlog-format =3D ROW

> For this reason, most people agree that it is not
> safe to rely on the slave server as the source for your backups.=20

sorry but these people have no plan

> My solution efficiently corrects row drift and makes sure the slaves=20
> are 100% binary replicas of the slaves

jesus christ nobody cares if they are binary replica as long
as the data is consistent and ident

> I fail to see how this is "crippled."

It is crippled because you do not understand the sense of
replication if you reinit it every day

> See my comment above. (But also we cannot stop them as long as we want
> because the slaves are used for running reports.=20

so start another slave on the machine with his own socket
for backups, i have running on all dedicated backup-servers
two instances - one is useable r/w and the other one without
tcp is the replication-slave, every hour the salve is stopped
and datadir mirrored to the r/w-instance

> Using my approach, each slave is down for about 30 seconds.=20
> The masters are not brought down at all.

and if you running a clean solution the salves are never down

> but is there really no way to put InnoDB into a state where all=20
> changes have been flushed to disk and it is safe to rsync the directory=
?=20

no, it is a database and not designed for access from external software
as long as the database is running

> Is stopping the service really the only way?=20

yes, and not only for innodb
try to copy oracle, postgresql, ms-sql :-)

if you do not stop the service you can be sure that the backup is
not useable or missing data, even if there would exist a mode
sync all to disk nobody would officially support copy datafiles
while the service is running, even with myisam nobody will do that

> And even if I stop the service, is rsync totally
> safe with InnoDB?

why not?

the server is down and you copy the whole datadir
what can be unsafe there?

--------------enigEA82419C3F86107259ABBB91
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk0+4b8ACgkQhmBjz394AnlFMACgh1DxWP5EakZnc/de0/oW afi5
ZX0AnAktulwe1uh+9i702etZs5iKVL/O
=Bzqp
-----END PGP SIGNATURE-----

--------------enigEA82419C3F86107259ABBB91--

Re: InnoDB and rsync

am 25.01.2011 15:56:26 von Johan De Meersman

--0015174c43c0be0862049aaceb32
Content-Type: text/plain; charset=ISO-8859-1

> jesus christ nobody cares if they are binary replica as long
> as the data is consistent and ident
>

Actually, I can see this being an issue if you're using LVM snapshot backups
or another similar technique - if the datafiles aren't all identical you
won't be able to restore to any machine from a single backup.

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0015174c43c0be0862049aaceb32--

Re: InnoDB and rsync

am 25.01.2011 16:05:43 von Steve Musumeche

On 1/25/2011 8:00 AM, Robinson, Eric wrote:
>> your whole solution is crippled because why in the world are
>> you killing your salves and reinit them without any reason daily?
> There is a very good reason: it is the phenomenon of row drift. The
> master and slave can appear to be in good sync, but often it is not
> actually the case. For this reason, most people agree that it is not
> safe to rely on the slave server as the source for your backups. My
> solution efficiently corrects row drift and makes sure the slaves are
> 100% binary replicas of the slaves, which can then be trusted as backup
> sources. The whole thing is very fast and there is no downtime for
> users, who can continue to work 24x7. I fail to see how this is
> "crippled."
Why don't you use a Maatkit solution like mk-checksum to ensure that
your slaves have identical data with the master?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: InnoDB and rsync

am 25.01.2011 16:21:51 von Eric.Robinson

> Why don't you use a Maatkit solution like mk-checksum to=20
> ensure that your slaves have identical data with the master?

I looked at Maatkit a year or so ago. It looked pretty interesting, but
then I started reading the disclaimers carefully and they scared the
bejeepers out of me. Warnings about data corruption and whatnot. I'll
check it out again. (I was actually looking for it this morning but
could not remember the name, so thanks for the reminder.)

--
Eric Robinson

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for =
steve@internetretailconnection.com,mysql@lists.mysql.com. If you are not =
the named addressee you should not disseminate, distribute, copy or =
alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 25.01.2011 16:24:50 von Reindl Harald

--------------enig5258138AD4F0B639DFA73B84
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 25.01.2011 15:56, schrieb Johan De Meersman:
>> jesus christ nobody cares if they are binary replica as long
>> as the data is consistent and ident
>>
>=20
> Actually, I can see this being an issue if you're using LVM snapshot ba=
ckups
> or another similar technique - if the datafiles aren't all identical yo=
u
> won't be able to restore to any machine from a single backup.

Where exactly do you see any problem?

* the master writes his bin-log in row format
* the slaves writes his relay-log and managing inserts updates
* lvm makes a snapshot of relay-log / datafiles

There is none and if there could be one LVM must be broken
because a snapshot has to be consistent

So if you stop the slave, make the snahpshot and start the slave
again there are all buffers written to the vfs-layer and the snapshot
must have a defined state.

Only if the db-server is running and have some data in memory cache
you could have any troubles and that is why not copy the files
as long the server is running

Nobody out there makes a copy of database files while the server
is running, really nobody!

--=20

Mit besten Grüßen, Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / software-development / cms-solutions
p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
icq: 154546673, http://www.thelounge.net/

--------------enig5258138AD4F0B639DFA73B84
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk0+60IACgkQhmBjz394AnmguwCfXof+QYGGX3dw2Oa9/OXF Sban
7JIAnR6gZnqapj8k7j33OCKc0sA7kiPx
=foD/
-----END PGP SIGNATURE-----

--------------enig5258138AD4F0B639DFA73B84--

RE: InnoDB and rsync

am 25.01.2011 16:38:53 von Jerry Schwartz

>-----Original Message-----
>From: vegivamp@gmail.com [mailto:vegivamp@gmail.com] On Behalf Of Johan De
>Meersman
>Sent: Tuesday, January 25, 2011 9:56 AM
>To: Reindl Harald
>Cc: Robinson, Eric; mysql@lists.mysql.com
>Subject: Re: InnoDB and rsync
>
>> jesus christ nobody cares if they are binary replica as long
>> as the data is consistent and ident
>>
>
>Actually, I can see this being an issue if you're using LVM snapshot backups
>or another similar technique - if the datafiles aren't all identical you
>won't be able to restore to any machine from a single backup.
>
[JS] I don't get it. Isn't this like saying that you can't substitute a
dictionary with a red cover for a dictionary with a blue cover?

If you do a complete LVM restore, then you have complete and (hopefully)
consistent copy of your files, even if the volume images were not identical;
in fact, they almost certainly will NOT be identical at the disk-image level.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com

>
>--
>Bier met grenadyn
>Is als mosterd by den wyn
>Sy die't drinkt, is eene kwezel
>Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: InnoDB and rsync

am 25.01.2011 16:45:35 von Eric.Robinson

> > There is a very good reason: it is the phenomenon of row drift. The=20
> > master and slave can appear to be in good sync, but often it is not=20
> > actually the case.
>=20
> ... sounds interesting; have you got any document explaining=20
> this phenomenon? AFAIK, the things that (silently) break=20
> replication are:
> - non-deterministic functions in statement-based replication
> - hand-made updates on the slave db
> is this enough to justify a *daily* resync?!

I'm definitely no expert on this. All I know is that we used to
frequently experience situations where queries to the slaves would
return different recordsets than the same queries to the masters. Yet by
all other indications the servers were in sync. All the replication
threads were running and the row counts were identical, but the data in
the rows was sometimes different. I asked about this in the list and the
answers I got back were that the phenomenon was called row drift and was
fairly well known and not always easy (or sometimes even possible) to
eliminate because of bad programming practices in some off-the-shelf
applications. At that time, the consensus in the list was that it was
not safe to trust replication slaves for backup purposes. That's when I
came up with the idea of doing an rsync every night, which creates a
slave that is 100% reliable for using as a backup source and also
eliminates problems with row-drift. Since we started using that
technique, we don't get calls from users complaining that their reports
are showing bogus totals and such.=20

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Mattia Merzi,Reindl Harald,mysql@lists.mysql.com. If =
you are not the named addressee you should not disseminate, distribute, =
copy or alter this email. Any views or opinions presented in this email =
are solely those of the author and might not represent those of =
Physicians' Managed Care or Physician Select Management. Warning: =
Although Physicians' Managed Care or Physician Select Management has =
taken reasonable precautions to ensure no viruses are present in this =
email, the company cannot accept responsibility for any loss or damage =
arising from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: InnoDB and rsync

am 25.01.2011 16:56:33 von Eric.Robinson

> nobody cares if they are binary replica as long=20
> as the data is consistent and ident

Like I said, I'm no expert on this, but my approach seems like the only
way to 100% absolutely sure that the data on the slave is in fact
consistent and identical to the data on tha master.

> so start another slave on the machine with his own socket for=20
> backups

You say that like it doesn't mean a huge amount of additional work,
expense, and complexity. We currently have 240+ master MySQL instances
and are adding them at a rate of several per week.

Based on everything you've said so far, I still prefer my solution. I
just need a way to make the same thing work with InnoDB.

--
Eric Robinson

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Reindl Harald,mysql@lists.mysql.com. If you are not =
the named addressee you should not disseminate, distribute, copy or =
alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 25.01.2011 17:28:47 von Reindl Harald

--------------enig450EB4BC26B7776D6230484A
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 25.01.2011 16:56, schrieb Robinson, Eric:

> You say that like it doesn't mean a huge amount of additional work,
> expense, and complexity. We currently have 240+ master MySQL instances
> and are adding them at a rate of several per week.

240 mysql-servers?
why there is no consolidation?

> Based on everything you've said so far, I still prefer my solution. I
> just need a way to make the same thing work with InnoDB.

this is simply impossible

--=20

Mit besten Grüßen, Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / software-development / cms-solutions
p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
icq: 154546673, http://www.thelounge.net/

--------------enig450EB4BC26B7776D6230484A
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk0++j8ACgkQhmBjz394AnkjlQCgnPxySGjIPOUv0NeXgZL1 J1LN
lZEAn11O7E9L4qxZGKNiaR8WmX2LbEnq
=kfXq
-----END PGP SIGNATURE-----

--------------enig450EB4BC26B7776D6230484A--

RE: InnoDB and rsync

am 25.01.2011 18:38:40 von Eric.Robinson

> 240 mysql-servers?
> why there is no consolidation?

I said 240+ mysql *instances*, not servers. It's actually just 3
physical servers (not counting standby cluster nodes).

> > just need a way to make the same thing work with InnoDB.
>=20
> this is simply impossible

That is very unfortunate.

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Reindl Harald,mysql@lists.mysql.com. If you are not =
the named addressee you should not disseminate, distribute, copy or =
alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables

am 25.01.2011 19:56:40 von Kendall Gifford

--90e6ba53a6a2de9f0d049ab04649
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Jan 24, 2011 at 6:43 PM, Gavin Towey wrote:

> If you show the EXPLAIN SELECT .. output, and the table structure, someone
> will be able to give a more definite answer.
>
>
Thanks for the reply Gavin. I actually did place this info in my very first
message on this thread, along with my basic table structure and server
version. Myself and others have just stopped keeping the full,
deeply-nested, quoted thread inside all subsequent messages which is why you
probably haven't seen it.

However, here is the EXPLAIN SELECT from the first message (reformatted for
email):

select_type: SIMPLE
table: recipients
type: ref
possible_keys: messages_fk, employee_idx
key: employee_idx
key_len: 5
ref: const
rows: 222640
Extra: Using where; Using temporary; Using filesort

select_type: SIMPLE
table: messages
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: email_archive.recipients.message_id
rows: 1
Extra:

Anyhow, having now copied these tables to another server (MySQL 5.1) and
done some tests (bumping up innodb_buffer_pool_size and playing with
innodb_flush_log_at_trx_commit for my writes and a few other knobs) it is
simply that these somewhat large tables need lots of RAM to perform well,
just as Reindl Harald originally pointed out.

Thanks again for the help everyone!

--
Kendall Gifford
zettabyte@gmail.com

--90e6ba53a6a2de9f0d049ab04649--

Re: InnoDB and rsync

am 25.01.2011 23:18:22 von Reindl Harald

--------------enig69C4C17C0077036439B62768
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 25.01.2011 18:38, schrieb Robinson, Eric:
>> 240 mysql-servers?
>> why there is no consolidation?
>=20
> I said 240+ mysql *instances*, not servers. It's actually just 3
> physical servers (not counting standby cluster nodes).

240 mysql-instances on 3 physical hosts?
what crazy setup is this please?

sorry but your smallest problem is really innodb per rsync

>>> just need a way to make the same thing work with InnoDB.
>>
>> this is simply impossible
>=20
> That is very unfortunate.

The whole world can work with replication-slaves and you are
the only one who installing an endless count of mysql-services
instead a hand of large instances - i would think that not all
others are ghost-drivers and nobody outside mysql would ever
think of backup a running db-server

--------------enig69C4C17C0077036439B62768
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk0/TC4ACgkQhmBjz394Ann5OgCdFKB74mDTKaJ5vVQFhaUd 4iU6
/FsAniv4rcClpjfE40sP+utvYihemJPg
=TtBe
-----END PGP SIGNATURE-----

--------------enig69C4C17C0077036439B62768--

RE: InnoDB and rsync

am 25.01.2011 23:35:53 von Eric.Robinson

> 240 mysql-instances on 3 physical hosts?
> what crazy setup is this please?

Processors average 90% idle, peaks are low, iowait is low, the system is
not swapping, response time is good, and our users are happy all around
the country. What is crazy about that?=20

> The whole world can work with replication-slaves and you are=20
> the only one who installing an endless count of=20
> mysql-services instead a hand of large instances=20

I don't know how the rest of the world does it, but we have been doing
it like this since 2006 and it has worked great and we have never
regretted having multiple instances of mysql. In fact, it is really
great because we can maintain each customer's service individually, stop
and start mysql without affecting other customers, turn query logs on
and off for each customer, customize performance parameters, and so on.
I can maintain a customer's database right in the middle of a production
day and the other customers won't even notice! It has been great being
able to do all these things. =20

> outside mysql would ever think of backup a running db-server
>=20

Then you're just not Googling very well. :-)

--Eric

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Reindl Harald,mysql@lists.mysql.com. If you are not =
the named addressee you should not disseminate, distribute, copy or =
alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 26.01.2011 04:59:41 von shawn.l.green

On 1/25/2011 10:45, Robinson, Eric wrote:
>>> There is a very good reason: it is the phenomenon of row drift. The
>>> master and slave can appear to be in good sync, but often it is not
>>> actually the case.
>>
>> ... sounds interesting; have you got any document explaining
>> this phenomenon? AFAIK, the things that (silently) break
>> replication are:
>> - non-deterministic functions in statement-based replication
>> - hand-made updates on the slave db
>> is this enough to justify a *daily* resync?!
>
>
> I'm definitely no expert on this. All I know is that we used to
> frequently experience situations where queries to the slaves would
> return different recordsets than the same queries to the masters. Yet by
> all other indications the servers were in sync. All the replication
> threads were running and the row counts were identical, but the data in
> the rows was sometimes different. I asked about this in the list and the
> answers I got back were that the phenomenon was called row drift and was
> fairly well known and not always easy (or sometimes even possible) to
> eliminate because of bad programming practices in some off-the-shelf
> applications. At that time, the consensus in the list was that it was
> not safe to trust replication slaves for backup purposes. That's when I
> came up with the idea of doing an rsync every night, which creates a
> slave that is 100% reliable for using as a backup source and also
> eliminates problems with row-drift. Since we started using that
> technique, we don't get calls from users complaining that their reports
> are showing bogus totals and such.
>

I suspect that your queries were not as deterministic as you thought
they were. Do you have a sample of a query that produced different
results between the master and the slave? We shouldn't need the results,
just the query.

--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: InnoDB and rsync

am 26.01.2011 05:07:23 von shawn.l.green

On 1/25/2011 09:00, Robinson, Eric wrote:
> ...
>
> I'm starting to worry that you may be right. I know FLUSH TABLES WITH
> READ LOCK does not work as expected with InnoDB, but is there really no
> way to put InnoDB into a state where all changes have been flushed to
> disk and it is safe to rsync the directory? Is stopping the service
> really the only way? (And even if I stop the service, is rsync totally
> safe with InnoDB?)
>

You need to quiesce the InnoDb background threads. One technique is
mentioned here:
http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-table spaces.html

Look for the section talking about "clean" backups.
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: InnoDB and rsync

am 26.01.2011 06:18:02 von Eric.Robinson

> On 1/25/2011 10:45, Robinson, Eric wrote:
> >>> There is a very good reason: it is the phenomenon of row=20
> drift. The=20
> >>> master and slave can appear to be in good sync, but often=20
> it is not=20
> >>> actually the case.
> >>
> >> ... sounds interesting; have you got any document explaining this=20
> >> phenomenon? AFAIK, the things that (silently) break=20
> replication are:
> >> - non-deterministic functions in statement-based replication
> >> - hand-made updates on the slave db
> >> is this enough to justify a *daily* resync?!
> >
> >
> > I'm definitely no expert on this. All I know is that we used to
> > frequently experience situations where queries to the slaves would
> > return different recordsets than the same queries to the=20
> masters. Yet by
> > all other indications the servers were in sync. All the replication
> > threads were running and the row counts were identical, but=20
> the data in
> > the rows was sometimes different. I asked about this in the=20
> list and the
> > answers I got back were that the phenomenon was called row=20
> drift and was
> > fairly well known and not always easy (or sometimes even=20
> possible) to
> > eliminate because of bad programming practices in some off-the-shelf
> > applications. At that time, the consensus in the list was=20
> that it was
> > not safe to trust replication slaves for backup purposes.=20
> That's when I
> > came up with the idea of doing an rsync every night, which creates a
> > slave that is 100% reliable for using as a backup source and also
> > eliminates problems with row-drift. Since we started using that
> > technique, we don't get calls from users complaining that=20
> their reports
> > are showing bogus totals and such.
> >
>=20
> I suspect that your queries were not as deterministic as you thought=20
> they were. Do you have a sample of a query that produced different=20
> results between the master and the slave? We shouldn't need=20
> the results,=20
> just the query.
>=20

Sorry, no. The software is a canned medical application so we cannot
easily inspect the queries that could have been causing the problem.
Even though we could capture them in various ways (sniffer, proxy, query
logs) it would not be easy to isolate the culprits out of the tens of
thousands issued every day. And it was a year or more ago. We have not
had the problem since we started rsyncing. :-)

=20

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Shawn Green (MySQL),Mattia Merzi,Reindl =
Harald,mysql@lists.mysql.com. If you are not the named addressee you =
should not disseminate, distribute, copy or alter this email. Any views =
or opinions presented in this email are solely those of the author and =
might not represent those of Physicians' Managed Care or Physician =
Select Management. Warning: Although Physicians' Managed Care or =
Physician Select Management has taken reasonable precautions to ensure =
no viruses are present in this email, the company cannot accept =
responsibility for any loss or damage arising from the use of this email =
or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: InnoDB and rsync

am 26.01.2011 06:38:09 von Eric.Robinson

> > I'm starting to worry that you may be right. I know FLUSH=20
> TABLES WITH=20
> > READ LOCK does not work as expected with InnoDB, but is=20
> there really=20
> > no way to put InnoDB into a state where all changes have=20
> been flushed=20
> > to disk and it is safe to rsync the directory? Is stopping=20
> the service=20
> > really the only way? (And even if I stop the service, is=20
> rsync totally=20
> > safe with InnoDB?)
> >
>=20
> You need to quiesce the InnoDb background threads. One=20
> technique is mentioned here:
> http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-table sp
aces.html
>=20
> Look for the section talking about "clean" backups.

Now we're talkin. I'll check it out.

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Shawn Green (MySQL),Reindl =
Harald,mysql@lists.mysql.com. If you are not the named addressee you =
should not disseminate, distribute, copy or alter this email. Any views =
or opinions presented in this email are solely those of the author and =
might not represent those of Physicians' Managed Care or Physician =
Select Management. Warning: Although Physicians' Managed Care or =
Physician Select Management has taken reasonable precautions to ensure =
no viruses are present in this email, the company cannot accept =
responsibility for any loss or damage arising from the use of this email =
or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: InnoDB and rsync

am 26.01.2011 06:58:26 von Eric.Robinson

> > You need to quiesce the InnoDb background threads. One technique is=20
> > mentioned here:
> > http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-table sp
> aces.html
> >=20
> > Look for the section talking about "clean" backups.
>=20
> Now we're talkin. I'll check it out.
>=20
>=20

I read that section but it is not at all clear (1) how one quiesces the
InnoDB background threads, or (2) if there is a way to keep them
quiesced while the backup is in progress.

Disclaimer - January 25, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Robinson, Eric,Shawn Green (MySQL),Reindl =
Harald,mysql@lists.mysql.com. If you are not the named addressee you =
should not disseminate, distribute, copy or alter this email. Any views =
or opinions presented in this email are solely those of the author and =
might not represent those of Physicians' Managed Care or Physician =
Select Management. Warning: Although Physicians' Managed Care or =
Physician Select Management has taken reasonable precautions to ensure =
no viruses are present in this email, the company cannot accept =
responsibility for any loss or damage arising from the use of this email =
or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 26.01.2011 12:29:55 von Johan De Meersman

--0015174c110a0633e6049abe27ea
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Jan 26, 2011 at 6:58 AM, Robinson, Eric wrote:

> > > You need to quiesce the InnoDb background threads. One technique is
> > > mentioned here:
> > > http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-table sp
> > aces.html
> > >
> > > Look for the section talking about "clean" backups.
> >
> > Now we're talkin. I'll check it out.
> >
> >
>
> I read that section but it is not at all clear (1) how one quiesces the
> InnoDB background threads, or (2) if there is a way to keep them
> quiesced while the backup is in progress.
>

From what I see there:

> You can make a clean backup .ibd file using the following method:
>
> 1.
>
> Stop all activity from the *mysqld*server and commit all transactions.
> 2.
>
> Wait until SHOW ENGINE INNODB STATUSshows that there are no active transactions in the database, and the main
> thread status of InnoDB is Waiting for server activity. Then you can
> make a copy of the .ibd file.
>
> I would assume that "flush tables with read lock" would work for 1. and
then you wait for 2. to happen. Probably shouldn't take very long,
especially in the quiet moments.

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--0015174c110a0633e6049abe27ea--

RE: InnoDB and rsync

am 26.01.2011 15:08:34 von Eric.Robinson

------_=_NextPart_001_01CBBD62.85B0523C
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

=20

> > You need to quiesce the InnoDb background threads. One
technique is
> > mentioned here:
> >
http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-table sp
=20
> aces.html
> >
> > Look for the section talking about "clean" backups.
>
> Now we're talkin. I'll check it out.
>
>
=09
=09
I read that section but it is not at all clear (1) how one
quiesces the
InnoDB background threads, or (2) if there is a way to keep them
quiesced while the backup is in progress.
=09

From what I see there:=20

You can make a clean backup .ibd file using the following
method:=20

1. Stop all activity from the mysqld
server and commit
all transactions.=20

2. Wait until SHOW ENGINE INNODB STATUS
shows that
there are no active transactions in the database, and the main thread
status of InnoDB is Waiting for server activity. Then you can make a
copy of the .ibd file.=20

=20

> I would assume that "flush tables with read lock" would work=20
> for 1. and then you wait for 2. to happen. Probably shouldn't=20
> take very long, especially in the quiet moments.
=20
=20
=20
Can anyone confirm that FLUSH TABLES WITH READ LOCK is sufficient to
"quiesce" the InnoDB background threads? When Googling this, I'm seeing
posts saying that even after a flush & lock, InnoDB keeps making changes
to certain files (metadata?) and that's why it is never really safe
(according to them) to copy the files from a running instance of MySQL.
Call me persistent (or just annoyingly thick-headed) but I am not fully
satisfied their explanations.
=20
=20
--Eric
=20
=20

Disclaimer - January 26, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Johan De Meersman,Shawn Green (MySQL),Reindl =
Harald,mysql@lists.mysql.com. If you are not the named addressee you =
should not disseminate, distribute, copy or alter this email. Any views =
or opinions presented in this email are solely those of the author and =
might not represent those of Physicians' Managed Care or Physician =
Select Management. Warning: Although Physicians' Managed Care or =
Physician Select Management has taken reasonable precautions to ensure =
no viruses are present in this email, the company cannot accept =
responsibility for any loss or damage arising from the use of this email =
or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

------_=_NextPart_001_01CBBD62.85B0523C--

RE: InnoDB and rsync

am 28.01.2011 17:04:10 von Eric.Robinson

> You need to quiesce the InnoDb background threads. One=20
> technique is mentioned here:
> http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-table sp
aces.html

Just refreshing this topic a bit. Can anyone confirm that FLUSH TABLES
WITH READ LOCK is sufficient to quiesce the InnoBD background threads
per Shawn's message above?=20

--
Eric Robinson

Disclaimer - January 28, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for mysql@lists.mysql.com,Shawn Green (MySQL). If you =
are not the named addressee you should not disseminate, distribute, copy =
or alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 28.01.2011 20:01:00 von Michael Dykman

FLUSH TABLES WITH READ LOCK does work consistently on MyISAM and my
experience confirms this. I do remember reading something on this
list eons ago that asserted that it is not necessarily effective on
InnoDB due to it's multi-versioning.. uncommited transactions might be
caught in an inconsistent state.

In one extreme instance, having a few terabytes of data across several
instances (on distinct hosts), I was required to do a full-refactoring
data migration with an absolute limitation on allowable downtime.
Among the technique which I used (and I can't take credit for this
one) was to use rsync on the live server for innodb files (this phase
took a very long time, but did not interfere with operations). The
result of this phase was, as you would expect, a set a seriously
broken files which were notheless very similar to the correct files.
When that phase was complete, I shut the server down and did another
rsync. It required perhaps a minute or 2, but the result was 100%
clean innodb data files which satisfied my downtime limitations.

FLUSH TABLES WITH READ LOCK might suffice if all transactions are
completed/rolled-back but I would stil advise that you scan SHOW
ENGINE INNODB STATUS but I would carefully experiment with that.

As for maat-kit, don't let the disclaimers discourage you. If you
read the disclaimers carefully on any product (at least those released
with the benefit(?) of legal advice), you would have a hard time
trusting any of it with your enterprise. The maat-kit team (and Baron
Schwartz in particular) and quite simply the *best* MySQL engineering
team out there, with the possible exception of the vendor. I would
not hesitate to trust them with my data.

- michael dykman

On Fri, Jan 28, 2011 at 11:04 AM, Robinson, Eric
wrote:
>> You need to quiesce the InnoDb background threads. One
>> technique is mentioned here:
>> http://dev.mysql.com/doc/refman/5.5/en/innodb-multiple-table sp
> aces.html
>
>
> Just refreshing this topic a bit. Can anyone confirm that FLUSH TABLES
> WITH READ LOCK is sufficient to quiesce the InnoBD background threads
> per Shawn's message above?
>
> --
> Eric Robinson
>
>
>
> Disclaimer - January 28, 2011
> This email and any files transmitted with it are confidential and intende=
d solely for mysql@lists.mysql.com,Shawn Green (MySQL). If you are not the =
named addressee you should not disseminate, distribute, copy or alter this =
email. Any views or opinions presented in this email are solely those of th=
e author and might not represent those of Physicians' Managed Care or Physi=
cian Select Management. Warning: Although Physicians' Managed Care or Physi=
cian Select Management has taken reasonable precautions to ensure no viruse=
s are present in this email, the company cannot accept responsibility for a=
ny loss or damage arising from the use of this email or attachments.
> This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: =A0 =A0http://lists.mysql.com/mysql?unsub=3Dmdykman@gmail=
..com
>
>

--=20
=A0- michael dykman
=A0- mdykman@gmail.com

=A0May the Source be with you.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

RE: InnoDB and rsync

am 28.01.2011 20:10:21 von Eric.Robinson

> In one extreme instance, having a few terabytes of data=20
> across several instances (on distinct hosts), I was required=20
> to do a full-refactoring data migration with an absolute=20
> limitation on allowable downtime.
> Among the technique which I used (and I can't take credit for this
> one) was to use rsync on the live server for innodb files=20
> (this phase took a very long time, but did not interfere with=20
> operations). The result of this phase was, as you would=20
> expect, a set a seriously broken files which were notheless=20
> very similar to the correct files.
> When that phase was complete, I shut the server down and did=20
> another rsync. It required perhaps a minute or 2, but the=20
> result was 100% clean innodb data files which satisfied my=20
> downtime limitations.
>=20
> FLUSH TABLES WITH READ LOCK might suffice if all=20
> transactions are completed/rolled-back but I would stil=20
> advise that you scan SHOW ENGINE INNODB STATUS but I would=20
> carefully experiment with that.
>=20

You just described almost the exact procedure that I described at the
beginning of this thread, except I use MyISAM so my question was whether
the same technique could work with InnoDB. It sounds like it very well
could if combined with SHOW ENGINE INNODB STATUS. I will definitely test
it to be sure.

> As for maat-kit, don't let the disclaimers discourage you. =20
> If you read the disclaimers carefully on any product (at=20
> least those released with the benefit(?) of legal advice),=20
> you would have a hard time trusting any of it with your=20
> enterprise. The maat-kit team (and Baron Schwartz in=20
> particular) and quite simply the *best* MySQL engineering=20
> team out there, with the possible exception of the vendor. I=20
> would not hesitate to trust them with my data.
>=20

I will definitely look at it again. Thanks.

--Eric

Disclaimer - January 28, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Michael Dykman,mysql@lists.mysql.com,Shawn Green =
(MySQL). If you are not the named addressee you should not disseminate, =
distribute, copy or alter this email. Any views or opinions presented in =
this email are solely those of the author and might not represent those =
of Physicians' Managed Care or Physician Select Management. Warning: =
Although Physicians' Managed Care or Physician Select Management has =
taken reasonable precautions to ensure no viruses are present in this =
email, the company cannot accept responsibility for any loss or damage =
arising from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 28.01.2011 21:31:19 von Reindl Harald

--------------enig7F06EE4E6D17376EE4A2EFD9
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 28.01.2011 17:04, schrieb Robinson, Eric:

> Just refreshing this topic a bit. Can anyone confirm that FLUSH TABLES
> WITH READ LOCK is sufficient to quiesce the InnoBD background threads
> per Shawn's message above?=20

Damned start your brain, read documentations and hear what peopole say

http://dev.mysql.com/doc/refman/5.1/en/alter-table-problems. html

> If you use ALTER TABLE on a transactional table or if you are using Win=
dows or
> OS/2, ALTER TABLE unlocks the table if you had done a LOCK TABLE on it.=
This
> is done because InnoDB and these operating systems cannot drop a table =
that is
> in use.

And you will build your business on a hotcopy with external tools beause
you do not trust replication? laughable!

Do what you want, but dont come back and cry if all goes down
You were told in which way you can use rsync with minimum downtime
or that replication can be used to stop only salves for a short time
and you are acting like a child "mama i will do what i said the whole tim=
e"

--------------enig7F06EE4E6D17376EE4A2EFD9
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk1DJ5gACgkQhmBjz394Ann3iACcDLqNSVWx5xhxWhHglzgj q6Rk
g1gAn0Liax4bw4q+aC1MJTv01/kOsHGI
=9eqW
-----END PGP SIGNATURE-----

--------------enig7F06EE4E6D17376EE4A2EFD9--

RE: InnoDB and rsync

am 28.01.2011 22:30:34 von Eric.Robinson

> And you will build your business on a hotcopy with external=20
> tools beause you do not trust replication? laughable!
>=20
> Do what you want, but dont come back and cry if all goes down=20
> You were told in which way you can use rsync with minimum=20
> downtime or that replication can be used to stop only salves=20
> for a short time and you are acting like a child "mama i will=20
> do what i said the whole time"

Our current model has been working well since 2006. We will be careful
to verify the reliability of any proposed changes.=20

Have a great day!

--
Eric Robinson

Disclaimer - January 28, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Reindl Harald,mysql@lists.mysql.com. If you are not =
the named addressee you should not disseminate, distribute, copy or =
alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 28.01.2011 22:47:17 von Reindl Harald

--------------enig6315EC6DF3686A7AE6669F21
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 28.01.2011 22:30, schrieb Robinson, Eric:

> Our current model has been working well since 2006. We will be careful
> to verify the reliability of any proposed changes.=20
>=20
> Have a great day!

this is ok because MyISAM is so simple that you can even without any
flushes make a copy while the server is running and after a "repair table=
"
on the destination machine all tables are useable

With InnoDB it is much difficult because "table spaces"
Eeven with "innodb_file_per_table" there are dependencies
of the table-files and "ibdata1" in the main datadir

If there is only a minimal problem it is possible that mysqld
will not start and if this is a time where you need your backup
really you would like to die :-)

Even if there would be documentaed ways to rsync while the
server is running i would never ever do that because it is
so hot that a simple mysql-bug in a later release could break
it temporarliy and you would not notice this, so you sgould simply
go a safe way

* flush atbles
* rsync while mysqld is running
* stop mysqld
* second rsync

With this steps you can copy real big databases with a minimum downtime,=

to reduce this again select a local folder as target if your disks are fa=
st
and after mysqld is runnign again you can sync this copy to another machi=
ne
without stress

--------------enig6315EC6DF3686A7AE6669F21
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk1DOWUACgkQhmBjz394Anl/XgCfQ4CUUQsB5hWrpVozrguV caXD
7UMAn3DrrGM6GRug/kAwv3026j4T1+df
=0Pot
-----END PGP SIGNATURE-----

--------------enig6315EC6DF3686A7AE6669F21--

RE: InnoDB and rsync

am 28.01.2011 22:59:47 von Eric.Robinson

> * flush atbles
> * rsync while mysqld is running
> * stop mysqld
> * second rsync
>=20

Unless we can verify 100% that there is a safe way to do it without
shutting down MySQL, then I'm sure the approach you described above is
the one we will end up with. Thanks for your input.

--
Eric Robinson

Disclaimer - January 28, 2011=20
This email and any files transmitted with it are confidential and =
intended solely for Reindl Harald,mysql@lists.mysql.com. If you are not =
the named addressee you should not disseminate, distribute, copy or =
alter this email. Any views or opinions presented in this email are =
solely those of the author and might not represent those of Physicians' =
Managed Care or Physician Select Management. Warning: Although =
Physicians' Managed Care or Physician Select Management has taken =
reasonable precautions to ensure no viruses are present in this email, =
the company cannot accept responsibility for any loss or damage arising =
from the use of this email or attachments.=20
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: InnoDB and rsync

am 01.02.2011 03:11:20 von Eric Bergen

I skimmed over this thread and I think I can help clarify the innodb,
rsync, and lvm situation.

The basic issue with just running rsync on the files under a running
mysqld is that the rsync
will copy different parts of files at different points in time. This
means that it could sync things
to disk before and after rsync has read that part of a file which,
when restored, will be corrupted.
This is the worst case scenario.

The next best thing is to run a FLUSH TABLES WITH READ LOCK (global
read lock) and maintain
the lock while the rsync is running. This works fine for myisam tables
because all file activity to
myisam tables is stopped while a global read lock is held.

This isn't guaranteed to work with innodb. Internally to mysql flush
tables with read lock only stops
queries from acquiring write locks which let them modify tables. This
won't make innodb hold still
for a few different reasons.

First a select query in innodb can actually modify data files. A
select on a page with unmerged
records from the insert buffer will cause innodb to merge those
records before making the page
available for selects. This causes some disk i/o. If this happens
while rsync is running the resulting
backup can be corrupted. So even while holding a global read lock and
only running selects innodb
can write data.

The master thread(s) perform background tasks such as flushing dirty
pages, merging the insert buffer
and purging old records whenever innodb feels like there is spare i/o
capacity. These threads don't
know how to hold still during a global read lock and can corrupt a
backup if it were taken with rsync.

The safest way to create a backup without using something like
XtraBackup is to get a snapshot at
the filesystem level or below that at the block device level. This is
effectively what LVM does. When
you create a LVM snapshot it freezes that device at a single point in
time. When you restore the backup
innodb goes through it's recovery procedure as if the power went out.

Some possible solutions to this were discussed on the internals list a
few years ago. I'm not sure what
has been implemented since then. The list thread is here:
http://lists.mysql.com/internals/35527

-Eric

On Fri, Jan 28, 2011 at 1:59 PM, Robinson, Eric w=
rote:
>> * flush atbles
>> * rsync while mysqld is running
>> * stop mysqld
>> * second rsync
>>
>
> Unless we can verify 100% that there is a safe way to do it without
> shutting down MySQL, then I'm sure the approach you described above is
> the one we will end up with. Thanks for your input.
>
> --
> Eric Robinson
>
>
>
> Disclaimer - January 28, 2011
> This email and any files transmitted with it are confidential and intende=
d solely for Reindl Harald,mysql@lists.mysql.com. If you are not the named =
addressee you should not disseminate, distribute, copy or alter this email.=
Any views or opinions presented in this email are solely those of the auth=
or and might not represent those of Physicians' Managed Care or Physician S=
elect Management. Warning: Although Physicians' Managed Care or Physician S=
elect Management has taken reasonable precautions to ensure no viruses are =
present in this email, the company cannot accept responsibility for any los=
s or damage arising from the use of this email or attachments.
> This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: =A0 =A0http://lists.mysql.com/mysql?unsub=3Deric.bergen@g=
mail.com
>
>

--=20
Eric Bergen
eric.bergen@gmail.com
http://www.ebergen.net

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg