many-inserts go faster the second time

many-inserts go faster the second time

am 25.09.2009 09:44:31 von Bennett Haselton

I have a script that runs several times in the evening, and on each
run it adds several thousand entries to a table.

On the first run, it adds the entries rather slowly. But then on all
subsequent runs (usually about a minute or two later), the many
inserts go a lot faster. This is true regardless of how many entries
are added by each run -- whether the first and second run both add
50,000 or the first and second run both add 10,000, the first run
goes slowly and the second one goes fast. But by the following
evening, the first run is back to going slowly again.

It's as if in the minute or two following the first run of the
script, MySQL catches its breath and realizes, hey, that table is
getting a lot of entries added to it, so it waves some magic dust so
that the next time I add a lot of entries, it goes a lot
faster. (Hope I'm not losing anybody with the technical terminology
here.) Then by the next evening the optimization parameter has
exp^W^W^W^W the fairy dust has worn off.

Is this a familiar phenomenon to anyone? Know why it's
happening? And especially, is there any way I can tell MySQL to
optimize that table before the first script run, so that the first
run goes fast as well?

In general, does anybody have familiarity with the strategies for
speeding up the process of inserting a lot of rows at a time, and
knows which ones really do work and which ones don't? This page:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
says that in the case of a single client doing lots of inserts, these
would apply:
- using insert statements with multiple values lists
- change the "bulk_insert_buffer_size" variable
- writing the data to be inserted into a temporary file, and then
using the LOAD DATA INFILE syntax

-Bennett


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: many-inserts go faster the second time

am 25.09.2009 17:24:29 von Dan Nelson

In the last episode (Sep 25), Bennett Haselton said:
> I have a script that runs several times in the evening, and on each run it
> adds several thousand entries to a table.
>
> On the first run, it adds the entries rather slowly. But then on all
> subsequent runs (usually about a minute or two later), the many inserts go
> a lot faster. This is true regardless of how many entries are added by
> each run -- whether the first and second run both add 50,000 or the first
> and second run both add 10,000, the first run goes slowly and the second
> one goes fast. But by the following evening, the first run is back to
> going slowly again.
>
> It's as if in the minute or two following the first run of the script,
> MySQL catches its breath and realizes, hey, that table is getting a lot of
> entries added to it, so it waves some magic dust so that the next time I
> add a lot of entries, it goes a lot faster. (Hope I'm not losing anybody
> with the technical terminology here.) Then by the next evening the
> optimization parameter has exp^W^W^W^W the fairy dust has worn off.

More likely, this is a relatively unused table, and the first batch of
inserts pulls most of the index and some of the table data into RAM, which
makes for much faster lookups on the next run. What do top and iostat stats
show on both runs? I'd expect heavy disk usage and little CPU on the first
run, and light disk and heavier CPU usage on the second.

--
Dan Nelson
dnelson@allantgroup.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: many-inserts go faster the second time

am 25.09.2009 21:08:03 von Bennett Haselton

At 08:24 AM 9/25/2009, Dan Nelson wrote:
>In the last episode (Sep 25), Bennett Haselton said:
> > I have a script that runs several times in the evening, and on
> each run it
> > adds several thousand entries to a table.
> >
> > On the first run, it adds the entries rather slowly. But then on all
> > subsequent runs (usually about a minute or two later), the many
> inserts go
> > a lot faster. This is true regardless of how many entries are added by
> > each run -- whether the first and second run both add 50,000 or
> the first
> > and second run both add 10,000, the first run goes slowly and the
> second
> > one goes fast. But by the following evening, the first run is back to
> > going slowly again.
> >
> > It's as if in the minute or two following the first run of the script,
> > MySQL catches its breath and realizes, hey, that table is getting
> a lot of
> > entries added to it, so it waves some magic dust so that the next
> time I
> > add a lot of entries, it goes a lot faster. (Hope I'm not losing
> anybody
> > with the technical terminology here.) Then by the next evening the
> > optimization parameter has exp^W^W^W^W the fairy dust has worn off.
>
>More likely, this is a relatively unused table, and the first batch of
>inserts pulls most of the index and some of the table data into RAM, which
>makes for much faster lookups on the next run. What do top and
>iostat stats
>show on both runs? I'd expect heavy disk usage and little CPU on
>the first
>run, and light disk and heavier CPU usage on the second.

That's interesting, I can look at that next time I try it. But if
that's the case, wouldn't the first run go slowly at first, but then
pick up speed once all of the indexes etc. have been pulled into
memory? Because that's not what I'm seeing -- if I insert 50,000 in
the first run, it's slow all the way through, but then the second
50,000 get inserted quickly.

-Bennett


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: many-inserts go faster the second time

am 02.10.2009 11:53:37 von Joerg Bruehe

Hi Bennett, all!


Bennett Haselton wrote:
> At 08:24 AM 9/25/2009, Dan Nelson wrote:
>> In the last episode (Sep 25), Bennett Haselton said:
>> > I have a script that runs several times in the evening, and on e=
ach
>> run it
>> > adds several thousand entries to a table.
>> >
>> > On the first run, it adds the entries rather slowly. But then o=
n all
>> > subsequent runs (usually about a minute or two later), the many
>> inserts go
>> > a lot faster. This is true regardless of how many entries are a=
dded by
>> > each run -- whether the first and second run both add 50,000 or =
the
>> first
>> > and second run both add 10,000, the first run goes slowly and th=
e
>> second
>> > one goes fast. But by the following evening, the first run is b=
ack to
>> > going slowly again.
>> >
>> > It's as if in the minute or two following the first run of the s=
cript,
>> > MySQL catches its breath and realizes, hey, that table is gettin=
g a
>> lot of
>> > entries added to it, so it waves some magic dust so that the nex=
t
>> time I
>> > add a lot of entries, it goes a lot faster. (Hope I'm not losin=
g
>> anybody
>> > with the technical terminology here.) Then by the next evening t=
he
>> > optimization parameter has exp^W^W^W^W the fairy dust has worn o=
ff.
>>
>> More likely, this is a relatively unused table, and the first batc=
h of
>> inserts pulls most of the index and some of the table data into RA=
M,
>> which
>> makes for much faster lookups on the next run. What do top and io=
stat
>> stats
>> show on both runs? I'd expect heavy disk usage and little CPU on =
the
>> first
>> run, and light disk and heavier CPU usage on the second.
>=20
> That's interesting, I can look at that next time I try it. But if
> that's the case, wouldn't the first run go slowly at first, but the=
n
> pick up speed once all of the indexes etc. have been pulled into
> memory? Because that's not what I'm seeing -- if I insert 50,000 i=
n the
> first run, it's slow all the way through, but then the second 50,00=
0 get
> inserted quickly.

Your "fairy dust" is called "access pattern", evaluated by a LRU or
similar policy.

Don't forget you may have caching on two levels: database and operati=
ng
system. Both have their own cache aging mechanisms.
The details about caching and its effects will vary by the table hand=
ler
you are using, MyISAM structures and policies definitely from InnoDB =
ones.

Even if MySQL would not cache data and index pages, they would still
reside in the operating system's file I/O cache, so the next access t=
o
them will be faster than the first one - regardless whether you read
them or modify them.

However, sooner or later they will be removed from all caches because
they are not accessed until the next evening, whereas other pages wer=
e
accessed and needed space in RAM.
(Here, I ignore the case of a RAM which is larger than all data acces=
sed
for a day, it is too unlikely.)
In the evening, when your job is run again, this starts anew.


HTH,
Jörg

--=20
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@Sun.COM
Sun Microsystems GmbH, Komturstraße 18a, D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB1610=
28


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: many-inserts go faster the second time

am 02.10.2009 21:28:09 von Bennett Haselton

At 02:53 AM 10/2/2009, Joerg Bruehe wrote:
>Hi Bennett, all!
>
>
>Bennett Haselton wrote:
> > At 08:24 AM 9/25/2009, Dan Nelson wrote:
> >> In the last episode (Sep 25), Bennett Haselton said:
> >> > I have a script that runs several times in the evening, and on each
> >> run it
> >> > adds several thousand entries to a table.
> >> >
> >> > On the first run, it adds the entries rather slowly. But then
> on all
> >> > subsequent runs (usually about a minute or two later), the many
> >> inserts go
> >> > a lot faster. This is true regardless of how many entries are
> added by
> >> > each run -- whether the first and second run both add 50,000 or the
> >> first
> >> > and second run both add 10,000, the first run goes slowly and the
> >> second
> >> > one goes fast. But by the following evening, the first run is
> back to
> >> > going slowly again.
> >> >
> >> > It's as if in the minute or two following the first run of the
> script,
> >> > MySQL catches its breath and realizes, hey, that table is getting a
> >> lot of
> >> > entries added to it, so it waves some magic dust so that the next
> >> time I
> >> > add a lot of entries, it goes a lot faster. (Hope I'm not losing
> >> anybody
> >> > with the technical terminology here.) Then by the next evening the
> >> > optimization parameter has exp^W^W^W^W the fairy dust has worn off.
> >>
> >> More likely, this is a relatively unused table, and the first batch of
> >> inserts pulls most of the index and some of the table data into RAM,
> >> which
> >> makes for much faster lookups on the next run. What do top and iostat
> >> stats
> >> show on both runs? I'd expect heavy disk usage and little CPU on the
> >> first
> >> run, and light disk and heavier CPU usage on the second.
> >
> > That's interesting, I can look at that next time I try it. But if
> > that's the case, wouldn't the first run go slowly at first, but then
> > pick up speed once all of the indexes etc. have been pulled into
> > memory? Because that's not what I'm seeing -- if I insert 50,000
> in the
> > first run, it's slow all the way through, but then the second
> 50,000 get
> > inserted quickly.
>
>Your "fairy dust" is called "access pattern", evaluated by a LRU or
>similar policy.
>
>Don't forget you may have caching on two levels: database and operating
>system. Both have their own cache aging mechanisms.
>The details about caching and its effects will vary by the table handler
>you are using, MyISAM structures and policies definitely from InnoDB ones.
>
>Even if MySQL would not cache data and index pages, they would still
>reside in the operating system's file I/O cache, so the next access to
>them will be faster than the first one - regardless whether you read
>them or modify them.
>
>However, sooner or later they will be removed from all caches because
>they are not accessed until the next evening, whereas other pages were
>accessed and needed space in RAM.
>(Here, I ignore the case of a RAM which is larger than all data accessed
>for a day, it is too unlikely.)
>In the evening, when your job is run again, this starts anew.

I understand that, but here's the problem:

If the speed increase were just the result of values being recently
accessed, then the speed increase should be a function of the number
of inserts that I've already done. So if I insert 10,000 rows in one
script run and then 10,000 rows in a second script run, and the
second script run is a lot faster, then the first 10,000 inserts were
enough to optimize everything. So that should mean if I do 50,000
inserts in a single script run, then the first 10,000 inserts should
be enough to speed everything up.

But that's not what I'm seeing. What I'm seeing is that if I do
10,000 inserts on the first run and 10,000 on the second, then the
first run is slow and the second run is fast. On the other hand if I
do 50,000 inserts on the first run and 50,000 on the second, then the
entire first run is slow and the entire second run is fast.

In any case, is there anything I can do to force MySQL to manually
pre-optimize the entire table (even if it takes some time to do so,
reading indexes into memory or whatever), other than kludgy solutions
like doing a dummy insert of several thousand rows and then deleting them?

-Bennett


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: many-inserts go faster the second time

am 02.10.2009 22:18:10 von mussatto

------=_20091002131810_29175
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit



On Fri, October 2, 2009 12:28, Bennett Haselton wrote:
> At 02:53
AM 10/2/2009, Joerg Bruehe wrote:
>>Hi Bennett, all!
>>
>>
>>Bennett Haselton wrote:
>> > At 08:24 AM 9/25/2009, Dan Nelson wrote:
>>
>> In the last episode (Sep 25), Bennett Haselton said:
>> >> > I have a script that runs several times in the
evening, and on each
>> >> run it
>> >>
> adds several thousand entries to a table.
>> >>
>
>> >> > On the first run, it adds the entries
rather slowly. But then
>> on all
>> >> >
subsequent runs (usually about a minute or two later), the many
>> >> inserts go
>> >> > a lot faster.
This is true regardless of how many entries are
>> added by
>> >> > each run -- whether the first and second run both
add 50,000 or the
>> >> first
>> >> >
and second run both add 10,000, the first run goes slowly and the
>> >> second
>> >> > one goes fast. But
by the following evening, the first run is
>> back to
>> >> > going slowly again.
>> >>
>
>> >> > It's as if in the minute or two following
the first run of the
>> script,
>> >> >
MySQL catches its breath and realizes, hey, that table is getting a
>> >> lot of
>> >> > entries added to
it, so it waves some magic dust so that the next
>> >>
time I
>> >> > add a lot of entries, it goes a lot
faster. (Hope I'm not losing
>> >> anybody
>>
>> > with the technical terminology here.) Then by the next
evening the
>> >> > optimization parameter has
exp^W^W^W^W the fairy dust has worn off.
>> >>
>> >> More likely, this is a relatively unused table, and
the first batch
>> of
>> >> inserts pulls most
of the index and some of the table data into RAM,
>> >>
which
>> >> makes for much faster lookups on the next
run. What do top and
>> iostat
>> >> stats
>> >> show on both runs? I'd expect heavy disk usage and
little CPU on the
>> >> first
>> >> run,
and light disk and heavier CPU usage on the second.
>> >
>> > That's interesting, I can look at that next time I try it.
But if
>> > that's the case, wouldn't the first run go
slowly at first, but then
>> > pick up speed once all of the
indexes etc. have been pulled into
>> > memory? Because
that's not what I'm seeing -- if I insert 50,000
>> in the
>> > first run, it's slow all the way through, but then the
second
>> 50,000 get
>> > inserted quickly.
>>
>>Your "fairy dust" is called "access
pattern", evaluated by a LRU or
>>similar policy.
>>
>>Don't forget you may have caching on two levels:
database and operating
>>system. Both have their own cache
aging mechanisms.
>>The details about caching and its effects
will vary by the table handler
>>you are using, MyISAM
structures and policies definitely from InnoDB
>> ones.
>>
>>Even if MySQL would not cache data and index
pages, they would still
>>reside in the operating system's file
I/O cache, so the next access to
>>them will be faster than the
first one - regardless whether you read
>>them or modify
them.
>>
>>However, sooner or later they will be
removed from all caches because
>>they are not accessed until
the next evening, whereas other pages were
>>accessed and
needed space in RAM.
>>(Here, I ignore the case of a RAM which
is larger than all data accessed
>>for a day, it is too
unlikely.)
>>In the evening, when your job is run again, this
starts anew.
>
> I understand that, but here's the
problem:
>
> If the speed increase were just the result
of values being recently
> accessed, then the speed increase
should be a function of the number
> of inserts that I've already
done. So if I insert 10,000 rows in one
> script run and then
10,000 rows in a second script run, and the
> second script run is
a lot faster, then the first 10,000 inserts were
> enough to
optimize everything. So that should mean if I do 50,000
> inserts
in a single script run, then the first 10,000 inserts should
> be
enough to speed everything up.
>
> But that's not what
I'm seeing. What I'm seeing is that if I do
> 10,000 inserts on
the first run and 10,000 on the second, then the
> first run is
slow and the second run is fast. On the other hand if I
> do
50,000 inserts on the first run and 50,000 on the second, then the
> entire first run is slow and the entire second run is fast.
>
> In any case, is there anything I can do to force MySQL
to manually
> pre-optimize the entire table (even if it takes some
time to do so,
> reading indexes into memory or whatever), other
than kludgy solutions
> like doing a dummy insert of several
thousand rows and then deleting them?
>
>
-Bennett
Maybe not dummies but rather a subset of the first
group.  Then do the rest of the first set. 

------=_20091002131810_29175--

Re: many-inserts go faster the second time

am 06.10.2009 17:26:58 von Joerg Bruehe

Hi Bennett, all,


sorry about the delay, I was somewhat busy.

Bennett Haselton wrote:
> At 02:53 AM 10/2/2009, Joerg Bruehe wrote:
>> Hi Bennett, all!
>>
>>
>> Bennett Haselton wrote:
>> > [[...]]
>>
>> Your "fairy dust" is called "access pattern", evaluated by a LRU o=
r
>> similar policy.
>>
>> Don't forget you may have caching on two levels: database and oper=
ating
>> system. Both have their own cache aging mechanisms.
>> The details about caching and its effects will vary by the table h=
andler
>> you are using, MyISAM structures and policies definitely from Inno=
DB
>> ones.
>>
>> Even if MySQL would not cache data and index pages, they would sti=
ll
>> reside in the operating system's file I/O cache, so the next acces=
s to
>> them will be faster than the first one - regardless whether you re=
ad
>> them or modify them.
>>
>> However, sooner or later they will be removed from all caches beca=
use
>> they are not accessed until the next evening, whereas other pages =
were
>> accessed and needed space in RAM.
>> (Here, I ignore the case of a RAM which is larger than all data ac=
cessed
>> for a day, it is too unlikely.)
>> In the evening, when your job is run again, this starts anew.
>=20
> I understand that, but here's the problem:
>=20
> If the speed increase were just the result of values being recently
> accessed, then the speed increase should be a function of the numbe=
r of
> inserts that I've already done. So if I insert 10,000 rows in one
> script run and then 10,000 rows in a second script run, and the sec=
ond
> script run is a lot faster, then the first 10,000 inserts were enou=
gh to
> optimize everything. So that should mean if I do 50,000 inserts in=
a
> single script run, then the first 10,000 inserts should be enough t=
o
> speed everything up.

That reasoning sounds valid, but I guess it is incomplete:
Which pages (data and index) would the first 10,000 row insert touch,
and which does the 50,000 row insert touch?

If your rows are ordered by some criteria (maybe not intentionally, j=
ust
by the way you get them), then it is possible that the first part of =
a
batch touches only part of the data pages, and so the last part doesn=
't
profit.

Yes, this is speculation. Also, I don't know the in-memory or on-disk
data structures good enough to prove my idea.

>=20
> But that's not what I'm seeing. What I'm seeing is that if I do 10=
,000
> inserts on the first run and 10,000 on the second, then the first r=
un is
> slow and the second run is fast. On the other hand if I do 50,000
> inserts on the first run and 50,000 on the second, then the entire =
first
> run is slow and the entire second run is fast.

Also, are you sure there is no action at the end of your insert batch
run that finishes the accesses, and as long as this hasn't been done
some aspect is missing?

>=20
> In any case, is there anything I can do to force MySQL to manually
> pre-optimize the entire table (even if it takes some time to do so,
> reading indexes into memory or whatever), other than kludgy solutio=
ns
> like doing a dummy insert of several thousand rows and then deletin=
g them?

If I knew one, I would tell you, but I have no knowledge there.
You might try a "SELECT *" or a "SELECT AVG(some_field)", but I won't
make any claim this helps.
I would not do any manipulation, the risk to miss the cleanup is simp=
ly
too high (and writing is more effort, reading should be enough I hope=
).


Regards,
Jörg

--=20
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@Sun.COM
Sun Microsystems GmbH, Komturstraße 18a, D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB1610=
28


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg