mysql using aio/raw device on linux
mysql using aio/raw device on linux
am 16.03.2011 16:45:53 von zhuchao
--0016e6476916a8e4f3049e9b701e
Content-Type: text/plain; charset=ISO-8859-1
hi, Guys
One Q: Can mysql binlog use raw device on Linux? Can we use asynch IO for
binlog writing? sequential non-qio fsync is slowing our throughput...
Thx
--
Regards
Zhu Chao
--0016e6476916a8e4f3049e9b701e--
Re: mysql using aio/raw device on linux
am 17.03.2011 08:14:09 von Johan De Meersman
----- Original Message -----
> From: "Chao Zhu"
>
> One Q: Can mysql binlog use raw device on Linux?
Mmm, good question. Don't really know; but I'm not convinced you'll get huge benefits from it, either. Modern filesystems tend to perform pretty close to raw throughput.
From a just-thinking-it-through point of view, I'd guess no - mysqld never seems to open binlogs for append, it always opens a new one. This may have something to do with the way replication works; not to mention the question of what'll happen if the log is full - it's not a circular buffer.
> Can we use asynch IO for binlog writing? sequential non-qio fsync is slowing our throughput...
Mmm... Theoretically, yes, you could use an async device (even nfs over UDP if you're so inclined) but async means that you're going to be losing some transactions if the server crashes.
You can also tweak http://dev.mysql.com/doc/refman/5.0/en/replication-options-b inary-log.html#sysvar_sync_binlog - basically, this controls how often the binlog fsyncs. Same caveat applies, obviously: set this to ten, and you'll have ten times less fsyncs, but you risk losing ten transactions in a crash.
If your binlogs are async, then you also risk having slaves out of sync if your master crashes.
Personally, if your binlogs are slowing you down, I would recommend putting them on faster storage. Multiple small, fast disks in RAID10 are going to be very fast, or you could invest in solid state disks - not all that expensive anymore, really. Maybe even just a RAM disk - you'll lose data when the machine crashes (and need an initscript for save/load of the data on that disk), but not if just the mysqld crashes.
Weigh the benefits of each option very, very carefully against the risk of losing data before you go through with this.
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: mysql using aio/raw device on linux
am 17.03.2011 23:00:47 von Karen Abgarian
Hi,
For the actual question, I agree with the points Johan mentioned. MySQL, to my knowledge, does not have an option to use raw devices for binary logs. Even if it had it, it would not have the benefits Chao is seeking. There is indeed a tradeoff between losing transactions and performance. If the goal is performance, the raw device would be slower since every write would have to actually complete, instead of leaving the block in the OS cache. The best is probably achieved by the battery backed cache: the server could be configured to not lose transactions and at the same time perform the work fast.
For the question of tweaking the sync_binlog, I find difficult to use values other than 0 and 1. With 0, it just ignores fsyncs, and the amount of transactions lost is at the mercy of OS cache. With 1, all transactions will always be on disk before returning to the user. I cannot make sense out of the doco's remark about that this would lose 'at most one transaction' and I assume it is a mistake.
With the value of 10, say, however, what I expect to happen, is the server will attempt to do fsync every 10 statements. Say 10 transactions are in the binary log buffer, and the server does an fsync. What is to happen with the other transactions that keep coming? If they commit in memory and return, the statement that sync_binlog syncs every 10 transactions is false. If they wait, the wait would be as large as the wait for the disk write and the result is that all transactions will be waiting for disk writes.
If somebody can shed more light on this, I would like to hear it.
Tx
Karen.
On Mar 17, 2011, at 12:14 AM, Johan De Meersman wrote:
> ----- Original Message -----
>> From: "Chao Zhu"
>>
>> One Q: Can mysql binlog use raw device on Linux?
>
> Mmm, good question. Don't really know; but I'm not convinced you'll get huge benefits from it, either. Modern filesystems tend to perform pretty close to raw throughput.
>
> From a just-thinking-it-through point of view, I'd guess no - mysqld never seems to open binlogs for append, it always opens a new one. This may have something to do with the way replication works; not to mention the question of what'll happen if the log is full - it's not a circular buffer.
>
>> Can we use asynch IO for binlog writing? sequential non-qio fsync is slowing our throughput...
>
> Mmm... Theoretically, yes, you could use an async device (even nfs over UDP if you're so inclined) but async means that you're going to be losing some transactions if the server crashes.
>
> You can also tweak http://dev.mysql.com/doc/refman/5.0/en/replication-options-b inary-log.html#sysvar_sync_binlog - basically, this controls how often the binlog fsyncs. Same caveat applies, obviously: set this to ten, and you'll have ten times less fsyncs, but you risk losing ten transactions in a crash.
>
> If your binlogs are async, then you also risk having slaves out of sync if your master crashes.
>
>
> Personally, if your binlogs are slowing you down, I would recommend putting them on faster storage. Multiple small, fast disks in RAID10 are going to be very fast, or you could invest in solid state disks - not all that expensive anymore, really. Maybe even just a RAM disk - you'll lose data when the machine crashes (and need an initscript for save/load of the data on that disk), but not if just the mysqld crashes.
>
>
> Weigh the benefits of each option very, very carefully against the risk of losing data before you go through with this.
>
>
> --
> Bier met grenadyn
> Is als mosterd by den wyn
> Sy die't drinkt, is eene kwezel
> Hy die't drinkt, is ras een ezel
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=abvk@apple.com
>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: mysql using aio/raw device on linux
am 18.03.2011 03:10:18 von zhuchao
--20cf301d43148eba65049eb847a3
Content-Type: text/plain; charset=ISO-8859-1
Thanks Guys;
The reason I was seeking RAW/AIO, is mostly about non-blocking write;
Which i mean:
Even though single write is not faster on RAW, if it supports raw and
Asynch IO write, then MySQL can continue to submit write request to disk
without waiting for the previous write to complete, and then submit a second
write request;
In this case, the commit(write throughput) can be enhanced greatly, without
blocking /keeping user wait; In our current test, we are using SAN with
huge cache and each single write only takes 0.3ms(yes very fast, close to
ramdisk i guess); But the sequential/blocking fsync call is the bottleneck:
and it can't be parallized;
That's the reason i was seeking for such option;
I was an oracle DBA before and oracle has such kind of capability(aio write)
so lgwr can have very high throughput(tens of thousands of commit per
second, and it does group commit);
Sample Trace in Unix/Oracle lgwr:
/1: semtimedop(105, 0xFFFFFFFF7FFFC914, 1, 0xFFFFFFFF7FFFC900) = 0
/1: kaio(AIOWRITE, 261, 0x390D3CE00, 8704, 0x0F5FB0007BB2B218) = 0
/1: kaio(AIOWRITE, 261, 0x390C80000, 253952, 0x0F5FD2007BB2B4A8) = 0
/1: kaio(AIOWRITE, 261, 0x390D60400, 211456, 0x0F63B2007BB2B738) = 0
/1: kaio(AIOWRITE, 261, 0x390E8EC00, 182272, 0x0F66EC007BB2B9C8) = 0
/1: kaio(AIOWRITE, 261, 0x390F10A00, 230912, 0x0F69B4007BB2BC58) = 0
/1: kaio(AIOWRITE, 261, 0x391024A00, 91648, 0x0F6D3A007BB2BEE8) = 0
Thx
On Fri, Mar 18, 2011 at 6:00 AM, Karen Abgarian wrote:
> Hi,
>
> For the actual question, I agree with the points Johan mentioned. MySQL,
> to my knowledge, does not have an option to use raw devices for binary logs.
> Even if it had it, it would not have the benefits Chao is seeking. There
> is indeed a tradeoff between losing transactions and performance. If the
> goal is performance, the raw device would be slower since every write would
> have to actually complete, instead of leaving the block in the OS cache.
> The best is probably achieved by the battery backed cache: the server could
> be configured to not lose transactions and at the same time perform the work
> fast.
>
> For the question of tweaking the sync_binlog, I find difficult to use
> values other than 0 and 1. With 0, it just ignores fsyncs, and the amount
> of transactions lost is at the mercy of OS cache. With 1, all transactions
> will always be on disk before returning to the user. I cannot make sense
> out of the doco's remark about that this would lose 'at most one
> transaction' and I assume it is a mistake.
>
> With the value of 10, say, however, what I expect to happen, is the server
> will attempt to do fsync every 10 statements. Say 10 transactions are in
> the binary log buffer, and the server does an fsync. What is to happen with
> the other transactions that keep coming? If they commit in memory and
> return, the statement that sync_binlog syncs every 10 transactions is false.
> If they wait, the wait would be as large as the wait for the disk write
> and the result is that all transactions will be waiting for disk writes.
>
> If somebody can shed more light on this, I would like to hear it.
>
> Tx
> Karen.
>
>
> On Mar 17, 2011, at 12:14 AM, Johan De Meersman wrote:
>
> > ----- Original Message -----
> >> From: "Chao Zhu"
> >>
> >> One Q: Can mysql binlog use raw device on Linux?
> >
> > Mmm, good question. Don't really know; but I'm not convinced you'll get
> huge benefits from it, either. Modern filesystems tend to perform pretty
> close to raw throughput.
> >
> > From a just-thinking-it-through point of view, I'd guess no - mysqld
> never seems to open binlogs for append, it always opens a new one. This may
> have something to do with the way replication works; not to mention the
> question of what'll happen if the log is full - it's not a circular buffer.
> >
> >> Can we use asynch IO for binlog writing? sequential non-qio fsync is
> slowing our throughput...
> >
> > Mmm... Theoretically, yes, you could use an async device (even nfs over
> UDP if you're so inclined) but async means that you're going to be losing
> some transactions if the server crashes.
> >
> > You can also tweak
> http://dev.mysql.com/doc/refman/5.0/en/replication-options-b inary-log.html#sysvar_sync_binlog- basically, this controls how often the binlog fsyncs. Same caveat applies,
> obviously: set this to ten, and you'll have ten times less fsyncs, but you
> risk losing ten transactions in a crash.
> >
> > If your binlogs are async, then you also risk having slaves out of sync
> if your master crashes.
> >
> >
> > Personally, if your binlogs are slowing you down, I would recommend
> putting them on faster storage. Multiple small, fast disks in RAID10 are
> going to be very fast, or you could invest in solid state disks - not all
> that expensive anymore, really. Maybe even just a RAM disk - you'll lose
> data when the machine crashes (and need an initscript for save/load of the
> data on that disk), but not if just the mysqld crashes.
> >
> >
> > Weigh the benefits of each option very, very carefully against the risk
> of losing data before you go through with this.
> >
> >
> > --
> > Bier met grenadyn
> > Is als mosterd by den wyn
> > Sy die't drinkt, is eene kwezel
> > Hy die't drinkt, is ras een ezel
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe: http://lists.mysql.com/mysql?unsub=abvk@apple.com
> >
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=zhuchao@gmail.com
>
>
--
Regards
Zhu Chao
--20cf301d43148eba65049eb847a3--
Re: mysql using aio/raw device on linux
am 18.03.2011 03:15:33 von zhuchao
--90e6ba2124794f2cd4049eb85ad1
Content-Type: text/plain; charset=ISO-8859-1
Thanks Guys;
The reason I was seeking RAW/AIO, is mostly about non-blocking write;
Which i mean:
Even though single write is not faster on RAW, if it supports raw and
Asynch IO write, then MySQL can continue to submit write request to disk
without waiting for the previous write to complete, and then submit a second
write request;
In this case, the commit(write throughput) can be enhanced greatly, without
blocking /keeping user wait; In our current test, we are using SAN with
huge cache and each single write only takes 0.3ms(yes very fast, close to
ramdisk i guess); But the sequential/blocking fsync call is the bottleneck:
and it can't be parallized;
That's the reason i was seeking for such option;
I was an oracle DBA before and oracle has such kind of capability(aio write)
so lgwr can have very high throughput(tens of thousands of commit per
second, and it does group commit);
Sample Trace in Unix/Oracle lgwr:
/1: semtimedop(105, 0xFFFFFFFF7FFFC914, 1, 0xFFFFFFFF7FFFC900) = 0
/1: kaio(AIOWRITE, 261, 0x390D3CE00, 8704, 0x0F5FB0007BB2B218) = 0
/1: kaio(AIOWRITE, 261, 0x390C80000, 253952, 0x0F5FD2007BB2B4A8) = 0
/1: kaio(AIOWRITE, 261, 0x390D60400, 211456, 0x0F63B2007BB2B738) = 0
/1: kaio(AIOWRITE, 261, 0x390E8EC00, 182272, 0x0F66EC007BB2B9C8) = 0
/1: kaio(AIOWRITE, 261, 0x390F10A00, 230912, 0x0F69B4007BB2BC58) = 0
/1: kaio(AIOWRITE, 261, 0x391024A00, 91648, 0x0F6D3A007BB2BEE8) = 0
Thx
On Fri, Mar 18, 2011 at 6:00 AM, Karen Abgarian wrote:
> Hi,
>
> For the actual question, I agree with the points Johan mentioned. MySQL,
> to my knowledge, does not have an option to use raw devices for binary logs.
> Even if it had it, it would not have the benefits Chao is seeking. There
> is indeed a tradeoff between losing transactions and performance. If the
> goal is performance, the raw device would be slower since every write would
> have to actually complete, instead of leaving the block in the OS cache.
> The best is probably achieved by the battery backed cache: the server could
> be configured to not lose transactions and at the same time perform the work
> fast.
>
> For the question of tweaking the sync_binlog, I find difficult to use
> values other than 0 and 1. With 0, it just ignores fsyncs, and the amount
> of transactions lost is at the mercy of OS cache. With 1, all transactions
> will always be on disk before returning to the user. I cannot make sense
> out of the doco's remark about that this would lose 'at most one
> transaction' and I assume it is a mistake.
>
> With the value of 10, say, however, what I expect to happen, is the server
> will attempt to do fsync every 10 statements. Say 10 transactions are in
> the binary log buffer, and the server does an fsync. What is to happen with
> the other transactions that keep coming? If they commit in memory and
> return, the statement that sync_binlog syncs every 10 transactions is false.
> If they wait, the wait would be as large as the wait for the disk write
> and the result is that all transactions will be waiting for disk writes.
>
> If somebody can shed more light on this, I would like to hear it.
>
> Tx
> Karen.
>
>
> On Mar 17, 2011, at 12:14 AM, Johan De Meersman wrote:
>
> > ----- Original Message -----
> >> From: "Chao Zhu"
> >>
> >> One Q: Can mysql binlog use raw device on Linux?
> >
> > Mmm, good question. Don't really know; but I'm not convinced you'll get
> huge benefits from it, either. Modern filesystems tend to perform pretty
> close to raw throughput.
> >
> > From a just-thinking-it-through point of view, I'd guess no - mysqld
> never seems to open binlogs for append, it always opens a new one. This may
> have something to do with the way replication works; not to mention the
> question of what'll happen if the log is full - it's not a circular buffer.
> >
> >> Can we use asynch IO for binlog writing? sequential non-qio fsync is
> slowing our throughput...
> >
> > Mmm... Theoretically, yes, you could use an async device (even nfs over
> UDP if you're so inclined) but async means that you're going to be losing
> some transactions if the server crashes.
> >
> > You can also tweak
> http://dev.mysql.com/doc/refman/5.0/en/replication-options-b inary-log.html#sysvar_sync_binlog- basically, this controls how often the binlog fsyncs. Same caveat applies,
> obviously: set this to ten, and you'll have ten times less fsyncs, but you
> risk losing ten transactions in a crash.
> >
> > If your binlogs are async, then you also risk having slaves out of sync
> if your master crashes.
> >
> >
> > Personally, if your binlogs are slowing you down, I would recommend
> putting them on faster storage. Multiple small, fast disks in RAID10 are
> going to be very fast, or you could invest in solid state disks - not all
> that expensive anymore, really. Maybe even just a RAM disk - you'll lose
> data when the machine crashes (and need an initscript for save/load of the
> data on that disk), but not if just the mysqld crashes.
> >
> >
> > Weigh the benefits of each option very, very carefully against the risk
> of losing data before you go through with this.
> >
> >
> > --
> > Bier met grenadyn
> > Is als mosterd by den wyn
> > Sy die't drinkt, is eene kwezel
> > Hy die't drinkt, is ras een ezel
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe: http://lists.mysql.com/mysql?unsub=abvk@apple.com
> >
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=zhuchao@gmail.com
>
>
--
Regards
Zhu Chao
--90e6ba2124794f2cd4049eb85ad1--
Re: mysql using aio/raw device on linux
am 18.03.2011 06:28:16 von Claudio Nanni - TomTom
--90e6ba1ef2608f473a049ebb0bf0
Content-Type: text/plain; charset=ISO-8859-1
Just my two cents.
That's why it is Oracle.
Oracle is (almost) an operating system,
with its advanced implementation of device/file system management,
up to a logical volume management just consider ASM for example.
MySQL is quite simpler.
May be Oracle gurus could bring some key benefit to MySQL now removing some
historical bottlenecks.
Cheers
Claudio
2011/3/18 Zhu,Chao
> Thanks Guys;
> The reason I was seeking RAW/AIO, is mostly about non-blocking write;
> Which i mean:
> Even though single write is not faster on RAW, if it supports raw and
> Asynch IO write, then MySQL can continue to submit write request to disk
> without waiting for the previous write to complete, and then submit a
> second
> write request;
> In this case, the commit(write throughput) can be enhanced greatly,
> without
> blocking /keeping user wait; In our current test, we are using SAN with
> huge cache and each single write only takes 0.3ms(yes very fast, close to
> ramdisk i guess); But the sequential/blocking fsync call is the bottleneck:
> and it can't be parallized;
>
> That's the reason i was seeking for such option;
>
> I was an oracle DBA before and oracle has such kind of capability(aio
> write)
> so lgwr can have very high throughput(tens of thousands of commit per
> second, and it does group commit);
>
> Sample Trace in Unix/Oracle lgwr:
> /1: semtimedop(105, 0xFFFFFFFF7FFFC914, 1, 0xFFFFFFFF7FFFC900) = 0
> /1: kaio(AIOWRITE, 261, 0x390D3CE00, 8704, 0x0F5FB0007BB2B218) = 0
> /1: kaio(AIOWRITE, 261, 0x390C80000, 253952, 0x0F5FD2007BB2B4A8) = 0
> /1: kaio(AIOWRITE, 261, 0x390D60400, 211456, 0x0F63B2007BB2B738) = 0
> /1: kaio(AIOWRITE, 261, 0x390E8EC00, 182272, 0x0F66EC007BB2B9C8) = 0
> /1: kaio(AIOWRITE, 261, 0x390F10A00, 230912, 0x0F69B4007BB2BC58) = 0
> /1: kaio(AIOWRITE, 261, 0x391024A00, 91648, 0x0F6D3A007BB2BEE8) = 0
>
> Thx
>
>
> On Fri, Mar 18, 2011 at 6:00 AM, Karen Abgarian wrote:
>
> > Hi,
> >
> > For the actual question, I agree with the points Johan mentioned.
> MySQL,
> > to my knowledge, does not have an option to use raw devices for binary
> logs.
> > Even if it had it, it would not have the benefits Chao is seeking.
> There
> > is indeed a tradeoff between losing transactions and performance. If
> the
> > goal is performance, the raw device would be slower since every write
> would
> > have to actually complete, instead of leaving the block in the OS cache.
> > The best is probably achieved by the battery backed cache: the server
> could
> > be configured to not lose transactions and at the same time perform the
> work
> > fast.
> >
> > For the question of tweaking the sync_binlog, I find difficult to use
> > values other than 0 and 1. With 0, it just ignores fsyncs, and the
> amount
> > of transactions lost is at the mercy of OS cache. With 1, all
> transactions
> > will always be on disk before returning to the user. I cannot make sense
> > out of the doco's remark about that this would lose 'at most one
> > transaction' and I assume it is a mistake.
> >
> > With the value of 10, say, however, what I expect to happen, is the
> server
> > will attempt to do fsync every 10 statements. Say 10 transactions are
> in
> > the binary log buffer, and the server does an fsync. What is to happen
> with
> > the other transactions that keep coming? If they commit in memory and
> > return, the statement that sync_binlog syncs every 10 transactions is
> false.
> > If they wait, the wait would be as large as the wait for the disk write
> > and the result is that all transactions will be waiting for disk writes.
> >
> > If somebody can shed more light on this, I would like to hear it.
> >
> > Tx
> > Karen.
> >
> >
> > On Mar 17, 2011, at 12:14 AM, Johan De Meersman wrote:
> >
> > > ----- Original Message -----
> > >> From: "Chao Zhu"
> > >>
> > >> One Q: Can mysql binlog use raw device on Linux?
> > >
> > > Mmm, good question. Don't really know; but I'm not convinced you'll get
> > huge benefits from it, either. Modern filesystems tend to perform pretty
> > close to raw throughput.
> > >
> > > From a just-thinking-it-through point of view, I'd guess no - mysqld
> > never seems to open binlogs for append, it always opens a new one. This
> may
> > have something to do with the way replication works; not to mention the
> > question of what'll happen if the log is full - it's not a circular
> buffer.
> > >
> > >> Can we use asynch IO for binlog writing? sequential non-qio fsync is
> > slowing our throughput...
> > >
> > > Mmm... Theoretically, yes, you could use an async device (even nfs over
> > UDP if you're so inclined) but async means that you're going to be losing
> > some transactions if the server crashes.
> > >
> > > You can also tweak
> >
> http://dev.mysql.com/doc/refman/5.0/en/replication-options-b inary-log.html#sysvar_sync_binlog-basically, this controls how often the binlog fsyncs. Same caveat applies,
> > obviously: set this to ten, and you'll have ten times less fsyncs, but
> you
> > risk losing ten transactions in a crash.
> > >
> > > If your binlogs are async, then you also risk having slaves out of sync
> > if your master crashes.
> > >
> > >
> > > Personally, if your binlogs are slowing you down, I would recommend
> > putting them on faster storage. Multiple small, fast disks in RAID10 are
> > going to be very fast, or you could invest in solid state disks - not all
> > that expensive anymore, really. Maybe even just a RAM disk - you'll lose
> > data when the machine crashes (and need an initscript for save/load of
> the
> > data on that disk), but not if just the mysqld crashes.
> > >
> > >
> > > Weigh the benefits of each option very, very carefully against the risk
> > of losing data before you go through with this.
> > >
> > >
> > > --
> > > Bier met grenadyn
> > > Is als mosterd by den wyn
> > > Sy die't drinkt, is eene kwezel
> > > Hy die't drinkt, is ras een ezel
> > >
> > > --
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe: http://lists.mysql.com/mysql?unsub=abvk@apple.com
> > >
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe: http://lists.mysql.com/mysql?unsub=zhuchao@gmail.com
> >
> >
>
>
> --
> Regards
> Zhu Chao
>
--
Claudio
--90e6ba1ef2608f473a049ebb0bf0--
Re: mysql using aio/raw device on linux
am 19.03.2011 01:22:45 von Karen Abgarian
Hi...
If we are to compare MySQL/binlog with Oracle's redo log, there are some known differences. First of all, the (somewhat) equivalent structure in Oracle for the MySQL's binary log is not the redo log the log writer is writing to. It is the archived log. The function of those is also different. The redo log is needed for crash recovery. The archived log is needed for media recovery.
Second, In Oracle, transactions always have to wait for commits. Taking that as the starting point, the optimizations for writing the redo are put in place. The approach is therefore "we have to write, so let us optimize it". As can be seen, it does not make sense to optimize redo writing unless we are targeting zero transaction loss. There is a large difference between zero and "some" in this context.
So, if we ever wanted to optimize the mysql's writing to binary logs, it is unclear what we would do that for. It cannot be protection against server crashes, because that could be resolved by using InnoDB or any other transactional storage engine. It is not protection against media failures, because a loss of the binary log because of media failures is equivalent to losing the archived log in Oracle for the same reasons.
The only thing that is somewhat impacted is replication on host crashes. However, if we wanted to be precise on this, we would notice that similarly to losing the transactions in the OS cache, we could also lose updates to the MySQL tables at the same time. Which means that on server crashes the primary servers should be abandoned and the service be switched to replica. If this is done, it does not matter that some transactions are lost (as it does not when the same thing is done in Oracle, during primary media failures and provided it is not configured to not lose data between primary and standby).
Tx
Karen.
On Mar 17, 2011, at 7:15 PM, Zhu,Chao wrote:
> Thanks Guys;
> The reason I was seeking RAW/AIO, is mostly about non-blocking write;
> Which i mean:
> Even though single write is not faster on RAW, if it supports raw and
> Asynch IO write, then MySQL can continue to submit write request to disk
> without waiting for the previous write to complete, and then submit a second
> write request;
> In this case, the commit(write throughput) can be enhanced greatly, without
> blocking /keeping user wait; In our current test, we are using SAN with
> huge cache and each single write only takes 0.3ms(yes very fast, close to
> ramdisk i guess); But the sequential/blocking fsync call is the bottleneck:
> and it can't be parallized;
>
> That's the reason i was seeking for such option;
>
> I was an oracle DBA before and oracle has such kind of capability(aio write)
> so lgwr can have very high throughput(tens of thousands of commit per
> second, and it does group commit);
>
> Sample Trace in Unix/Oracle lgwr:
> /1: semtimedop(105, 0xFFFFFFFF7FFFC914, 1, 0xFFFFFFFF7FFFC900) = 0
> /1: kaio(AIOWRITE, 261, 0x390D3CE00, 8704, 0x0F5FB0007BB2B218) = 0
> /1: kaio(AIOWRITE, 261, 0x390C80000, 253952, 0x0F5FD2007BB2B4A8) = 0
> /1: kaio(AIOWRITE, 261, 0x390D60400, 211456, 0x0F63B2007BB2B738) = 0
> /1: kaio(AIOWRITE, 261, 0x390E8EC00, 182272, 0x0F66EC007BB2B9C8) = 0
> /1: kaio(AIOWRITE, 261, 0x390F10A00, 230912, 0x0F69B4007BB2BC58) = 0
> /1: kaio(AIOWRITE, 261, 0x391024A00, 91648, 0x0F6D3A007BB2BEE8) = 0
>
> Thx
>
>
> On Fri, Mar 18, 2011 at 6:00 AM, Karen Abgarian wrote:
>
>> Hi,
>>
>> For the actual question, I agree with the points Johan mentioned. MySQL,
>> to my knowledge, does not have an option to use raw devices for binary logs.
>> Even if it had it, it would not have the benefits Chao is seeking. There
>> is indeed a tradeoff between losing transactions and performance. If the
>> goal is performance, the raw device would be slower since every write would
>> have to actually complete, instead of leaving the block in the OS cache.
>> The best is probably achieved by the battery backed cache: the server could
>> be configured to not lose transactions and at the same time perform the work
>> fast.
>>
>> For the question of tweaking the sync_binlog, I find difficult to use
>> values other than 0 and 1. With 0, it just ignores fsyncs, and the amount
>> of transactions lost is at the mercy of OS cache. With 1, all transactions
>> will always be on disk before returning to the user. I cannot make sense
>> out of the doco's remark about that this would lose 'at most one
>> transaction' and I assume it is a mistake.
>>
>> With the value of 10, say, however, what I expect to happen, is the server
>> will attempt to do fsync every 10 statements. Say 10 transactions are in
>> the binary log buffer, and the server does an fsync. What is to happen with
>> the other transactions that keep coming? If they commit in memory and
>> return, the statement that sync_binlog syncs every 10 transactions is false.
>> If they wait, the wait would be as large as the wait for the disk write
>> and the result is that all transactions will be waiting for disk writes.
>>
>> If somebody can shed more light on this, I would like to hear it.
>>
>> Tx
>> Karen.
>>
>>
>> On Mar 17, 2011, at 12:14 AM, Johan De Meersman wrote:
>>
>>> ----- Original Message -----
>>>> From: "Chao Zhu"
>>>>
>>>> One Q: Can mysql binlog use raw device on Linux?
>>>
>>> Mmm, good question. Don't really know; but I'm not convinced you'll get
>> huge benefits from it, either. Modern filesystems tend to perform pretty
>> close to raw throughput.
>>>
>>> From a just-thinking-it-through point of view, I'd guess no - mysqld
>> never seems to open binlogs for append, it always opens a new one. This may
>> have something to do with the way replication works; not to mention the
>> question of what'll happen if the log is full - it's not a circular buffer.
>>>
>>>> Can we use asynch IO for binlog writing? sequential non-qio fsync is
>> slowing our throughput...
>>>
>>> Mmm... Theoretically, yes, you could use an async device (even nfs over
>> UDP if you're so inclined) but async means that you're going to be losing
>> some transactions if the server crashes.
>>>
>>> You can also tweak
>> http://dev.mysql.com/doc/refman/5.0/en/replication-options-b inary-log.html#sysvar_sync_binlog- basically, this controls how often the binlog fsyncs. Same caveat applies,
>> obviously: set this to ten, and you'll have ten times less fsyncs, but you
>> risk losing ten transactions in a crash.
>>>
>>> If your binlogs are async, then you also risk having slaves out of sync
>> if your master crashes.
>>>
>>>
>>> Personally, if your binlogs are slowing you down, I would recommend
>> putting them on faster storage. Multiple small, fast disks in RAID10 are
>> going to be very fast, or you could invest in solid state disks - not all
>> that expensive anymore, really. Maybe even just a RAM disk - you'll lose
>> data when the machine crashes (and need an initscript for save/load of the
>> data on that disk), but not if just the mysqld crashes.
>>>
>>>
>>> Weigh the benefits of each option very, very carefully against the risk
>> of losing data before you go through with this.
>>>
>>>
>>> --
>>> Bier met grenadyn
>>> Is als mosterd by den wyn
>>> Sy die't drinkt, is eene kwezel
>>> Hy die't drinkt, is ras een ezel
>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?unsub=abvk@apple.com
>>>
>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: http://lists.mysql.com/mysql?unsub=zhuchao@gmail.com
>>
>>
>
>
> --
> Regards
> Zhu Chao
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org