iostat with raid device...

am 08.04.2011 21:55:39 von Linux Raid Study

Hello,

I have a raid device /dev/md0 based on 4 devices sd[abcd].

When I write 4GB to /dev/md0, I see following output from iostat...

Ques:
Shouldn't I see write/sec to be same for all four drives? Why does
/dev/sdd always have higher value for BlksWrtn/sec?
My strip size is 1MB.

thanks for any pointers...

avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 0.34 0.03 0.00 99.61

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.08 247.77 338.73 37478883 51237136
sda1 1.08 247.77 338.73 37478195 51237136
sdb 1.08 247.73 338.78 37472990 51245712
sdb1 1.08 247.73 338.78 37472302 51245712
sdc 1.10 247.82 338.66 37486670 51226640
sdc1 1.10 247.82 338.66 37485982 51226640
sdd 1.09 118.46 467.97 17918510 70786576
sdd1 1.09 118.45 467.97 17917822 70786576
md0 65.60 443.79 1002.42 67129812 151629440
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 09.04.2011 00:05:01 von Roberto Spadim

another question... why md have more tps? disk elevators? sector size?

2011/4/8 Linux Raid Study :
> Hello,
>
> I have a raid device /dev/md0 based on 4 devices sd[abcd].
>
> When I write 4GB to /dev/md0, I see following output from iostat...
>
> Ques:
> Shouldn't I see write/sec to be same for all four drives? Why does
> /dev/sdd always have higher value for =A0BlksWrtn/sec?
> My strip size is 1MB.
>
> thanks for any pointers...
>
> avg-cpu: =A0%user =A0 %nice %system %iowait =A0%steal =A0 %idle
> =A0 =A0 =A0 =A0 =A0 0.02 =A0 =A00.00 =A0 =A00.34 =A0 =A00.03 =A0 =A00=
00 =A0 99.61
>
> Device: =A0 =A0 =A0 =A0 =A0 =A0tps =A0 Blk_read/s =A0 Blk_wrtn/s =A0 =
Blk_read =A0 Blk_wrtn
> sda =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.08 =A0 =A0 =A0 247.77 =A0 =A0 =A0 3=
38.73 =A0 37478883 =A0 51237136
> sda1 =A0 =A0 =A0 =A0 =A0 =A0 =A01.08 =A0 =A0 =A0 247.77 =A0 =A0 =A0 3=
38.73 =A0 37478195 =A0 51237136
> sdb =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.08 =A0 =A0 =A0 247.73 =A0 =A0 =A0 3=
38.78 =A0 37472990 =A0 51245712
> sdb1 =A0 =A0 =A0 =A0 =A0 =A0 =A01.08 =A0 =A0 =A0 247.73 =A0 =A0 =A0 3=
38.78 =A0 37472302 =A0 51245712
> sdc =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.10 =A0 =A0 =A0 247.82 =A0 =A0 =A0 3=
38.66 =A0 37486670 =A0 51226640
> sdc1 =A0 =A0 =A0 =A0 =A0 =A0 =A01.10 =A0 =A0 =A0 247.82 =A0 =A0 =A0 3=
38.66 =A0 37485982 =A0 51226640
> sdd =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.09 =A0 =A0 =A0 118.46 =A0 =A0 =A0 4=
67.97 =A0 17918510 =A0 70786576
> sdd1 =A0 =A0 =A0 =A0 =A0 =A0 =A01.09 =A0 =A0 =A0 118.45 =A0 =A0 =A0 4=
67.97 =A0 17917822 =A0 70786576
> md0 =A0 =A0 =A0 =A0 =A0 =A0 =A065.60 =A0 =A0 =A0 443.79 =A0 =A0 =A010=
02.42 =A0 67129812 =A0151629440
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 09.04.2011 00:10:45 von Linux Raid Study

Thanks for pointing this out...I did observe this but forgot to
mention in the email...

Can someone give some insight into this.

Thanks.

On Fri, Apr 8, 2011 at 3:05 PM, Roberto Spadim =
wrote:
> another question... why md have more tps? disk elevators? sector size=
?
>
> 2011/4/8 Linux Raid Study :
>> Hello,
>>
>> I have a raid device /dev/md0 based on 4 devices sd[abcd].
>>
>> When I write 4GB to /dev/md0, I see following output from iostat...
>>
>> Ques:
>> Shouldn't I see write/sec to be same for all four drives? Why does
>> /dev/sdd always have higher value for Â BlksWrtn/sec?
>> My strip size is 1MB.
>>
>> thanks for any pointers...
>>
>> avg-cpu: Â %user Â %nice %system %iowait Â %steal Â =
%idle
>> Â Â Â Â Â 0.02 Â Â 0.00 Â =C2=
=A00.34 Â Â 0.03 Â Â 0.00 Â 99.61
>>
>> Device: Â Â Â Â Â Â tps Â Blk_read=
/s Â Blk_wrtn/s Â Blk_read Â Blk_wrtn
>> sda Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.77 Â Â Â 338.73 Â 37478883 Â 51=
237136
>> sda1 Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.77 Â Â Â 338.73 Â 37478195 Â 51=
237136
>> sdb Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.73 Â Â Â 338.78 Â 37472990 Â 51=
245712
>> sdb1 Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.73 Â Â Â 338.78 Â 37472302 Â 51=
245712
>> sdc Â Â Â Â Â Â Â 1.10 Â =C2=
=A0 Â 247.82 Â Â Â 338.66 Â 37486670 Â 51=
226640
>> sdc1 Â Â Â Â Â Â Â 1.10 Â =C2=
=A0 Â 247.82 Â Â Â 338.66 Â 37485982 Â 51=
226640
>> sdd Â Â Â Â Â Â Â 1.09 Â =C2=
=A0 Â 118.46 Â Â Â 467.97 Â 17918510 Â 70=
786576
>> sdd1 Â Â Â Â Â Â Â 1.09 Â =C2=
=A0 Â 118.45 Â Â Â 467.97 Â 17917822 Â 70=
786576
>> md0 Â Â Â Â Â Â Â 65.60 Â =C2=
=A0 Â 443.79 Â Â Â 1002.42 Â 67129812 Â 151=
629440
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at Â http://vger.kernel.org/majordomo-info.h=
tml
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 09.04.2011 01:46:29 von NeilBrown

On Fri, 8 Apr 2011 12:55:39 -0700 Linux Raid Study
wrote:

> Hello,
>
> I have a raid device /dev/md0 based on 4 devices sd[abcd].

Would this be raid0? raid1? raid5? raid6? raid10?
It could make a difference.

>
> When I write 4GB to /dev/md0, I see following output from iostat...

Are you writing directly to the /dev/md0, or to a filesystem mounted
from /dev/md0? It might be easier to explain in the second case, but you
text suggests the first case.

>
> Ques:
> Shouldn't I see write/sec to be same for all four drives? Why does
> /dev/sdd always have higher value for BlksWrtn/sec?
> My strip size is 1MB.
>
> thanks for any pointers...
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.02 0.00 0.34 0.03 0.00 99.61
>
> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
> sda 1.08 247.77 338.73 37478883 51237136
> sda1 1.08 247.77 338.73 37478195 51237136
> sdb 1.08 247.73 338.78 37472990 51245712
> sdb1 1.08 247.73 338.78 37472302 51245712
> sdc 1.10 247.82 338.66 37486670 51226640
> sdc1 1.10 247.82 338.66 37485982 51226640
> sdd 1.09 118.46 467.97 17918510 70786576
> sdd1 1.09 118.45 467.97 17917822 70786576
> md0 65.60 443.79 1002.42 67129812 151629440

Doing the sums, for every 2 blocks written to md0 we see 3 blocks written to
some underlying device. That doesn't make much sense for a 4 drive array.
If we assume that the extra writes to sdd were from some other source, then
It is closer to a 3:4 ratio which suggests raid5.
So I'm guessing that the array is newly created and is recovering the data on
sdd1 at the same time as you are doing the IO test.
This would agree with the observation that sd[abc] see a lot more reads than
sdd.

I'll let you figure out the tps number.... do the math to find out the
average blk/t number for each device.

NeilBrown

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 09.04.2011 02:40:46 von Linux Raid Study

Hi Neil,

This is raid5. I have mounted /dev/md0 to /mnt and file system is ext4.

The system is newly created. Steps:
mdadm for raid5
mkfs.ext4 /dev/md0
mount /dev/md0 /mnt/raid
Export /mnt/raid to remote PC using CIFS
Copy file from PC to the mounted drive

An update....
I just ran the test again (without doing reformatting device) and
noticed all 4 HDDs incremented the #ofWritesBlocks equally. This
implies that when raid was configured first time, raid5 was trying to
do its own stuff (recovery)...

What I'm not sure of is if the device is newly formatted, would raid
recovery happen? What else could explain difference in the first run
of IO benchmark?

Thanks.

On Fri, Apr 8, 2011 at 4:46 PM, NeilBrown wrote:
> On Fri, 8 Apr 2011 12:55:39 -0700 Linux Raid Study
> wrote:
>
>> Hello,
>>
>> I have a raid device /dev/md0 based on 4 devices sd[abcd].
>
> Would this be raid0? raid1? raid5? raid6? raid10?
> It could make a difference.
>
>>
>> When I write 4GB to /dev/md0, I see following output from iostat...
>
> Are you writing directly to the /dev/md0, or to a filesystem mounted
> from /dev/md0? Â It might be easier to explain in the second case=
, but you
> text suggests the first case.
>
>>
>> Ques:
>> Shouldn't I see write/sec to be same for all four drives? Why does
>> /dev/sdd always have higher value for Â BlksWrtn/sec?
>> My strip size is 1MB.
>>
>> thanks for any pointers...
>>
>> avg-cpu: Â %user Â %nice %system %iowait Â %steal Â =
%idle
>> Â Â Â Â Â Â 0.02 Â Â 0.00 Â =
Â 0.34 Â Â 0.03 Â Â 0.00 Â 99.61
>>
>> Device: Â Â Â Â Â Â tps Â Blk_read=
/s Â Blk_wrtn/s Â Blk_read Â Blk_wrtn
>> sda Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.77 Â Â Â 338.73 Â 37478883 Â 51=
237136
>> sda1 Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.77 Â Â Â 338.73 Â 37478195 Â 51=
237136
>> sdb Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.73 Â Â Â 338.78 Â 37472990 Â 51=
245712
>> sdb1 Â Â Â Â Â Â Â 1.08 Â =C2=
=A0 Â 247.73 Â Â Â 338.78 Â 37472302 Â 51=
245712
>> sdc Â Â Â Â Â Â Â 1.10 Â =C2=
=A0 Â 247.82 Â Â Â 338.66 Â 37486670 Â 51=
226640
>> sdc1 Â Â Â Â Â Â Â 1.10 Â =C2=
=A0 Â 247.82 Â Â Â 338.66 Â 37485982 Â 51=
226640
>> sdd Â Â Â Â Â Â Â 1.09 Â =C2=
=A0 Â 118.46 Â Â Â 467.97 Â 17918510 Â 70=
786576
>> sdd1 Â Â Â Â Â Â Â 1.09 Â =C2=
=A0 Â 118.45 Â Â Â 467.97 Â 17917822 Â 70=
786576
>> md0 Â Â Â Â Â Â Â 65.60 Â =C2=
=A0 Â 443.79 Â Â Â 1002.42 Â 67129812 Â 151=
629440
>
> Doing the sums, for every 2 blocks written to md0 we see 3 blocks wri=
tten to
> some underlying device. Â That doesn't make much sense for a 4 dr=
ive array.
> If we assume that the extra writes to sdd were from some other source=
, then
> It is closer to a 3:4 ratio which suggests raid5.
> So I'm guessing that the array is newly created and is recovering the=
data on
> sdd1 at the same time as you are doing the IO test.
> This would agree with the observation that sd[abc] see a lot more rea=
ds than
> sdd.
>
> I'll let you figure out the tps number.... do the math to find out th=
e
> average blk/t number for each device.
>
> NeilBrown
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at Â http://vger.kernel.org/majordomo-info.h=
tml
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 09.04.2011 10:50:44 von Robin Hill

--mxv5cy4qt+RJ9ypb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:

> What I'm not sure of is if the device is newly formatted, would raid
> recovery happen? What else could explain difference in the first run
> of IO benchmark?
>=20
When an array is first created, it's created in a degraded state - this
is the simplest way to make it available to the user instantly. The
final drive(s) are then automatically rebuilt, calculating the
parity/data information as normal for recovering a drive.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--mxv5cy4qt+RJ9ypb
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk2gHdoACgkQShxCyD40xBL6OwCgsq/wOJfj380UPhoT6m33 71nL
qh4Anj8+vtiVzOfzHeHdn9m7UbWY/crm
=mlT/
-----END PGP SIGNATURE-----

--mxv5cy4qt+RJ9ypb--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 11.04.2011 10:32:34 von Linux Raid Study

Hi Robin,

Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
the iostat output are ok...is that correct?

Thanks.

On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill wrot=
e:
> On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
>
>> What I'm not sure of is if the device is newly formatted, would raid
>> recovery happen? What else could explain difference in the first run
>> of IO benchmark?
>>
> When an array is first created, it's created in a degraded state - th=
is
> is the simplest way to make it available to the user instantly. The
> final drive(s) are then automatically rebuilt, calculating the
> parity/data information as normal for recovering a drive.
>
> Cheers,
> Â Â Robin
> --
> Â Â ___
> Â Â ( ' } Â Â | Â Â Â Robin Hill =C2=
=A0 Â Â Â |
> Â / / ) Â Â Â | Little Jim says .... Â Â =
Â Â Â Â Â Â Â Â Â Â =
Â Â |
> Â // !! Â Â Â | Â Â Â "He fallen in =
de water !!" Â Â Â Â Â Â Â Â |
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 11.04.2011 11:25:59 von Robin Hill

--J2SCkAp4GZ/dPZZf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill wrote:
> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
> >
> >> What I'm not sure of is if the device is newly formatted, would raid
> >> recovery happen? What else could explain difference in the first run
> >> of IO benchmark?
> >>
> > When an array is first created, it's created in a degraded state - this
> > is the simplest way to make it available to the user instantly. The
> > final drive(s) are then automatically rebuilt, calculating the
> > parity/data information as normal for recovering a drive.
> >
> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
> the iostat output are ok...is that correct?
>=20
If it hadn't completed the initial recovery, yes. If it _had_ completed
the initial recovery then I'd expect writes to be balanced (barring
any differences in hardware).

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--J2SCkAp4GZ/dPZZf
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk2iySYACgkQShxCyD40xBLsLACglIBrmwi9oYVyE35th1yy rHHn
fisAniQalq/tZUL/jvJDPppBrj7SRk6F
=H8aR
-----END PGP SIGNATURE-----

--J2SCkAp4GZ/dPZZf--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 11.04.2011 11:36:50 von Linux Raid Study

The initial recovery should normally be done during first few minutes
... this is a newly formatted disk so there isn't any user data
there. So, if I run the IO benchmark after say 3-4 min of doing, I
should be ok?

mdam --create /dev/md0 --raid5....
mount /dev/md0 /mnt/raid
mkfs.ext4 /mnt/raid

..wait 3-4 min

run IO benchmark...

Am I correct?

Thanks.

On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill wro=
te:
> On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
>> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill w=
rote:
>> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
>> >
>> >> What I'm not sure of is if the device is newly formatted, would r=
aid
>> >> recovery happen? What else could explain difference in the first =
run
>> >> of IO benchmark?
>> >>
>> > When an array is first created, it's created in a degraded state -=
this
>> > is the simplest way to make it available to the user instantly. Th=
e
>> > final drive(s) are then automatically rebuilt, calculating the
>> > parity/data information as normal for recovering a drive.
>> >
>> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers i=
n
>> the iostat output are ok...is that correct?
>>
> If it hadn't completed the initial recovery, yes. Â If it _had_ c=
ompleted
> the initial recovery then I'd expect writes to be balanced (barring
> any differences in hardware).
>
> Cheers,
> Â Â Robin
> --
> Â Â ___
> Â Â ( ' } Â Â | Â Â Â Robin Hill =C2=
=A0 Â Â Â |
> Â / / ) Â Â Â | Little Jim says .... Â Â =
Â Â Â Â Â Â Â Â Â Â =
Â Â |
> Â // !! Â Â Â | Â Â Â "He fallen in =
de water !!" Â Â Â Â Â Â Â Â |
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 11.04.2011 11:53:55 von Robin Hill

--/WwmFnJnmDyWGHa4
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote:
> On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill wrote:
> > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
> >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill wro=
te:
> >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
> >> >
> >> >> What I'm not sure of is if the device is newly formatted, would raid
> >> >> recovery happen? What else could explain difference in the first run
> >> >> of IO benchmark?
> >> >>
> >> > When an array is first created, it's created in a degraded state - t=
his
> >> > is the simplest way to make it available to the user instantly. The
> >> > final drive(s) are then automatically rebuilt, calculating the
> >> > parity/data information as normal for recovering a drive.
> >> >
> >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
> >> the iostat output are ok...is that correct?
> >>
> > If it hadn't completed the initial recovery, yes. =A0If it _had_ comple=
ted
> > the initial recovery then I'd expect writes to be balanced (barring
> > any differences in hardware).
> >
> The initial recovery should normally be done during first few minutes
> .... this is a newly formatted disk so there isn't any user data
> there. So, if I run the IO benchmark after say 3-4 min of doing, I
> should be ok?
>=20
> mdam --create /dev/md0 --raid5....
> mount /dev/md0 /mnt/raid
> mkfs.ext4 /mnt/raid
>=20
> ...wait 3-4 min
>=20
> run IO benchmark...
>=20
> Am I correct?
>=20
No, depending on the size of the drives, the initial recovery can take
hours or even days. For RAID5 with N drives, it needs to read the
entirity of (N-1) drives, and write the entirity of the remaining drive
(whether there's any data or not, the initial state of the drives is
unknown so parity data has to be calculated for the entire array).

Check /proc/mdstat and wait until the array has completed resync before
running any benchmarks.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--/WwmFnJnmDyWGHa4
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk2iz7EACgkQShxCyD40xBI//gCfWvQv/y6eZLDXyNrvbtu+ uPN6
eMoAoJ4w4pS6BD4Tn9wjDhLjM2p05V2n
=bTgg
-----END PGP SIGNATURE-----

--/WwmFnJnmDyWGHa4--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 11.04.2011 12:18:08 von NeilBrown

--Sig_/3KcswC8T0vbj/pW1isbjl0z
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Mon, 11 Apr 2011 10:53:55 +0100 Robin Hill wrote:

> On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote:
> > On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill wro=
te:
> > > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
> > >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill w=
rote:
> > >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wrote:
> > >> >
> > >> >> What I'm not sure of is if the device is newly formatted, would r=
aid
> > >> >> recovery happen? What else could explain difference in the first =
run
> > >> >> of IO benchmark?
> > >> >>
> > >> > When an array is first created, it's created in a degraded state -=
this
> > >> > is the simplest way to make it available to the user instantly. The
> > >> > final drive(s) are then automatically rebuilt, calculating the
> > >> > parity/data information as normal for recovering a drive.
> > >> >
> > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numbers in
> > >> the iostat output are ok...is that correct?
> > >>
> > > If it hadn't completed the initial recovery, yes. =A0If it _had_ comp=
leted
> > > the initial recovery then I'd expect writes to be balanced (barring
> > > any differences in hardware).
> > >
> > The initial recovery should normally be done during first few minutes
> > .... this is a newly formatted disk so there isn't any user data
> > there. So, if I run the IO benchmark after say 3-4 min of doing, I
> > should be ok?
> >=20
> > mdam --create /dev/md0 --raid5....
> > mount /dev/md0 /mnt/raid
> > mkfs.ext4 /mnt/raid
> >=20
> > ...wait 3-4 min
> >=20
> > run IO benchmark...
> >=20
> > Am I correct?
> >=20
> No, depending on the size of the drives, the initial recovery can take
> hours or even days. For RAID5 with N drives, it needs to read the
> entirity of (N-1) drives, and write the entirity of the remaining drive
> (whether there's any data or not, the initial state of the drives is
> unknown so parity data has to be calculated for the entire array).
>=20
> Check /proc/mdstat and wait until the array has completed resync before
> running any benchmarks.

or run
mdadm --wait /dev/md0

or create the array with --assume-clean. But if the array is raid5, don't
trust the data if a device fails: use this only for testing.

NeilBrown

>=20
> Cheers,
> Robin

--Sig_/3KcswC8T0vbj/pW1isbjl0z
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)

iD8DBQFNotVgG5fc6gV+Wb0RAhAgAJ48Sj+3DgG72ahJDpix4qzhPxOz9gCf fgEg
mKqqOtjnJGpGqG2QSUdYSeA=
=uGHB
-----END PGP SIGNATURE-----

--Sig_/3KcswC8T0vbj/pW1isbjl0z--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 12.04.2011 03:57:34 von Linux Raid Study

If I use --assume-clean in mdadm, I see performance is 10-15% lower as
compared to the case wherein this option is not specified. When I run
without --assume_clean, I wait until mdadm prints "recovery_done" and
then run IO benchmarks...

Is perf drop expected?

Thanks.

On Mon, Apr 11, 2011 at 3:18 AM, NeilBrown wrote:
> On Mon, 11 Apr 2011 10:53:55 +0100 Robin Hill =
wrote:
>
>> On Mon Apr 11, 2011 at 02:36:50AM -0700, Linux Raid Study wrote:
>> > On Mon, Apr 11, 2011 at 2:25 AM, Robin Hill > wrote:
>> > > On Mon Apr 11, 2011 at 01:32:34 -0700, Linux Raid Study wrote:
>> > >> On Sat, Apr 9, 2011 at 1:50 AM, Robin Hill uk> wrote:
>> > >> > On Fri Apr 08, 2011 at 05:40:46PM -0700, Linux Raid Study wro=
te:
>> > >> >
>> > >> >> What I'm not sure of is if the device is newly formatted, wo=
uld raid
>> > >> >> recovery happen? What else could explain difference in the f=
irst run
>> > >> >> of IO benchmark?
>> > >> >>
>> > >> > When an array is first created, it's created in a degraded st=
ate - this
>> > >> > is the simplest way to make it available to the user instantl=
y. The
>> > >> > final drive(s) are then automatically rebuilt, calculating th=
e
>> > >> > parity/data information as normal for recovering a drive.
>> > >> >
>> > >> Thanks. So, the uneven (unequal) distribution of Wrtie/Sec numb=
ers in
>> > >> the iostat output are ok...is that correct?
>> > >>
>> > > If it hadn't completed the initial recovery, yes. Â If it _h=
ad_ completed
>> > > the initial recovery then I'd expect writes to be balanced (barr=
ing
>> > > any differences in hardware).
>> > >
>> > The initial recovery should normally be done during first few minu=
tes
>> > .... this is a newly formatted disk so there isn't any user data
>> > there. So, if I run the IO benchmark after say 3-4 min of doing, I
>> > should be ok?
>> >
>> > mdam --create /dev/md0 --raid5....
>> > mount /dev/md0 /mnt/raid
>> > mkfs.ext4 /mnt/raid
>> >
>> > ...wait 3-4 min
>> >
>> > run IO benchmark...
>> >
>> > Am I correct?
>> >
>> No, depending on the size of the drives, the initial recovery can ta=
ke
>> hours or even days. For RAID5 with N drives, it needs to read the
>> entirity of (N-1) drives, and write the entirity of the remaining dr=
ive
>> (whether there's any data or not, the initial state of the drives is
>> unknown so parity data has to be calculated for the entire array).
>>
>> Check /proc/mdstat and wait until the array has completed resync bef=
ore
>> running any benchmarks.
>
> or run
> Â mdadm --wait /dev/md0
>
> or create the array with --assume-clean. Â But if the array is ra=
id5, don't
> trust the data if a device fails: Â use this only for testing.
>
> NeilBrown
>
>
>>
>> Cheers,
>> Â Â Robin
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 12.04.2011 04:51:41 von NeilBrown

On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
wrote:

> If I use --assume-clean in mdadm, I see performance is 10-15% lower as
> compared to the case wherein this option is not specified. When I run
> without --assume_clean, I wait until mdadm prints "recovery_done" and
> then run IO benchmarks...
>
> Is perf drop expected?

No. And I cannot explain it.... unless the array is so tiny that it all fits
in the stripe cache (typically about 1Meg).

There really should be no difference.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 12.04.2011 21:36:35 von Linux Raid Study

Hello Neil,

=46or the benchmarking purpose, I've configured array of ~30GB.
stripe_cache_size is 1024 (so 1M).

BTW, I'm using Windows copy (robocopy) utility to test perf and I
believe block size it uses is 32kB. But since everything gets written
thru VFS, I'm not sure how to change stripe_cache_size to get optimal
performance with this setup...

Thanks.

On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown wrote:
> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
> wrote:
>
>> If I use --assume-clean in mdadm, I see performance is 10-15% lower =
as
>> compared to the case wherein this option is not specified. When I ru=
n
>> without --assume_clean, I wait until mdadm prints "recovery_done" an=
d
>> then run IO benchmarks...
>>
>> Is perf drop expected?
>
> No. Â And I cannot explain it.... unless the array is so tiny tha=
t it all fits
> in the stripe cache (typically about 1Meg).
>
> There really should be no difference.
>
> NeilBrown
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 13.04.2011 20:21:52 von Linux Raid Study

Let me reword previous email...

I tried to change stripe_cache_size as following and tried values
between 16 to 4096
echo 512 > /sys/block/md0/md/stripe_cache_size

But, I'm not seeing too much difference in performance. I'm running on
2.6.27sh kernel.

Any ideas...

Thanks for your help...

On Tue, Apr 12, 2011 at 12:36 PM, Linux Raid Study
wrote:
> Hello Neil,
>
> For the benchmarking purpose, I've configured array of ~30GB.
> stripe_cache_size is 1024 (so 1M).
>
> BTW, I'm using Windows copy (robocopy) utility to test perf and I
> believe block size it uses is 32kB. But since everything gets written
> thru VFS, I'm not sure how to change stripe_cache_size to get optimal
> performance with this setup...
>
> Thanks.
>
> On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown wrote:
>> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
>> wrote:
>>
>>> If I use --assume-clean in mdadm, I see performance is 10-15% lower=
as
>>> compared to the case wherein this option is not specified. When I r=
un
>>> without --assume_clean, I wait until mdadm prints "recovery_done" a=
nd
>>> then run IO benchmarks...
>>>
>>> Is perf drop expected?
>>
>> No. Â And I cannot explain it.... unless the array is so tiny th=
at it all fits
>> in the stripe cache (typically about 1Meg).
>>
>> There really should be no difference.
>>
>> NeilBrown
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: iostat with raid device...

am 13.04.2011 23:00:11 von NeilBrown

On Wed, 13 Apr 2011 11:21:52 -0700 Linux Raid Study
wrote:

> Let me reword previous email...
>=20
> I tried to change stripe_cache_size as following and tried values
> between 16 to 4096
> echo 512 > /sys/block/md0/md/stripe_cache_size
>=20
> But, I'm not seeing too much difference in performance. I'm running o=
n
> 2.6.27sh kernel.

I wouldn't expect much difference.

>=20
> Any ideas...

On what exactly?
What exactly are you doing, what exactly are the results? What exactly=
don't
you understand?

Detail help.

NeilBrown

>=20
> Thanks for your help...
>=20
> On Tue, Apr 12, 2011 at 12:36 PM, Linux Raid Study
> wrote:
> > Hello Neil,
> >
> > For the benchmarking purpose, I've configured array of ~30GB.
> > stripe_cache_size is 1024 (so 1M).
> >
> > BTW, I'm using Windows copy (robocopy) utility to test perf and I
> > believe block size it uses is 32kB. But since everything gets writt=
en
> > thru VFS, I'm not sure how to change stripe_cache_size to get optim=
al
> > performance with this setup...
> >
> > Thanks.
> >
> > On Mon, Apr 11, 2011 at 7:51 PM, NeilBrown wrote:
> >> On Mon, 11 Apr 2011 18:57:34 -0700 Linux Raid Study
> >> wrote:
> >>
> >>> If I use --assume-clean in mdadm, I see performance is 10-15% low=
er as
> >>> compared to the case wherein this option is not specified. When I=
run
> >>> without --assume_clean, I wait until mdadm prints "recovery_done"=
and
> >>> then run IO benchmarks...
> >>>
> >>> Is perf drop expected?
> >>
> >> No. =A0And I cannot explain it.... unless the array is so tiny tha=
t it all fits
> >> in the stripe cache (typically about 1Meg).
> >>
> >> There really should be no difference.
> >>
> >> NeilBrown
> >>
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html