Optimize RAID0 for max IOPS?

Optimize RAID0 for max IOPS?

am 18.01.2011 22:01:12 von Wolfgang Denk

Hi,

I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain
s/w RAID0, because the existing system appears to be seriously limited
in terms of numbers of I/O operations per second.

Our workload is mixed read / write (something between 80% read / 20%
write and 50% / 50%), consisting of a very large number of usually
very small files.

There may be 20...50 millions of files, or more. 65% of the files are
smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
kB; 98.4% are smaller than 64 kB.

I will have 4 x 1 TB disks for this setup.

The plan is to build a RAID0 from the 4 devices, create a physical
volume and a volume group on the resulting /dev/md?, then create 2 or
3 logical volumes that will be used as XFS file systems.

My goal is to optimize for maximum number of I/O operations per
second. [I am aware that using SSDs would be a nice thing, but that
would be too expensive.]

Is this a reasonable approach for such a task?

Should I do anything different to acchive maximum performance?

What are the tunables in this setup? [It seems the usual recipies are
more oriented in maximizing the data troughput for large, mostly
sequential accesses - I figure that things like increasing read-ahead
etc. will not help me much here?]

Thanks in advance.

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Quote from a recent meeting: "We are going to continue having these
meetings everyday until I find out why no work is getting done."
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 18.01.2011 23:18:50 von Roberto Spadim

it=B4s a interesting question, i don=B4t know what the best too
but...
i didn=B4t create a partition of a /dev/mdxxx device yet (linux 2.6.29)=
,
maybe it=B4s not possible

try partitioning all hard drives and make many paritions and make raid
on each one
another way could be a lvm over mdxxx and try to partition it (can lvm
be partitioned?)

another optimization is per disk elevator (at linux level) at /sys/
you can find it (try find -iname elevator, or find -iname scheduler, i
don=B4t remember the file name)

linux raid0 have a nice read/write algorithm for hard disks (i think)
test it
the best solution is no partition (since md will be made in disk, and
not on partition, this make disk head position more real than by
partition, making read_balance algorithm better)

2011/1/18 Wolfgang Denk :
> Hi,
>
> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plai=
n
> s/w RAID0, because the existing system appears to be seriously limite=
d
> in terms of numbers of I/O operations per second.
>
> Our workload is mixed read / write (something between 80% read / 20%
> write and 50% / 50%), consisting of a very large number of usually
> very small files.
>
> There may be 20...50 millions of files, or more. 65% of the files are
> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> kB; 98.4% are smaller than 64 kB.
>
> I will have 4 x 1 TB disks for this setup.
>
> The plan is to build a RAID0 from the 4 devices, create a physical
> volume and a volume group on the resulting /dev/md?, then create 2 or
> 3 logical volumes that will be used as XFS file systems.
>
> My goal is to optimize for maximum number of I/O operations per
> second. [I am aware that using SSDs would be a nice thing, but that
> would be too expensive.]
>
> Is this a reasonable approach for such a task?
>
> Should I do anything different to acchive maximum performance?
>
> What are the tunables in this setup? =A0[It seems the usual recipies =
are
> more oriented in maximizing the data troughput for large, mostly
> sequential accesses - I figure that things like increasing read-ahead
> etc. will not help me much here?]
>
> Thanks in advance.
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev Zu=
ndel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Quote from a recent meeting: =A0 "We are going to continue having the=
se
> meetings everyday until I find out why no work is getting done."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 00:15:50 von unknown

Hi,

[in German:] Schätzelein, Dein Problem sind die Platten, nicht der
Controller.

[in English:] Dude, the disks are your bottleneck.

On a 4-disk RAID0 software RAID can only outspeed this 3ware Controller
with a really really fast processor. The limiting factor is the disk's
access time. If SSDs are too expensive, then your actual performance i=
s
the max you'll get (maybe to replace the HWRAID controller might give a
little speed-up, but not very much).

All the best,
Stefan

Am 18.01.2011 22:01, schrieb Wolfgang Denk:
> Hi,
>=20
> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plai=
n
> s/w RAID0, because the existing system appears to be seriously limite=
d
> in terms of numbers of I/O operations per second.
>=20
> Our workload is mixed read / write (something between 80% read / 20%
> write and 50% / 50%), consisting of a very large number of usually
> very small files.
>=20
> There may be 20...50 millions of files, or more. 65% of the files are
> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> kB; 98.4% are smaller than 64 kB.
>=20
> I will have 4 x 1 TB disks for this setup.
>=20
> The plan is to build a RAID0 from the 4 devices, create a physical
> volume and a volume group on the resulting /dev/md?, then create 2 or
> 3 logical volumes that will be used as XFS file systems.
>=20
> My goal is to optimize for maximum number of I/O operations per
> second. [I am aware that using SSDs would be a nice thing, but that
> would be too expensive.]
>=20
> Is this a reasonable approach for such a task?
>=20
> Should I do anything different to acchive maximum performance?
>=20
> What are the tunables in this setup? [It seems the usual recipies ar=
e
> more oriented in maximizing the data troughput for large, mostly
> sequential accesses - I figure that things like increasing read-ahead
> etc. will not help me much here?]
>=20
> Thanks in advance.
>=20
> Best regards,
>=20
> Wolfgang Denk
>=20

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 01:05:59 von Roberto Spadim

maybe removing hwraid and using swraid may reduce speed (depend how
much cpu you use with hw and with sw)
what we can optimize? less I/O per seconds making as much useful
read/write data on array, how? good read/write algorithms for raid.
(for each device type, ssd, hd) but... like stefan, disks are your
bottleneck

2011/1/18 Stefan /*St0fF*/ Hübner =
:
> Hi,
>
> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
> Controller.
>
> [in English:] Dude, the disks are your bottleneck.
>
> On a 4-disk RAID0 software RAID can only outspeed this 3ware Controll=
er
> with a really really fast processor. =A0The limiting factor is the di=
sk's
> access time. =A0If SSDs are too expensive, then your actual performan=
ce is
> the max you'll get (maybe to replace the HWRAID controller might give=
a
> little speed-up, but not very much).
>
> All the best,
> Stefan
>
> Am 18.01.2011 22:01, schrieb Wolfgang Denk:
>> Hi,
>>
>> I'm going to replace a h/w based RAID system (3ware 9650SE) by a pla=
in
>> s/w RAID0, because the existing system appears to be seriously limit=
ed
>> in terms of numbers of I/O operations per second.
>>
>> Our workload is mixed read / write (something between 80% read / 20%
>> write and 50% / 50%), consisting of a very large number of usually
>> very small files.
>>
>> There may be 20...50 millions of files, or more. 65% of the files ar=
e
>> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 1=
6
>> kB; 98.4% are smaller than 64 kB.
>>
>> I will have 4 x 1 TB disks for this setup.
>>
>> The plan is to build a RAID0 from the 4 devices, create a physical
>> volume and a volume group on the resulting /dev/md?, then create 2 o=
r
>> 3 logical volumes that will be used as XFS file systems.
>>
>> My goal is to optimize for maximum number of I/O operations per
>> second. [I am aware that using SSDs would be a nice thing, but that
>> would be too expensive.]
>>
>> Is this a reasonable approach for such a task?
>>
>> Should I do anything different to acchive maximum performance?
>>
>> What are the tunables in this setup? =A0[It seems the usual recipies=
are
>> more oriented in maximizing the data troughput for large, mostly
>> sequential accesses - I figure that things like increasing read-ahea=
d
>> etc. will not help me much here?]
>>
>> Thanks in advance.
>>
>> Best regards,
>>
>> Wolfgang Denk
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 08:04:32 von Wolfgang Denk

Dear Roberto Spadim,

In message you wrote:
>
> try partitioning all hard drives and make many paritions and make raid
> on each one
> another way could be a lvm over mdxxx and try to partition it (can lvm
> be partitioned?)

I do not intend to use any partitions. I will use LVM on the full
device /dev/mdX, and then use logical volumes.


Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
What was sliced bread the greatest thing since?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 08:10:06 von Wolfgang Denk

Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,

In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>=20
> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
> Controller.

Irrtum.

> [in English:] Dude, the disks are your bottleneck.

Wrong. Testing the same workload with soft RAID versus the h/w RAID
solution gives a _significant_ performance difference.

I happen to know which benchmarks 3ware (and other RAID controller
manufacturers) are optimizing their firmware for - IOPS is not even
mentioned there.

> On a 4-disk RAID0 software RAID can only outspeed this 3ware Controll=
er
> with a really really fast processor. The limiting factor is the disk=
's
> access time. If SSDs are too expensive, then your actual performance=
is
> the max you'll get (maybe to replace the HWRAID controller might give=
a
> little speed-up, but not very much).

=46rom some tests done before I expect to see a speed increase of >10.

Hey, even a single disk is performing better under this work load.

And fast processor? Yes, I have it, but what for? It spends most of
it's time (>90%, usually more) in iowait.

Best regards,

Wolfgang Denk

--=20
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
"To IBM, 'open' means there is a modicum of interoperability among
some of their equipment." - Harv Masterson
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 08:11:38 von Wolfgang Denk

Dear Roberto Spadim,

In message you wrote:
> maybe removing hwraid and using swraid may reduce speed (depend how
> much cpu you use with hw and with sw)
> what we can optimize? less I/O per seconds making as much useful
> read/write data on array, how? good read/write algorithms for raid.
> (for each device type, ssd, hd) but... like stefan, disks are your
> bottleneck

No, they are not. Run some benchmarks yourself if you don't believe
me.

Even a single disk drive is performing better than the hw RAID under
this workload.

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
WARNING: This Product Attracts Every Other Piece of Matter in the
Universe, Including the Products of Other Manufacturers, with a Force
Proportional to the Product of the Masses and Inversely Proportional
to the Distance Between Them.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 09:18:53 von unknown

Am 19.01.2011 08:11, schrieb Wolfgang Denk:
> Dear Roberto Spadim,
>
> In message you wrote:
>> maybe removing hwraid and using swraid may reduce speed (depend how
>> much cpu you use with hw and with sw)
>> what we can optimize? less I/O per seconds making as much useful
>> read/write data on array, how? good read/write algorithms for raid.
>> (for each device type, ssd, hd) but... like stefan, disks are your
>> bottleneck
>
> No, they are not. Run some benchmarks yourself if you don't believe
> me.

Lol - I wouldn't have answered in the first place if I didn't have any
expertise. So suit yourself - as you don't bring up any real numbers
(remember: you've got the weird setup, you asked, you don't have enough
money for the enterprise solution - so ...) nobody who worked with 3ware
controllers will believe you.

>
> Even a single disk drive is performing better than the hw RAID under
> this workload.

Well - that is the problem - simulate YOUR workload. Actually I fear at
least one of your disks has a grown defect, which slows down / blocks
i/o. Haven't seen any 9650SE RAID being slower than the same config in
a software raid.

>
> Best regards,
>
> Wolfgang Denk
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 09:29:45 von Jaap Crezee

On 01/19/11 09:18, Stefan /*St0fF*/ Hübner wrote:
> Am 19.01.2011 08:11, schrieb Wolfgang Denk:
> Lol - I wouldn't have answered in the first place if I didn't have an=
y
> expertise. So suit yourself - as you don't bring up any real numbers
> (remember: you've got the weird setup, you asked, you don't have enou=
gh
> money for the enterprise solution - so ...) nobody who worked with 3w=
are
> controllers will believe you.

Here's one: I switched from 3ware hardware based raid to linux software=
raid and I am getting better=20
throughputs. I had a 3ware PCI-X car (don't know which type by hearth).
Okay, to be honest I did not have a (enterprise solution?) battery-back=
up-unit. So probably no write caching...

Jaap
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 10:32:17 von Jan Kasprzak

Jaap Crezee wrote:
: On 01/19/11 09:18, Stefan /*St0fF*/ Hübner wrote:
: >Am 19.01.2011 08:11, schrieb Wolfgang Denk:
: >Lol - I wouldn't have answered in the first place if I didn't have a=
ny
: >expertise. So suit yourself - as you don't bring up any real number=
s
: >(remember: you've got the weird setup, you asked, you don't have eno=
ugh
: >money for the enterprise solution - so ...) nobody who worked with 3=
ware
: >controllers will believe you.
:=20
: Here's one: I switched from 3ware hardware based raid to linux softwa=
re=20
: raid and I am getting better throughputs. I had a 3ware PCI-X car (do=
n't=20
: know which type by hearth).
: Okay, to be honest I did not have a (enterprise solution?)=20
: battery-backup-unit. So probably no write caching...
:=20
A "me too": 3ware 9550SX with 8 drives, RAID-5. The performance
(especially latency) was very bad. After I switched to the md SW RAID
and lowered the TCQ depth in the 3ware controller to 16[*], the filesys=
tem
and latency feels much faster.

The only problem I had was a poor interaction of the CFQ
iosched with the RAID-5 rebuild process, but I have fixed this
by moving to deadline I/O scheduler.

Another case was the LSI SAS 2008 (I admit it is a pretty low-end
HW RAID controller): 10 disks WD RE4 black 2TB in HW and SW RAID-10
configurations:

time mkfs.ext4 /dev/md0 # SW RAID
real 8m4.783s
user 0m9.255s
sys 2m30.107s

time mkfs.ext4 -F /dev/sdb # HW RAID
real 22m13.503s
user 0m9.763s
sys 2m51.371s

The problem with HW RAID is that today's computers can dedicate tens
of gigabytes to buffer cache, which allows the I/O scheduler to reorder
requests based on latency and other criteria, which no RAID controller
can match, because it cannot see which requests are latency-critical
and which are not.

Also, Linux I/O scheduler works really hard to keep all spindles
busy, while when you fill the TC queue of a HW RAID volume with request=
s
which maps to one or small number of physical disks, there is no way th=
e
controller can tell "send me more requests, but not from this area
of the HW RAID volume".

[*] 3ware driver is especially bad here, because its default queue
depth is 1024, IIRC, which makes the whole I/O scheduler
with queue size 512 a no-op. Think bufferbloat in the storage area.

--=20
| Jan "Yenya" Kasprzak e}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18=
A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/bl=
og/ |
Please don't top post and in particular don't attach entire digests to =
your
mail or we'll all soon be using bittorrent to read the list. --Alan=
Cox
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 20:21:04 von Wolfgang Denk

Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,

In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>=20
> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
> Controller.
>=20
> [in English:] Dude, the disks are your bottleneck.
..

Maybe we can stop speculations about what might be the cause of the
problems in some setup I do NOT intend to use, and rather discuss the
questions I asked.

> > I will have 4 x 1 TB disks for this setup.
> >=20
> > The plan is to build a RAID0 from the 4 devices, create a physical
> > volume and a volume group on the resulting /dev/md?, then create 2 =
or
> > 3 logical volumes that will be used as XFS file systems.

Clarrification: I'll run /dev/md* on the raw disks, without any
partitions on them.

> > My goal is to optimize for maximum number of I/O operations per
> > second. ...
> >=20
> > Is this a reasonable approach for such a task?
> >=20
> > Should I do anything different to acchive maximum performance?
> >=20
> > What are the tunables in this setup? [It seems the usual recipies =
are
> > more oriented in maximizing the data troughput for large, mostly
> > sequential accesses - I figure that things like increasing read-ahe=
ad
> > etc. will not help me much here?]

So can anybody help answering these questions:

- are there any special options when creating the RAID0 to make it
perform faster for such a use case?
- are there other tunables, any special MD / LVM / file system /
read ahead / buffer cache / ... parameters to look for?

Thanks.

Wolfgang Denk

--=20
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 20:50:17 von Roberto Spadim

So can anybody help answering these questions:

- are there any special options when creating the RAID0 to make it
perform faster for such a use case?
- are there other tunables, any special MD / LVM / file system / read
ahead / buffer cache / ... parameters to look for?

lets see:
what=B4s your disk (ssd or sas or sata) best block size to write/read?
write this at ->(A)
what=B4s your work load? 50% write 50% read ?

raid0 block size should be multiple of (A)
*****filesystem size should be multiple of (A) of all disks
*****read ahead should be a multiple of (A)
for example
/dev/sda 1kb
/dev/sdb 4kb

you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb an=
d 4kb)

check i/o sheduller per disk too (ssd should use noop, disks should
use cfq, deadline or another...)
async and sync option at mount /etc/fstab, noatime reduce a lot of i/o
too, you should optimize your application too
hdparm each disk to use dma and fastest i/o options

are you using only filesystem? are you using somethink more? samba?
mysql? apache? lvm?
each of this programs have some tunning, check their benchmarks


getting back....
what=B4s a raid controller?
cpu + memory + disk controller + disks
but... it only run raid software (it can run linux....)

if you computer is slower than raid cpu+memory+disk controller, you
will have a slower software raid, than hardware raid
it=B4s like load balance on cpu/memory utilization of disk i/o (use
dedicated hardware, or use your hardware?)
got it?
using a super fast xeon with ddr3 and optical fiber running software
raid, is faster than a hardware raid using a arm (or fpga) ddrX memory
and sas(fiber optical) connection to disks

two solutions for the same problem
what=B4s fast? benchmark it
i think that if your xeon run a database and a very workloaded apache,
a dedicated hardware raid can run faster, but a light xeon can run
faster than a dedicated hardware raid



2011/1/19 Wolfgang Denk :
> Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,
>
> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>
>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>> Controller.
>>
>> [in English:] Dude, the disks are your bottleneck.
> ...
>
> Maybe we can stop speculations about what might be the cause of the
> problems in some setup I do NOT intend to use, and rather discuss the
> questions I asked.
>
>> > I will have 4 x 1 TB disks for this setup.
>> >
>> > The plan is to build a RAID0 from the 4 devices, create a physical
>> > volume and a volume group on the resulting /dev/md?, then create 2=
or
>> > 3 logical volumes that will be used as XFS file systems.
>
> Clarrification: I'll run /dev/md* on the raw disks, without any
> partitions on them.
>
>> > My goal is to optimize for maximum number of I/O operations per
>> > second. ...
>> >
>> > Is this a reasonable approach for such a task?
>> >
>> > Should I do anything different to acchive maximum performance?
>> >
>> > What are the tunables in this setup? =A0[It seems the usual recipi=
es are
>> > more oriented in maximizing the data troughput for large, mostly
>> > sequential accesses - I figure that things like increasing read-ah=
ead
>> > etc. will not help me much here?]
>
> So can anybody help answering these questions:
>
> - are there any special options when creating the RAID0 to make it
> =A0perform faster for such a use case?
> - are there other tunables, any special MD / LVM / file system /
> =A0read ahead / buffer cache / ... parameters to look for?
>
> Thanks.
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev Zu=
ndel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 19.01.2011 23:36:45 von unknown

@Roberto: I guess you're right. BUT: i have not seen 900MB/s coming fro=
m
(i.e. read access) a software raid, but I've seen it from a 9750 on a
LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). So
one might not be wrong assuming on current raid-controllers
hardware/software matching and timing is way more optimized than what
mdraid might get at all.

The 9650 and 9690 are considerably slower, but I've seen 550MB/s thrupu=
t
from those, also (I don't recall the setup anymore, tho).

Max reading I saw from a software raid was around 350MB/s - so hence my
answers. And if people had problems with controllers which are 5 years
or older by now, the numbers are not really comparable...

Now again there's the point where there are also parameters on the
controller that can be tweaked, and a simple way to recreate the testin=
g
scenario. We may discuss and throw in further numbers and experience,
but not being able to recreate your specific scenario makes us talk pas=
t
each other...

stefan

Am 19.01.2011 20:50, schrieb Roberto Spadim:
> So can anybody help answering these questions:
>=20
> - are there any special options when creating the RAID0 to make it
> perform faster for such a use case?
> - are there other tunables, any special MD / LVM / file system / read
> ahead / buffer cache / ... parameters to look for?
>=20
> lets see:
> what´s your disk (ssd or sas or sata) best block size to write/r=
ead?
> write this at ->(A)
> what´s your work load? 50% write 50% read ?
>=20
> raid0 block size should be multiple of (A)
> *****filesystem size should be multiple of (A) of all disks
> *****read ahead should be a multiple of (A)
> for example
> /dev/sda 1kb
> /dev/sdb 4kb
>=20
> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb =
and 4kb)
>=20
> check i/o sheduller per disk too (ssd should use noop, disks should
> use cfq, deadline or another...)
> async and sync option at mount /etc/fstab, noatime reduce a lot of i/=
o
> too, you should optimize your application too
> hdparm each disk to use dma and fastest i/o options
>=20
> are you using only filesystem? are you using somethink more? samba?
> mysql? apache? lvm?
> each of this programs have some tunning, check their benchmarks
>=20
>=20
> getting back....
> what´s a raid controller?
> cpu + memory + disk controller + disks
> but... it only run raid software (it can run linux....)
>=20
> if you computer is slower than raid cpu+memory+disk controller, you
> will have a slower software raid, than hardware raid
> it´s like load balance on cpu/memory utilization of disk i/o (us=
e
> dedicated hardware, or use your hardware?)
> got it?
> using a super fast xeon with ddr3 and optical fiber running software
> raid, is faster than a hardware raid using a arm (or fpga) ddrX memor=
y
> and sas(fiber optical) connection to disks
>=20
> two solutions for the same problem
> what´s fast? benchmark it
> i think that if your xeon run a database and a very workloaded apache=
,
> a dedicated hardware raid can run faster, but a light xeon can run
> faster than a dedicated hardware raid
>=20
>=20
>=20
> 2011/1/19 Wolfgang Denk :
>> Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,
>>
>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>>
>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht=
der
>>> Controller.
>>>
>>> [in English:] Dude, the disks are your bottleneck.
>> ...
>>
>> Maybe we can stop speculations about what might be the cause of the
>> problems in some setup I do NOT intend to use, and rather discuss th=
e
>> questions I asked.
>>
>>>> I will have 4 x 1 TB disks for this setup.
>>>>
>>>> The plan is to build a RAID0 from the 4 devices, create a physical
>>>> volume and a volume group on the resulting /dev/md?, then create 2=
or
>>>> 3 logical volumes that will be used as XFS file systems.
>>
>> Clarrification: I'll run /dev/md* on the raw disks, without any
>> partitions on them.
>>
>>>> My goal is to optimize for maximum number of I/O operations per
>>>> second. ...
>>>>
>>>> Is this a reasonable approach for such a task?
>>>>
>>>> Should I do anything different to acchive maximum performance?
>>>>
>>>> What are the tunables in this setup? [It seems the usual recipies=
are
>>>> more oriented in maximizing the data troughput for large, mostly
>>>> sequential accesses - I figure that things like increasing read-ah=
ead
>>>> etc. will not help me much here?]
>>
>> So can anybody help answering these questions:
>>
>> - are there any special options when creating the RAID0 to make it
>> perform faster for such a use case?
>> - are there other tunables, any special MD / LVM / file system /
>> read ahead / buffer cache / ... parameters to look for?
>>
>> Thanks.
>>
>> Wolfgang Denk
>>
>> --
>> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zunde=
l
>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, German=
y
>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.d=
e
>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>=20
>=20
>=20

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 20.01.2011 00:09:39 von Roberto Spadim

the problem....
if you use iostat, or iotop
with software raid:
you just see disk i/o
you don=B4t see memory (cache) i/o
when using hardware raid:
you just see raid i/o (it can be a cache read or a real disk read)


if you check memory+disk i/o, you will get similar values, if not, you
will see high cpu usage
for example you are using raidx with 10disks on a hardware raid
change hardware raid to use only disks (10 disks for linux)
make the same raidx with 10disks
you will get a slower i/o since it have a controler between disk and cp=
u
try it without hardware raid cpu, just a (sas/sata) optimized
controller, or 10 (sata/sas) one port
you still with a slow i/o then hardware controller (that=B4s right!)

now let=B4s remove the sata/sas channel, let=B4s use a pci-express
revodrive or pci-express texas ssd drive
you will get better values then a hardware raid, but... why? you
changed the hardware (ok, i know) but you make cpu more close to disk
if you use disks with cache, you will get more speed (a memory ssd
harddisk is faster than a harddisk only disk)

why hardware are more faster than linux? i don=B4t think they are...
they can make smaller latencies with good memory cache
but if you computer use ddr3 and your hardware raid controller use i2c
memory, your ddr3 cache is faster...

how to benchmark? check disk i/o+memory cache i/o
if linux is faster ok, you use more cpu and memory of your computer
if linux is slower ok, you use less cpu and memory, but will have it
on hardware raid...
if you upgrade you memory and cpu, it can be faster than you hardware
raid controller, what=B4s better for you?

want a better read/write solution for software raid? make a new
read/write code, you can do it, linux is easier than hardware raid to
code!
want a better read/write solution for hardware raid? call your
hardware seller and talk, please i need a better firmware, could you
send me?

got?


2011/1/19 Stefan /*St0fF*/ Hübner =
:
> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming f=
rom
> (i.e. read access) a software raid, but I've seen it from a 9750 on a
> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). =A0=
So
> one might not be wrong assuming on current raid-controllers
> hardware/software matching and timing is way more optimized than what
> mdraid might get at all.
>
> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thru=
put
> from those, also (I don't recall the setup anymore, tho).
>
> Max reading I saw from a software raid was around 350MB/s - so hence =
my
> answers. =A0And if people had problems with controllers which are 5 y=
ears
> or older by now, the numbers are not really comparable...
>
> Now again there's the point where there are also parameters on the
> controller that can be tweaked, and a simple way to recreate the test=
ing
> scenario. =A0We may discuss and throw in further numbers and experien=
ce,
> but not being able to recreate your specific scenario makes us talk p=
ast
> each other...
>
> stefan
>
> Am 19.01.2011 20:50, schrieb Roberto Spadim:
>> So can anybody help answering these questions:
>>
>> - are there any special options when creating the RAID0 to make it
>> perform faster for such a use case?
>> - are there other tunables, any special MD / LVM / file system / rea=
d
>> ahead / buffer cache / ... parameters to look for?
>>
>> lets see:
>> what=B4s your disk (ssd or sas or sata) best block size to write/rea=
d?
>> write this at ->(A)
>> what=B4s your work load? 50% write 50% read ?
>>
>> raid0 block size should be multiple of (A)
>> *****filesystem size should be multiple of (A) of all disks
>> *****read ahead should be a multiple of (A)
>> for example
>> /dev/sda 1kb
>> /dev/sdb 4kb
>>
>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb=
and 4kb)
>>
>> check i/o sheduller per disk too (ssd should use noop, disks should
>> use cfq, deadline or another...)
>> async and sync option at mount /etc/fstab, noatime reduce a lot of i=
/o
>> too, you should optimize your application too
>> hdparm each disk to use dma and fastest i/o options
>>
>> are you using only filesystem? are you using somethink more? samba?
>> mysql? apache? lvm?
>> each of this programs have some tunning, check their benchmarks
>>
>>
>> getting back....
>> what=B4s a raid controller?
>> cpu + memory + disk controller + disks
>> but... it only run raid software (it can run linux....)
>>
>> if you computer is slower than raid cpu+memory+disk controller, you
>> will have a slower software raid, than hardware raid
>> it=B4s like load balance on cpu/memory utilization of disk i/o (use
>> dedicated hardware, or use your hardware?)
>> got it?
>> using a super fast xeon with ddr3 and optical fiber running software
>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memo=
ry
>> and sas(fiber optical) connection to disks
>>
>> two solutions for the same problem
>> what=B4s fast? benchmark it
>> i think that if your xeon run a database and a very workloaded apach=
e,
>> a dedicated hardware raid can run faster, but a light xeon can run
>> faster than a dedicated hardware raid
>>
>>
>>
>> 2011/1/19 Wolfgang Denk :
>>> Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,
>>>
>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>>>
>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht d=
er
>>>> Controller.
>>>>
>>>> [in English:] Dude, the disks are your bottleneck.
>>> ...
>>>
>>> Maybe we can stop speculations about what might be the cause of the
>>> problems in some setup I do NOT intend to use, and rather discuss t=
he
>>> questions I asked.
>>>
>>>>> I will have 4 x 1 TB disks for this setup.
>>>>>
>>>>> The plan is to build a RAID0 from the 4 devices, create a physica=
l
>>>>> volume and a volume group on the resulting /dev/md?, then create =
2 or
>>>>> 3 logical volumes that will be used as XFS file systems.
>>>
>>> Clarrification: I'll run /dev/md* on the raw disks, without any
>>> partitions on them.
>>>
>>>>> My goal is to optimize for maximum number of I/O operations per
>>>>> second. ...
>>>>>
>>>>> Is this a reasonable approach for such a task?
>>>>>
>>>>> Should I do anything different to acchive maximum performance?
>>>>>
>>>>> What are the tunables in this setup? =A0[It seems the usual recip=
ies are
>>>>> more oriented in maximizing the data troughput for large, mostly
>>>>> sequential accesses - I figure that things like increasing read-a=
head
>>>>> etc. will not help me much here?]
>>>
>>> So can anybody help answering these questions:
>>>
>>> - are there any special options when creating the RAID0 to make it
>>> =A0perform faster for such a use case?
>>> - are there other tunables, any special MD / LVM / file system /
>>> =A0read ahead / buffer cache / ... parameters to look for?
>>>
>>> Thanks.
>>>
>>> Wolfgang Denk
>>>
>>> --
>>> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev =
Zundel
>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germa=
ny
>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.=
de
>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 20.01.2011 00:18:22 von Roberto Spadim

a good idea....
why not start a opensource raid controller?
what we need? a cpu, memory, power supply with battery or capacitor,
sas/sata (disk interfaces), pci-express or another (computer
interface)
it don=B4t need a operational system, since it will only run one progra=
m
with some threads (ok a small operational system to implement threads
easly)

we could use arm, fpga, intel core2duo, atlhon, xeon, or another system=
..
instead using a computer with ethernet interface (nbd nfs samba or
another file/device sharing iscsi ethernet sata), we need a computer
with pci-express interface and native operational system module


2011/1/19 Roberto Spadim :
> the problem....
> if you use iostat, or iotop
> with software raid:
> =A0 you just see disk i/o
> =A0 you don=B4t see memory (cache) i/o
> when using hardware raid:
> =A0 you just see raid i/o (it can be a cache read or a real disk read=
)
>
>
> if you check memory+disk i/o, you will get similar values, if not, yo=
u
> will see high cpu usage
> for example you are using raidx with 10disks on a hardware raid
> change hardware raid to use only disks (10 disks for linux)
> make the same raidx with 10disks
> you will get a slower i/o since it have a controler between disk and =
cpu
> try it without hardware raid cpu, just a (sas/sata) optimized
> controller, or 10 (sata/sas) one port
> you still with a slow i/o then hardware controller (that=B4s right!)
>
> now let=B4s remove the sata/sas channel, let=B4s use a pci-express
> revodrive or pci-express texas ssd drive
> you will get better values then a hardware raid, but... why? you
> changed the hardware (ok, i know) but you make cpu more close to disk
> if you use disks with cache, you will get more speed (a memory ssd
> harddisk is faster than a harddisk only disk)
>
> why hardware are more faster than linux? i don=B4t think they are...
> they can make smaller latencies with good memory cache
> but if you computer use ddr3 and your hardware raid controller use i2=
c
> memory, your ddr3 cache is faster...
>
> how to benchmark? check disk i/o+memory cache i/o
> if linux is faster ok, you use more cpu and memory of your computer
> if linux is slower ok, you use less cpu and memory, but will have it
> on hardware raid...
> if you upgrade you memory and cpu, it can be faster than you hardware
> raid controller, what=B4s better for you?
>
> want a better read/write solution for software raid? make a new
> read/write code, you can do it, linux is easier than hardware raid to
> code!
> want a better read/write solution for hardware raid? call your
> hardware seller and talk, please i need a better firmware, could you
> send me?
>
> got?
>
>
> 2011/1/19 Stefan /*St0fF*/ Hübner e>:
>> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming =
from
>> (i.e. read access) a software raid, but I've seen it from a 9750 on =
a
>> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). =
=A0So
>> one might not be wrong assuming on current raid-controllers
>> hardware/software matching and timing is way more optimized than wha=
t
>> mdraid might get at all.
>>
>> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thr=
uput
>> from those, also (I don't recall the setup anymore, tho).
>>
>> Max reading I saw from a software raid was around 350MB/s - so hence=
my
>> answers. =A0And if people had problems with controllers which are 5 =
years
>> or older by now, the numbers are not really comparable...
>>
>> Now again there's the point where there are also parameters on the
>> controller that can be tweaked, and a simple way to recreate the tes=
ting
>> scenario. =A0We may discuss and throw in further numbers and experie=
nce,
>> but not being able to recreate your specific scenario makes us talk =
past
>> each other...
>>
>> stefan
>>
>> Am 19.01.2011 20:50, schrieb Roberto Spadim:
>>> So can anybody help answering these questions:
>>>
>>> - are there any special options when creating the RAID0 to make it
>>> perform faster for such a use case?
>>> - are there other tunables, any special MD / LVM / file system / re=
ad
>>> ahead / buffer cache / ... parameters to look for?
>>>
>>> lets see:
>>> what=B4s your disk (ssd or sas or sata) best block size to write/re=
ad?
>>> write this at ->(A)
>>> what=B4s your work load? 50% write 50% read ?
>>>
>>> raid0 block size should be multiple of (A)
>>> *****filesystem size should be multiple of (A) of all disks
>>> *****read ahead should be a multiple of (A)
>>> for example
>>> /dev/sda 1kb
>>> /dev/sdb 4kb
>>>
>>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1k=
b and 4kb)
>>>
>>> check i/o sheduller per disk too (ssd should use noop, disks should
>>> use cfq, deadline or another...)
>>> async and sync option at mount /etc/fstab, noatime reduce a lot of =
i/o
>>> too, you should optimize your application too
>>> hdparm each disk to use dma and fastest i/o options
>>>
>>> are you using only filesystem? are you using somethink more? samba?
>>> mysql? apache? lvm?
>>> each of this programs have some tunning, check their benchmarks
>>>
>>>
>>> getting back....
>>> what=B4s a raid controller?
>>> cpu + memory + disk controller + disks
>>> but... it only run raid software (it can run linux....)
>>>
>>> if you computer is slower than raid cpu+memory+disk controller, you
>>> will have a slower software raid, than hardware raid
>>> it=B4s like load balance on cpu/memory utilization of disk i/o (use
>>> dedicated hardware, or use your hardware?)
>>> got it?
>>> using a super fast xeon with ddr3 and optical fiber running softwar=
e
>>> raid, is faster than a hardware raid using a arm (or fpga) ddrX mem=
ory
>>> and sas(fiber optical) connection to disks
>>>
>>> two solutions for the same problem
>>> what=B4s fast? benchmark it
>>> i think that if your xeon run a database and a very workloaded apac=
he,
>>> a dedicated hardware raid can run faster, but a light xeon can run
>>> faster than a dedicated hardware raid
>>>
>>>
>>>
>>> 2011/1/19 Wolfgang Denk :
>>>> Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,
>>>>
>>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>>>>
>>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht =
der
>>>>> Controller.
>>>>>
>>>>> [in English:] Dude, the disks are your bottleneck.
>>>> ...
>>>>
>>>> Maybe we can stop speculations about what might be the cause of th=
e
>>>> problems in some setup I do NOT intend to use, and rather discuss =
the
>>>> questions I asked.
>>>>
>>>>>> I will have 4 x 1 TB disks for this setup.
>>>>>>
>>>>>> The plan is to build a RAID0 from the 4 devices, create a physic=
al
>>>>>> volume and a volume group on the resulting /dev/md?, then create=
2 or
>>>>>> 3 logical volumes that will be used as XFS file systems.
>>>>
>>>> Clarrification: I'll run /dev/md* on the raw disks, without any
>>>> partitions on them.
>>>>
>>>>>> My goal is to optimize for maximum number of I/O operations per
>>>>>> second. ...
>>>>>>
>>>>>> Is this a reasonable approach for such a task?
>>>>>>
>>>>>> Should I do anything different to acchive maximum performance?
>>>>>>
>>>>>> What are the tunables in this setup? =A0[It seems the usual reci=
pies are
>>>>>> more oriented in maximizing the data troughput for large, mostly
>>>>>> sequential accesses - I figure that things like increasing read-=
ahead
>>>>>> etc. will not help me much here?]
>>>>
>>>> So can anybody help answering these questions:
>>>>
>>>> - are there any special options when creating the RAID0 to make it
>>>> =A0perform faster for such a use case?
>>>> - are there other tunables, any special MD / LVM / file system /
>>>> =A0read ahead / buffer cache / ... parameters to look for?
>>>>
>>>> Thanks.
>>>>
>>>> Wolfgang Denk
>>>>
>>>> --
>>>> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev=
Zundel
>>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germ=
any
>>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx=
de
>>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ra=
id" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
>>>>
>>>
>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 20.01.2011 03:48:07 von Keld Simonsen

On Wed, Jan 19, 2011 at 09:18:22PM -0200, Roberto Spadim wrote:
> a good idea....
> why not start a opensource raid controller?
> what we need? a cpu, memory, power supply with battery or capacitor,
> sas/sata (disk interfaces), pci-express or another (computer
> interface)

Why? because of some differences in memory speed?

Normally software raid is faster than hardware raid, as wittnessed by
many here on the list. The mentioning of max 350 MB/s on a SW raid=20
is not true, 350 MB/S is what I get out of a simple box with 4 slightly
oldish SATA drives. 16 new fast SATA drives in SW raid6 should easily g=
o beyond
1000 MB/s, given that there are not other bottlenecks in the system.

Linux SW raid goes fairly close to theoretical maxima, given adequate
HW.


best regards
keld

> it don?t need a operational system, since it will only run one progra=
m
> with some threads (ok a small operational system to implement threads
> easly)
>=20
> we could use arm, fpga, intel core2duo, atlhon, xeon, or another syst=
em...
> instead using a computer with ethernet interface (nbd nfs samba or
> another file/device sharing iscsi ethernet sata), we need a computer
> with pci-express interface and native operational system module
>=20
>=20
> 2011/1/19 Roberto Spadim :
> > the problem....
> > if you use iostat, or iotop
> > with software raid:
> > =A0 you just see disk i/o
> > =A0 you don?t see memory (cache) i/o
> > when using hardware raid:
> > =A0 you just see raid i/o (it can be a cache read or a real disk re=
ad)
> >
> >
> > if you check memory+disk i/o, you will get similar values, if not, =
you
> > will see high cpu usage
> > for example you are using raidx with 10disks on a hardware raid
> > change hardware raid to use only disks (10 disks for linux)
> > make the same raidx with 10disks
> > you will get a slower i/o since it have a controler between disk an=
d cpu
> > try it without hardware raid cpu, just a (sas/sata) optimized
> > controller, or 10 (sata/sas) one port
> > you still with a slow i/o then hardware controller (that?s right!)
> >
> > now let?s remove the sata/sas channel, let?s use a pci-express
> > revodrive or pci-express texas ssd drive
> > you will get better values then a hardware raid, but... why? you
> > changed the hardware (ok, i know) but you make cpu more close to di=
sk
> > if you use disks with cache, you will get more speed (a memory ssd
> > harddisk is faster than a harddisk only disk)
> >
> > why hardware are more faster than linux? i don?t think they are...
> > they can make smaller latencies with good memory cache
> > but if you computer use ddr3 and your hardware raid controller use =
i2c
> > memory, your ddr3 cache is faster...
> >
> > how to benchmark? check disk i/o+memory cache i/o
> > if linux is faster ok, you use more cpu and memory of your computer
> > if linux is slower ok, you use less cpu and memory, but will have i=
t
> > on hardware raid...
> > if you upgrade you memory and cpu, it can be faster than you hardwa=
re
> > raid controller, what?s better for you?
> >
> > want a better read/write solution for software raid? make a new
> > read/write code, you can do it, linux is easier than hardware raid =
to
> > code!
> > want a better read/write solution for hardware raid? call your
> > hardware seller and talk, please i need a better firmware, could yo=
u
> > send me?
> >
> > got?
> >
> >
> > 2011/1/19 Stefan /*St0fF*/ Hübner de>:
> >> @Roberto: I guess you're right. BUT: i have not seen 900MB/s comin=
g from
> >> (i.e. read access) a software raid, but I've seen it from a 9750 o=
n a
> >> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330)=
=A0So
> >> one might not be wrong assuming on current raid-controllers
> >> hardware/software matching and timing is way more optimized than w=
hat
> >> mdraid might get at all.
> >>
> >> The 9650 and 9690 are considerably slower, but I've seen 550MB/s t=
hruput
> >> from those, also (I don't recall the setup anymore, tho).
> >>
> >> Max reading I saw from a software raid was around 350MB/s - so hen=
ce my
> >> answers. =A0And if people had problems with controllers which are =
5 years
> >> or older by now, the numbers are not really comparable...
> >>
> >> Now again there's the point where there are also parameters on the
> >> controller that can be tweaked, and a simple way to recreate the t=
esting
> >> scenario. =A0We may discuss and throw in further numbers and exper=
ience,
> >> but not being able to recreate your specific scenario makes us tal=
k past
> >> each other...
> >>
> >> stefan
> >>
> >> Am 19.01.2011 20:50, schrieb Roberto Spadim:
> >>> So can anybody help answering these questions:
> >>>
> >>> - are there any special options when creating the RAID0 to make i=
t
> >>> perform faster for such a use case?
> >>> - are there other tunables, any special MD / LVM / file system / =
read
> >>> ahead / buffer cache / ... parameters to look for?
> >>>
> >>> lets see:
> >>> what?s your disk (ssd or sas or sata) best block size to write/re=
ad?
> >>> write this at ->(A)
> >>> what?s your work load? 50% write 50% read ?
> >>>
> >>> raid0 block size should be multiple of (A)
> >>> *****filesystem size should be multiple of (A) of all disks
> >>> *****read ahead should be a multiple of (A)
> >>> for example
> >>> /dev/sda 1kb
> >>> /dev/sdb 4kb
> >>>
> >>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of =
1kb and 4kb)
> >>>
> >>> check i/o sheduller per disk too (ssd should use noop, disks shou=
ld
> >>> use cfq, deadline or another...)
> >>> async and sync option at mount /etc/fstab, noatime reduce a lot o=
f i/o
> >>> too, you should optimize your application too
> >>> hdparm each disk to use dma and fastest i/o options
> >>>
> >>> are you using only filesystem? are you using somethink more? samb=
a?
> >>> mysql? apache? lvm?
> >>> each of this programs have some tunning, check their benchmarks
> >>>
> >>>
> >>> getting back....
> >>> what?s a raid controller?
> >>> cpu + memory + disk controller + disks
> >>> but... it only run raid software (it can run linux....)
> >>>
> >>> if you computer is slower than raid cpu+memory+disk controller, y=
ou
> >>> will have a slower software raid, than hardware raid
> >>> it?s like load balance on cpu/memory utilization of disk i/o (use
> >>> dedicated hardware, or use your hardware?)
> >>> got it?
> >>> using a super fast xeon with ddr3 and optical fiber running softw=
are
> >>> raid, is faster than a hardware raid using a arm (or fpga) ddrX m=
emory
> >>> and sas(fiber optical) connection to disks
> >>>
> >>> two solutions for the same problem
> >>> what?s fast? benchmark it
> >>> i think that if your xeon run a database and a very workloaded ap=
ache,
> >>> a dedicated hardware raid can run faster, but a light xeon can ru=
n
> >>> faster than a dedicated hardware raid
> >>>
> >>>
> >>>
> >>> 2011/1/19 Wolfgang Denk :
> >>>> Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,
> >>>>
> >>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
> >>>>>
> >>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nich=
t der
> >>>>> Controller.
> >>>>>
> >>>>> [in English:] Dude, the disks are your bottleneck.
> >>>> ...
> >>>>
> >>>> Maybe we can stop speculations about what might be the cause of =
the
> >>>> problems in some setup I do NOT intend to use, and rather discus=
s the
> >>>> questions I asked.
> >>>>
> >>>>>> I will have 4 x 1 TB disks for this setup.
> >>>>>>
> >>>>>> The plan is to build a RAID0 from the 4 devices, create a phys=
ical
> >>>>>> volume and a volume group on the resulting /dev/md?, then crea=
te 2 or
> >>>>>> 3 logical volumes that will be used as XFS file systems.
> >>>>
> >>>> Clarrification: I'll run /dev/md* on the raw disks, without any
> >>>> partitions on them.
> >>>>
> >>>>>> My goal is to optimize for maximum number of I/O operations pe=
r
> >>>>>> second. ...
> >>>>>>
> >>>>>> Is this a reasonable approach for such a task?
> >>>>>>
> >>>>>> Should I do anything different to acchive maximum performance?
> >>>>>>
> >>>>>> What are the tunables in this setup? =A0[It seems the usual re=
cipies are
> >>>>>> more oriented in maximizing the data troughput for large, most=
ly
> >>>>>> sequential accesses - I figure that things like increasing rea=
d-ahead
> >>>>>> etc. will not help me much here?]
> >>>>
> >>>> So can anybody help answering these questions:
> >>>>
> >>>> - are there any special options when creating the RAID0 to make =
it
> >>>> =A0perform faster for such a use case?
> >>>> - are there other tunables, any special MD / LVM / file system /
> >>>> =A0read ahead / buffer cache / ... parameters to look for?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Wolfgang Denk
> >>>>
> >>>> --
> >>>> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detl=
ev Zundel
> >>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Ge=
rmany
> >>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@de=
nx.de
> >>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-=
raid" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.=
html
> >>>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ra=
id" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
> >>
> >
> >
> >
> > --
> > Roberto Spadim
> > Spadim Technology / SPAEmpresarial
> >
>=20
>=20
>=20
> --=20
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 20.01.2011 04:53:09 von Roberto Spadim

=3D) i know, but since we have many proprietary firmware, a opensource
firmware (like openbios) could be very nice :D hehehehe

i will use linux raid (i=B4m sure it=B4s very good) it=B4s really fast,=
and
work with hotswap too
(ok there=B4s some userspace programs to allow it to work ok even with
wrong kernel hotswap problems, but when kernel can release and replug
it without problems we don=B4t need userspace programs... userspace
check each new hotpluged volume, if uuid is =3D some raid uuid device,
it put the device in the right raid device (i made it with a php
script =3D) hehehe) )


2011/1/20 Keld J=F8rn Simonsen :
> On Wed, Jan 19, 2011 at 09:18:22PM -0200, Roberto Spadim wrote:
>> a good idea....
>> why not start a opensource raid controller?
>> what we need? a cpu, memory, power supply with battery or capacitor,
>> sas/sata (disk interfaces), pci-express or another (computer
>> interface)
>
> Why? because of some differences in memory speed?
>
> Normally software raid is faster than hardware raid, as wittnessed by
> many here on the list. The mentioning of max 350 MB/s on a SW raid
> is not true, 350 MB/S is what I get out of a simple box with 4 slight=
ly
> oldish SATA drives. 16 new fast SATA drives in SW raid6 should easily=
go beyond
> 1000 MB/s, given that there are not other bottlenecks in the system.
>
> Linux SW raid goes fairly close to theoretical maxima, given adequate
> HW.
>
>
> best regards
> keld
>
>> it don?t need a operational system, since it will only run one progr=
am
>> with some threads (ok a small operational system to implement thread=
s
>> easly)
>>
>> we could use arm, fpga, intel core2duo, atlhon, xeon, or another sys=
tem...
>> instead using a computer with ethernet interface (nbd nfs samba or
>> another file/device sharing iscsi ethernet sata), we need a computer
>> with pci-express interface and native operational system module
>>
>>
>> 2011/1/19 Roberto Spadim :
>> > the problem....
>> > if you use iostat, or iotop
>> > with software raid:
>> > =A0 you just see disk i/o
>> > =A0 you don?t see memory (cache) i/o
>> > when using hardware raid:
>> > =A0 you just see raid i/o (it can be a cache read or a real disk r=
ead)
>> >
>> >
>> > if you check memory+disk i/o, you will get similar values, if not,=
you
>> > will see high cpu usage
>> > for example you are using raidx with 10disks on a hardware raid
>> > change hardware raid to use only disks (10 disks for linux)
>> > make the same raidx with 10disks
>> > you will get a slower i/o since it have a controler between disk a=
nd cpu
>> > try it without hardware raid cpu, just a (sas/sata) optimized
>> > controller, or 10 (sata/sas) one port
>> > you still with a slow i/o then hardware controller (that?s right!)
>> >
>> > now let?s remove the sata/sas channel, let?s use a pci-express
>> > revodrive or pci-express texas ssd drive
>> > you will get better values then a hardware raid, but... why? you
>> > changed the hardware (ok, i know) but you make cpu more close to d=
isk
>> > if you use disks with cache, you will get more speed (a memory ssd
>> > harddisk is faster than a harddisk only disk)
>> >
>> > why hardware are more faster than linux? i don?t think they are...
>> > they can make smaller latencies with good memory cache
>> > but if you computer use ddr3 and your hardware raid controller use=
i2c
>> > memory, your ddr3 cache is faster...
>> >
>> > how to benchmark? check disk i/o+memory cache i/o
>> > if linux is faster ok, you use more cpu and memory of your compute=
r
>> > if linux is slower ok, you use less cpu and memory, but will have =
it
>> > on hardware raid...
>> > if you upgrade you memory and cpu, it can be faster than you hardw=
are
>> > raid controller, what?s better for you?
>> >
>> > want a better read/write solution for software raid? make a new
>> > read/write code, you can do it, linux is easier than hardware raid=
to
>> > code!
>> > want a better read/write solution for hardware raid? call your
>> > hardware seller and talk, please i need a better firmware, could y=
ou
>> > send me?
>> >
>> > got?
>> >
>> >
>> > 2011/1/19 Stefan /*St0fF*/ Hübner u.de>:
>> >> @Roberto: I guess you're right. BUT: i have not seen 900MB/s comi=
ng from
>> >> (i.e. read access) a software raid, but I've seen it from a 9750 =
on a
>> >> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330=
). =A0So
>> >> one might not be wrong assuming on current raid-controllers
>> >> hardware/software matching and timing is way more optimized than =
what
>> >> mdraid might get at all.
>> >>
>> >> The 9650 and 9690 are considerably slower, but I've seen 550MB/s =
thruput
>> >> from those, also (I don't recall the setup anymore, tho).
>> >>
>> >> Max reading I saw from a software raid was around 350MB/s - so he=
nce my
>> >> answers. =A0And if people had problems with controllers which are=
5 years
>> >> or older by now, the numbers are not really comparable...
>> >>
>> >> Now again there's the point where there are also parameters on th=
e
>> >> controller that can be tweaked, and a simple way to recreate the =
testing
>> >> scenario. =A0We may discuss and throw in further numbers and expe=
rience,
>> >> but not being able to recreate your specific scenario makes us ta=
lk past
>> >> each other...
>> >>
>> >> stefan
>> >>
>> >> Am 19.01.2011 20:50, schrieb Roberto Spadim:
>> >>> So can anybody help answering these questions:
>> >>>
>> >>> - are there any special options when creating the RAID0 to make =
it
>> >>> perform faster for such a use case?
>> >>> - are there other tunables, any special MD / LVM / file system /=
read
>> >>> ahead / buffer cache / ... parameters to look for?
>> >>>
>> >>> lets see:
>> >>> what?s your disk (ssd or sas or sata) best block size to write/r=
ead?
>> >>> write this at ->(A)
>> >>> what?s your work load? 50% write 50% read ?
>> >>>
>> >>> raid0 block size should be multiple of (A)
>> >>> *****filesystem size should be multiple of (A) of all disks
>> >>> *****read ahead should be a multiple of (A)
>> >>> for example
>> >>> /dev/sda 1kb
>> >>> /dev/sdb 4kb
>> >>>
>> >>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of=
1kb and 4kb)
>> >>>
>> >>> check i/o sheduller per disk too (ssd should use noop, disks sho=
uld
>> >>> use cfq, deadline or another...)
>> >>> async and sync option at mount /etc/fstab, noatime reduce a lot =
of i/o
>> >>> too, you should optimize your application too
>> >>> hdparm each disk to use dma and fastest i/o options
>> >>>
>> >>> are you using only filesystem? are you using somethink more? sam=
ba?
>> >>> mysql? apache? lvm?
>> >>> each of this programs have some tunning, check their benchmarks
>> >>>
>> >>>
>> >>> getting back....
>> >>> what?s a raid controller?
>> >>> cpu + memory + disk controller + disks
>> >>> but... it only run raid software (it can run linux....)
>> >>>
>> >>> if you computer is slower than raid cpu+memory+disk controller, =
you
>> >>> will have a slower software raid, than hardware raid
>> >>> it?s like load balance on cpu/memory utilization of disk i/o (us=
e
>> >>> dedicated hardware, or use your hardware?)
>> >>> got it?
>> >>> using a super fast xeon with ddr3 and optical fiber running soft=
ware
>> >>> raid, is faster than a hardware raid using a arm (or fpga) ddrX =
memory
>> >>> and sas(fiber optical) connection to disks
>> >>>
>> >>> two solutions for the same problem
>> >>> what?s fast? benchmark it
>> >>> i think that if your xeon run a database and a very workloaded a=
pache,
>> >>> a dedicated hardware raid can run faster, but a light xeon can r=
un
>> >>> faster than a dedicated hardware raid
>> >>>
>> >>>
>> >>>
>> >>> 2011/1/19 Wolfgang Denk :
>> >>>> Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,
>> >>>>
>> >>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>> >>>>>
>> >>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nic=
ht der
>> >>>>> Controller.
>> >>>>>
>> >>>>> [in English:] Dude, the disks are your bottleneck.
>> >>>> ...
>> >>>>
>> >>>> Maybe we can stop speculations about what might be the cause of=
the
>> >>>> problems in some setup I do NOT intend to use, and rather discu=
ss the
>> >>>> questions I asked.
>> >>>>
>> >>>>>> I will have 4 x 1 TB disks for this setup.
>> >>>>>>
>> >>>>>> The plan is to build a RAID0 from the 4 devices, create a phy=
sical
>> >>>>>> volume and a volume group on the resulting /dev/md?, then cre=
ate 2 or
>> >>>>>> 3 logical volumes that will be used as XFS file systems.
>> >>>>
>> >>>> Clarrification: I'll run /dev/md* on the raw disks, without any
>> >>>> partitions on them.
>> >>>>
>> >>>>>> My goal is to optimize for maximum number of I/O operations p=
er
>> >>>>>> second. ...
>> >>>>>>
>> >>>>>> Is this a reasonable approach for such a task?
>> >>>>>>
>> >>>>>> Should I do anything different to acchive maximum performance=
?
>> >>>>>>
>> >>>>>> What are the tunables in this setup? =A0[It seems the usual r=
ecipies are
>> >>>>>> more oriented in maximizing the data troughput for large, mos=
tly
>> >>>>>> sequential accesses - I figure that things like increasing re=
ad-ahead
>> >>>>>> etc. will not help me much here?]
>> >>>>
>> >>>> So can anybody help answering these questions:
>> >>>>
>> >>>> - are there any special options when creating the RAID0 to make=
it
>> >>>> =A0perform faster for such a use case?
>> >>>> - are there other tunables, any special MD / LVM / file system =
/
>> >>>> =A0read ahead / buffer cache / ... parameters to look for?
>> >>>>
>> >>>> Thanks.
>> >>>>
>> >>>> Wolfgang Denk
>> >>>>
>> >>>> --
>> >>>> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Det=
lev Zundel
>> >>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, G=
ermany
>> >>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@d=
enx.de
>> >>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>> >>>> --
>> >>>> To unsubscribe from this list: send the line "unsubscribe linux=
-raid" in
>> >>>> the body of a message to majordomo@vger.kernel.org
>> >>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info=
html
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-r=
aid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.h=
tml
>> >>
>> >
>> >
>> >
>> > --
>> > Roberto Spadim
>> > Spadim Technology / SPAEmpresarial
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 21.01.2011 20:34:57 von Wolfgang Denk

Dear Roberto,

In message m> you wrote:
> a good idea....
> why not start a opensource raid controller?
> what we need? a cpu, memory, power supply with battery or capacitor,
> sas/sata (disk interfaces), pci-express or another (computer
> interface)
> it don=B4t need a operational system, since it will only run one prog=
ram
> with some threads (ok a small operational system to implement threads
> easly)
>
> we could use arm, fpga, intel core2duo, atlhon, xeon, or another syst=
em...

You could evenuse a processor dedicated for such a job, like a
PPC440SPe or PPC460SX or similar, which provide hardware-offload
capabilities for the RAID calculations. These are even supported by
drivers in mainline Linux.

But again, thee would not helpo to maximize IOPS - goal for
optimization has always been maximum sequential troughput only
(and yes, I know exactly what I'm talking about; guess where the
aforementioned drivers are coming from).

Best regards,

Wolfgang Denk

--=20
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
I don't see any direct evidence ... but, then, my crystal ball is in
dire need of an ectoplasmic upgrade. :-) -- Howard Smith
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 21.01.2011 21:03:52 von Roberto Spadim

=3D) i know
but, every body tell software is slower, the solution - use hardware
ok
there=B4s no opensource firmware for raid hardware

i preffer a good software/hardware solution, linux raid is a good
software solution for me =3D)
but, why not try a opensource project? hehe
what we could do.... a virtual machine :P with only raid and nfs, or
make a dedicated cpu for raid (cpu affinity) and a portion of memory
only for raid cache (today i think raid software don=B4t have cache, it
shoudn=B4t, cache is done by linux at filesystem level, i=B4m right?)


2011/1/21 Wolfgang Denk :
> Dear Roberto,
>
> In message com> you wrote:
>> a good idea....
>> why not start a opensource raid controller?
>> what we need? a cpu, memory, power supply with battery or capacitor,
>> sas/sata (disk interfaces), pci-express or another (computer
>> interface)
>> it don=B4t need a operational system, since it will only run one pro=
gram
>> with some threads (ok a small operational system to implement thread=
s
>> easly)
>>
>> we could use arm, fpga, intel core2duo, atlhon, xeon, or another sys=
tem...
>
> You could evenuse a processor dedicated for such a job, like a
> PPC440SPe or PPC460SX or similar, which provide hardware-offload
> capabilities for the RAID calculations. =A0These are even supported b=
y
> drivers in mainline Linux.
>
> But again, thee would not helpo to maximize IOPS - goal for
> optimization has always been maximum sequential troughput only
> (and yes, I know exactly what I'm talking about; guess where the
> aforementioned drivers are coming from).
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev Zu=
ndel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> I don't see any direct evidence ... =A0but, then, my crystal ball is =
in
> dire need of an ectoplasmic upgrade. :-) =A0 =A0 =A0 =A0 =A0 =A0 =A0-=
- Howard Smith
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 21.01.2011 21:04:22 von Roberto Spadim

thanks, i never used a PPC440SPe, i will buy one for hobby =3D)

2011/1/21 Roberto Spadim :
> =3D) i know
> but, every body tell software is slower, the solution - use hardware
> ok
> there=B4s no opensource firmware for raid hardware
>
> i preffer a good software/hardware solution, linux raid is a good
> software solution for me =3D)
> but, why not try a opensource project? hehe
> what we could do.... a virtual machine :P with only raid and nfs, or
> make a dedicated cpu for raid (cpu affinity) and a portion of memory
> only for raid cache (today i think raid software don=B4t have cache, =
it
> shoudn=B4t, cache is done by linux at filesystem level, i=B4m right?)
>
>
> 2011/1/21 Wolfgang Denk :
>> Dear Roberto,
>>
>> In message com> you wrote:
>>> a good idea....
>>> why not start a opensource raid controller?
>>> what we need? a cpu, memory, power supply with battery or capacitor=
,
>>> sas/sata (disk interfaces), pci-express or another (computer
>>> interface)
>>> it don=B4t need a operational system, since it will only run one pr=
ogram
>>> with some threads (ok a small operational system to implement threa=
ds
>>> easly)
>>>
>>> we could use arm, fpga, intel core2duo, atlhon, xeon, or another sy=
stem...
>>
>> You could evenuse a processor dedicated for such a job, like a
>> PPC440SPe or PPC460SX or similar, which provide hardware-offload
>> capabilities for the RAID calculations. =A0These are even supported =
by
>> drivers in mainline Linux.
>>
>> But again, thee would not helpo to maximize IOPS - goal for
>> optimization has always been maximum sequential troughput only
>> (and yes, I know exactly what I'm talking about; guess where the
>> aforementioned drivers are coming from).
>>
>> Best regards,
>>
>> Wolfgang Denk
>>
>> --
>> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev Z=
undel
>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, German=
y
>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.d=
e
>> I don't see any direct evidence ... =A0but, then, my crystal ball is=
in
>> dire need of an ectoplasmic upgrade. :-) =A0 =A0 =A0 =A0 =A0 =A0 =A0=
-- Howard Smith
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 24.01.2011 15:40:08 von CoolCold

On Wed, Jan 19, 2011 at 10:21 PM, Wolfgang Denk wrote:
> Dear =3D?ISO-8859-15?Q?Stefan_/*St0fF*/_H=3DFCbner?=3D,
>
> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>
>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>> Controller.
>>
>> [in English:] Dude, the disks are your bottleneck.
> ...
>
> Maybe we can stop speculations about what might be the cause of the
> problems in some setup I do NOT intend to use, and rather discuss the
> questions I asked.
>
>> > I will have 4 x 1 TB disks for this setup.
>> >
>> > The plan is to build a RAID0 from the 4 devices, create a physical
>> > volume and a volume group on the resulting /dev/md?, then create 2=
or
>> > 3 logical volumes that will be used as XFS file systems.
>
> Clarrification: I'll run /dev/md* on the raw disks, without any
> partitions on them.
>
>> > My goal is to optimize for maximum number of I/O operations per
>> > second. ...
>> >
>> > Is this a reasonable approach for such a task?
>> >
>> > Should I do anything different to acchive maximum performance?
>> >
>> > What are the tunables in this setup? =A0[It seems the usual recipi=
es are
>> > more oriented in maximizing the data troughput for large, mostly
>> > sequential accesses - I figure that things like increasing read-ah=
ead
>> > etc. will not help me much here?]
>
> So can anybody help answering these questions:
>
> - are there any special options when creating the RAID0 to make it
> =A0perform faster for such a use case?
> - are there other tunables, any special MD / LVM / file system /
> =A0read ahead / buffer cache / ... parameters to look for?
XFS is known for it's slow speed on metadata operations like updating
file attributes/removing files..but things gonna change after 2.6.35
where delaylog is used. Citating Dave Chinner :
< dchinner> Indeed, the biggest concurrency limitation has
traditionally been the transaction commit/journalling code, but that's
a lot more scalable now with delayed logging....

So, you may need to benchmark fs part.

>
> Thanks.
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH, =A0 =A0 MD: Wolfgang Denk & Detlev Zu=
ndel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 24.01.2011 16:25:02 von Justin Piszcz

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--655872-1146966783-1295882702=:14640
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE



On Mon, 24 Jan 2011, CoolCold wrote:

>> So can anybody help answering these questions:
>>
>> - are there any special options when creating the RAID0 to make it
>> =A0perform faster for such a use case?
>> - are there other tunables, any special MD / LVM / file system /
>> =A0read ahead / buffer cache / ... parameters to look for?
> XFS is known for it's slow speed on metadata operations like updating
> file attributes/removing files..but things gonna change after 2.6.35
> where delaylog is used. Citating Dave Chinner :
> < dchinner> Indeed, the biggest concurrency limitation has
> traditionally been the transaction commit/journalling code, but that's
> a lot more scalable now with delayed logging....
>
> So, you may need to benchmark fs part.

Some info on XFS benchmark with delaylog here:
http://comments.gmane.org/gmane.comp.file-systems.xfs.genera l/34379

Justin.

--655872-1146966783-1295882702=:14640--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 24.01.2011 21:43:31 von Wolfgang Denk

Dear CoolCold,

In message you wrote:
>
> > So can anybody help answering these questions:
> >
> > - are there any special options when creating the RAID0 to make it
> > perform faster for such a use case?
> > - are there other tunables, any special MD / LVM / file system /
> > read ahead / buffer cache / ... parameters to look for?
> XFS is known for it's slow speed on metadata operations like updating
> file attributes/removing files..but things gonna change after 2.6.35
> where delaylog is used. Citating Dave Chinner :
> < dchinner> Indeed, the biggest concurrency limitation has
> traditionally been the transaction commit/journalling code, but that's
> a lot more scalable now with delayed logging....
>
> So, you may need to benchmark fs part.

Thanks a lot - much appreciated. The first reply that actually was on
topic...

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
It is not best to swap horses while crossing the river.
- Abraham Lincoln
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 24.01.2011 21:48:27 von Wolfgang Denk

Dear Justin Piszcz,

In message you wrote:
>
> > So, you may need to benchmark fs part.
>
> Some info on XFS benchmark with delaylog here:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.genera l/34379

Thanks a lot for the pointer. I will try this out.

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Madness takes its toll. Please have exact change.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 24.01.2011 22:57:13 von Wolfgang Denk

Dear Justin,

In message you wrote:
>
> Some info on XFS benchmark with delaylog here:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.genera l/34379

For the record: I tested both the "delaylog" and "logbsize=262144" on
two systems running Fedora 14 x86_64 (kernel version
2.6.35.10-74.fc14.x86_64).


Test No. Mount options
1 rw,noatime
2 rw,noatime,delaylog
3 rw,noatime,delaylog,logbsize=262144


System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM
--------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks
(chunk size 16 kB, using S-ATA ports on main board), XFS

Test 1:

Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
A1 8G 844 96 153107 19 56427 11 2006 98 127174 15 369.4 6
Latency 13686us 1480ms 1128ms 14986us 136ms 74911us
Version 1.96 ------Sequential Create------ --------Random Create--------
A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0
Latency 326ms 171us 277ms 343ms 9us 360ms
1.96,1.96,A1,1,1295714835,8G,,844,96,153107,19,56427,11,2006 ,98,127174,15,369.4,6,16,,,,,104,0,+++++,+++,115,0,89,0,++++ +,+++,111,0,13686us,1480ms,1128ms,14986us,136ms,74911us,326m s,171us,277ms,343ms,9us,360ms

Test 2:

Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
A2 8G 417 46 67526 8 28251 5 1338 63 53780 5 236.0 4
Latency 38626us 1859ms 508ms 26689us 258ms 188ms
Version 1.96 ------Sequential Create------ --------Random Create--------
A2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 51 0 +++++ +++ 128 0 102 0 +++++ +++ 125 0
Latency 1526ms 169us 277ms 363ms 8us 324ms
1.96,1.96,A2,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63 ,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++ ,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us ,277ms,363ms,8us,324ms

Test 3:

Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
A3 8G 417 46 67526 8 28251 5 1338 63 53780 5 236.0 4
Latency 38626us 1859ms 508ms 26689us 258ms 188ms
Version 1.96 ------Sequential Create------ --------Random Create--------
A3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 51 0 +++++ +++ 128 0 102 0 +++++ +++ 125 0
Latency 1526ms 169us 277ms 363ms 8us 324ms
1.96,1.96,A3,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63 ,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++ ,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us ,277ms,363ms,8us,324ms

System B: Supermicro H8DM8-2 Mainbord, Dual-Core AMD Opteron 2216 @ 2.4 GHz, 8 GB RAM
software RAID 6 using 6 x Seagate ST31000524NS S-ATA II disks
(chunk size 16 kB, using a Marvell MV88SX6081 8-port SATA II PCI-X Controller)
XFS

Test 1:

Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
B1 16G 403 98 198720 66 53287 49 1013 99 228076 91 545.0 31
Latency 43022us 127ms 126ms 29328us 105ms 66395us
Version 1.96 ------Sequential Create------ --------Random Create--------
B1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 97 1 +++++ +++ 96 1 96 1 +++++ +++ 95 1
Latency 326ms 349us 351ms 355ms 49us 363ms
1.96,1.96,B1,1,1295784794,16G,,403,98,198720,66,53287,49,101 3,99,228076,91,545.0,31,16,,,,,97,1,+++++,+++,96,1,96,1,++++ +,+++,95,1,43022us,127ms,126ms,29328us,105ms,66395us,326ms,3 49us,351ms,355ms,49us,363ms

Test 2:

Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
B2 16G 380 98 197319 68 54835 48 983 99 216812 89 527.8 31
Latency 47456us 227ms 280ms 24696us 38233us 80147us
Version 1.96 ------Sequential Create------ --------Random Create--------
B2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 91 1 +++++ +++ 115 1 73 1 +++++ +++ 96 1
Latency 355ms 2274us 833ms 750ms 1079us 400ms
1.96,1.96,B2,1,1295884032,16G,,380,98,197319,68,54835,48,983 ,99,216812,89,527.8,31,16,,,,,91,1,+++++,+++,115,1,73,1,++++ +,+++,96,1,47456us,227ms,280ms,24696us,38233us,80147us,355ms ,2274us,833ms,750ms,1079us,400ms

Test 3:

Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
B3 16G 402 99 175802 64 55639 48 1006 99 232748 87 543.7 32
Latency 43160us 426ms 164ms 13306us 40857us 65114us
Version 1.96 ------Sequential Create------ --------Random Create--------
B3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 93 1 +++++ +++ 101 1 95 1 +++++ +++ 95 1
Latency 479ms 2281us 383ms 366ms 22us 402ms
1.96,1.96,B3,1,1295880202,16G,,402,99,175802,64,55639,48,100 6,99,232748,87,543.7,32,16,,,,,93,1,+++++,+++,101,1,95,1,+++ ++,+++,95,1,43160us,426ms,164ms,13306us,40857us,65114us,479m s,2281us,383ms,366ms,22us,402ms


I do not see any significant improvement in any of the parameters -
especially when compared to the serious performance degradation (down
to 44% for block write, 42% for block read) on system A.

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
A supercomputer is a machine that runs an endless loop in 2 seconds.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 00:03:14 von Dave Chinner

On Mon, Jan 24, 2011 at 10:57:13PM +0100, Wolfgang Denk wrote:
> Dear Justin,
>
> In message you wrote:
> >
> > Some info on XFS benchmark with delaylog here:
> > http://comments.gmane.org/gmane.comp.file-systems.xfs.genera l/34379
>
> For the record: I tested both the "delaylog" and "logbsize=262144" on
> two systems running Fedora 14 x86_64 (kernel version
> 2.6.35.10-74.fc14.x86_64).
>
>
> Test No. Mount options
> 1 rw,noatime
> 2 rw,noatime,delaylog
> 3 rw,noatime,delaylog,logbsize=262144
>
>
> System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM
> --------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks
> (chunk size 16 kB, using S-ATA ports on main board), XFS
>
> Test 1:
>
> Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> A1 8G 844 96 153107 19 56427 11 2006 98 127174 15 369.4 6
> Latency 13686us 1480ms 1128ms 14986us 136ms 74911us
> Version 1.96 ------Sequential Create------ --------Random Create--------
> A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0

Only 16 files? You need to test something that takes more than 5
milliseconds to run. Given that XFS can run at >20,000 creates/s for
a single threaded sequential create like this, perhaps you should
start at 100,000 files (maybe a million) so you get an idea of
sustained performance.

......

> I do not see any significant improvement in any of the parameters -
> especially when compared to the serious performance degradation (down
> to 44% for block write, 42% for block read) on system A.

delaylog does not affect the block IO path in any way, so something
else is going on there. You need to sort that out before drawing any
conclusions.

Similarly, you need to test something relevant to your workload, not
use a canned benchmarks in the expectation the results are in any
way meaningful to your real workload. Also, if you do use a stupid
canned benchmark, make sure you configure it to test something
relevant to what you are trying to compare...

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 08:39:00 von Emmanuel Florac

Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez:

> Only 16 files?

IIRC this is 16 thousands of files. Though this is not enough, I
generally use 80 to 160 for tests.

--=20
------------------------------------------------------------ -----------=
-
Emmanuel Florac | Direction technique
| Intellique
|
| +33 1 78 94 84 02
------------------------------------------------------------ -----------=
-
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 09:36:43 von Dave Chinner

[ As a small note - if you are going to comment on the results table
from a previous message, please don't cut it from your response.
Context is important. I pasted the relevant part back in so i can
refer back to it in my response. ]

On Tue, Jan 25, 2011 at 08:39:00AM +0100, Emmanuel Florac wrote:
> Le Tue, 25 Jan 2011 10:03:14 +1100 vous =E9criviez:
> > > Version 1.96 ------Sequential Create------ --------Random =
Create--------
> > > A1 -Create-- --Read--- -Delete-- -Create-- --Rea=
d--- -Delete--
> > > files /sec %CP /sec %CP /sec %CP /sec %CP /sec=
%CP /sec %CP
> > > 16 104 0 +++++ +++ 115 0 89 0 +++++=
+++ 111 0
> >=20
> > Only 16 files?
>=20
> IIRC this is 16 thousands of files. Though this is not enough, I
> generally use 80 to 160 for tests.

Yes, you're right, the bonnie++ man page states that it is in units
of 1024 files. Be nice if there was a "k" to signify that so people
who aren't intimately familiar with it's output format can see
exactly what was tested....

As it is, a create rate of 104 files/s (note the consistency of
units between 2 adjacent numbers!) indicates something else is
screwed, because my local test VM on RAID0 gets numbers like this:

Version 1.96 ------Sequential Create------ --------Random Create=
--------
test-4 -Create-- --Read--- -Delete-- -Create-- --Read--- -=
Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP =
/sec %CP
16 25507 90 +++++ +++ 30472 97 25281 93 +++++ +++ 2=
9077 97
Latency 23864us 204us 21092us 18855us 82us =
121us

IOWs, create rates of 25k/s and unlink of 30k/s and it is clearly
CPU bound.

Therein lies the difference: the original numbers have 0% CPU usage,
which indicates that the test is blocking. Something is causing the
reported test system to be blocked almost all the time.

/me looks closer.

Oh, despite $subject being "RAID0" the filesystems being tested are
on RAID5 and RAID6 with very small chunk sizes on slow SATA drives.
This is smelling like a case of barrier IOs on software raid on
cheap storage....

Cheers,

Dave.
--=20
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 13:45:09 von Wolfgang Denk

Dear Dave Chinner,

In message <20110125083643.GE28803@dastard> you wrote:
>
> Oh, despite $subject being "RAID0" the filesystems being tested are
> on RAID5 and RAID6 with very small chunk sizes on slow SATA drives.
> This is smelling like a case of barrier IOs on software raid on
> cheap storage....

Right. [Any way to avoid these, btw?] I got side-tracked by the
comments about the new (to me) delaylog mount option to xfs; as the
results were not exactly as exp[ected I though it might be interesting
to report these.

But as the subject says, my current topic is tuning RAID0 to avoid
exactly this type of bottleneck; or rather looking for tunable options
on RAID0

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
PLEASE NOTE: Some Quantum Physics Theories Suggest That When the Con-
sumer Is Not Directly Observing This Product, It May Cease to Exist
or Will Exist Only in a Vague and Undetermined State.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 13:51:23 von Emmanuel Florac

Le Tue, 25 Jan 2011 13:45:09 +0100
Wolfgang Denk =E9crivait:

> > This is smelling like a case of barrier IOs on software raid on
> > cheap storage.... =20
>=20
> Right. [Any way to avoid these, btw?]=20

Easy enough, use the "nobarrier" mount option.=20

--=20
------------------------------------------------------------ -----------=
-
Emmanuel Florac | Direction technique
| Intellique
|
| +33 1 78 94 84 02
------------------------------------------------------------ -----------=
-
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 18:10:17 von Christoph Hellwig

On Tue, Jan 18, 2011 at 10:01:12PM +0100, Wolfgang Denk wrote:
> Hi,
>
> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain
> s/w RAID0, because the existing system appears to be seriously limited
> in terms of numbers of I/O operations per second.
>
> Our workload is mixed read / write (something between 80% read / 20%
> write and 50% / 50%), consisting of a very large number of usually
> very small files.
>
> There may be 20...50 millions of files, or more. 65% of the files are
> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> kB; 98.4% are smaller than 64 kB.

I don't think you even want a RAID0 in that case. For small IOPs
you're much better off with a simple concatenation of devices.

> The plan is to build a RAID0 from the 4 devices, create a physical
> volume and a volume group on the resulting /dev/md?, then create 2 or
> 3 logical volumes that will be used as XFS file systems.

Especially if you're running XFS the concetantion will work beautifully
for this setup. Make sure that your AG boundaries align to the physical
devices, and they can be used completely independently for small IOPs.

> Should I do anything different to acchive maximum performance?

Make sure to disable the disk write caches and if not using the newest
kernel also mount the filesystem with -o nobarrier. With lots of small
I/Os and metadata intensive workloads that's usually a lot faster.

Also if you have a lot of log traffic an external log devices will
help a lot. It's doesn't need to be larger, but it will keep the
amount of seeks on the other devices down.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 19:41:15 von Wolfgang Denk

Dear Christoph,

In message <20110125171017.GA24921@infradead.org> you wrote:
>
> > There may be 20...50 millions of files, or more. 65% of the files are
> > smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> > kB; 98.4% are smaller than 64 kB.
>
> I don't think you even want a RAID0 in that case. For small IOPs
> you're much better off with a simple concatenation of devices.

What exactly do you mean by "conatenation"? LVM striping?
At least the discussion here does not show any significant advantages
for this concept:
http://groups.google.com/group/ubuntu-user-community/web/pic k-your-pleasure-raid-0-mdadm-striping-or-lvm-striping

> > Should I do anything different to acchive maximum performance?
>
> Make sure to disable the disk write caches and if not using the newest
> kernel also mount the filesystem with -o nobarrier. With lots of small
> I/Os and metadata intensive workloads that's usually a lot faster.

Tests if done recently indicate that on the other hand nobarrier causes
a serious degradation of read and write performance (down to some 40%
of the values before).

> Also if you have a lot of log traffic an external log devices will
> help a lot. It's doesn't need to be larger, but it will keep the
> amount of seeks on the other devices down.

Understood, thanks.

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Never underestimate the bandwidth of a station wagon full of tapes.
-- Dr. Warren Jackson, Director, UTCS
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 25.01.2011 22:35:23 von Christoph Hellwig

On Tue, Jan 25, 2011 at 07:41:15PM +0100, Wolfgang Denk wrote:
> > I don't think you even want a RAID0 in that case. For small IOPs
> > you're much better off with a simple concatenation of devices.
>
> What exactly do you mean by "conatenation"? LVM striping?
> At least the discussion here does not show any significant advantages
> for this concept:
> http://groups.google.com/group/ubuntu-user-community/web/pic k-your-pleasure-raid-0-mdadm-striping-or-lvm-striping

No, concatenation means not using any striping, but just concatenating
the disk linearly, e.g.

+-----------------------------------+
| Filesystem |
+--------+--------+--------+--------+
| Disk 1 | Disk 2 | Disk 3 | Disk 4 |
+--------+--------+--------+--------+

This can be done using the using the MD linear target, or simply
by having multiple PVs in a VG with LVM.

>
> > Make sure to disable the disk write caches and if not using the newest
> > kernel also mount the filesystem with -o nobarrier. With lots of small
> > I/Os and metadata intensive workloads that's usually a lot faster.
>
> Tests if done recently indicate that on the other hand nobarrier causes
> a serious degradation of read and write performance (down to some 40%
> of the values before).

Do you have a pointer to your results?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 26.01.2011 08:16:16 von Wolfgang Denk

Dear Christoph Hellwig,

In message <20110125213523.GA14375@infradead.org> you wrote:
>
> > What exactly do you mean by "conatenation"? LVM striping?
> > At least the discussion here does not show any significant advantages
> > for this concept:
> > http://groups.google.com/group/ubuntu-user-community/web/pic k-your-pleasure-raid-0-mdadm-striping-or-lvm-striping
>
> No, concatenation means not using any striping, but just concatenating
> the disk linearly, e.g.
>
> +-----------------------------------+
> | Filesystem |
> +--------+--------+--------+--------+
> | Disk 1 | Disk 2 | Disk 3 | Disk 4 |
> +--------+--------+--------+--------+
>
> This can be done using the using the MD linear target, or simply
> by having multiple PVs in a VG with LVM.

I will not have a single file system, but several, so I'd probably go
with LVM. But - when I then create a LV, eventually smaller than any
of the disks, will the data (and thus the traffic) be really distri-
buted over all drives, or will I not basicly see the same results as
when using a single drive?

> > Tests if done recently indicate that on the other hand nobarrier causes
> > a serious degradation of read and write performance (down to some 40%
> > of the values before).
>
> Do you have a pointer to your results?

This was the first set of tests:

http://thread.gmane.org/gmane.linux.raid/31269/focus=31419

I've run some more tests on the system called 'B' in this list:


# lvcreate -L 32G -n test castor0
Logical volume "test" created
# mkfs.xfs /dev/mapper/castor0-test
meta-data=/dev/mapper/castor0-test isize=256 agcount=16, agsize=524284 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=8388544, imaxpct=25
= sunit=4 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=4096, version=2
= sectsz=512 sunit=4 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
# mount /dev/mapper/castor0-test /mnt/tmp/
# mkdir /mnt/tmp/foo
# chown wd.wd /mnt/tmp/foo
# bonnie++ -d /mnt/tmp/foo -m xfs -u wd -g wd
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
xfs 16G 425 98 182929 64 46956 41 955 97 201274 83 517.6 30
Latency 42207us 2377ms 195ms 33339us 86675us 84167us
Version 1.96 ------Sequential Create------ --------Random Create--------
xfs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 93 1 +++++ +++ 90 1 123 1 +++++ +++ 127 1
Latency 939ms 2279us 1415ms 307ms 1057us 724ms
1.96,1.96,xfs,1,1295938326,16G,,425,98,182929,64,46956,41,95 5,97,201274,83,517.6,30,16,,,,,93,1,+++++,+++,90,1,123,1,+++ ++,+++,127,1,42207us,2377ms,195ms,33339us,86675us,84167us,93 9ms,2279us,1415ms,307ms,1057us,724ms

[[Re-run with larger number of file creates / deletes]]

# bonnie++ -d /mnt/tmp/foo -n 128:65536:0:512 -m xfs1 -u wd -g wd
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
xfs1 16G 400 98 175931 63 46970 40 781 99 181044 73 524.2 30
Latency 48299us 2501ms 210ms 20693us 83729us 85349us
Version 1.96 ------Sequential Create------ --------Random Create--------
xfs1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
128:65536:0/512 42 1 25607 99 71 1 38 1 8267 67 34 0
Latency 1410ms 2337us 2116ms 1240ms 44920us 4139ms
1.96,1.96,xfs1,1,1295942356,16G,,400,98,175931,63,46970,40,7 81,99,181044,73,524.2,30,128,65536,,,512,42,1,25607,99,71,1, 38,1,8267,67,34,0,48299us,2501ms,210ms,20693us,83729us,85349 us,1410ms,2337us,2116ms,1240ms,44920us,4139ms

[[Add delaylog,logbsize=262144]]

# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type xfs (rw)
# mount -o remount,noatime,delaylog,logbsize=262144 /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144)
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
xfs1 16G 445 98 106201 43 35407 33 939 99 83545 42 490.4 30
Latency 43307us 4614ms 242ms 37420us 195ms 128ms
Version 1.96 ------Sequential Create------ --------Random Create--------
xfs1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
128:65536:0/512 308 4 24121 99 2393 30 321 5 22929 99 331 6
Latency 34842ms 1288us 6634ms 87944ms 195us 12239ms
1.96,1.96,xfs1,1,1295968991,16G,,445,98,106201,43,35407,33,9 39,99,83545,42,490.4,30,128,65536,,,512,308,4,24121,99,2393, 30,321,5,22929,99,331,6,43307us,4614ms,242ms,37420us,195ms,1 28ms,34842ms,1288us,6634ms,87944ms,195us,12239ms


[[Note: Block write: drop to 60%, Block read drops to <50%]]

[[Add nobarriers]]

# mount -o remount,nobarriers /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers)
# bonnie++ -d /mnt/tmp/foo -n 128:65536:0:512 -m xfs2 -u wd -g wd
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
xfs2 16G 427 98 193950 65 52848 45 987 99 198110 83 496.5 25
Latency 41543us 128ms 186ms 14678us 67639us 76024us
Version 1.96 ------Sequential Create------ --------Random Create--------
xfs2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
128:65536:0/512 352 6 24513 99 2604 32 334 5 24921 99 333 6
Latency 32152ms 2307us 4148ms 31036ms 493us 23065ms
1.96,1.96,xfs2,1,1295966513,16G,,427,98,193950,65,52848,45,9 87,99,198110,83,496.5,25,128,65536,,,512,352,6,24513,99,2604 ,32,334,5,24921,99,333,6,41543us,128ms,186ms,14678us,67639us ,76024us,32152ms,2307us,4148ms,31036ms,493us,23065ms


[[Much better. But now compare ext4]]

# mkfs.ext4 /dev/mapper/castor0-test
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=4 blocks, Stripe width=16 blocks
2097152 inodes, 8388608 blocks
419430 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
256 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
# mount /dev/mapper/castor0-test /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type ext4 (rw)
# mkdir /mnt/tmp/foo
# chown wd.wd /mnt/tmp/foo
# bonnie++ -d /mnt/tmp/foo -m ext4 -u wd -g wd
....
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ext4 16G 248 99 128657 49 61267 49 1026 97 236552 85 710.9 35
Latency 78833us 567ms 2586ms 37539us 61572us 88413us
Version 1.96 ------Sequential Create------ --------Random Create--------
ext4 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 14841 52 +++++ +++ 23164 70 20409 78 +++++ +++ 23441 73
Latency 206us 2384us 2372us 2322us 78us 2335us
1.96,1.96,ext4,1,1295954392,16G,,248,99,128657,49,61267,49,1 026,97,236552,85,710.9,35,16,,,,,14841,52,+++++,+++,23164,70 ,20409,78,+++++,+++,23441,73,78833us,567ms,2586ms,37539us,61 572us,88413us,206us,2384us,2372us,2322us,78us,2335us

[[Only 2/3 of the speed of XFS for block write, but nearly 20% faster
for block read. But magnitudes faster for file creates / deletes!]]

[[add nobarrier]]

# mount -o remount,nobarrier /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type ext4.2 (rw,nobarrier)
# bonnie++ -d /mnt/tmp/foo -m ext4 -u wd -g wd
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ext4.2 16G 241 99 125446 50 57726 55 945 97 215698 87 509.2 54
Latency 81198us 1085ms 2479ms 46401us 111ms 83051us
Version 1.96 ------Sequential Create------ --------Random Create--------
ext4 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 12476 63 +++++ +++ 23990 66 21185 82 +++++ +++ 23039 82
Latency 440us 1019us 1094us 238us 25us 215us
1.96,1.96,ext4.2,1,1295996176,16G,,241,99,125446,50,57726,55 ,945,97,215698,87,509.2,54,16,,,,,12476,63,+++++,+++,23990,6 6,21185,82,+++++,+++,23039,82,81198us,1085ms,2479ms,46401us, 111ms,83051us,440us,1019us,1094us,238us,25us,215us

[[Again, degradation of about 10% for block read; with only minod
advantages for seq. delete and random create]]



Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
For those who like this sort of thing, this is the sort of thing they
like. - Abraham Lincoln
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 26.01.2011 09:32:49 von Stan Hoeppner

Wolfgang Denk put forth on 1/26/2011 1:16 AM:

> I will not have a single file system, but several, so I'd probably go
> with LVM. But - when I then create a LV, eventually smaller than any
> of the disks, will the data (and thus the traffic) be really distri-
> buted over all drives, or will I not basicly see the same results as
> when using a single drive?

If creating multiple filesystems then concatenation is probably not what you
want, for the reasons you suspect, if you want the IO spread across all 4 disks
for all operations on all filesystems.

> # lvcreate -L 32G -n test castor0
> Logical volume "test" created
> # mkfs.xfs /dev/mapper/castor0-test

Is this on that set of 4 low end Maxtor disks? Is the above LV sitting atop
RAID 0, RAID 5, or concatenation?

> [[Only 2/3 of the speed of XFS for block write, but nearly 20% faster
> for block read. But magnitudes faster for file creates / deletes!]]

Try adding some concurrency, say 8, to bonnie++ and retest both XFS and ext4.
XFS was designed/optimized for parallel workloads, not single thread workloads
(although it can extract some concurrency from a single thread workload). XFS
really shines with parallel workloads (assuming the underlying hardware isn't
junk, and the mdraid/lvm configuration is sane). ext4 will probably always beat
XFS performance with single thread workloads, and I don't believe anyone is
surprised by that. For most moderate to heavy parallel workloads, XFS usually
trounces ext4 (and all other Linux filesystems).

--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 26.01.2011 09:42:24 von Wolfgang Denk

Dear Stan Hoeppner,

In message <4D3FDC31.3010502@hardwarefreak.com> you wrote:
>
> > # lvcreate -L 32G -n test castor0
> > Logical volume "test" created
> > # mkfs.xfs /dev/mapper/castor0-test
>
> Is this on that set of 4 low end Maxtor disks? Is the above LV sitting atop
> RAID 0, RAID 5, or concatenation?

No, this is the other system, using 6 x Seagate ST31000524NS on a
Marvell MV88SX6081 8-port SATA II PCI-X Controller.

LVM is sitting on top of a RAID6 here:

md2 : active raid6 sda[0] sdi[5] sdh[4] sde[3] sdd[2] sdb[1]
3907049792 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/6] [UUUUUU]

> Try adding some concurrency, say 8, to bonnie++ and retest both XFS and ext4.

OK, will do.

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The price of curiosity is a terminal experience.
- Terry Pratchett, _The Dark Side of the Sun_
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 26.01.2011 10:38:54 von Christoph Hellwig

On Wed, Jan 26, 2011 at 08:16:16AM +0100, Wolfgang Denk wrote:
> I will not have a single file system, but several, so I'd probably go
> with LVM. But - when I then create a LV, eventually smaller than any
> of the disks, will the data (and thus the traffic) be really distri-
> buted over all drives, or will I not basicly see the same results as
> when using a single drive?

Think about it: if you're doing small IOPs, they usually are smaller
than the stripe size and you will hit only one disk anyway. But with
a raid0 which disk you hit is relatively unpredictable. With a
concatentation aligned to the AGs XFS will distribute processes writing
data to the different AGs and thus the different disks, and you can
reliably get performance out of them.

If you have multiple filesystems the setup depends a lot on the
workloads you plan to put on the filesystems. If all of the filesystems
on it are busy at the same time just assigning disks to filesystems
probably gives you the best performace. If they are busy at different
times, or some are not busy at all you first want to partition the disk
into areas for each filesystem and then concatenate them into volumes
for each filesystem.


> [[Note: Block write: drop to 60%, Block read drops to <50%]]

How is the cpu load? delaylog trades I/O operations for cpu
utilization. Together with a raid6, which apparently is the system you
use here i might overload your system.

And btw, in future please state you have numbers for a totally different
setup then the one you're asking questions for. Comparing a raid6 setup
to striping/concatenation is completely irrelevant.

>
> [[Add nobarriers]]
>
> # mount -o remount,nobarriers /mnt/tmp
> # mount | grep /mnt/tmp
> /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers)

a) the option is called nobarrier
b) it looks like your mount implementation is really buggy as it shows
random options that weren't actually parsed and accepted by the
filesystem.

> [[Again, degradation of about 10% for block read; with only minod
> advantages for seq. delete and random create]]

I really don't trust the numbers. nobarrier sends down less I/O
requests, and avoids all kinds of queue stalls. How repetable are these
benchmarks? Do you also see it using a less hacky benchmark than
bonnier++?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Optimize RAID0 for max IOPS?

am 26.01.2011 10:41:12 von CoolCold

On Wed, Jan 26, 2011 at 12:38 PM, Christoph Hellwig =
wrote:
> On Wed, Jan 26, 2011 at 08:16:16AM +0100, Wolfgang Denk wrote:
>> I will not have a single file system, but several, so I'd probably g=
o
>> with LVM. But - when I then create a LV, eventually smaller than any
>> of the disks, will the data (and thus the traffic) be really distri-
>> buted over all drives, or will I not basicly see the same results as
>> when using a single drive?
>
> Think about it: =A0if you're doing small IOPs, they usually are small=
er
> than the stripe size and you will hit only one disk anyway. =A0But wi=
th
> a raid0 which disk you hit is relatively unpredictable. =A0With a
> concatentation aligned to the AGs XFS will distribute processes writi=
ng
> data to the different AGs and thus the different disks, and you can
> reliably get performance out of them.
>
> If you have multiple filesystems the setup depends a lot on the
> workloads you plan to put on the filesystems. =A0If all of the filesy=
stems
> on it are busy at the same time just assigning disks to filesystems
> probably gives you the best performace. =A0If they are busy at differ=
ent
> times, or some are not busy at all you first want to partition the di=
sk
> into areas for each filesystem and then concatenate them into volumes
> for each filesystem.
>
>
>> [[Note: Block write: drop to 60%, Block read drops to <50%]]
>
> How is the cpu load? =A0delaylog trades I/O operations for cpu
> utilization. =A0Together with a raid6, which apparently is the system=
you
> use here i might overload your system.
>
> And btw, in future please state you have numbers for a totally differ=
ent
> setup then the one you're asking questions for. =A0Comparing a raid6 =
setup
> to striping/concatenation is completely irrelevant.
>
>>
>> [[Add nobarriers]]
>>
>> # mount -o remount,nobarriers /mnt/tmp
>> # mount | grep /mnt/tmp
>> /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,l=
ogbsize=3D262144,nobarriers)
>
> =A0a) the option is called nobarrier
> =A0b) it looks like your mount implementation is really buggy as it s=
hows
> =A0 =A0random options that weren't actually parsed and accepted by th=
e
> =A0 =A0filesystem.
cat /proc/mounts may help i guess

>
>> [[Again, degradation of about 10% for block read; with only minod
>> advantages for seq. delete and random create]]
>
> I really don't trust the numbers. =A0nobarrier sends down less I/O
> requests, and avoids all kinds of queue stalls. =A0How repetable are =
these
> benchmarks? =A0Do you also see it using a less hacky benchmark than
> bonnier++?
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html