raid over ethernet

am 29.01.2011 02:58:09 von Roberto Spadim

hi guys, i was thinking about raid over ethernet... there's a solution
to make a syncronous replica of my filesystem? no problem if my
primary server get down, i can mout my replica fsck it and continue
with available data
i was reading about nbd, anyone have more ideas?

--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 06:41:04 von jeromepoulin

DRBD: http://www.drbd.org/

Envoy=E9 de mon appareil mobile.

J=E9r=F4me Poulin
Solutions G.A.

On 2011-01-28, at 20:58, Roberto Spadim wrote:

> hi guys, i was thinking about raid over ethernet... there's a solutio=
n
> to make a syncronous replica of my filesystem? no problem if my
> primary server get down, i can mout my replica fsck it and continue
> with available data
> i was reading about nbd, anyone have more ideas?
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 07:42:16 von Roberto Spadim

is it better than nbd+mdadm?

2011/1/29 J=E9r=F4me Poulin :
> DRBD: http://www.drbd.org/
>
> Envoy=E9 de mon appareil mobile.
>
> J=E9r=F4me Poulin
> Solutions G.A.
>
> On 2011-01-28, at 20:58, Roberto Spadim wrote=
:
>
>> hi guys, i was thinking about raid over ethernet... there's a soluti=
on
>> to make a syncronous replica of my filesystem? no problem if my
>> primary server get down, i can mout my replica fsck it and continue
>> with available data
>> i was reading about nbd, anyone have more ideas?
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 07:42:38 von Mikael Abrahamsson

On Fri, 28 Jan 2011, Roberto Spadim wrote:

> hi guys, i was thinking about raid over ethernet... there's a solution
> to make a syncronous replica of my filesystem? no problem if my
> primary server get down, i can mout my replica fsck it and continue
> with available data
> i was reading about nbd, anyone have more ideas?

Look into AoE (ATA over Ethernet).

--
Mikael Abrahamsson email: swmike@swm.pp.se
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 07:44:05 von Roberto Spadim

faster than nbd?

2011/1/29 Mikael Abrahamsson :
> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>
>> hi guys, i was thinking about raid over ethernet... there's a soluti=
on
>> to make a syncronous replica of my filesystem? no problem if my
>> primary server get down, i can mout my replica fsck it and continue
>> with available data
>> i was reading about nbd, anyone have more ideas?
>
> Look into AoE (ATA over Ethernet).
>
> --
> Mikael Abrahamsson =A0 =A0email: swmike@swm.pp.se
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 07:48:17 von Roberto Spadim

better than drbd?

2011/1/29 Roberto Spadim :
> faster than nbd?
>
> 2011/1/29 Mikael Abrahamsson :
>> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>>
>>> hi guys, i was thinking about raid over ethernet... there's a solut=
ion
>>> to make a syncronous replica of my filesystem? no problem if my
>>> primary server get down, i can mout my replica fsck it and continue
>>> with available data
>>> i was reading about nbd, anyone have more ideas?
>>
>> Look into AoE (ATA over Ethernet).
>>
>> --
>> Mikael Abrahamsson =A0 =A0email: swmike@swm.pp.se
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>

--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 12:47:34 von Roberto Spadim

Manging the combination of nbd and mdraid is complicated.

complicated =3D drbd work?

2011/1/29 Peter Chacko :
> AoE is not routable. And has no replication .Its not used =A0for DRBD=
or NBD.
> AoE is best if you want to implement cheapest SAN in the local networ=
k.
> For the original purpose, DRBD is the best. Manging the combination o=
f nbd
> and mdraid is complicated.
> thanks.
> Peter Chacko,
> Athinio data systems.
>
> On Sat, Jan 29, 2011 at 12:18 PM, Roberto Spadim br>
> wrote:
>>
>> better than drbd?
>>
>> 2011/1/29 Roberto Spadim :
>> > faster than nbd?
>> >
>> > 2011/1/29 Mikael Abrahamsson :
>> >> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>> >>
>> >>> hi guys, i was thinking about raid over ethernet... there's a so=
lution
>> >>> to make a syncronous replica of my filesystem? no problem if my
>> >>> primary server get down, i can mout my replica fsck it and conti=
nue
>> >>> with available data
>> >>> i was reading about nbd, anyone have more ideas?
>> >>
>> >> Look into AoE (ATA over Ethernet).
>> >>
>> >> --
>> >> Mikael Abrahamsson =A0 =A0email: swmike@swm.pp.se
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-r=
aid"
>> >> in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.h=
tml
>> >>
>> >
>> >
>> >
>> > --
>> > Roberto Spadim
>> > Spadim Technology / SPAEmpresarial
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
>

--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 14:29:21 von Alexander Schreiber

On Sat, Jan 29, 2011 at 04:42:16AM -0200, Roberto Spadim wrote:
> is it better than nbd+mdadm?

Definitely. We are using drbd replicated disks on a _lot_ of machines,
with all kinds of outside events: disk failures, network failures,
machine failures of various interesting variants. Despite this kind of
pounding, drbd turned out to be very robust, with data loss happening
very rarely (well, with some combined failures you are just plain
screwed - that's why one has backups).

Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 14:34:57 von Alexander Schreiber

On Sat, Jan 29, 2011 at 04:44:05AM -0200, Roberto Spadim wrote:
> faster than nbd?

I don't know how drbd compares in speed to ndb, but drbd is obviously
slower than plain disks, especially if you care about your data. In the
only sensible operating mode (synchronous writes to the underlying block
devices), the speed (both bandwidth and latency) depends on your disks
and your network connection (so you better get at least a Gigabit link).
Depending on your particular setup, you'll probably get 50-60% of the
plain disk performance for writes, while reads should be reasonably
close to the plain disk performance - drbd optimizes reads by just reading
from the local disk if it can.

Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

raid over ethernet

am 29.01.2011 15:25:14 von denis

ouch, html. - my bad.

---------- Forwarded message ----------
=46rom: Denis
Date: 2011/1/29
Subject: Re: raid over ethernet
To: Alexander Schreiber
Cc: Roberto Spadim , Mikael Abrahamsson
, Linux-RAID

2011/1/29 Roberto Spadim=A0
>
> Manging the combination of nbd and mdraid is complicated.
>
> complicated =3D drbd work?

I have been using drbd for a long time and it's quite easy to
implement, manage and use. The main purpose of all aplications I have
used it for, were high availibility and it works just fine. And it's
really cool to se it integrated with heartbeat, which will manage to
mount the partition on one or another node, according to your police
and nodes availibility.

2011/1/29 Alexander Schreiber
>
> plain disk performance for writes, while reads should be reasonably
> close to the plain disk performance - drbd optimizes reads by just re=
ading
> from the local disk if it can.
>

=A0However, I have not used it with active-active fashion. Have you? if
yes, what is your overall experience?

>
> Kind regards,
> =A0 =A0 =A0 =A0 =A0Alex.
> --
> "Opportunity is missed by most people because it is dressed in overal=
ls and
> =A0looks like work." =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0-- Thomas A. Edison
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html

Cheers,

--
Denis Anjos,
www.versatushpc.com.br

--
Denis Anjos,
www.versatushpc.com.br
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 16:30:44 von Spelic

On 01/29/2011 07:44 AM, Roberto Spadim wrote:
> faster than nbd?

NBD is fast but has one problem: if you lose network connectivity for a
while (tcp drops) there is no recovery I am aware of. I think it unmaps
the disk until user intervention. Or this was the situation a couple of
years ago.
Actually for RAID this might even be a good point, but keep it in mind.
iscsi seems an obvious alternative. And you can put anything under MD I
think, but DRBD (without MD) is probably better because it's made
exactly for that purpose.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 19:34:12 von David Brown

On 29/01/11 07:42, Mikael Abrahamsson wrote:
> On Fri, 28 Jan 2011, Roberto Spadim wrote:
>
>> hi guys, i was thinking about raid over ethernet... there's a solution
>> to make a syncronous replica of my filesystem? no problem if my
>> primary server get down, i can mout my replica fsck it and continue
>> with available data
>> i was reading about nbd, anyone have more ideas?
>
> Look into AoE (ATA over Ethernet).
>

I think AoE is limited to fairly direct connections - it doesn't use IP,
and can't be routed (at least not easily - I'm sure it is possible if
you try hard enough). The alternative is iSCSI, which does use IP and
can therefore be routed and passed around over networks. AoE is
therefore slightly more efficient, and iSCSI more flexible.

If you are looking at making a raid1 with an iSCSI or AoE target as one
of the disks, consider using a write-intent bitmap and the
--write-mostly and --write-behind flags.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 22:08:15 von Alexander Schreiber

On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> 2011/1/29 Alexander Schreiber
>
> >
> > plain disk performance for writes, while reads should be reasonably
> > close to the plain disk performance - drbd optimizes reads by just reading
> > from the local disk if it can.
> >
> >
> However, I have not used it with active-active fashion. Have you? if yes,
> what is your overall experience?

We are using drbd to provide mirrored disks for virtual machines running
under Xen. 99% of the time, the drbd devices run in primary/secondary
mode (aka active/passive), but they are switched to primary/primary
(aka active/active) for live migrations of domains, as that needs the
disks to be available on both nodes. From our experience, if the drbd
device is healthy, this is very reliable. No experience with running
drbd in primary/primary config for any extended period of time, though
(the live migrations are usually over after a few seconds to a minute at
most, then the drbd devices go back to primary/secondary).

Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 29.01.2011 22:54:55 von John Robinson

On 29/01/2011 21:08, Alexander Schreiber wrote:
> On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> 2011/1/29 Alexander Schreiber
>>
>>>
>>> plain disk performance for writes, while reads should be reasonably
>>> close to the plain disk performance - drbd optimizes reads by just reading
>>> from the local disk if it can.
>>>
>>>
>> However, I have not used it with active-active fashion. Have you? if yes,
>> what is your overall experience?
>
> We are using drbd to provide mirrored disks for virtual machines running
> under Xen. 99% of the time, the drbd devices run in primary/secondary
> mode (aka active/passive), but they are switched to primary/primary
> (aka active/active) for live migrations of domains, as that needs the
> disks to be available on both nodes. From our experience, if the drbd
> device is healthy, this is very reliable. No experience with running
> drbd in primary/primary config for any extended period of time, though
> (the live migrations are usually over after a few seconds to a minute at
> most, then the drbd devices go back to primary/secondary).

Now that is interesting, to me at least. More as a thought experiment
for now, I was wondering how one would go about setting up a small
cluster of commodity servers (maybe 8 machines) running Xen (or perhaps
now KVM) VMs, such that if one (or potentially two) of the machines
died, the VMs could be picked up by the other machines in the cluster,
and only using locally-attached SATA/SAS discs in each machine.

I guess I'm talking about RAIN or RAIS rather than RAID so maybe I'd
better start reading the Wikipedia pages on those and not talk about it
on this list...

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 30.01.2011 00:04:31 von Stan Hoeppner

John Robinson put forth on 1/29/2011 3:54 PM:

> Now that is interesting, to me at least. More as a thought experiment for now, I
> was wondering how one would go about setting up a small cluster of commodity
> servers (maybe 8 machines) running Xen (or perhaps now KVM) VMs, such that if
> one (or potentially two) of the machines died, the VMs could be picked up by the
> other machines in the cluster, and only using locally-attached SATA/SAS discs in
> each machine.

Doing N-way active replication with DRBD increases network utilization
substantially. With two DRBD active nodes you will have a maximum of _2_
simultaneous data streams, one in each direction. With 8 active nodes you will
have a maximum of _56_ simultaneous data streams. Your scenario requires all
nodes be active.

This may work for a hobby cluster or something with very low volume of data
being written to disk. This solution most likely won't scale for a cluster with
any amount of real traffic. GbE peaks at 100 MB/s. Therefore each node will
have only about 12 MB/s of bidirectional bandwidth for each other cluster member
if my math is correct. A single SATA disk run about 80-120 MB/s, so your
network DRBD disk bandwidth is about 1/7th to 1/10th that of a single local
disk. In a 2 node cluster it's closer to 1:1. For you scenario to actually be
feasible, you'd need at least bonded quad GbE interfaces if not single 10 GbE
interfaces to get all the bandwidth you'd need.

You'd be _MUCH_ better off using 2 active DRBD mirrored NFS servers with GFS2
filesystems and having the aforementioned 8 nodes do their data sharing via NFS.
In this setup each node only writes once (to NFS) dramatically reducing network
bandwidth required per node, with only 16 maximum data streams instead of 56.
If you need more bandwidth or IOPS than a single disk NFS server can produce,
simply RAID 4-10 disks on each NFS server via RAID 10, then mirror the two RAIDs
with DRBD.

You may need 2-4 GbE interfaces between the two NFS servers just for DRBD
traffic, but the cost of that is much less than having the same number of
interfaces in each of 8 cluster nodes. This will also give you much better
performance after a node or two fails and you have to boot their VM guests on
other hosts. Having fast central RAID storage will allow those guests to boot
much more quickly and without causing degraded performance on the other nodes
due to lack of disk bandwidth in your suggested model.

--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 30.01.2011 00:06:06 von Miles Fidelman

John Robinson wrote:
> Now that is interesting, to me at least. More as a thought experiment
> for now, I was wondering how one would go about setting up a small
> cluster of commodity servers (maybe 8 machines) running Xen (or
> perhaps now KVM) VMs, such that if one (or potentially two) of the
> machines died, the VMs could be picked up by the other machines in the
> cluster, and only using locally-attached SATA/SAS discs in each machine.
I do that now - albeit only on a 2-node cluster. DRBD works just fine
using locally attached drives.

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 30.01.2011 02:43:58 von Alexander Schreiber

On Sat, Jan 29, 2011 at 09:54:55PM +0000, John Robinson wrote:
> On 29/01/2011 21:08, Alexander Schreiber wrote:
> >On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> >>2011/1/29 Alexander Schreiber
> >>
> >>>
> >>>plain disk performance for writes, while reads should be reasonably
> >>>close to the plain disk performance - drbd optimizes reads by just reading
> >>>from the local disk if it can.
> >>>
> >>>
> >> However, I have not used it with active-active fashion. Have you? if yes,
> >>what is your overall experience?
> >
> >We are using drbd to provide mirrored disks for virtual machines running
> >under Xen. 99% of the time, the drbd devices run in primary/secondary
> >mode (aka active/passive), but they are switched to primary/primary
> >(aka active/active) for live migrations of domains, as that needs the
> >disks to be available on both nodes. From our experience, if the drbd
> >device is healthy, this is very reliable. No experience with running
> >drbd in primary/primary config for any extended period of time, though
> >(the live migrations are usually over after a few seconds to a minute at
> >most, then the drbd devices go back to primary/secondary).
>
> Now that is interesting, to me at least. More as a thought
> experiment for now, I was wondering how one would go about setting
> up a small cluster of commodity servers (maybe 8 machines) running
> Xen (or perhaps now KVM) VMs, such that if one (or potentially two)
> of the machines died, the VMs could be picked up by the other
> machines in the cluster, and only using locally-attached SATA/SAS
> discs in each machine.
>
> I guess I'm talking about RAIN or RAIS rather than RAID so maybe I'd
> better start reading the Wikipedia pages on those and not talk about
> it on this list...

For the "survive single node total machine failure" case your problem has
already been solved: http://code.google.com/p/ganeti/

We run a large number of clusters with that and the VMs routinely survive
disk failures and recover (come back from what looks like a power failure
to the VM) from node failure.

Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 31.01.2011 09:42:44 von denis

2011/1/29 Alexander Schreiber :
> On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> 2011/1/29 Alexander Schreiber
>>
>> >
>> > plain disk performance for writes, while reads should be reasonabl=
y
>> > close to the plain disk performance - drbd optimizes reads by just=
reading
>> > from the local disk if it can.
>> >
>> >
>> =A0However, I have not used it with active-active fashion. Have you?=
if yes,
>> what is your overall experience?
>
> We are using drbd to provide mirrored disks for virtual machines runn=
ing
> under Xen. 99% of the time, the drbd devices run in primary/secondary
> mode (aka active/passive), but they are switched to primary/primary
> (aka active/active) for live migrations of domains, as that needs the
> disks to be available on both nodes. From our experience, if the drbd
> device is healthy, this is very reliable. No experience with running
> drbd in primary/primary config for any extended period of time, thoug=
h
> (the live migrations are usually over after a few seconds to a minute=
at
> most, then the drbd devices go back to primary/secondary).

What filesystem are you using to enable the primary-primary mode? Have
you evaluated it against any other available option?
>
> Kind regards,
> =A0 =A0 =A0 =A0 =A0Alex.
> --
> "Opportunity is missed by most people because it is dressed in overal=
ls and
> =A0looks like work." =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0-- Thomas A. Edison
>

cheers!

--=20
Denis Anjos,
www.versatushpc.com.br
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 31.01.2011 14:03:50 von Alexander Schreiber

On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
> 2011/1/29 Alexander Schreiber :
> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> >> 2011/1/29 Alexander Schreiber
> >>
> >> >
> >> > plain disk performance for writes, while reads should be reasona=
bly
> >> > close to the plain disk performance - drbd optimizes reads by ju=
st reading
> >> > from the local disk if it can.
> >> >
> >> >
> >> =A0However, I have not used it with active-active fashion. Have yo=
u? if yes,
> >> what is your overall experience?
> >
> > We are using drbd to provide mirrored disks for virtual machines ru=
nning
> > under Xen. 99% of the time, the drbd devices run in primary/seconda=
ry
> > mode (aka active/passive), but they are switched to primary/primary
> > (aka active/active) for live migrations of domains, as that needs t=
he
> > disks to be available on both nodes. From our experience, if the dr=
bd
> > device is healthy, this is very reliable. No experience with runnin=
g
> > drbd in primary/primary config for any extended period of time, tho=
ugh
> > (the live migrations are usually over after a few seconds to a minu=
te at
> > most, then the drbd devices go back to primary/secondary).
>=20
> What filesystem are you using to enable the primary-primary mode? Hav=
e
> you evaluated it against any other available option?

The filesystem is whatever the VM is using, usually ext3. But the
filesystem doesn't matter in our use case at all, because:
- the backing store for drbd are logical volumes
- the drbd block devices are directly exported as block devices
to the VMs
The filesystem is only active inside the VM - and the VM is not aware o=
f
the drbd primary/secondary -> primary/primary -> primary/secondary danc=
e
that happens "outside" to enable live migration.

Kind regards,
Alex.
--=20
"Opportunity is missed by most people because it is dressed in overalls=
and
looks like work." -- Thomas A. Ed=
ison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 31.01.2011 15:45:31 von Roberto Spadim

i think filesystem is a problem...
you can't have two writers over a filesystem that allow only one, or
you will have filesystem crash (a lot of fsck repair... local cache
and other's features), maybe a gfs ocfs or another is a better
solution...

2011/1/31 Alexander Schreiber :
> On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
>> 2011/1/29 Alexander Schreiber :
>> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> >> 2011/1/29 Alexander Schreiber
>> >>
>> >> >
>> >> > plain disk performance for writes, while reads should be reason=
ably
>> >> > close to the plain disk performance - drbd optimizes reads by j=
ust reading
>> >> > from the local disk if it can.
>> >> >
>> >> >
>> >> =A0However, I have not used it with active-active fashion. Have y=
ou? if yes,
>> >> what is your overall experience?
>> >
>> > We are using drbd to provide mirrored disks for virtual machines r=
unning
>> > under Xen. 99% of the time, the drbd devices run in primary/second=
ary
>> > mode (aka active/passive), but they are switched to primary/primar=
y
>> > (aka active/active) for live migrations of domains, as that needs =
the
>> > disks to be available on both nodes. From our experience, if the d=
rbd
>> > device is healthy, this is very reliable. No experience with runni=
ng
>> > drbd in primary/primary config for any extended period of time, th=
ough
>> > (the live migrations are usually over after a few seconds to a min=
ute at
>> > most, then the drbd devices go back to primary/secondary).
>>
>> What filesystem are you using to enable the primary-primary mode? Ha=
ve
>> you evaluated it against any other available option?
>
> The filesystem is whatever the VM is using, usually ext3. But the
> filesystem doesn't matter in our use case at all, because:
> =A0- the backing store for drbd =A0are logical volumes
> =A0- the drbd block devices are directly exported as block devices
> =A0 to the VMs
> The filesystem is only active inside the VM - and the VM is not aware=
of
> the drbd primary/secondary -> primary/primary -> primary/secondary da=
nce
> that happens "outside" to enable live migration.
>
> Kind regards,
> =A0 =A0 =A0 =A0 =A0 Alex.
> --
> "Opportunity is missed by most people because it is dressed in overal=
ls and
> =A0looks like work." =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0-- Thomas A. Edison
>
>

--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 31.01.2011 17:15:26 von Alexander Schreiber

On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote:
> i think filesystem is a problem...
> you can't have two writers over a filesystem that allow only one, or
> you will have filesystem crash (a lot of fsck repair... local cache
> and other's features), maybe a gfs ocfs or another is a better
> solution...

No, for _our_ use case (replicated disks for VMs running under Xen
with live migration) the fileystem just _does_ _not_ _matter_ _at_
_all_. Due to the way Xen live migration works, there is only one
writer at any one time: the VM "owning" the virtual disk provided
by drbd.=20

To illustrate the point, a very short summary of what happens during
Xen live migration in our setup:
- VM is to be migrated from host A to host B, with the virtual block
device for the instance being provided by a drbd pair running on
those hosts
- host A/B are configured primary/secondary
- we reconfigure drbd to primary/primary
- start Xen live migration
- Xen creates a target VM on host B, this VM is not yet running
- Xen syncs live VM memory from host A to host B
- when most of the memory is synced over, Xen suspends execution of
the VM on host A
- Xen copies the remaining dirty VM memory from host A to host B
- Xen resumes VM execution on host B, destroys the source VM
on host A, Xen live migration is completed
- we reconfigure drbd on hosts A/B to secondary/primary

There is no concurrent access to the virtual block device here anywhere=

And the only reason we go primary/primary during live migration is that
for Xen to attach the disks to the target VM, they have to be available
and accessible on the target node - as well as on the source node where
they are currently attached to the source VM.

Now, if you were doing things like, say, use an primary/primary drbd
setup for NFS servers serving in parallel from two hosts, then yes,=20
you'd have to take special steps with a proper parallel filesystem
to avoid corruption. But this is a completely different problem.

Kidn regards,
Alex.
>=20
> 2011/1/31 Alexander Schreiber :
> > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
> >> 2011/1/29 Alexander Schreiber :
> >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
> >> >> 2011/1/29 Alexander Schreiber
> >> >>
> >> >> >
> >> >> > plain disk performance for writes, while reads should be reas=
onably
> >> >> > close to the plain disk performance - drbd optimizes reads by=
just reading
> >> >> > from the local disk if it can.
> >> >> >
> >> >> >
> >> >> =A0However, I have not used it with active-active fashion. Have=
you? if yes,
> >> >> what is your overall experience?
> >> >
> >> > We are using drbd to provide mirrored disks for virtual machines=
running
> >> > under Xen. 99% of the time, the drbd devices run in primary/seco=
ndary
> >> > mode (aka active/passive), but they are switched to primary/prim=
ary
> >> > (aka active/active) for live migrations of domains, as that need=
s the
> >> > disks to be available on both nodes. From our experience, if the=
drbd
> >> > device is healthy, this is very reliable. No experience with run=
ning
> >> > drbd in primary/primary config for any extended period of time, =
though
> >> > (the live migrations are usually over after a few seconds to a m=
inute at
> >> > most, then the drbd devices go back to primary/secondary).
> >>
> >> What filesystem are you using to enable the primary-primary mode? =
Have
> >> you evaluated it against any other available option?
> >
> > The filesystem is whatever the VM is using, usually ext3. But the
> > filesystem doesn't matter in our use case at all, because:
> > =A0- the backing store for drbd =A0are logical volumes
> > =A0- the drbd block devices are directly exported as block devices
> > =A0 to the VMs
> > The filesystem is only active inside the VM - and the VM is not awa=
re of
> > the drbd primary/secondary -> primary/primary -> primary/secondary =
dance
> > that happens "outside" to enable live migration.

--=20
"Opportunity is missed by most people because it is dressed in overalls=
and
looks like work." -- Thomas A. Ed=
ison
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid over ethernet

am 31.01.2011 18:37:32 von Roberto Spadim

nice, you don=B4t have two writers.

2011/1/31 Alexander Schreiber :
> On Mon, Jan 31, 2011 at 12:45:31PM -0200, Roberto Spadim wrote:
>> i think filesystem is a problem...
>> you can't have two writers over a filesystem that allow only one, or
>> you will have filesystem crash (a lot of fsck repair... local cache
>> and other's features), maybe a gfs ocfs or another is a better
>> solution...
>
> No, for _our_ use case (replicated disks for VMs running under Xen
> with live migration) the fileystem just _does_ _not_ _matter_ _at_
> _all_. Due to the way Xen live migration works, there is only one
> writer at any one time: the VM "owning" the virtual disk provided
> by drbd.
>
> To illustrate the point, a very short summary of what happens during
> Xen live migration in our setup:
> =A0- VM is to be migrated from host A to host B, with the virtual blo=
ck
> =A0 device for the instance being provided by a drbd pair running on
> =A0 those hosts
> =A0- host A/B are configured primary/secondary
> =A0- we reconfigure drbd to primary/primary
> =A0- start Xen live migration
> =A0- Xen creates a target VM on host B, this VM is not yet running
> =A0- Xen syncs live VM memory from host A to host B
> =A0- when most of the memory is synced over, Xen suspends execution o=
f
> =A0 the VM on host A
> =A0- Xen copies the remaining dirty VM memory from host A to host B
> =A0- Xen resumes VM execution on host B, destroys the source VM
> =A0 on host A, Xen live migration is completed
> =A0- we reconfigure drbd on hosts A/B to secondary/primary
>
> There is no concurrent access to the virtual block device here anywhe=
re.
> And the only reason we go primary/primary during live migration is th=
at
> for Xen to attach the disks to the target VM, they have to be availab=
le
> and accessible on the target node - as well as on the source node whe=
re
> they are currently attached to the source VM.
>
> Now, if you were doing things like, say, use an primary/primary drbd
> setup for NFS servers serving in parallel from two hosts, then yes,
> you'd have to take special steps with a proper parallel filesystem
> to avoid corruption. But this is a completely different problem.
>
> Kidn regards,
> =A0 =A0 =A0 =A0 =A0Alex.
>>
>> 2011/1/31 Alexander Schreiber :
>> > On Mon, Jan 31, 2011 at 06:42:44AM -0200, Denis wrote:
>> >> 2011/1/29 Alexander Schreiber :
>> >> > On Sat, Jan 29, 2011 at 12:23:14PM -0200, Denis wrote:
>> >> >> 2011/1/29 Alexander Schreiber
>> >> >>
>> >> >> >
>> >> >> > plain disk performance for writes, while reads should be rea=
sonably
>> >> >> > close to the plain disk performance - drbd optimizes reads b=
y just reading
>> >> >> > from the local disk if it can.
>> >> >> >
>> >> >> >
>> >> >> =A0However, I have not used it with active-active fashion. Hav=
e you? if yes,
>> >> >> what is your overall experience?
>> >> >
>> >> > We are using drbd to provide mirrored disks for virtual machine=
s running
>> >> > under Xen. 99% of the time, the drbd devices run in primary/sec=
ondary
>> >> > mode (aka active/passive), but they are switched to primary/pri=
mary
>> >> > (aka active/active) for live migrations of domains, as that nee=
ds the
>> >> > disks to be available on both nodes. From our experience, if th=
e drbd
>> >> > device is healthy, this is very reliable. No experience with ru=
nning
>> >> > drbd in primary/primary config for any extended period of time,=
though
>> >> > (the live migrations are usually over after a few seconds to a =
minute at
>> >> > most, then the drbd devices go back to primary/secondary).
>> >>
>> >> What filesystem are you using to enable the primary-primary mode?=
Have
>> >> you evaluated it against any other available option?
>> >
>> > The filesystem is whatever the VM is using, usually ext3. But the
>> > filesystem doesn't matter in our use case at all, because:
>> > =A0- the backing store for drbd =A0are logical volumes
>> > =A0- the drbd block devices are directly exported as block devices
>> > =A0 to the VMs
>> > The filesystem is only active inside the VM - and the VM is not aw=
are of
>> > the drbd primary/secondary -> primary/primary -> primary/secondary=
dance
>> > that happens "outside" to enable live migration.
>
> --
> "Opportunity is missed by most people because it is dressed in overal=
ls and
> =A0looks like work." =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0-- Thomas A. Edison
>
>

--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html