Thought about delayed sync

Thought about delayed sync

am 08.10.2011 20:03:10 von Wakko Warner

A few days ago, I thought about creating raid arrays w/o syncing. I
understand why sync is needed. Please correct me if I'm wrong in any of my
statements.

Currently, if someone uses large disks (1tb or larger), the initial sync can
take a long time and until it has completed, the array isn't fully
protected. I noted on a raid1 of a pair of 1tb disks took hours to complete
when there was no activity.

Here is my thought. There is already a bitmap to indicate which blocks are
dirty. Thus by using that, a drop of a disk (accidental or intentional), a
resync only syncs those blocks that the bitmap knows were dirtied.

What if another bitmap could be utilized. This would be an "in use" bitmap.
The purpose of this could be that there would never be an initial sync.
When data is written to an area that has not been synced, a sync will happen
of that region. Once the sync is complete, that region will be marked as
synced in the bitmap. Only the parts that have been written to will be
synced. The other data is of no consequence. As with the current bitmap,
this would have to be asked for.

Lets say someone has been using this array for some time and a disk dropped
out and had to be replaced. Lets also say that the actual usage was about
25-30% of the array (of course, that would be wasted space). With the "in
use" bitmap, they would replace the disk and only the areas that had been
written to would be resynced over to the new disk. The rest, since it had
not been used, would not need to be.

A side effect of this would be that a check or a resync could use this to
check the real data (IE on a weekly basis) and take less time.

Over all, depending on the usage, this can keep the wear and tear on a disk
down. I'm speaking of personal experience with my systems. I have arrays
that are not 100% or even 80% used. I have some production servers that
have extra space for expansion and not fully used.

I'm sure this would take some time to implement if someone does this. As I
mentioned at the beginning, this was just a thought, but I think it could
benefit people if it were implemented.

I am on the list, but feel free to keep me in the CC.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 09.10.2011 00:00:44 von Thomas Fjellstrom

On October 8, 2011, Wakko Warner wrote:
> A few days ago, I thought about creating raid arrays w/o syncing. I
> understand why sync is needed. Please correct me if I'm wrong in any of my
> statements.
>
> Currently, if someone uses large disks (1tb or larger), the initial sync
> can take a long time and until it has completed, the array isn't fully
> protected. I noted on a raid1 of a pair of 1tb disks took hours to
> complete when there was no activity.
>
> Here is my thought. There is already a bitmap to indicate which blocks are
> dirty. Thus by using that, a drop of a disk (accidental or intentional), a
> resync only syncs those blocks that the bitmap knows were dirtied.
>
> What if another bitmap could be utilized. This would be an "in use"
> bitmap. The purpose of this could be that there would never be an initial
> sync. When data is written to an area that has not been synced, a sync
> will happen of that region. Once the sync is complete, that region will
> be marked as synced in the bitmap. Only the parts that have been written
> to will be synced. The other data is of no consequence. As with the
> current bitmap, this would have to be asked for.
>
> Lets say someone has been using this array for some time and a disk dropped
> out and had to be replaced. Lets also say that the actual usage was about
> 25-30% of the array (of course, that would be wasted space). With the "in
> use" bitmap, they would replace the disk and only the areas that had been
> written to would be resynced over to the new disk. The rest, since it had
> not been used, would not need to be.
>
> A side effect of this would be that a check or a resync could use this to
> check the real data (IE on a weekly basis) and take less time.
>
> Over all, depending on the usage, this can keep the wear and tear on a disk
> down. I'm speaking of personal experience with my systems. I have arrays
> that are not 100% or even 80% used. I have some production servers that
> have extra space for expansion and not fully used.
>
> I'm sure this would take some time to implement if someone does this. As I
> mentioned at the beginning, this was just a thought, but I think it could
> benefit people if it were implemented.
>
> I am on the list, but feel free to keep me in the CC.

I think theres at least one, probably fatal problem with that idea. There is
currently no reliable way for md to tell which areas are actually in use. That
is, once a section is written to the first time, it will stay in use, even if
it isn't. "Now what about TRIM?" you ask? Not all file systems support it, and
I /think/ (based on a quick search of the list) mdraid doesn't fully support
TRIM either. LVM may not either. (a quick search also suggested lvm2 doesn't
pass on trim properly/at-all).

I've been using the current bitmap support on my raid5 array for some time,
and it has made the few resync's that were needed, very fast compared to a
full resync. Instead of 15+ hours, they finished in 20 minutes or less. I call
that a win.

--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 09.10.2011 00:36:41 von NeilBrown

--Sig_/lEe3Pzs8_AO9zYmwDDu5IgV
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Sat, 8 Oct 2011 14:03:10 -0400 Wakko Warner wrote:

> A few days ago, I thought about creating raid arrays w/o syncing. I
> understand why sync is needed. Please correct me if I'm wrong in any of =
my
> statements.
>=20
> Currently, if someone uses large disks (1tb or larger), the initial sync =
can
> take a long time and until it has completed, the array isn't fully
> protected. I noted on a raid1 of a pair of 1tb disks took hours to compl=
ete
> when there was no activity.
>=20
> Here is my thought. There is already a bitmap to indicate which blocks a=
re
> dirty. Thus by using that, a drop of a disk (accidental or intentional),=
a
> resync only syncs those blocks that the bitmap knows were dirtied.
>=20
> What if another bitmap could be utilized. This would be an "in use" bitm=
ap.=20
> The purpose of this could be that there would never be an initial sync.=20
> When data is written to an area that has not been synced, a sync will hap=
pen
> of that region. Once the sync is complete, that region will be marked as
> synced in the bitmap. Only the parts that have been written to will be
> synced. The other data is of no consequence. As with the current bitmap,
> this would have to be asked for.
>=20
> Lets say someone has been using this array for some time and a disk dropp=
ed
> out and had to be replaced. Lets also say that the actual usage was about
> 25-30% of the array (of course, that would be wasted space). With the "in
> use" bitmap, they would replace the disk and only the areas that had been
> written to would be resynced over to the new disk. The rest, since it had
> not been used, would not need to be.
>=20
> A side effect of this would be that a check or a resync could use this to
> check the real data (IE on a weekly basis) and take less time.
>=20
> Over all, depending on the usage, this can keep the wear and tear on a di=
sk
> down. I'm speaking of personal experience with my systems. I have arrays
> that are not 100% or even 80% used. I have some production servers that
> have extra space for expansion and not fully used.
>=20
> I'm sure this would take some time to implement if someone does this. As=
I
> mentioned at the beginning, this was just a thought, but I think it could
> benefit people if it were implemented.
>=20
> I am on the list, but feel free to keep me in the CC.
>=20

I think you are suggesting this:

http://neil.brown.name/blog/20110216044002#5

??
Patches welcome :-)

NeilBrown

--Sig_/lEe3Pzs8_AO9zYmwDDu5IgV
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIVAwUBTpDQgjnsnt1WYoG5AQLFGw/+JxdXtAGrP/N2gl68iH1aLjYCsGEX uFOD
+CdRkFHiVxCFwOlhrKF3vcIKjZ4kA30pl6YNkZzIk5vtord2/N2kWMKupfPQ bFZ9
9JQgllzHZow0u7IH3VKUe1jshbj+1IX5CX+Pn4DUUoNDCxlQyyxJX+Qc1I1z 9txa
FALLSE4D9MiPcsu8R300QOYqQ68HEKf6wLEbWD2a0M1lRFAFt+05tc8bFCs1 PhVd
8W859qFTnBAt/fTJabrza//+5xqZo/OtjWNce8pLfAfSqrfFPEJc6NVIzGS7 jgcr
ysy7ApoMFGjlDc1rlZEF8A9jPNb/AybJXwWYa0INM27wPH50tIQlus2q1ZzJ Sg1T
FoEHtMqL51xLb5SYIeeQl8PaLaaXrxd68PlqHXiyoLocioOp7cBk9ejVFPZd +Qdp
24y15acP8YFHdM0ouOcYUFOb7QJMZ2UPNipVQ4wh+mdYDNev3CKGiDtati+G ORww
2mT65mFnjsZsO01gzXy9v3ie2gfvoPnWy8D7PThovfNEj+0tgNyCfqEtjiP1 +83P
ATSgEdZ+SrSNfGdqbTLl83xGKh523T5xV3H11ZfkS27UUMWGuYaaZgrk0cgL ff0B
o+tM/tcJHAqLMn7TWflWEMJ8WlpkxME2MrO/COByZrarTxI/duotvjTDID7W S9fq
axcZSH5msR4=
=3phK
-----END PGP SIGNATURE-----

--Sig_/lEe3Pzs8_AO9zYmwDDu5IgV--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 09.10.2011 13:32:16 von alexander.kuehn

Zitat von NeilBrown :

> http://neil.brown.name/blog/20110216044002#5

Using one bit per stripe looks like the natural choice to me, no?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 09.10.2011 13:56:39 von Wakko Warner

NeilBrown wrote:
> I think you are suggesting this:
>
> http://neil.brown.name/blog/20110216044002#5

You hit the nail on the head. Yes.

> Patches welcome :-)

I'd love to, but unfortunately, this is way beyond my capabilities.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 09.10.2011 14:04:07 von Wakko Warner

Thomas Fjellstrom wrote:
> On October 8, 2011, Wakko Warner wrote:
> > A few days ago, I thought about creating raid arrays w/o syncing. I
> > understand why sync is needed. Please correct me if I'm wrong in any of my
> > statements.
> >
> > Currently, if someone uses large disks (1tb or larger), the initial sync
> > can take a long time and until it has completed, the array isn't fully
> > protected. I noted on a raid1 of a pair of 1tb disks took hours to
> > complete when there was no activity.
> >
> > Here is my thought. There is already a bitmap to indicate which blocks are
> > dirty. Thus by using that, a drop of a disk (accidental or intentional), a
> > resync only syncs those blocks that the bitmap knows were dirtied.
> >
> > What if another bitmap could be utilized. This would be an "in use"
> > bitmap. The purpose of this could be that there would never be an initial
> > sync. When data is written to an area that has not been synced, a sync
> > will happen of that region. Once the sync is complete, that region will
> > be marked as synced in the bitmap. Only the parts that have been written
> > to will be synced. The other data is of no consequence. As with the
> > current bitmap, this would have to be asked for.
> >
> > Lets say someone has been using this array for some time and a disk dropped
> > out and had to be replaced. Lets also say that the actual usage was about
> > 25-30% of the array (of course, that would be wasted space). With the "in
> > use" bitmap, they would replace the disk and only the areas that had been
> > written to would be resynced over to the new disk. The rest, since it had
> > not been used, would not need to be.
> >
> > A side effect of this would be that a check or a resync could use this to
> > check the real data (IE on a weekly basis) and take less time.
> >
> > Over all, depending on the usage, this can keep the wear and tear on a disk
> > down. I'm speaking of personal experience with my systems. I have arrays
> > that are not 100% or even 80% used. I have some production servers that
> > have extra space for expansion and not fully used.
> >
> > I'm sure this would take some time to implement if someone does this. As I
> > mentioned at the beginning, this was just a thought, but I think it could
> > benefit people if it were implemented.
> >
> > I am on the list, but feel free to keep me in the CC.
>
> I think theres at least one, probably fatal problem with that idea. There is
> currently no reliable way for md to tell which areas are actually in use. That
> is, once a section is written to the first time, it will stay in use, even if
> it isn't. "Now what about TRIM?" you ask? Not all file systems support it, and
> I /think/ (based on a quick search of the list) mdraid doesn't fully support
> TRIM either. LVM may not either. (a quick search also suggested lvm2 doesn't
> pass on trim properly/at-all).

Actually, I was completely aware of this before I wrote my thought to the
list. I don't know exactly how it could be told. I thought about a program
that could read lvm data and tell MD what blocks are not in use. It could
go further and attempt to read the filesystem. TRIM is a nice idea, but as
you alread mentioned, not all filesystems support it and not all layers
support passing it.

> I've been using the current bitmap support on my raid5 array for some time,
> and it has made the few resync's that were needed, very fast compared to a
> full resync. Instead of 15+ hours, they finished in 20 minutes or less. I call
> that a win.

Try this instead. Create a raid5 (or 6) on 4 2tb drives. Add about 100gb
of data to it and replace one of the disks with a fresh disk. You'll notice
you have to resync the entire array. The current bitmap only tells which
blocks have changed and a resync of an existing member is quick. But a new
member has no known in sync blocks and has to resync the whole thing. I
know, I already had this happen to me last month.

On another note, I used this feature to clean the dust out of my disk array
in another system. Fail a drive, read the array to verify which drive I
physically failed, remove it, clean the dust off, add it back, wait for
resync to complete and then do another disk. Resync on that was quick for
the 750gb member. Without a bitmap, resync time on that system is 3 hours.

Thanks for your input though.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 09.10.2011 14:34:57 von Thomas Fjellstrom

On October 9, 2011, Wakko Warner wrote:
> Thomas Fjellstrom wrote:
> > On October 8, 2011, Wakko Warner wrote:
> > > A few days ago, I thought about creating raid arrays w/o syncing. I
> > > understand why sync is needed. Please correct me if I'm wrong in any
> > > of my statements.
> > >
> > > Currently, if someone uses large disks (1tb or larger), the initial
> > > sync can take a long time and until it has completed, the array isn't
> > > fully protected. I noted on a raid1 of a pair of 1tb disks took hours
> > > to complete when there was no activity.
> > >
> > > Here is my thought. There is already a bitmap to indicate which blocks
> > > are dirty. Thus by using that, a drop of a disk (accidental or
> > > intentional), a resync only syncs those blocks that the bitmap knows
> > > were dirtied.
> > >
> > > What if another bitmap could be utilized. This would be an "in use"
> > > bitmap. The purpose of this could be that there would never be an
> > > initial sync. When data is written to an area that has not been
> > > synced, a sync will happen of that region. Once the sync is complete,
> > > that region will be marked as synced in the bitmap. Only the parts
> > > that have been written to will be synced. The other data is of no
> > > consequence. As with the current bitmap, this would have to be asked
> > > for.
> > >
> > > Lets say someone has been using this array for some time and a disk
> > > dropped out and had to be replaced. Lets also say that the actual
> > > usage was about 25-30% of the array (of course, that would be wasted
> > > space). With the "in use" bitmap, they would replace the disk and
> > > only the areas that had been written to would be resynced over to the
> > > new disk. The rest, since it had not been used, would not need to be.
> > >
> > > A side effect of this would be that a check or a resync could use this
> > > to check the real data (IE on a weekly basis) and take less time.
> > >
> > > Over all, depending on the usage, this can keep the wear and tear on a
> > > disk down. I'm speaking of personal experience with my systems. I
> > > have arrays that are not 100% or even 80% used. I have some
> > > production servers that have extra space for expansion and not fully
> > > used.
> > >
> > > I'm sure this would take some time to implement if someone does this.
> > > As I mentioned at the beginning, this was just a thought, but I think
> > > it could benefit people if it were implemented.
> > >
> > > I am on the list, but feel free to keep me in the CC.
> >
> > I think theres at least one, probably fatal problem with that idea. There
> > is currently no reliable way for md to tell which areas are actually in
> > use. That is, once a section is written to the first time, it will stay
> > in use, even if it isn't. "Now what about TRIM?" you ask? Not all file
> > systems support it, and I /think/ (based on a quick search of the list)
> > mdraid doesn't fully support TRIM either. LVM may not either. (a quick
> > search also suggested lvm2 doesn't pass on trim properly/at-all).
>
> Actually, I was completely aware of this before I wrote my thought to the
> list. I don't know exactly how it could be told. I thought about a
> program that could read lvm data and tell MD what blocks are not in use.
> It could go further and attempt to read the filesystem. TRIM is a nice
> idea, but as you alread mentioned, not all filesystems support it and not
> all layers support passing it.
>
> > I've been using the current bitmap support on my raid5 array for some
> > time, and it has made the few resync's that were needed, very fast
> > compared to a full resync. Instead of 15+ hours, they finished in 20
> > minutes or less. I call that a win.
>
> Try this instead. Create a raid5 (or 6) on 4 2tb drives. Add about 100gb
> of data to it and replace one of the disks with a fresh disk. You'll
> notice you have to resync the entire array. The current bitmap only tells
> which blocks have changed and a resync of an existing member is quick.
> But a new member has no known in sync blocks and has to resync the whole
> thing. I know, I already had this happen to me last month.

Yeah, after reading the link to Neil's blog, it hit me how useful it could be.

> On another note, I used this feature to clean the dust out of my disk array
> in another system. Fail a drive, read the array to verify which drive I
> physically failed, remove it, clean the dust off, add it back, wait for
> resync to complete and then do another disk. Resync on that was quick for
> the 750gb member. Without a bitmap, resync time on that system is 3 hours.

Try it on a 7 1TB drive raid5. fun times. I imagine its much worse with 2, 3
or 4 TB drives. (though not many people have a bunch of internal 4TB drives I
imagine).

> Thanks for your input though.


--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 09.10.2011 15:44:54 von Wakko Warner

Thomas Fjellstrom wrote:
> On October 9, 2011, Wakko Warner wrote:
> > On another note, I used this feature to clean the dust out of my disk array
> > in another system. Fail a drive, read the array to verify which drive I
> > physically failed, remove it, clean the dust off, add it back, wait for
> > resync to complete and then do another disk. Resync on that was quick for
> > the 750gb member. Without a bitmap, resync time on that system is 3 hours.
>
> Try it on a 7 1TB drive raid5. fun times. I imagine its much worse with 2, 3
> or 4 TB drives. (though not many people have a bunch of internal 4TB drives I
> imagine).

Actually, I have, well sort of. I have an 8 drive raid6 (2tb disks). It
takes about the same time as a 4 drive raid5 with the same disks.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Thought about delayed sync

am 10.10.2011 00:12:36 von NeilBrown

--Sig_/kwRQzKrcwcqeZwvOjUJy8xV
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Sun, 09 Oct 2011 13:32:16 +0200 Alexander Kühn
wrote:

>=20
> Zitat von NeilBrown :
>=20
> > http://neil.brown.name/blog/20110216044002#5
>=20
> Using one bit per stripe looks like the natural choice to me, no?

It might seem natural, but that doesn't mean it is good.

My rule-of-thumb is that each bit should correspond to about one second of
resync time. So that is probably 50-100MB these days.
So a 4TB drive would have 40,000-80,000 bits or be 5KB to 10KB in size.
I would probably rather a 4KB bitmap, so each bit could correspond to 2-3
seconds which is probably fine.
That might line up with the stripe size, or it might not. And whether is
does or not is largely irrelevant to the code.

NeilBrown

--Sig_/kwRQzKrcwcqeZwvOjUJy8xV
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIVAwUBTpIcXTnsnt1WYoG5AQK6EA//UmC2wKaqpl0wKbE0rvuW7AASMosB TD9P
6E1s9bVusvPKhPZekR6A0lKzBHVAYIIYmxrVqMpW2cwE7zg7NARkpkaKq8Iw iN3O
ZQxSiczvh8urPrY+wMBCFvSbsI02XpY+cC8XeE+ZrGGr6YA4USi54RxSzjk6 Yxl0
kKIFP8WDtzPDybYbxyxoe2RKg4RgAD4ks4bLCuYoElngFU17QFWJuWe0nw7p uBOt
JBT8v+ScsSzLGaQWnH2+rAzoMGvp909VQqVSo/HOa7VSPeQ0qLnCarxNwuyU H5tI
JaKUPEDxxG6s/1UjcDQzqAq9mHcy0ZTp+NFWoHa8ulZ5bwZQjhiSG8/q5kfZ OYik
E/KPw/2wkNkx7jILO4VVsj1/mNK7fqtRXueqFad+ovABGqqMose/n5oV9s4y U6w4
/BD5GvFehnl7lsTh9377bf5B9NLIKvvjm4MIcCJAXBS+EoSDp1AK8+U6oMpU cwRt
MNxyRCnTvcgtBOT4bFcTFzfLxI2FBKw+Q218sBJA/ZEDUxVkWREpM6Ah6nEc uSAF
Zsaz/6teFJu8x0DhMypQQwn7a/4LsMemj3NxZFobGvO/u+kwIv0RSCmHYcvQ Rsii
7s+U3Wl44COShuEkb9l0fe7Axiyfi7fG0xcSfqggDvJKA2pp+aBxmTwvifgC Ww6m
iU74nlW55tg=
=kXUO
-----END PGP SIGNATURE-----

--Sig_/kwRQzKrcwcqeZwvOjUJy8xV--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html