freshly grown array shrinks after first reboot

freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 17:28:53 von Pim Zandbergen

I replaced every 2TB drive of my 7-drive RAID-5 array with 3TB drives.
After the last replacement I could grow the array from 12 TB to 18 TB using

mdadm --grow /dev/md0 --size max

That worked:
md0: detected capacity change from 12002386771968 to 18003551059968

It worked for quite a while, until the machine had to be rebooted. It
shrunk:
md0: detected capacity change from 0 to 4809411526656

The LVM volume group on this array would not be activated until I repeated
the mdadm command. It grew back to the original size.
md0: detected capacity change from 4809411526656 to 18003551059968

However, this caused major data loss, as everything beyond the perceived
4.8 TB size was wiped by the sync process.

This happened on Fedora 15, using kernel-2.6.38.6-27.fc15.x86_64 and
mdadm-3.2.2-6.fc15.x86_64.

The drives are Hitachi Deskstar 7K3000 HDS723030ALA640. The adapter is an
LSI Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
(LSI SAS 9211-8i). I had to buy this adapter as my old SAS1068 based card
would not support 3TB drives.

I can probably fix this by creating a fresh new array and then start
restoring
my backups, but now is the time to seek for the cause of this.

I can reproduce this on demand. I can grow the array again, and it will
shrink
immediately after the next reboot.

What should I do to find the cause?

Thanks,
Pim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 18:12:35 von Pim Zandbergen

On 09/01/2011 05:28 PM, Pim Zandbergen wrote:
>
>
> What should I do to find the cause?

Additional information:

Both the original 2TB drives as well as the new 3TB drives were GPT
formatted with partition type FD00

This is information about the currently shrunk array:

# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Wed Feb 8 23:22:15 2006
Raid Level : raid5
Array Size : 4696690944 (4479.11 GiB 4809.41 GB)
Used Dev Size : 782781824 (746.52 GiB 801.57 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Aug 30 21:50:50 2011
State : clean
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1
Events : 0.3157574

Number Major Minor RaidDevice State
0 8 161 0 active sync /dev/sdk1
1 8 177 1 active sync /dev/sdl1
2 8 193 2 active sync /dev/sdm1
3 8 145 3 active sync /dev/sdj1
4 8 209 4 active sync /dev/sdn1
5 8 225 5 active sync /dev/sdo1
6 8 129 6 active sync /dev/sdi1

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 18:16:41 von Pim Zandbergen

More info:

# gdisk -l /dev/sdk
GPT fdisk (gdisk) version 0.7.2

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdk: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): BEEBC2FD-A959-4292-8115-AEFA06E0978E
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number Start (sector) End (sector) Size Code Name
1 2048 5860533134 2.7 TiB FD00 Linux RAID

# mdadm --examine /dev/sdk1
/dev/sdk1:
Magic : a92b4efc
Version : 0.90.03
UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1
Creation Time : Wed Feb 8 23:22:15 2006
Raid Level : raid5
Used Dev Size : 782781824 (746.52 GiB 801.57 GB)
Array Size : 4696690944 (4479.11 GiB 4809.41 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0

Update Time : Thu Sep 1 18:11:08 2011
State : clean
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0
Checksum : 7698c20e - correct
Events : 3157574

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 161 0 active sync /dev/sdk1

0 0 8 161 0 active sync /dev/sdk1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 193 2 active sync /dev/sdm1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 209 4 active sync /dev/sdn1
5 5 8 225 5 active sync /dev/sdo1
6 6 8 129 6 active sync /dev/sdi1

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 18:31:12 von Doug Ledford

On 09/01/2011 12:12 PM, Pim Zandbergen wrote:
> On 09/01/2011 05:28 PM, Pim Zandbergen wrote:
>>
>>
>> What should I do to find the cause?
>
> Additional information:
>
> Both the original 2TB drives as well as the new 3TB drives were GPT
> formatted with partition type FD00
>
> This is information about the currently shrunk array:
>
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 0.90

Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15
will not create this version of raid array by default. There is a
reason we have updated to a new superblock. Does this problem still
occur if you use a newer superblock format (one of the version 1.x
versions)?

> Creation Time : Wed Feb 8 23:22:15 2006
> Raid Level : raid5
> Array Size : 4696690944 (4479.11 GiB 4809.41 GB)
> Used Dev Size : 782781824 (746.52 GiB 801.57 GB)

This looks like some sort of sector count wrap, which might be related
to version 0.90 superblock usage. 3TB - 2.2TB (roughly the wrap point) =
800GB, which is precisely how much of each device you are using to
create a 4.8TB array.

> Raid Devices : 7
> Total Devices : 7
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Tue Aug 30 21:50:50 2011
> State : clean
> Active Devices : 7
> Working Devices : 7
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : 1bf1b0e2:82d487c5:f6f36a45:766001d1
> Events : 0.3157574
>
> Number Major Minor RaidDevice State
> 0 8 161 0 active sync /dev/sdk1
> 1 8 177 1 active sync /dev/sdl1
> 2 8 193 2 active sync /dev/sdm1
> 3 8 145 3 active sync /dev/sdj1
> 4 8 209 4 active sync /dev/sdn1
> 5 8 225 5 active sync /dev/sdo1
> 6 8 129 6 active sync /dev/sdi1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 18:48:34 von John Robinson

On 01/09/2011 17:16, Pim Zandbergen wrote:
> # gdisk -l /dev/sdk
[...]
> Number Start (sector) End (sector) Size Code Name
> 1 2048 5860533134 2.7 TiB FD00 Linux RAID

Partition type FD is only for metadata 0.90 arrays to be auto-assembled
by the kernel. This is now deprecated; you should be using partition
type DA (Non-FS data) and an initrd to assemble your arrays.

> # mdadm --examine /dev/sdk1
> /dev/sdk1:
> Magic : a92b4efc
> Version : 0.90.03

Metadata version 0.90 does not support devices over 2TiB. I think it's a
bug that you weren't warned at some point.

Cheers,

John.

--
John Robinson, yuiop IT services
0131 557 9577 / 07771 784 058
46/12 Broughton Road, Edinburgh EH7 4EE
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 19:03:36 von Robin Hill

--rJwd6BRFiFCcLxzm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu Sep 01, 2011 at 06:12:35PM +0200, Pim Zandbergen wrote:

> On 09/01/2011 05:28 PM, Pim Zandbergen wrote:
> >
> >
> > What should I do to find the cause?
>=20
> Additional information:
>=20
> Both the original 2TB drives as well as the new 3TB drives were GPT
> formatted with partition type FD00
>=20
> This is information about the currently shrunk array:
>=20
>=20
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 0.90
> Creation Time : Wed Feb 8 23:22:15 2006
> Raid Level : raid5
>=20
Looks like there's a bug somewhere. The documentation says that 0.90
metadata doesn't support >2TB components for RAID levels 1 and above.
If this is still correct, mdadm should have prevented you growing the
array in the first place.

I'd suggest recreating the array with 1.x metadata instead and checking
whether that runs into the same issue.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--rJwd6BRFiFCcLxzm
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk5fuucACgkQShxCyD40xBJ86gCeN4O1u+LxQgCJSso/sS9h 7Q+1
nfwAnRQm3M+FmbmvHetlNbwy82pWP0Bt
=Lg4n
-----END PGP SIGNATURE-----

--rJwd6BRFiFCcLxzm--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 19:21:12 von Pim Zandbergen

On 1-9-2011 6:48, John Robinson wrote:
> you should be using partition type DA (Non-FS data)
using gdisk (GPT) or fdisk (MBR) ?

> and an initrd to assemble your arrays.

Booting from the array is not required. I guess the Fedora init scripts
will assemble the array from /etc/mdadm.conf

Thanks,
Pim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 19:44:53 von Pim Zandbergen

On 09/01/2011 06:31 PM, Doug Ledford wrote:
> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15
> will not create this version of raid array by default. There is a
> reason we have updated to a new superblock.

As you may have seen, the array was created in 2006, and has gone through
several similar grow procedures.

> Does this problem still occur if you use a newer superblock format
> (one of the version 1.x versions)?

I suppose not. But that would destroy the "evidence" of a possible bug.
For me, it's too late, but finding it could help others to prevent this
situation.
If there's anything I could do to help find it, now is the time.

If the people on this list know enough, I will proceed.

Thanks,
Pim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 20:17:22 von Doug Ledford

On 09/01/2011 01:44 PM, Pim Zandbergen wrote:
> On 09/01/2011 06:31 PM, Doug Ledford wrote:
>> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15
>> will not create this version of raid array by default. There is a
>> reason we have updated to a new superblock.
>
> As you may have seen, the array was created in 2006, and has gone through
> several similar grow procedures.

Even so, one of the original limitations of the 0.90 superblock was
maximum usable device size. I'm not entirely sure that growing a 0.90
superblock past 2TB wasn't the source of your problem and that the bug
that needs fixed is that mdadm should have refused to grow a 0.90
superblock based array beyond the 2TB limit. Neil would have to speak
to that.

>> Does this problem still occur if you use a newer superblock format
>> (one of the version 1.x versions)?
>
> I suppose not. But that would destroy the "evidence" of a possible bug.
> For me, it's too late, but finding it could help others to prevent this
> situation.
> If there's anything I could do to help find it, now is the time.
>
> If the people on this list know enough, I will proceed.
>
> Thanks,
> Pim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 20:52:21 von Pim Zandbergen

On 09/01/2011 08:17 PM, Doug Ledford wrote:
> the bug that needs fixed is that mdadm should have refused to grow a
> 0.90 superblock based array beyond the 2TB limit
Yes, that's exactly what I am aiming for.

I could file a bug on bugzilla.redhat.com if that would help.

I'm not sure whether I need to keep my hosed array around
in order to be able to reproduce things.

Thanks,
Pim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 01.09.2011 21:41:42 von Doug Ledford

On 09/01/2011 02:52 PM, Pim Zandbergen wrote:
> On 09/01/2011 08:17 PM, Doug Ledford wrote:
>> the bug that needs fixed is that mdadm should have refused to grow a
>> 0.90 superblock based array beyond the 2TB limit
> Yes, that's exactly what I am aiming for.
>
> I could file a bug on bugzilla.redhat.com if that would help.

Feel free, it helps me track things.

> I'm not sure whether I need to keep my hosed array around
> in order to be able to reproduce things.

I don't think that's necessary at this point. It seems pretty obvious
what's going on and should be easy to reproduce.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 02.09.2011 07:32:30 von Simon Matthews

On Thu, Sep 1, 2011 at 10:44 AM, Pim Zandbergen
wrote:
> On 09/01/2011 06:31 PM, Doug Ledford wrote:
>>
>> Why is your raid metadata using this old version? =A0mdadm-3.2.2-6.f=
c15 will
>> not create this version of raid array by default. =A0There is a reas=
on we have
>> updated to a new superblock.
>
> As you may have seen, the array was created in 2006, and has gone thr=
ough
> several similar grow procedures.
>
>> Does this problem still occur if you use a newer superblock format (=
one of
>> the version 1.x versions)?
>
> I suppose not. But that would destroy the "evidence" of a possible bu=
g.
> For me, it's too late, but finding it could help others to prevent th=
is
> situation.
> If there's anything I could do to help find it, now is the time.
>
> If the people on this list know enough, I will proceed.
>
> Thanks,
> Pim

I ran into this exact problem some weeks ago. I don't recall any error
or warning messages about growing the array to use 3TB partitions and
Neil acknowledge that this was a bug. He also gave instructions on how
to recover from this situation and re-start the array using 1.0
metadata.

Here is Neil's comment from that thread:

------------------------------------------------------------ --------
Oopps. That array is using 0.90 metadata which can only handle up to 2=
TB
devices. The 'resize' code should catch that you are asking the imposs=
ible,
but it doesn't it seems.

You need to simply recreate the array as 1.0.
i.e.
mdadm -S /dev/md5
mdadm -C /dev/md5 --metadata 1.0 -l1 -n2 --assume-clean

Then all should be happiness.
>
------------------------------------------------------------ --------
Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 02.09.2011 10:53:59 von Pim Zandbergen

On 09/02/2011 07:32 AM, Simon Matthews wrote:
> He also gave instructions on how
> to recover from this situation and re-start the array using 1.0
> metadata.
If only I had been patient and had not not tried to grow the array back..
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 02.09.2011 11:02:02 von Pim Zandbergen

On 09/01/2011 07:21 PM, Pim Zandbergen wrote:
> On 1-9-2011 6:48, John Robinson wrote:
>> you should be using partition type DA (Non-FS data)
> using gdisk (GPT) or fdisk (MBR) ?

I tried gdisk but does not know about DA00.

I tried fdisk and created an array from the resulting partitions.
That would only use the first 2TB of the 3TB disks.

Then I tried fdisk, but used the whole disks for the array.
That seems to work, although mdadm gave a lot of warnings
about the fact that the drives were partitioned.
The partition table does not seem to be wiped, however.

Is the latter way the way it is supposed to be done now?

Thanks,
Pim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 02.09.2011 11:19:40 von Pim Zandbergen

On 09/01/2011 09:41 PM, Doug Ledford wrote:
>> I could file a bug on bugzilla.redhat.com if that would help.
>
> Feel free, it helps me track things.

https://bugzilla.redhat.com/show_bug.cgi?id=735306
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major dataloss

am 02.09.2011 12:33:35 von Mikael Abrahamsson

On Fri, 2 Sep 2011, Pim Zandbergen wrote:

> Is the latter way the way it is supposed to be done now?

I've used whole drives the past years, it's worked great. You avoid all
the hassle of handling partitions and alignment.

So yes, go for the whole device approach. I would make sure the partition
table is wiped and that I was using v1.2 superblocks (default by now).

--
Mikael Abrahamsson email: swmike@swm.pp.se
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 02.09.2011 13:06:06 von John Robinson

On 02/09/2011 10:19, Pim Zandbergen wrote:
> On 09/01/2011 09:41 PM, Doug Ledford wrote:
>>> I could file a bug on bugzilla.redhat.com if that would help.
>>
>> Feel free, it helps me track things.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=735306

I'm not sure whether it's just the --grow that should complain, or
perhaps the earlier step of
mdadm /dev/md/array-using-0.90-metadata --add /dev/3TB
should also complain (even if it'll work with less than 2TiB in use, it
ought to tell the user they won't be able to grow the array).

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 05.09.2011 12:47:16 von Pim Zandbergen

On 09/02/2011 12:33 PM, Mikael Abrahamsson wrote:
> So yes, go for the whole device approach. I would make sure the
> partition table is wiped and that I was using v1.2 superblocks
> (default by now).

Could I have both? That is, add the whole device to the array, yet have
a protective
partition table?

I like the idea of having a protective partition table, similar to the
EE type that protects GPT
partitions from non-GPT aware partitioning software or OS's.

It looks like the 1.2 superblock allows just that, as it starts 4k past
the start.

So, would it be wise to add the whole device to an array, using 1.2
metadata,
with a fake partition table (type DA) ?

Thanks,
Pim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major dataloss

am 08.09.2011 03:10:49 von NeilBrown

On Thu, 01 Sep 2011 14:17:22 -0400 Doug Ledford wrote:

> On 09/01/2011 01:44 PM, Pim Zandbergen wrote:
> > On 09/01/2011 06:31 PM, Doug Ledford wrote:
> >> Why is your raid metadata using this old version? mdadm-3.2.2-6.fc15
> >> will not create this version of raid array by default. There is a
> >> reason we have updated to a new superblock.
> >
> > As you may have seen, the array was created in 2006, and has gone through
> > several similar grow procedures.
>
> Even so, one of the original limitations of the 0.90 superblock was
> maximum usable device size. I'm not entirely sure that growing a 0.90
> superblock past 2TB wasn't the source of your problem and that the bug
> that needs fixed is that mdadm should have refused to grow a 0.90
> superblock based array beyond the 2TB limit. Neil would have to speak
> to that.

I finally had time to look into this problem.
I'm ashamed to say there is a serious bug here that I should have found and
fixed some time ago, but didn't look problem. However I don't understand why
you lost any data.

The 0.90 metadata uses an unsigned 32bit number to record the number of
kilobytes used per device. This should allow devices up to 4TB. I don't
know where the "2TB" came from. Maybe I thought something was signed? or
maybe I just didn't think.

However in 2.6.29 a bug was introduced in the handling of the count.
It is best to keep everything in the same units and the preferred units for
devices seems to be 512byte sectors so we changed md to record the available
size on a device in sectors. So for 0.90 metadata this is:

rdev->sectors = sb->size * 2;

Do you see the bug? It will multiple size (a u32) by 2 before casting it to
a sector_t, so we lose the high bit. This should have been
rdev->sectors = ((sector_t)sb->size)*2;
and will be after I submit a patch.

However this should not lead to any data corruption. When you reassemble the
array after reboot it will be 2TB per device smaller than it should be
(which is exactly what we see: 18003551059968-4809411526656 == 2*10^40*(7-1))
so some data will be missing. But when you increase the size again it will
check the parity of the "new" space but as that is all correct it will not
change anything.
So your data *should* have been back exactly as it was. I am at a loss to
explain why it is not.

I will add a test to mdadm to discourage you from adding 4+TB devices to 0.90
metadata, or 2+TB devices for 3.0 and earlier kernels.

I might also add a test to discourage growing an array beyond 2TB on kernels
before 3.1. That is more awkward as mdadm doesn't really know how big you
are growing it to. You ask for 'max' and it just says 'max' to the kernel.
The kernel needs to do the testing - and currently it doesn't.

Anyway the following patch will be on its way to Linus in a day or two.
Thanks for your report, and my apologies for your loss.

NeilBrown

From 24e9c8d1a620159df73f9b4a545cae668b6285ef Mon Sep 17 00:00:00 2001
From: NeilBrown
Date: Thu, 8 Sep 2011 10:54:34 +1000
Subject: [PATCH] md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.

0.90 metadata uses an unsigned 32bit number to count the number of
kilobytes used from each device.
This should allow up to 4TB per device.
However we multiply this by 2 (to get sectors) before casting to a
larger type, so sizes above 2TB get truncated.

Also we allow rdev->sectors to be larger than 4TB, so it is possible
for the array to be resized larger than the metadata can handle.
So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
used.

Reported-by: Pim Zandbergen
Signed-off-by: NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3742ce8..63f71cc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1138,8 +1138,11 @@ static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version
ret = 0;
}
rdev->sectors = rdev->sb_start;
+ /* Limit to 4TB as metadata cannot record more than that */
+ if (rdev->sectors >= (2ULL << 32))
+ rdev->sectors = (2ULL << 32) - 2;

- if (rdev->sectors < sb->size * 2 && sb->level > 1)
+ if (rdev->sectors < ((sector_t)sb->size) * 2 && sb->level > 1)
/* "this cannot possibly happen" ... */
ret = -EINVAL;

@@ -1173,7 +1176,7 @@ static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev)
mddev->clevel[0] = 0;
mddev->layout = sb->layout;
mddev->raid_disks = sb->raid_disks;
- mddev->dev_sectors = sb->size * 2;
+ mddev->dev_sectors = ((sector_t)sb->size) * 2;
mddev->events = ev1;
mddev->bitmap_info.offset = 0;
mddev->bitmap_info.default_offset = MD_SB_BYTES >> 9;
@@ -1415,6 +1418,11 @@ super_90_rdev_size_change(mdk_rdev_t *rdev, sector_t num_sectors)
rdev->sb_start = calc_dev_sboffset(rdev);
if (!num_sectors || num_sectors > rdev->sb_start)
num_sectors = rdev->sb_start;
+ /* Limit to 4TB as metadata cannot record more than that.
+ * 4TB == 2^32 KB, or 2*2^32 sectors.
+ */
+ if (num_sectors >= (2ULL << 32))
+ num_sectors = (2ULL << 32) - 2;
md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size,
rdev->sb_page);
md_super_wait(rdev->mddev);
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 08.09.2011 15:44:26 von Pim Zandbergen

On 8-9-2011 3:10, NeilBrown wrote:
> So your data*should* have been back exactly as it was. I am at a loss to
> explain why it is not.
The array contained an LVM VG that would not activate until grown back.
After growing back,
- one ext4 LV was perfectly intact
- one other could be fsck'd back to life without any damage
- a third one could be fsck'd back with leaving some stuff in lost+found
- three others were beyond repair.

The VG was as old as the array itself; the LV's were pretty fragmented.

It looked like the ext4 superblocks were shifted. I could see the superblock
with hexdump, but mount would not. fsck first had to repair the superblock
before anything else.

So my report that data was "wiped" by the sync process was incorrect.

> You ask for 'max' and it just says 'max' to the kernel.
> The kernel needs to do the testing - and currently it doesn't.
I hope/assume this is no problem for my newly created array.
>
> Thanks for your report, and my apologies for your loss.
No need to apologize ; the limitation was documented, and I could have
upgraded the metadata without data loss, had I waited longer for advice.
And I did have off-site backups for the important stuff.

I'm just reporting this so others may be spared this experience.

Thanks,
Pim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: freshly grown array shrinks after first reboot - major data loss

am 09.09.2011 21:30:52 von Bill Davidsen

John Robinson wrote:
> On 02/09/2011 10:19, Pim Zandbergen wrote:
>> On 09/01/2011 09:41 PM, Doug Ledford wrote:
>>>> I could file a bug on bugzilla.redhat.com if that would help.
>>>
>>> Feel free, it helps me track things.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=735306
>
> I'm not sure whether it's just the --grow that should complain, or
> perhaps the earlier step of
> mdadm /dev/md/array-using-0.90-metadata --add /dev/3TB
> should also complain (even if it'll work with less than 2TiB in use,
> it ought to tell the user they won't be able to grow the array).
>
Perhaps Neil can confirm, but the limitation seems to be using 2TB as an
array member size, I am reasonably sure that if you had partitioned the
drives into two 1.5TB partitions you could have created the array just fine.

Note that this is just a speculation, not a suggestion to allow using
0.90 metadata, and I do realize that this array was created in the dark
ages, not being created new.

--
Bill Davidsen
We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination. -me, 2010

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html