physical size of the device inconsistent with superblock, after RAID problems

am 15.02.2011 04:14:34 von Gavin Flower

Hi,

I would appreciate advice recovering from the following situation, after an aborted mdadm resizing operation and subsequent recovery actions:

/dev/md1: The filing system size (according to the superblock) is 76799952 blocks
The physical size of the device is 76799616
Either the superblock or the partition table is likely to be corrupt!

/dev/md1: UNEXPECTED INCONSISTENCY: RUN fsck manually
(i.e. without -a or -p options)

fsck.ext4 -f -n /dev/md1 output:

e2fsck 1.41.12 (17-May-2010)
The filesystem size (according to the superblock) is 76799952 blocks
The physical size of the device is 76799616 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? no

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -9626 -(9728--9752) +(405344--405369)
Fix? no

/dev/md1: ********** WARNING: Filesystem still has errors **********

/dev/md1: 1693644/19202048 files (0.3% non-contiguous), 54273929/76799952 blocks

Note that original size, according mdadm, was not a multiple of 512KB, so I reshaped it to be the largest multiple or 512KB less than the original size using the -size option of mdadm. So my second attempt to reshape, using the 512 chunk size, started okay. The previous chunk size was 64KB.

Note I am using Fedora 14, up-to-date as of Friday February 11th, and that there are 5X500KB drives, with 3 RAID-6 arrays:
/dev/md0 swap
/dev/md1 mostly user data (the problematic one)
/dev/md2 distribution & O/S files
plus /boot on a non-RAID ext4 partition

Sequence of events:
Reshaped /dev/md1 using mdadm, without first reducing size of the ext4 filesystem.

The process of reshaping /dev/md1 was about 20% through when I killed it.

System appeared okay.

I rebooted a few minute later, but shortly after I selected the kernel, it stopped, and I dropped into a shell.

With the help of Neil Brown, I made some progress and /dev/md1 reshaping appeared to have completed without error.

However, on the next reboot I got the INCONSISTENCY message.

Will it be safe to simply accept fsck's offer to fix, or are there other things I should do?

Thanks,
Gavin

--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: physical size of the device inconsistent with superblock, after RAID problems

am 18.02.2011 00:53:11 von Gavin Flower

Hi Neil,

My attempted post to ext3-users@redhat.com, had not been published ther=
e (even though I had emailed it 4 days ago!), as at a minute ago.

I finally bit the bullet and went ahead.

I accepted the fixes put forward by fsck associated with bitmap differe=
nces, and rebooted.

Still problems.

Still had the discrepancy in the file size.=A0 So I ran the command:

resize2fs -p /dev/md1 76799616

I used the smaller of the 2 block counts, as:
(a) I needed to reduce the file system size, because I had already redu=
ced the RAID size (I _SHOULD_ have done this first, before resizing the=
RAID), and
(b) it is reported as the 'physical' size of the device, so it is likel=
y to be the correct value IMHO

The system the came up successfully after a reboot, and I was able to l=
og in as normal.

There appeared to be no apparent loss of data, not that I did an exhaus=
tive systematic check. However, several users have logged on successful=
ly, and it is playing its part as gateway to the Internet, and squid ap=
pears to be providing its normal functionality.

Neil, your help and encouragement was/is greatly appreciated!

Thanks,
Gavin
--

All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!

--- On Tue, 15/2/11, Gavin Flower wrote:

=46rom: Gavin Flower
Subject: physical size of the device inconsistent with superblock, afte=
r RAID problems
To: ext3-users@redhat.com
Cc: neilb@suse.de, linux-raid@vger.kernel.org
Date: Tuesday, 15 February, 2011, 16:14

Hi,

I would appreciate advice recovering from the following situation, afte=
r an aborted mdadm resizing operation and subsequent recovery actions:

/dev/md1: The filing system size (according to the superblock) is 76799=
952 blocks
The physical size of the device is 76799616
Either the superblock or the partition table is likely to be corrupt!

/dev/md1: UNEXPECTED INCONSISTENCY: RUN fsck manually
(i.e. without -a or -p options)

fsck.ext4 -f -n /dev/md1 output:

e2fsck 1.41.12 (17-May-2010)
The filesystem size (according to the superblock) is 76799952 blocks
The physical size of the device is 76799616 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? no

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:=A0 -9626 -(9728--9752) +(405344--405369)
=46ix? no

/dev/md1: ********** WARNING: Filesystem still has errors **********

/dev/md1: 1693644/19202048 files (0.3% non-contiguous), 54273929/767999=
52 blocks

Note that original size, according mdadm, was not a multiple of 512KB, =
so I reshaped it to be the largest multiple or 512KB less than the orig=
inal size using the -size option of mdadm.=A0 So my second attempt to r=
eshape, using the 512 chunk size, started okay.=A0 The previous chunk s=
ize was 64KB.

Note I am using Fedora 14, up-to-date as of Friday February 11th, and t=
hat there are 5X500KB drives, with 3 RAID-6 arrays:
/dev/md0 swap
/dev/md1 mostly user data (the problematic one)
/dev/md2 distribution & O/S files
plus /boot on a non-RAID ext4 partition

Sequence of events:
Reshaped /dev/md1 using mdadm, without first reducing size of the ext4 =
filesystem.

The process of reshaping /dev/md1 was about 20% through when I killed i=
t.

System appeared okay.

I rebooted a few minute later, but shortly after I selected the kernel,=
it stopped, and I dropped into a shell.

With the help of Neil Brown, I made some progress and /dev/md1 reshapin=
g appeared to have completed without error.

However, on the next reboot I got the INCONSISTENCY message.

Will it be safe to simply accept fsck's offer to fix, or are there othe=
r things I should do?

Thanks,
Gavin=20

--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!

=20
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: physical size of the device inconsistent with superblock, afterRAID problems

am 18.02.2011 02:51:25 von NeilBrown

On Thu, 17 Feb 2011 15:53:11 -0800 (PST) Gavin Flower o.com>
wrote:

> Hi Neil,
>=20
> My attempted post to ext3-users@redhat.com, had not been published th=
ere (even though I had emailed it 4 days ago!), as at a minute ago.
>=20
> I finally bit the bullet and went ahead.
>=20
> I accepted the fixes put forward by fsck associated with bitmap diffe=
rences, and rebooted.
>=20
> Still problems.
>=20
> Still had the discrepancy in the file size.=A0 So I ran the command:
>=20
> resize2fs -p /dev/md1 76799616
>=20
> I used the smaller of the 2 block counts, as:
> (a) I needed to reduce the file system size, because I had already re=
duced the RAID size (I _SHOULD_ have done this first, before resizing t=
he RAID), and
> (b) it is reported as the 'physical' size of the device, so it is lik=
ely to be the correct value IMHO
>=20
> The system the came up successfully after a reboot, and I was able to=
log in as normal.
>=20
> There appeared to be no apparent loss of data, not that I did an exha=
ustive systematic check. However, several users have logged on successf=
ully, and it is playing its part as gateway to the Internet, and squid =
appears to be providing its normal functionality.
>=20
> Neil, your help and encouragement was/is greatly appreciated!
>=20

Excellent! I'm glad you found a way through.
As you didn't really trim very much from your device it is certainly po=
ssible
that no critical data was there. Quite possibly resize2fs would have t=
old
you if there was (I certainly hope it would have done).

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: physical size of the device inconsistent with superblock, after RAID problems

am 18.02.2011 04:50:48 von Gavin Flower

--- On Fri, 18/2/11, NeilBrown wrote:

> From: NeilBrown
> Subject: Re: physical size of the device inconsistent with superblock, after RAID problems
> To: "Gavin Flower"
> Cc: ext3-users@redhat.com, linux-raid@vger.kernel.org
> Date: Friday, 18 February, 2011, 14:51
> On Thu, 17 Feb 2011 15:53:11 -0800
> (PST) Gavin Flower
> wrote:
>
> > Hi Neil,
> >
> > My attempted post to ext3-users@redhat.com,
> had not been published there (even though I had emailed it 4
> days ago!), as at a minute ago.
> >
> > I finally bit the bullet and went ahead.
> >
> > I accepted the fixes put forward by fsck associated
> with bitmap differences, and rebooted.
> >
> > Still problems.
> >
> > Still had the discrepancy in the file size. So I ran
> the command:
> >
> > resize2fs -p /dev/md1 76799616
> >
> > I used the smaller of the 2 block counts, as:
> > (a) I needed to reduce the file system size, because I
> had already reduced the RAID size (I _SHOULD_ have done this
> first, before resizing the RAID), and
> > (b) it is reported as the 'physical' size of the
> device, so it is likely to be the correct value IMHO
> >
> > The system the came up successfully after a reboot,
> and I was able to log in as normal.
> >
> > There appeared to be no apparent loss of data, not
> that I did an exhaustive systematic check. However, several
> users have logged on successfully, and it is playing its
> part as gateway to the Internet, and squid appears to be
> providing its normal functionality.
> >
> > Neil, your help and encouragement was/is greatly
> appreciated!
> >
>
> Excellent! I'm glad you found a way through.
> As you didn't really trim very much from your device it is
> certainly possible
> that no critical data was there. Quite possibly
> resize2fs would have told
> you if there was (I certainly hope it would have done).
>
> NeilBrown
>
Hi Neil,

Having about 26% spare capacity (see output of the df) md1 (the problematic RAID 6), probably (?) meant that nothing was likely to be lost by trimming a tiny fraction of a percent from the end.

However, since the md1 device actually resides on 5 real physical drives, reality is almost certainly more complicated! - possibly, hence the bit map discrepancies (now I'm firmly outside my area of expertise!).

# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md2 1097254408 27547660 1013969456 3% /
tmpfs 4097108 772 4096336 1% /dev/shm
/dev/sda1 1032088 129800 849860 14% /boot
/dev/md1 302377920 212244524 74773476 74% /data
# mdadm --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Thu Dec 3 13:05:02 2009
Raid Level : raid6
Array Size : 307198464 (292.97 GiB 314.57 GB)
Used Dev Size : 102399488 (97.66 GiB 104.86 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri Feb 18 15:09:50 2011
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

UUID : 6f1176ae:a0ad6cac:bfe78010:bc810f04
Events : 0.3389728

Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 66 2 active sync /dev/sde2
3 8 50 3 active sync /dev/sdd2
4 8 34 4 active sync /dev/sdc2
#

Cheers,
Gavin

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html