mdadm forces resync every boot

am 05.08.2011 17:05:53 von Daniel Frey

Hi all,

I've been fighting with my raid array (imsm - raid10) for several weeks
now. I've now replaced all four drives in my array as the constant
rebuilding caused a smart error to trip on the old drives; unfortunately
mdadm is still resyncing the array at every boot.

One thing I would like to clarify is does mdadm need to disassemble the
array before reboot. At this point, I can't tell if my system is
currently doing this. Googling around it seems some say that this step
is unnecessary.

I've managed to update initramfs to 3.2.1 and the local system to 3.2.1
but the problem still persists.

The last thing my system does is remount root ro, which it does
successfully. However, at the next start:

[ 12.657829] md: md127 stopped.
[ 12.660652] md: bind
[ 12.660939] md: bind
[ 12.661212] md: bind
[ 12.661282] md: bind
[ 12.664972] md: md126 stopped.
[ 12.665284] md: bind
[ 12.665383] md: bind
[ 12.665476] md: bind
[ 12.665568] md: bind
[ 12.669218] md/raid10:md126: not clean -- starting background
reconstruction
[ 12.669221] md/raid10:md126: active with 4 out of 4 devices
[ 12.669241] md126: detected capacity change from 0 to 1000210432000
[ 12.678356] md: md126 switched to read-write mode.
[ 12.678390] md: resync of RAID array md126
[ 12.678393] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 12.678395] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for resync.
[ 12.678399] md: using 128k window, over a total of 976768256 blocks.

and cat /proc/mdstat shows it resyncing:

# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10]
md126 : active raid10 sda[3] sdb[2] sdc[1] sdd[0]
976768000 blocks super external:/md127/0 64K chunks 2 near-copies
[4/4] [UUUU]
[==>..................] resync = 12.2% (119256896/976768256)
finish=100.7min speed=141865K/sec

md127 : inactive sdd[3](S) sda[2](S) sdb[1](S) sdc[0](S)
9028 blocks super external:imsm

unused devices:

When it resyncs it is fine until the next power down.

Some other details:

# mdadm --detail-platform
Platform : Intel(R) Matrix Storage Manager
Version : 9.6.0.1014
RAID Levels : raid0 raid1 raid10 raid5
Chunk Sizes : 4k 8k 16k 32k 64k 128k
Max Disks : 7
Max Volumes : 2
I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
Port0 : /dev/sda (WD-WCAYUJ525606)
Port1 : /dev/sdb (WD-WCAYUJ525636)
Port2 : /dev/sdc (WD-WCAYUX093587)
Port3 : /dev/sdd (WD-WCAYUX092774)
Port4 : - non-disk device (TSSTcorp CDDVDW SH-S203B) -
Port5 : - no device attached -

# mdadm --detail --scan
ARRAY /dev/md/imsm0 metadata=imsm UUID=ec239ccc:22b7330b:0c4808ff:82dd176b
ARRAY /dev/md/HDD_0 container=/dev/md/imsm0 member=0
UUID=f61f87fc:1e85f04b:59e873c5:0afdb987

# ls /dev/md
HDD_0 HDD_0p1 HDD_0p2 HDD_0p3 HDD_0p4 imsm0

Everything seems to be working. Also, I can't reproduce the results in
Windows Vista x64 (dual-boot.) When I go from linux -> Windows, Windows
detects the array as bad and reinitializes it as well, but if I reboot
into Windows the array survives without being marked bad.

Can anyone shed some light on this? I've been bashing my head on my desk
for too long and have run out of ideas.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm forces resync every boot

am 06.08.2011 12:18:18 von Erwan Leroux

Can you post the content of /etc/mdadm/mdadm.conf ?

I noticed that if the ARRAY line contains the name parameter, the raid
is not started properly
ARRAY /dev/md/raid metadata=3D1.2
UUID=3D906ce226:19afa04f:12aab3c1:f91daa96 name=3DServeur:raid

you just have to remove the parameter and everything go back to normal.

Mayre be you're in this case, but you mentionned dual-boot, so perhaps
your problem is related to

Cordialy,

Erwan Leroux

2011/8/5 Daniel Frey :
> Hi all,
>
> I've been fighting with my raid array (imsm - raid10) for several wee=
ks
> now. I've now replaced all four drives in my array as the constant
> rebuilding caused a smart error to trip on the old drives; unfortunat=
ely
> mdadm is still resyncing the array at every boot.
>
> One thing I would like to clarify is does mdadm need to disassemble t=
he
> array before reboot. At this point, I can't tell if my system is
> currently doing this. Googling around it seems some say that this ste=
p
> is unnecessary.
>
> I've managed to update initramfs to 3.2.1 and the local system to 3.2=
1
> but the problem still persists.
>
> The last thing my system does is remount root ro, which it does
> successfully. However, at the next start:
>
> [ =A0 12.657829] md: md127 stopped.
> [ =A0 12.660652] md: bind
> [ =A0 12.660939] md: bind
> [ =A0 12.661212] md: bind
> [ =A0 12.661282] md: bind
> [ =A0 12.664972] md: md126 stopped.
> [ =A0 12.665284] md: bind
> [ =A0 12.665383] md: bind
> [ =A0 12.665476] md: bind
> [ =A0 12.665568] md: bind
> [ =A0 12.669218] md/raid10:md126: not clean -- starting background
> reconstruction
> [ =A0 12.669221] md/raid10:md126: active with 4 out of 4 devices
> [ =A0 12.669241] md126: detected capacity change from 0 to 1000210432=
000
> [ =A0 12.678356] md: md126 switched to read-write mode.
> [ =A0 12.678390] md: resync of RAID array md126
> [ =A0 12.678393] md: minimum _guaranteed_ =A0speed: 1000 KB/sec/disk.
> [ =A0 12.678395] md: using maximum available idle IO bandwidth (but n=
ot
> more than 200000 KB/sec) for resync.
> [ =A0 12.678399] md: using 128k window, over a total of 976768256 blo=
cks.
>
> and cat /proc/mdstat shows it resyncing:
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10]
> md126 : active raid10 sda[3] sdb[2] sdc[1] sdd[0]
> =A0 =A0 =A0976768000 blocks super external:/md127/0 64K chunks 2 near=
-copies
> [4/4] [UUUU]
> =A0 =A0 =A0[==>..................] =A0resync =3D 12.2% (119256896=
/976768256)
> finish=3D100.7min speed=3D141865K/sec
>
> md127 : inactive sdd[3](S) sda[2](S) sdb[1](S) sdc[0](S)
> =A0 =A0 =A09028 blocks super external:imsm
>
> unused devices:
>
> When it resyncs it is fine until the next power down.
>
> Some other details:
>
> # mdadm --detail-platform
> =A0 =A0 =A0 Platform : Intel(R) Matrix Storage Manager
> =A0 =A0 =A0 =A0Version : 9.6.0.1014
> =A0 =A0RAID Levels : raid0 raid1 raid10 raid5
> =A0 =A0Chunk Sizes : 4k 8k 16k 32k 64k 128k
> =A0 =A0 =A0Max Disks : 7
> =A0 =A0Max Volumes : 2
> =A0I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
> =A0 =A0 =A0 =A0 =A0Port0 : /dev/sda (WD-WCAYUJ525606)
> =A0 =A0 =A0 =A0 =A0Port1 : /dev/sdb (WD-WCAYUJ525636)
> =A0 =A0 =A0 =A0 =A0Port2 : /dev/sdc (WD-WCAYUX093587)
> =A0 =A0 =A0 =A0 =A0Port3 : /dev/sdd (WD-WCAYUX092774)
> =A0 =A0 =A0 =A0 =A0Port4 : - non-disk device (TSSTcorp CDDVDW SH-S203=
B) -
> =A0 =A0 =A0 =A0 =A0Port5 : - no device attached -
>
> # mdadm --detail --scan
> ARRAY /dev/md/imsm0 metadata=3Dimsm UUID=3Dec239ccc:22b7330b:0c4808ff=
:82dd176b
> ARRAY /dev/md/HDD_0 container=3D/dev/md/imsm0 member=3D0
> UUID=3Df61f87fc:1e85f04b:59e873c5:0afdb987
>
> # ls /dev/md
> HDD_0 =A0HDD_0p1 =A0HDD_0p2 =A0HDD_0p3 =A0HDD_0p4 =A0imsm0
>
> Everything seems to be working. Also, I can't reproduce the results i=
n
> Windows Vista x64 (dual-boot.) When I go from linux -> Windows, Windo=
ws
> detects the array as bad and reinitializes it as well, but if I reboo=
t
> into Windows the array survives without being marked bad.
>
> Can anyone shed some light on this? I've been bashing my head on my d=
esk
> for too long and have run out of ideas.
>
> Dan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm forces resync every boot

am 07.08.2011 02:09:45 von Daniel Frey

On 08/06/11 03:18, Erwan Leroux wrote:
> Can you post the content of /etc/mdadm/mdadm.conf ?

Yep. This is my /etc/mdadm.conf:

ARRAY /dev/md/imsm0 metadata=imsm UUID=ec239ccc:22b7330b:0c4808ff:82dd176b
ARRAY /dev/md/HDD_0 container=/dev/md/imsm0 member=0
UUID=f61f87fc:1e85f04b:59e873c5:0afdb987

>
> I noticed that if the ARRAY line contains the name parameter, the raid
> is not started properly
> ARRAY /dev/md/raid metadata=1.2
> UUID=906ce226:19afa04f:12aab3c1:f91daa96 name=Serveur:raid

That doesn't seem to be the problem here.

>
> you just have to remove the parameter and everything go back to normal.
>
> Mayre be you're in this case, but you mentionned dual-boot, so perhaps
> your problem is related to

I've done quite a bit more testing and it's definitely mdadm.

Starting from a clean array:
-Booting into Windows is fine, the array is not degraded
-Then rebooting into Windows, the array is fine not degraded
-Then rebooting into linux, the array is OK
-From there, rebooting into either Windows or linux the array is
degraded and the array starts to rebuild.

It's definitely mdadm causing the array to rebuild itself.

I've rolled a kernel with dmraid-1.0.0-rc16 and the problem is
completely gone.

I've got two kernels on this machine now and I can boot between the
mdadm and dmraid kernels, provided I remember to update the /etc/fstab
before I reboot. Even now, going into the mdadm kernel the rebuilding
issue still happens. I've not changed any of the rc scripts after
building the dmraid kernel, stuff just worked.

I'd much rather use mdadm, as dmraid was far harder to get to work for
some reason... well, it's mostly because I am not used to building my
own initramfs and was using one of gentoo's helpers (genkernel.) I
managed to get the tools to use the most recent mdadm and dmraid, after
much head-scratching.

Dan

>
> Cordialy,
>
> Erwan Leroux
>
>

>
> 2011/8/5 Daniel Frey :
>> Hi all,
>>
>> I've been fighting with my raid array (imsm - raid10) for several weeks
>> now. I've now replaced all four drives in my array as the constant
>> rebuilding caused a smart error to trip on the old drives; unfortunately
>> mdadm is still resyncing the array at every boot.
>>
>> One thing I would like to clarify is does mdadm need to disassemble the
>> array before reboot. At this point, I can't tell if my system is
>> currently doing this. Googling around it seems some say that this step
>> is unnecessary.
>>
>> I've managed to update initramfs to 3.2.1 and the local system to 3.2.1
>> but the problem still persists.
>>
>> The last thing my system does is remount root ro, which it does
>> successfully. However, at the next start:
>>
>> [ 12.657829] md: md127 stopped.
>> [ 12.660652] md: bind
>> [ 12.660939] md: bind
>> [ 12.661212] md: bind
>> [ 12.661282] md: bind
>> [ 12.664972] md: md126 stopped.
>> [ 12.665284] md: bind
>> [ 12.665383] md: bind
>> [ 12.665476] md: bind
>> [ 12.665568] md: bind
>> [ 12.669218] md/raid10:md126: not clean -- starting background
>> reconstruction
>> [ 12.669221] md/raid10:md126: active with 4 out of 4 devices
>> [ 12.669241] md126: detected capacity change from 0 to 1000210432000
>> [ 12.678356] md: md126 switched to read-write mode.
>> [ 12.678390] md: resync of RAID array md126
>> [ 12.678393] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
>> [ 12.678395] md: using maximum available idle IO bandwidth (but not
>> more than 200000 KB/sec) for resync.
>> [ 12.678399] md: using 128k window, over a total of 976768256 blocks.
>>
>> and cat /proc/mdstat shows it resyncing:
>>
>> # cat /proc/mdstat
>> Personalities : [linear] [raid0] [raid1] [raid10]
>> md126 : active raid10 sda[3] sdb[2] sdc[1] sdd[0]
>> 976768000 blocks super external:/md127/0 64K chunks 2 near-copies
>> [4/4] [UUUU]
>> [==>..................] resync = 12.2% (119256896/976768256)
>> finish=100.7min speed=141865K/sec
>>
>> md127 : inactive sdd[3](S) sda[2](S) sdb[1](S) sdc[0](S)
>> 9028 blocks super external:imsm
>>
>> unused devices:
>>
>> When it resyncs it is fine until the next power down.
>>
>> Some other details:
>>
>> # mdadm --detail-platform
>> Platform : Intel(R) Matrix Storage Manager
>> Version : 9.6.0.1014
>> RAID Levels : raid0 raid1 raid10 raid5
>> Chunk Sizes : 4k 8k 16k 32k 64k 128k
>> Max Disks : 7
>> Max Volumes : 2
>> I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
>> Port0 : /dev/sda (WD-WCAYUJ525606)
>> Port1 : /dev/sdb (WD-WCAYUJ525636)
>> Port2 : /dev/sdc (WD-WCAYUX093587)
>> Port3 : /dev/sdd (WD-WCAYUX092774)
>> Port4 : - non-disk device (TSSTcorp CDDVDW SH-S203B) -
>> Port5 : - no device attached -
>>
>> # mdadm --detail --scan
>> ARRAY /dev/md/imsm0 metadata=imsm UUID=ec239ccc:22b7330b:0c4808ff:82dd176b
>> ARRAY /dev/md/HDD_0 container=/dev/md/imsm0 member=0
>> UUID=f61f87fc:1e85f04b:59e873c5:0afdb987
>>
>> # ls /dev/md
>> HDD_0 HDD_0p1 HDD_0p2 HDD_0p3 HDD_0p4 imsm0
>>
>> Everything seems to be working. Also, I can't reproduce the results in
>> Windows Vista x64 (dual-boot.) When I go from linux -> Windows, Windows
>> detects the array as bad and reinitializes it as well, but if I reboot
>> into Windows the array survives without being marked bad.
>>
>> Can anyone shed some light on this? I've been bashing my head on my desk
>> for too long and have run out of ideas.
>>
>> Dan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm forces resync every boot

am 09.08.2011 01:34:37 von NeilBrown

On Fri, 05 Aug 2011 08:05:53 -0700 Daniel Frey wrote:

> Hi all,
>
> I've been fighting with my raid array (imsm - raid10) for several weeks
> now. I've now replaced all four drives in my array as the constant
> rebuilding caused a smart error to trip on the old drives; unfortunately
> mdadm is still resyncing the array at every boot.
>
> One thing I would like to clarify is does mdadm need to disassemble the
> array before reboot. At this point, I can't tell if my system is
> currently doing this. Googling around it seems some say that this step
> is unnecessary.

With md arrays using "native" metadata you don't need to be too careful
shutting down. This is probably what you found by googling.

With IMSM metadata it is a little easier to get it "wrong" though it should
normally work correctly.

There is a program "mdmon" which communicates with the kernel and updates the
metadata on the devices.

When there have been no writes for a little while, mdmon will notice and mark
the array as 'clean'. It will then mark it 'dirty' before the first write is
allowed to proceed.
On a clean shutdown of the array it will mark that array as 'clean'.

But for you, the system shuts down with the array marked 'dirty'. This
suggests that on your machine 'mdmon' is being killed while the array is
still active.

Presumably your root is on the IMSM RAID10 array? When the root filesystem
is marked 'read only' it will probably write to the filesystem to record that
a fsck is not needed. So the array will be 'dirty'. If you then halt before
mdmon has a chance to mark the array 'clean' you will get exactly the result
you see.

If you arrange that the shutdown script runs
mdadm --wait-clean --scan

after marking the root filesystem readonly, it will wait until all arrays are
recorded as 'clean'.

This should fix your problem.

What distro are you using? openSUSE has this command in /etc/init.d/reboot.

NeilBrown

>
> I've managed to update initramfs to 3.2.1 and the local system to 3.2.1
> but the problem still persists.
>
> The last thing my system does is remount root ro, which it does
> successfully. However, at the next start:
>
> [ 12.657829] md: md127 stopped.
> [ 12.660652] md: bind
> [ 12.660939] md: bind
> [ 12.661212] md: bind
> [ 12.661282] md: bind
> [ 12.664972] md: md126 stopped.
> [ 12.665284] md: bind
> [ 12.665383] md: bind
> [ 12.665476] md: bind
> [ 12.665568] md: bind
> [ 12.669218] md/raid10:md126: not clean -- starting background
> reconstruction
> [ 12.669221] md/raid10:md126: active with 4 out of 4 devices
> [ 12.669241] md126: detected capacity change from 0 to 1000210432000
> [ 12.678356] md: md126 switched to read-write mode.
> [ 12.678390] md: resync of RAID array md126
> [ 12.678393] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> [ 12.678395] md: using maximum available idle IO bandwidth (but not
> more than 200000 KB/sec) for resync.
> [ 12.678399] md: using 128k window, over a total of 976768256 blocks.
>
> and cat /proc/mdstat shows it resyncing:
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10]
> md126 : active raid10 sda[3] sdb[2] sdc[1] sdd[0]
> 976768000 blocks super external:/md127/0 64K chunks 2 near-copies
> [4/4] [UUUU]
> [==>..................] resync = 12.2% (119256896/976768256)
> finish=100.7min speed=141865K/sec
>
> md127 : inactive sdd[3](S) sda[2](S) sdb[1](S) sdc[0](S)
> 9028 blocks super external:imsm
>
> unused devices:
>
> When it resyncs it is fine until the next power down.
>
> Some other details:
>
> # mdadm --detail-platform
> Platform : Intel(R) Matrix Storage Manager
> Version : 9.6.0.1014
> RAID Levels : raid0 raid1 raid10 raid5
> Chunk Sizes : 4k 8k 16k 32k 64k 128k
> Max Disks : 7
> Max Volumes : 2
> I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
> Port0 : /dev/sda (WD-WCAYUJ525606)
> Port1 : /dev/sdb (WD-WCAYUJ525636)
> Port2 : /dev/sdc (WD-WCAYUX093587)
> Port3 : /dev/sdd (WD-WCAYUX092774)
> Port4 : - non-disk device (TSSTcorp CDDVDW SH-S203B) -
> Port5 : - no device attached -
>
> # mdadm --detail --scan
> ARRAY /dev/md/imsm0 metadata=imsm UUID=ec239ccc:22b7330b:0c4808ff:82dd176b
> ARRAY /dev/md/HDD_0 container=/dev/md/imsm0 member=0
> UUID=f61f87fc:1e85f04b:59e873c5:0afdb987
>
> # ls /dev/md
> HDD_0 HDD_0p1 HDD_0p2 HDD_0p3 HDD_0p4 imsm0
>
> Everything seems to be working. Also, I can't reproduce the results in
> Windows Vista x64 (dual-boot.) When I go from linux -> Windows, Windows
> detects the array as bad and reinitializes it as well, but if I reboot
> into Windows the array survives without being marked bad.
>
> Can anyone shed some light on this? I've been bashing my head on my desk
> for too long and have run out of ideas.
>
> Dan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm forces resync every boot

am 09.08.2011 21:34:30 von Daniel Frey

On 08/08/11 16:34, NeilBrown wrote:
> On Fri, 05 Aug 2011 08:05:53 -0700 Daniel Frey wrote:
>
>> Hi all,
>>
>> I've been fighting with my raid array (imsm - raid10) for several weeks
>> now. I've now replaced all four drives in my array as the constant
>> rebuilding caused a smart error to trip on the old drives; unfortunately
>> mdadm is still resyncing the array at every boot.
>>
>> One thing I would like to clarify is does mdadm need to disassemble the
>> array before reboot. At this point, I can't tell if my system is
>> currently doing this. Googling around it seems some say that this step
>> is unnecessary.
>
> With md arrays using "native" metadata you don't need to be too careful
> shutting down. This is probably what you found by googling.
>
> With IMSM metadata it is a little easier to get it "wrong" though it should
> normally work correctly.
>
> There is a program "mdmon" which communicates with the kernel and updates the
> metadata on the devices.
>
> When there have been no writes for a little while, mdmon will notice and mark
> the array as 'clean'. It will then mark it 'dirty' before the first write is
> allowed to proceed.
> On a clean shutdown of the array it will mark that array as 'clean'.
>
> But for you, the system shuts down with the array marked 'dirty'. This
> suggests that on your machine 'mdmon' is being killed while the array is
> still active.
>
> Presumably your root is on the IMSM RAID10 array? When the root filesystem
> is marked 'read only' it will probably write to the filesystem to record that
> a fsck is not needed. So the array will be 'dirty'. If you then halt before
> mdmon has a chance to mark the array 'clean' you will get exactly the result
> you see.

That's correct, my root is on the imsm raid10 array, along with my
dual-boot of Windows Vista. If I didn't want to boot Vista, I'd just be
using native mdadm like I always have.

>
> If you arrange that the shutdown script runs
> mdadm --wait-clean --scan
>
> after marking the root filesystem readonly, it will wait until all arrays are
> recorded as 'clean'.

I've done some quick poking around and I do not see anything like that.
I'll look into it a little more and, if necessary, file a bug with the
distro.

>
> This should fix your problem.
>
> What distro are you using? openSUSE has this command in /etc/init.d/reboot.

I've used gentoo for as long as I can remember. I've never had an issue
with mdadm on gentoo until I started using the imsm raid, so it's very
possible that the above command is missing from the shutdown sequence.

Thanks Neil! With this bit of information I should be able to get it
resolved. I do have both mdadm and dmraid kernels on this machine now,
so testing shouldn't be too hard.

>
> NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm forces resync every boot

am 10.08.2011 03:07:56 von Daniel Frey

On 08/08/11 16:34, NeilBrown wrote:
>
> When there have been no writes for a little while, mdmon will notice and mark
> the array as 'clean'. It will then mark it 'dirty' before the first write is
> allowed to proceed.
> On a clean shutdown of the array it will mark that array as 'clean'.
>
> But for you, the system shuts down with the array marked 'dirty'. This
> suggests that on your machine 'mdmon' is being killed while the array is
> still active.
>
> Presumably your root is on the IMSM RAID10 array? When the root filesystem
> is marked 'read only' it will probably write to the filesystem to record that
> a fsck is not needed. So the array will be 'dirty'. If you then halt before
> mdmon has a chance to mark the array 'clean' you will get exactly the result
> you see.
>
> If you arrange that the shutdown script runs
> mdadm --wait-clean --scan
>
> after marking the root filesystem readonly, it will wait until all arrays are
> recorded as 'clean'.
>

Neil,

I see by the switches it is watching /proc/mdstat for updates, but /proc
was likely unmounted earlier in the shutdown process, and root is now
read-only.

I've put it in the shutdown scripts and the system hangs on shutdown -
I've traced the hang to the command I put in above `mdadm --wait-clean
--scan`.

Is it possible it's waiting for a write somewhere? I'm so close to
having it working now...

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm forces resync every boot

am 10.08.2011 03:31:46 von NeilBrown

On Tue, 09 Aug 2011 18:07:56 -0700 Daniel Frey wrote:

> On 08/08/11 16:34, NeilBrown wrote:
> >
> > When there have been no writes for a little while, mdmon will notice and mark
> > the array as 'clean'. It will then mark it 'dirty' before the first write is
> > allowed to proceed.
> > On a clean shutdown of the array it will mark that array as 'clean'.
> >
> > But for you, the system shuts down with the array marked 'dirty'. This
> > suggests that on your machine 'mdmon' is being killed while the array is
> > still active.
> >
> > Presumably your root is on the IMSM RAID10 array? When the root filesystem
> > is marked 'read only' it will probably write to the filesystem to record that
> > a fsck is not needed. So the array will be 'dirty'. If you then halt before
> > mdmon has a chance to mark the array 'clean' you will get exactly the result
> > you see.
> >
> > If you arrange that the shutdown script runs
> > mdadm --wait-clean --scan
> >
> > after marking the root filesystem readonly, it will wait until all arrays are
> > recorded as 'clean'.
> >
>
> Neil,
>
> I see by the switches it is watching /proc/mdstat for updates, but /proc
> was likely unmounted earlier in the shutdown process, and root is now
> read-only.

It need /proc, /sys, and /dev (assuming udev).
There is really no point in unmounting any of these as they are virtual.
Unmount everything else, remount root readonly, then "mdadm --wait-clean".

However I would expect missing filesystems to cause an early failure rather
than a hang. If mdmon had been killed already that might cause a bit of a
hang, but it should be limited to 5 seconds.

Maybe run 'mdadm --wait-clean --scan' under 'strace' and see what it is
doing??

NeilBrown

>
> I've put it in the shutdown scripts and the system hangs on shutdown -
> I've traced the hang to the command I put in above `mdadm --wait-clean
> --scan`.
>
> Is it possible it's waiting for a write somewhere? I'm so close to
> having it working now...
>
> Dan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html