grow fails with 2.6.34 git

grow fails with 2.6.34 git

am 15.04.2010 00:10:06 von James Braid

Trying to grow a 4 disk RAID 5 array to a 6 disk RAID 6 array - running
2.6.34-rc2 (also tried with latest git, same sysfs errors)

Using mdadm from git.

Here's the error I get when I try to perform the grow:

# ./mdadm --grow --backup-file=/root/backup.md4 --level=6
--raid-devices=6 /dev/md4
mdadm: Need to backup 768K of critical section..
mdadm: /dev/md4: Cannot get array details from sysfs

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md4 : active raid6 sde[0] sdg[5](S) sdh[6](S) sdc[3] sdd[2] sdf[1]
4395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [UUUU_]

unused devices:

dmesg reports lots of sysfs errors:

[ 922.249484] ------------[ cut here ]------------
[ 922.249549] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xcc/0xe3()
[ 922.249609] Hardware name: GA-MA785GT-UD3H
[ 922.249673] sysfs: cannot create duplicate filename
'/devices/virtual/block/md4/md/stripe_cache_size'
[ 922.249783] Modules linked in: ppdev lp parport sco bridge stp bnep
rfcomm l2cap crc16 dahdi_echocan_oslec echo powernow_k8
cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative
uinput fuse ext3 jbd mbcache it87 hwmon_vid raid456 async_raid6_recov
async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod btusb
bluetooth pl2303 rfkill usbserial snd_hda_codec_nvhdmi
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm_oss snd_hwdep
snd_mixer_oss ata_generic snd_pcm snd_seq_midi ide_pci_generic
snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device r8169
firewire_ohci wcopenpci snd edac_core ahci i2c_piix4 ohci_hcd k10temp
soundcore mii atiixp firewire_core edac_mce_amd tpm_tis libata nvidia(P)
ide_core agpgart wctdm tpm dahdi snd_page_alloc tpm_bios i2c_core floppy
processor button crc_itu_t crc_ccitt evdev xfs exportfs sd_mod
crc_t10dif dm_mod thermal fan thermal_sys ehci_hcd usb_storage usbcore
nls_base scsi_mod
[ 922.254181] Pid: 10642, comm: mdadm Tainted: P W
2.6.34-rc2-amd64 #2
[ 922.254241] Call Trace:
[ 922.254302] [] ? warn_slowpath_common+0x76/0x8c
[ 922.254366] [] ? warn_slowpath_fmt+0x40/0x45
[ 922.254427] [] ? sysfs_add_one+0xcc/0xe3
[ 922.254490] [] ? sysfs_add_file_mode+0x4b/0x7d
[ 922.254553] [] ? internal_create_group+0xdd/0x16b
[ 922.259781] [] ? run+0x4fa/0x685 [raid456]
[ 922.259853] [] ? level_store+0x3b7/0x42e [md_mod]
[ 922.259922] [] ? md_attr_store+0x77/0x96 [md_mod]
[ 922.259988] [] ? sysfs_write_file+0xe3/0x11f
[ 922.260061] [] ? vfs_write+0xa4/0x101
[ 922.260122] [] ? sys_write+0x45/0x6b
[ 922.260183] [] ? system_call_fastpath+0x16/0x1b
[ 922.260243] ---[ end trace f9f1fcb8cad24d01 ]---
[ 922.260381] raid5: failed to create sysfs attributes for md4

After the grow failed, I stopped the array and restarted it. At that
point it appears to be continuing with the grow process? Is this correct?

# mdadm --stop /dev/md4
mdadm: stopped /dev/md4

# mdadm --assemble /dev/md4
mdadm: /dev/md4 has been started with 4 drives (out of 5) and 2 spares.

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md4 : active raid6 sde[0] sdh[5] sdg[6](S) sdc[3] sdd[2] sdf[1]
4395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [UUUU_]
[>....................] recovery = 0.0% (147712/1465138496)
finish=661.1min speed=36928K/sec

unused devices:

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: grow fails with 2.6.34 git

am 15.04.2010 00:48:02 von Michael Evans

On Wed, Apr 14, 2010 at 3:10 PM, James Braid wrot=
e:
> Trying to grow a 4 disk RAID 5 array to a 6 disk RAID 6 array - runni=
ng
> 2.6.34-rc2 (also tried with latest git, same sysfs errors)
>
> Using mdadm from git.
>
> Here's the error I get when I try to perform the grow:
>
> # ./mdadm --grow --backup-file=3D/root/backup.md4 --level=3D6 --raid-=
devices=3D6
> /dev/md4
> mdadm: Need to backup 768K of critical section..
> mdadm: /dev/md4: Cannot get array details from sysfs
>
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md4 : active raid6 sde[0] sdg[5](S) sdh[6](S) sdc[3] sdd[2] sdf[1]
> =A0 =A0 =A04395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [=
UUUU_]
>
> unused devices:
>
> dmesg reports lots of sysfs errors:
>
> [ =A0922.249484] ------------[ cut here ]------------
> [ =A0922.249549] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xcc/0x=
e3()
> [ =A0922.249609] Hardware name: GA-MA785GT-UD3H
> [ =A0922.249673] sysfs: cannot create duplicate filename
> '/devices/virtual/block/md4/md/stripe_cache_size'
> [ =A0922.249783] Modules linked in: ppdev lp parport sco bridge stp b=
nep
> rfcomm l2cap crc16 dahdi_echocan_oslec echo powernow_k8 cpufreq_users=
pace
> cpufreq_stats cpufreq_powersave cpufreq_conservative uinput fuse ext3=
jbd
> mbcache it87 hwmon_vid raid456 async_raid6_recov async_pq raid6_pq as=
ync_xor
> xor async_memcpy async_tx md_mod btusb bluetooth pl2303 rfkill usbser=
ial
> snd_hda_codec_nvhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_code=
c
> snd_pcm_oss snd_hwdep snd_mixer_oss ata_generic snd_pcm snd_seq_midi
> ide_pci_generic snd_rawmidi snd_seq_midi_event snd_seq snd_timer
> snd_seq_device r8169 firewire_ohci wcopenpci snd edac_core ahci i2c_p=
iix4
> ohci_hcd k10temp soundcore mii atiixp firewire_core edac_mce_amd tpm_=
tis
> libata nvidia(P) ide_core agpgart wctdm tpm dahdi snd_page_alloc tpm_=
bios
> i2c_core floppy processor button crc_itu_t crc_ccitt evdev xfs export=
fs
> sd_mod crc_t10dif dm_mod thermal fan thermal_sys ehci_hcd usb_storage
> usbcore nls_base scsi_mod
> [ =A0922.254181] Pid: 10642, comm: mdadm Tainted: P =A0 =A0 =A0 =A0W =
2.6.34-rc2-amd64
> #2
> [ =A0922.254241] Call Trace:
> [ =A0922.254302] =A0[] ? warn_slowpath_common+0x76/=
0x8c
> [ =A0922.254366] =A0[] ? warn_slowpath_fmt+0x40/0x4=
5
> [ =A0922.254427] =A0[] ? sysfs_add_one+0xcc/0xe3
> [ =A0922.254490] =A0[] ? sysfs_add_file_mode+0x4b/0=
x7d
> [ =A0922.254553] =A0[] ? internal_create_group+0xdd=
/0x16b
> [ =A0922.259781] =A0[] ? run+0x4fa/0x685 [raid456]
> [ =A0922.259853] =A0[] ? level_store+0x3b7/0x42e [m=
d_mod]
> [ =A0922.259922] =A0[] ? md_attr_store+0x77/0x96 [m=
d_mod]
> [ =A0922.259988] =A0[] ? sysfs_write_file+0xe3/0x11=
f
> [ =A0922.260061] =A0[] ? vfs_write+0xa4/0x101
> [ =A0922.260122] =A0[] ? sys_write+0x45/0x6b
> [ =A0922.260183] =A0[] ? system_call_fastpath+0x16/=
0x1b
> [ =A0922.260243] ---[ end trace f9f1fcb8cad24d01 ]---
> [ =A0922.260381] raid5: failed to create sysfs attributes for md4
>
> After the grow failed, I stopped the array and restarted it. At that =
point
> it appears to be continuing with the grow process? Is this correct?
>
> # mdadm --stop /dev/md4
> mdadm: stopped /dev/md4
>
> # mdadm --assemble /dev/md4
> mdadm: /dev/md4 has been started with 4 drives (out of 5) and 2 spare=
s.
>
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md4 : active raid6 sde[0] sdh[5] sdg[6](S) sdc[3] sdd[2] sdf[1]
> =A0 =A0 =A04395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [=
UUUU_]
> =A0 =A0 =A0[>....................] =A0recovery =3D =A00.0% (147712/14=
65138496)
> finish=3D661.1min speed=3D36928K/sec
>
> unused devices:
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

Yes, that /extremely/ slow growth progress is normal for this
conversion. Every critical section must be backed up synced, and only
then will it proceed. You may notice an incorrect number of disks
relative to the expected number. That will go away the next time the
array is assembled.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: grow fails with 2.6.34 git

am 15.04.2010 01:27:51 von James Braid

Michael Evans wrote:
> Yes, that /extremely/ slow growth progress is normal for this
> conversion. Every critical section must be backed up synced, and only
> then will it proceed. You may notice an incorrect number of disks
> relative to the expected number. That will go away the next time the
> array is assembled.

Thanks. I'm not concerned about the speed; the fact that I had to stop
and restart the array before the reshape would begin is what concerns
me, as well as the odd sysfs errors.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: grow fails with 2.6.34 git

am 15.04.2010 01:38:09 von Michael Evans

On Wed, Apr 14, 2010 at 4:27 PM, James Braid wrot=
e:
> Michael Evans wrote:
>>
>> Yes, that /extremely/ slow growth progress is normal for this
>> conversion. =A0Every critical section must be backed up synced, and =
only
>> then will it proceed. =A0You may notice an incorrect number of disks
>> relative to the expected number. =A0That will go away the next time =
the
>> array is assembled.
>
> Thanks. I'm not concerned about the speed; the fact that I had to sto=
p and
> restart the array before the reshape would begin is what concerns me,=
as
> well as the odd sysfs errors.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

Probably some minor bug relating to sysfs entry creation; it looks
like it saw a duplicate entry, so probably just one flow control
statement and test off of valid. It looks like that would have been
ancillary to the actual critical work anyway. Presuming your data
doesn't read as garbage at the moment it should be intact when the
reshape completes as well.

As for what caused the error, you'd have to ask someone that currently
writes/tests the git version. The system I use software raid on is
more or less 'production' in that it's the main file-server for the
house, so I tend to only use stable kernel versions.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: grow fails with 2.6.34 git

am 15.04.2010 03:04:01 von NeilBrown

On Wed, 14 Apr 2010 23:10:06 +0100
James Braid wrote:

> Trying to grow a 4 disk RAID 5 array to a 6 disk RAID 6 array - running
> 2.6.34-rc2 (also tried with latest git, same sysfs errors)
>
> Using mdadm from git.
>
> Here's the error I get when I try to perform the grow:
>
> # ./mdadm --grow --backup-file=/root/backup.md4 --level=6
> --raid-devices=6 /dev/md4
> mdadm: Need to backup 768K of critical section..
> mdadm: /dev/md4: Cannot get array details from sysfs

That is odd, and isn't really explained by the dmesg errors you mentioned.

I'll have to do some experimentation to see if I can reproduce your symptom.

The sysfs errors are more noise than a real issue, I'm currently working on
a patch to get rid of them.

>
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md4 : active raid6 sde[0] sdg[5](S) sdh[6](S) sdc[3] sdd[2] sdf[1]
> 4395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [UUUU_]

So it has converted your RAID5 to RAID6 with a special layout which places
all the Q blocks on the one disk. That disk is missing. So your data is
still safe, but the layout is somewhat unorthodox, and it didn't grow to 6
devices like you asked it to.

> After the grow failed, I stopped the array and restarted it. At that
> point it appears to be continuing with the grow process? Is this correct?
....
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md4 : active raid6 sde[0] sdh[5] sdg[6](S) sdc[3] sdd[2] sdf[1]
> 4395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [UUUU_]
> [>....................] recovery = 0.0% (147712/1465138496)
> finish=661.1min speed=36928K/sec

What is happening here is that the spare (sdh) is getting the Q blocks
written to it. When this completes you will have full 2-disk redundancy but
the layout will not be optimal and the array wont be any bigger.
To fix this you would:

mdadm --grow --backup-file=/root/backup.md4 --raid-devices=6 \
--layout=normalise /dev/md4

Hopefully this will not hit the same problem that you hit before.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: grow fails with 2.6.34 git

am 15.04.2010 04:22:48 von Michael Evans

On Wed, Apr 14, 2010 at 6:04 PM, Neil Brown wrote:
> What is happening here is that the spare (sdh) is getting the Q block=
s
> written to it. =A0When this completes you will have full 2-disk redun=
dancy but
> the layout will not be optimal and the array wont be any bigger.
> To fix this you would:
>
> =A0mdadm --grow --backup-file=3D/root/backup.md4 --raid-devices=3D6 \
> =A0 =A0 =A0--layout=3Dnormalise /dev/md4
>

Is there some way of telling if the layout is proper, or is that what
'level 6' indicates?

Also is it safe to run something like...

mdadm --grow --back-file=3D/some/never/used/file --layout=3Dnormalize /=
dev/mdX

on any raid 5/6 device that isn't undergoing an active reshape? (would
it be a null-op if not required)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: grow fails with 2.6.34 git

am 15.04.2010 04:55:35 von NeilBrown

On Wed, 14 Apr 2010 19:22:48 -0700
Michael Evans wrote:

> On Wed, Apr 14, 2010 at 6:04 PM, Neil Brown wrote:
> > What is happening here is that the spare (sdh) is getting the Q blo=
cks
> > written to it.  When this completes you will have full 2-disk =
redundancy but
> > the layout will not be optimal and the array wont be any bigger.
> > To fix this you would:
> >
> >  mdadm --grow --backup-file=3D/root/backup.md4 --raid-devices=3D=
6 \
> >      --layout=3Dnormalise /dev/md4
> >
>=20
> Is there some way of telling if the layout is proper, or is that what
> 'level 6' indicates?

mdadm -D should list the layout.
If it ends "-6" then it is a non-standard layout with Q on the last dev=
ice
and the remainder of the devices laid out like a RAID5.

>=20
> Also is it safe to run something like...
>=20
> mdadm --grow --back-file=3D/some/never/used/file --layout=3Dnormalize=
/dev/mdX
>=20
> on any raid 5/6 device that isn't undergoing an active reshape? (woul=
d
> it be a null-op if not required)

Yes. It quite literally looks at the layout name to see if it ends wit=
h "-6".
If it is not needed it will actually report and error:
mdadm: layout normalize not understood for raid6.

I should probably fix that..... one day.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: grow fails with 2.6.34 git

am 15.04.2010 17:09:03 von James Braid

On 15/04/10 02:04, Neil Brown wrote:
> On Wed, 14 Apr 2010 23:10:06 +0100
> James Braid wrote:
>> # cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md4 : active raid6 sde[0] sdg[5](S) sdh[6](S) sdc[3] sdd[2] sdf[1]
>> 4395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [UUUU_]
>
> So it has converted your RAID5 to RAID6 with a special layout which places
> all the Q blocks on the one disk. That disk is missing. So your data is
> still safe, but the layout is somewhat unorthodox, and it didn't grow to 6
> devices like you asked it to.

Yeah, I was a a bit confused as to why that didn't work.

>> After the grow failed, I stopped the array and restarted it. At that
>> point it appears to be continuing with the grow process? Is this correct?
> ...
>> # cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md4 : active raid6 sde[0] sdh[5] sdg[6](S) sdc[3] sdd[2] sdf[1]
>> 4395415488 blocks level 6, 64k chunk, algorithm 18 [5/4] [UUUU_]
>> [>....................] recovery = 0.0% (147712/1465138496)
>> finish=661.1min speed=36928K/sec
>
> What is happening here is that the spare (sdh) is getting the Q blocks
> written to it. When this completes you will have full 2-disk redundancy but
> the layout will not be optimal and the array wont be any bigger.
> To fix this you would:
>
> mdadm --grow --backup-file=/root/backup.md4 --raid-devices=6 \
> --layout=normalise /dev/md4
>
> Hopefully this will not hit the same problem that you hit before.

This seems to be working OK - thanks Neil! The man pages cover this
quite well too.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html