What exactly does echo check > /sys/block/mdX/md/sync

What exactly does echo check > /sys/block/mdX/md/sync_action do?

am 09.01.2011 23:48:05 von Christian Schmidt

Hi all,

As the subject says, I'm wondering what issuing the "check" command to a
raid array does.
The wiki says it starts a full read of the raid array. However I wonder
if all members, especially the parts of the drives containing the
redundancy information, will be read, and possibly the validity of the
redundancy data will be checked?

A possibly related question is: why did this member turn into "spare"
role? The system was fully functional and in daily use for about a year.
It was declared to be a four drive raid 5 with no spares. If I remember
level 5 correctly there is no single drive for the redundancy data to
avoid bottlenecks, right?

alpha md # mdadm --examine --verbose /dev/sdh2
/dev/sdh2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : fa8fb033:6312742f:0524501d:5aa24a28
Name : sysresccd:1
Creation Time : Sat Jul 17 02:57:27 2010
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3904927887 (1862.01 GiB 1999.32 GB)
Array Size : 11714780160 (5586.04 GiB 5997.97 GB)
Used Dev Size : 3904926720 (1862.01 GiB 1999.32 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 172eb49b:03e62242:614d7ed3:1fb25f65

Update Time : Sun Jan 9 19:55:09 2011
Checksum : a991f168 - correct
Events : 34

Layout : left-symmetric
Chunk Size : 512K

Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)

Too bad that 1.2 superblocks don't contain the full array information
like 0.90 did.

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: What exactly does echo check > /sys/block/mdX/md/sync_actiondo?

am 10.01.2011 00:26:13 von NeilBrown

On Sun, 09 Jan 2011 23:48:05 +0100 Christian Schmidt
wrote:

> Hi all,
>
> As the subject says, I'm wondering what issuing the "check" command to a
> raid array does.
> The wiki says it starts a full read of the raid array. However I wonder
> if all members, especially the parts of the drives containing the
> redundancy information, will be read, and possibly the validity of the
> redundancy data will be checked?

May I suggest
man 4 md

and search for 'check'
???

md/sync_action
This can be used to monitor and control the resync/recovery pro-
cess of MD. In particular, writing "check" here will cause the
array to read all data block and check that they are consistent
(e.g. parity is correct, or all mirror replicas are the same).
Any discrepancies found are NOT corrected.

A count of problems found will be stored in md/mismatch_count.

Alternately, "repair" can be written which will cause the same
check to be performed, but any errors will be corrected.

Finally, "idle" can be written to stop the check/repair process.

Does that answer your question?

A more recent man page says:

md arrays can be scrubbed by writing either check or repair to the file
md/sync_action in the sysfs directory for the device.

Requesting a scrub will cause md to read every block on every device in
the array, and check that the data is consistent. For RAID1 and
RAID10, this means checking that the copies are identical. For RAID4,
RAID5, RAID6 this means checking that the parity block is (or blocks
are) correct.

If a read error is detected during this process, the normal read-error
handling causes correct data to be found from other devices and to be
written back to the faulty device. In many case this will effectively
fix the bad block.

If all blocks read successfully but are found to not be consistent,
then this is regarded as a mismatch.

>
> A possibly related question is: why did this member turn into "spare"
> role? The system was fully functional and in daily use for about a year.
> It was declared to be a four drive raid 5 with no spares. If I remember
> level 5 correctly there is no single drive for the redundancy data to
> avoid bottlenecks, right?

One would need to see the history of the whole array, not just the current
state of a single device, to be able to guess the reason for the current
state.

And yes: RAID5 distributes the parity blocks to avoid bottlenecks.

>
> alpha md # mdadm --examine --verbose /dev/sdh2
> /dev/sdh2:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : fa8fb033:6312742f:0524501d:5aa24a28
> Name : sysresccd:1
> Creation Time : Sat Jul 17 02:57:27 2010
> Raid Level : raid5
> Raid Devices : 4
>
> Avail Dev Size : 3904927887 (1862.01 GiB 1999.32 GB)
> Array Size : 11714780160 (5586.04 GiB 5997.97 GB)
> Used Dev Size : 3904926720 (1862.01 GiB 1999.32 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 172eb49b:03e62242:614d7ed3:1fb25f65
>
> Update Time : Sun Jan 9 19:55:09 2011
> Checksum : a991f168 - correct
> Events : 34
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : spare
> Array State : AAAA ('A' == active, '.' == missing)
>
> Too bad that 1.2 superblocks don't contain the full array information
> like 0.90 did.

The extra information that 0.90 stored was not (and could not be) reliable.

This device thinks that that the array is functioning correctly with no
failed devices, and that this device is a spare - presumably a 5th device?
It doesn't know the names of the other devices (and if it thought it did, it
could easily be wrong as names changed). What do the other devices think of
the state of the array?

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: What exactly does echo check > /sys/block/mdX/md/sync_actiondo?

am 10.01.2011 01:28:07 von Christian Schmidt

On 01/10/2011 12:26 AM, NeilBrown wrote:
> On Sun, 09 Jan 2011 23:48:05 +0100 Christian Schmidt
> wrote:
>
>> Hi all,
>>
>> As the subject says, I'm wondering what issuing the "check" command to a
>> raid array does.
>
> May I suggest
> man 4 md
>
> Does that answer your question?

Yes, indeed. Thanks.

>> A possibly related question is: why did this member turn into "spare"
>> role? The system was fully functional and in daily use for about a year.
>> It was declared to be a four drive raid 5 with no spares. If I remember
>> level 5 correctly there is no single drive for the redundancy data to
>> avoid bottlenecks, right?
>
> One would need to see the history of the whole array, not just the current
> state of a single device, to be able to guess the reason for the current
> state.
>
> And yes: RAID5 distributes the parity blocks to avoid bottlenecks.
>
>>
>> alpha md # mdadm --examine --verbose /dev/sdh2
>> /dev/sdh2:
>> Magic : a92b4efc
>> Version : 1.2
>> Feature Map : 0x0
>> Array UUID : fa8fb033:6312742f:0524501d:5aa24a28
>> Name : sysresccd:1
>> Creation Time : Sat Jul 17 02:57:27 2010
>> Raid Level : raid5
>> Raid Devices : 4
>>
>> Avail Dev Size : 3904927887 (1862.01 GiB 1999.32 GB)
>> Array Size : 11714780160 (5586.04 GiB 5997.97 GB)
>> Used Dev Size : 3904926720 (1862.01 GiB 1999.32 GB)
>> Data Offset : 2048 sectors
>> Super Offset : 8 sectors
>> State : clean
>> Device UUID : 172eb49b:03e62242:614d7ed3:1fb25f65
>>
>> Update Time : Sun Jan 9 19:55:09 2011
>> Checksum : a991f168 - correct
>> Events : 34
>>
>> Layout : left-symmetric
>> Chunk Size : 512K
>>
>> Device Role : spare
>> Array State : AAAA ('A' == active, '.' == missing)
>>
>> Too bad that 1.2 superblocks don't contain the full array information
>> like 0.90 did.
>
> The extra information that 0.90 stored was not (and could not be) reliable.
>
> This device thinks that that the array is functioning correctly with no
> failed devices, and that this device is a spare - presumably a 5th device?
> It doesn't know the names of the other devices (and if it thought it did, it
> could easily be wrong as names changed). What do the other devices think of
> the state of the array?

[~]>mdadm -Q --detail /dev/md3
/dev/md3:
Version : 1.02
Creation Time : Sat Jul 17 02:57:27 2010
Raid Level : raid5
Array Size : 5857390080 (5586.04 GiB 5997.97 GB)
Used Dev Size : 1952463360 (1862.01 GiB 1999.32 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent

Update Time : Mon Jan 10 00:38:00 2011
State : clean, recovering
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Rebuild Status : 68% complete

Name : sysresccd:1
UUID : fa8fb033:6312742f:0524501d:5aa24a28
Events : 34

Number Major Minor RaidDevice State
0 8 34 0 active sync /dev/sdc2
1 8 50 1 active sync /dev/sdd2
2 8 82 2 active sync /dev/sdf2
4 8 114 3 active sync /dev/sdh2

So just "check" turns the array into rebuild mode and one of the drives
into a spare? That's unexpected.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: What exactly does echo check > /sys/block/mdX/md/sync_actiondo?

am 10.01.2011 01:43:29 von NeilBrown

On Mon, 10 Jan 2011 01:28:07 +0100 Christian Schmidt
wrote:

> > This device thinks that that the array is functioning correctly with no
> > failed devices, and that this device is a spare - presumably a 5th device?
> > It doesn't know the names of the other devices (and if it thought it did, it
> > could easily be wrong as names changed). What do the other devices think of
> > the state of the array?
>
> [~]>mdadm -Q --detail /dev/md3
> /dev/md3:
> Version : 1.02
> Creation Time : Sat Jul 17 02:57:27 2010
> Raid Level : raid5
> Array Size : 5857390080 (5586.04 GiB 5997.97 GB)
> Used Dev Size : 1952463360 (1862.01 GiB 1999.32 GB)
> Raid Devices : 4
> Total Devices : 4
> Persistence : Superblock is persistent
>
> Update Time : Mon Jan 10 00:38:00 2011
> State : clean, recovering
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Rebuild Status : 68% complete
>
> Name : sysresccd:1
> UUID : fa8fb033:6312742f:0524501d:5aa24a28
> Events : 34
>
> Number Major Minor RaidDevice State
> 0 8 34 0 active sync /dev/sdc2
> 1 8 50 1 active sync /dev/sdd2
> 2 8 82 2 active sync /dev/sdf2
> 4 8 114 3 active sync /dev/sdh2
>
> So just "check" turns the array into rebuild mode and one of the drives
> into a spare? That's unexpected.

I very much doubt writing "check" is all that happened. Maybe seeing some
kernel logs would help.
What does
cat /proc/mdstat

show (assuming the check/recovery/whatever hasn't finished yet).
It should say "recovering" as I think the key word is copied into the
'State:' line above.

But writing "check" should not cause any drive to become a 'spare', and
should not trigger a 'rebuild' - just a 'check'.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: What exactly does echo check > /sys/block/mdX/md/sync_actiondo?

am 10.01.2011 02:14:35 von Christian Schmidt

On 01/10/2011 01:43 AM, NeilBrown wrote:
> On Mon, 10 Jan 2011 01:28:07 +0100 Christian Schmidt
> wrote:
>
>
>>> This device thinks that that the array is functioning correctly with no
>>> failed devices, and that this device is a spare - presumably a 5th device?
>>> It doesn't know the names of the other devices (and if it thought it did, it
>>> could easily be wrong as names changed). What do the other devices think of
>>> the state of the array?
>>
>> [~]>mdadm -Q --detail /dev/md3
>> /dev/md3:
>> Version : 1.02
>> Creation Time : Sat Jul 17 02:57:27 2010
>> Raid Level : raid5
>> Array Size : 5857390080 (5586.04 GiB 5997.97 GB)
>> Used Dev Size : 1952463360 (1862.01 GiB 1999.32 GB)
>> Raid Devices : 4
>> Total Devices : 4
>> Persistence : Superblock is persistent
>>
>> Update Time : Mon Jan 10 00:38:00 2011
>> State : clean, recovering
>> Active Devices : 4
>> Working Devices : 4
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Layout : left-symmetric
>> Chunk Size : 512K
>>
>> Rebuild Status : 68% complete
>>
>> Name : sysresccd:1
>> UUID : fa8fb033:6312742f:0524501d:5aa24a28
>> Events : 34
>>
>> Number Major Minor RaidDevice State
>> 0 8 34 0 active sync /dev/sdc2
>> 1 8 50 1 active sync /dev/sdd2
>> 2 8 82 2 active sync /dev/sdf2
>> 4 8 114 3 active sync /dev/sdh2
>>
>> So just "check" turns the array into rebuild mode and one of the drives
>> into a spare? That's unexpected.
>
> I very much doubt writing "check" is all that happened. Maybe seeing some
> kernel logs would help.

Here they are:

[ 235.503895] md: md3 stopped.
[ 235.505428] md: bind
[ 235.505557] md: bind
[ 235.505673] md: bind
[ 235.505804] md: bind
[ 235.510288] md/raid:md3: device sdc2 operational as raid disk 0
[ 235.510292] md/raid:md3: device sdh2 operational as raid disk 3
[ 235.510294] md/raid:md3: device sdf2 operational as raid disk 2
[ 235.510296] md/raid:md3: device sdd2 operational as raid disk 1
[ 235.510569] md/raid:md3: allocated 4280kB
[ 235.510604] md/raid:md3: raid level 5 active with 4 out of 4 devices,
algorithm 2
[ 235.510607] RAID conf printout:
[ 235.510609] --- level:5 rd:4 wd:4
[ 235.510611] disk 0, o:1, dev:sdc2
[ 235.510613] disk 1, o:1, dev:sdd2
[ 235.510614] disk 2, o:1, dev:sdf2
[ 235.510616] disk 3, o:1, dev:sdh2
[ 235.510652] md3: detected capacity change from 0 to 5997967441920
[ 236.204947] md3: unknown partition table
[ 1347.192343] md: data-check of RAID array md3
[ 1347.192346] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 1347.192347] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for data-check.
[ 1347.192352] md: using 128k window, over a total of 1952463360 blocks.

Actually I rebooted the machine after a kernel update, which turned out
to change the drive names (I left an unrelated drive in a hotswap bay).
Also, I had an erroneous /etc/mdadm.conf which was still referring to
the old drive naming. When I realized this drive array wasn't started I
completely renamed the config file and ran
mdadm -A --scan
after which the array was found. I have some issues opening crypto
volumes on the LVM though and tried to figure out whether I forgot the
key for one and never created the other, or something's wrong on the
underlying layer, so I started a check.

> What does
> cat /proc/mdstat

It says:

md3 : active raid5 sdc2[0] sdh2[4] sdf2[2] sdd2[1]
5857390080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4]
[UUUU]
[=================>...] check = 85.3% (1667391744/1952463360)
finish=57.5min speed=82511K/sec

> show (assuming the check/recovery/whatever hasn't finished yet).
> It should say "recovering" as I think the key word is copied into the
> 'State:' line above.
>
> But writing "check" should not cause any drive to become a 'spare', and
> should not trigger a 'rebuild' - just a 'check'.

Well... so what is this raid actually doing? mdstat says check, mdam -q
--detail says recovering, and mdadm --examine on one of the drives says
spare (while no spare are listed at any other point).

mdadm --examine:

/dev/sdc2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : fa8fb033:6312742f:0524501d:5aa24a28
Name : sysresccd:1
Creation Time : Sat Jul 17 02:57:27 2010
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3904927887 (1862.01 GiB 1999.32 GB)
Array Size : 11714780160 (5586.04 GiB 5997.97 GB)
Used Dev Size : 3904926720 (1862.01 GiB 1999.32 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 801bb0ab:256d6f57:7e53e467:62094362

Update Time : Mon Jan 10 01:43:39 2011
Checksum : 5f661441 - correct
Events : 35

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing)

/dev/sdd2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : fa8fb033:6312742f:0524501d:5aa24a28
Name : sysresccd:1
Creation Time : Sat Jul 17 02:57:27 2010
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3904927887 (1862.01 GiB 1999.32 GB)
Array Size : 11714780160 (5586.04 GiB 5997.97 GB)
Used Dev Size : 3904926720 (1862.01 GiB 1999.32 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : d14e0126:4c8be6cd:418165b2:24bba827

Update Time : Mon Jan 10 01:43:39 2011
Checksum : 6015453f - correct
Events : 35

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing)

/dev/sdf2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : fa8fb033:6312742f:0524501d:5aa24a28
Name : sysresccd:1
Creation Time : Sat Jul 17 02:57:27 2010
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3904927887 (1862.01 GiB 1999.32 GB)
Array Size : 11714780160 (5586.04 GiB 5997.97 GB)
Used Dev Size : 3904926720 (1862.01 GiB 1999.32 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3b8a4934:40a3270d:7e285e98:07aec354

Update Time : Mon Jan 10 01:43:39 2011
Checksum : c0b232bd - correct
Events : 35

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing)

/dev/sdh2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : fa8fb033:6312742f:0524501d:5aa24a28
Name : sysresccd:1
Creation Time : Sat Jul 17 02:57:27 2010
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3904927887 (1862.01 GiB 1999.32 GB)
Array Size : 11714780160 (5586.04 GiB 5997.97 GB)
Used Dev Size : 3904926720 (1862.01 GiB 1999.32 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 172eb49b:03e62242:614d7ed3:1fb25f65

Update Time : Mon Jan 10 01:43:39 2011
Checksum : a8d4425a - correct
Events : 35

Layout : left-symmetric
Chunk Size : 512K

Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html