Raid failing, which command to remove the bad drive?

Raid failing, which command to remove the bad drive?

am 26.08.2011 22:13:01 von tlenz

I have 4 drives set up as 2 pairs. The first part has 3 partitions on
it and it seems 1 of those drives is failing (going to have to figure
out which drive it is too so I don't pull the wrong one out of the case)

It's been awhile since I had to replace a drive in the array and my
notes are a bit confusing. I'm not sure which I need to use to remove
the drive:


sudo mdadm --manage /dev/md0 --fail /dev/sdb
sudo mdadm --manage /dev/md0 --remove /dev/sdb
sudo mdadm --manage /dev/md1 --fail /dev/sdb
sudo mdadm --manage /dev/md1 --remove /dev/sdb
sudo mdadm --manage /dev/md2 --fail /dev/sdb
sudo mdadm --manage /dev/md2 --remove /dev/sdb

or

sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3

I'm not sure if I fail the drive partition or whole drive for each.

-------------------------------------
The mails I got are:
-------------------------------------
A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdb1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdb2[2](F) sda2[0]
4891712 blocks [2/1] [U_]

md2 : active raid1 sdb3[1] sda3[0]
459073344 blocks [2/2] [UU]

md3 : active raid1 sdd1[1] sdc1[0]
488383936 blocks [2/2] [UU]

md0 : active raid1 sdb1[2](F) sda1[0]
24418688 blocks [2/1] [U_]

unused devices:
-------------------------------------
A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdb2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdb2[2](F) sda2[0]
4891712 blocks [2/1] [U_]

md2 : active raid1 sdb3[1] sda3[0]
459073344 blocks [2/2] [UU]

md3 : active raid1 sdd1[1] sdc1[0]
488383936 blocks [2/2] [UU]

md0 : active raid1 sdb1[2](F) sda1[0]
24418688 blocks [2/1] [U_]

unused devices:
-------------------------------------
A Fail event had been detected on md device /dev/md2.

It could be related to component device /dev/sdb3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdb2[2](F) sda2[0]
4891712 blocks [2/1] [U_]

md2 : active raid1 sdb3[2](F) sda3[0]
459073344 blocks [2/1] [U_]

md3 : active raid1 sdd1[1] sdc1[0]
488383936 blocks [2/2] [UU]

md0 : active raid1 sdb1[2](F) sda1[0]
24418688 blocks [2/1] [U_]

unused devices:
-------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 26.08.2011 23:25:10 von mathias.buren

On 26 August 2011 21:13, Timothy D. Lenz wrote:
> I have 4 drives set up as 2 pairs.  The first part has 3 partiti=
ons on it
> and it seems 1 of those drives is failing (going to have to figure ou=
t which
> drive it is too so I don't pull the wrong one out of the case)
>
> It's been awhile since I had to replace a drive in the array and my n=
otes
> are a bit confusing. I'm not sure which I need to use to remove the d=
rive:
>
>
>        sudo mdadm --manage /dev/md0 --fail /dev/s=
db
>        sudo mdadm --manage /dev/md0 --remove /dev=
/sdb
>        sudo mdadm --manage /dev/md1 --fail /dev/s=
db
>        sudo mdadm --manage /dev/md1 --remove /dev=
/sdb
>        sudo mdadm --manage /dev/md2 --fail /dev/s=
db
>        sudo mdadm --manage /dev/md2 --remove /dev=
/sdb
>
> or
>
> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>
> I'm not sure if I fail the drive partition or whole drive for each.
>
> -------------------------------------
> The mails I got are:
> -------------------------------------
> A Fail event had been detected on md device /dev/md0.
>
> It could be related to component device /dev/sdb1.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>      4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[1] sda3[0]
>      459073344 blocks [2/2] [UU]
>
> md3 : active raid1 sdd1[1] sdc1[0]
>      488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
>      24418688 blocks [2/1] [U_]
>
> unused devices:
> -------------------------------------
> A Fail event had been detected on md device /dev/md1.
>
> It could be related to component device /dev/sdb2.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>      4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[1] sda3[0]
>      459073344 blocks [2/2] [UU]
>
> md3 : active raid1 sdd1[1] sdc1[0]
>      488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
>      24418688 blocks [2/1] [U_]
>
> unused devices:
> -------------------------------------
> A Fail event had been detected on md device /dev/md2.
>
> It could be related to component device /dev/sdb3.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>      4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[2](F) sda3[0]
>      459073344 blocks [2/1] [U_]
>
> md3 : active raid1 sdd1[1] sdc1[0]
>      488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
>      24418688 blocks [2/1] [U_]
>
> unused devices:
> -------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
>

Looks like your sda is failing, that's the smartctl -a /dev/sda output?

/Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 27.08.2011 00:26:07 von tlenz

um, no, that was the email that mdadm sends I thought. And it says=20
problem is sdb in each case. Though I was wondering why each one said=20
[U_] instead of [_U]. Here is the smartctl for sda and below that will=20
be for sdb

==================== =====
==================== =====
==================== ==
vorg@x64VDR:~$ sudo smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local=20
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.=
net

===3D START OF INFORMATION SECTION ===3D
Model Family: Seagate Barracuda 7200.11
Device Model: ST3500320AS
Serial Number: 9QM7M86S
LU WWN Device Id: 5 000c50 01059c636
=46irmware Version: SD1A
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Fri Aug 26 15:23:41 2011 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

===3D START OF READ SMART DATA SECTION ===3D
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activit=
y
was completed without error.
Auto Offline Data Collection:=20
Enabled.
Self-test execution status: ( 0) The previous self-test routine=20
completed
without error or no self-test=20
has ever
been run.
Total time to complete Offline
data collection: ( 650) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate=

Auto Offline data collection=20
on/off support.
Suspend Offline collection upo=
n new
command.
Offline surface scan supported=

Self-test supported.
Conveyance Self-test supported=

Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before enterin=
g
power-saving mode.
Supports SMART auto save timer=

Error logging capability: (0x01) Error logging supported.
General Purpose Logging suppor=
ted.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 119) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control=20
supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE=20
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 100 006 Pre-fail=20
Always - 83309768
3 Spin_Up_Time 0x0003 094 094 000 Pre-fail=20
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age=20
Always - 13
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail=20
Always - 0
7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail=20
Always - 13556066
9 Power_On_Hours 0x0032 094 094 000 Old_age=20
Always - 5406
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail=20
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age=20
Always - 13
184 End-to-End_Error 0x0032 100 100 099 Old_age Alway=
s=20
- 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Alway=
s=20
- 0
188 Command_Timeout 0x0032 100 100 000 Old_age Alway=
s=20
- 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Alway=
s=20
- 0
190 Airflow_Temperature_Cel 0x0022 067 065 045 Old_age Alway=
s=20
- 33 (Min/Max 30/35)
194 Temperature_Celsius 0x0022 033 040 000 Old_age Alway=
s=20
- 33 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 058 033 000 Old_age Alway=
s=20
- 83309768
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Alway=
s=20
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age=20
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Alway=
s=20
- 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute de=
lay
On 8/26/2011 2:25 PM, Mathias Burén wrote:
> smartctl -a /dev/sda
==================== =====
==================== =====
==================== ==

vorg@x64VDR:~$ sudo smartctl -a /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local=20
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.=
net

Vendor: /1:0:0:0
Product:
User Capacity: 600,332,565,813,390,450 bytes [600 PB]
Logical block size: 774843950 bytes
scsiModePageOffset: response length too short, resp_len=3D47 offset=3D5=
0=20
bd_len=3D46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more=
=20
'-T permissive' options.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 27.08.2011 00:45:35 von NeilBrown

On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz" wrote:

> I have 4 drives set up as 2 pairs. The first part has 3 partitions on
> it and it seems 1 of those drives is failing (going to have to figure
> out which drive it is too so I don't pull the wrong one out of the case)
>
> It's been awhile since I had to replace a drive in the array and my
> notes are a bit confusing. I'm not sure which I need to use to remove
> the drive:
>
>
> sudo mdadm --manage /dev/md0 --fail /dev/sdb
> sudo mdadm --manage /dev/md0 --remove /dev/sdb
> sudo mdadm --manage /dev/md1 --fail /dev/sdb
> sudo mdadm --manage /dev/md1 --remove /dev/sdb
> sudo mdadm --manage /dev/md2 --fail /dev/sdb
> sudo mdadm --manage /dev/md2 --remove /dev/sdb

sdb is not a member of any of these arrays so all of these commands will fail.

The partitions are members of the arrays.
>
> or
>
> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2

sd1 and sdb2 have already been marked as failed so there is little point in
marking them as failed again. Removing them makes sense though.


> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3

sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is a bit
marginal.
So if you want to remove sdb from the machine this the correct thing to do.
Mark sdb3 as failed, then remove it from the array.

>
> I'm not sure if I fail the drive partition or whole drive for each.

You only fail things that aren't failed already, and you fail the thing that
mdstat or mdadm -D tells you is a member of the array.

NeilBrown



>
> -------------------------------------
> The mails I got are:
> -------------------------------------
> A Fail event had been detected on md device /dev/md0.
>
> It could be related to component device /dev/sdb1.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
> 4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[1] sda3[0]
> 459073344 blocks [2/2] [UU]
>
> md3 : active raid1 sdd1[1] sdc1[0]
> 488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
> 24418688 blocks [2/1] [U_]
>
> unused devices:
> -------------------------------------
> A Fail event had been detected on md device /dev/md1.
>
> It could be related to component device /dev/sdb2.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
> 4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[1] sda3[0]
> 459073344 blocks [2/2] [UU]
>
> md3 : active raid1 sdd1[1] sdc1[0]
> 488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
> 24418688 blocks [2/1] [U_]
>
> unused devices:
> -------------------------------------
> A Fail event had been detected on md device /dev/md2.
>
> It could be related to component device /dev/sdb3.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
> 4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[2](F) sda3[0]
> 459073344 blocks [2/1] [U_]
>
> md3 : active raid1 sdd1[1] sdc1[0]
> 488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
> 24418688 blocks [2/1] [U_]
>
> unused devices:
> -------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 27.08.2011 00:45:36 von mathias.buren

T24gMjYgQXVndXN0IDIwMTEgMjM6MjYsIFRpbW90aHkgRC4gTGVueiA8dGxl bnpAdm9yZ29uLmNv
bT4gd3JvdGU6Cj4gdW0sIG5vLCB0aGF0IHdhcyB0aGUgZW1haWwgdGhhdCBt ZGFkbSBzZW5kcyBJ
IHRob3VnaHQuIEFuZCBpdCBzYXlzIHByb2JsZW0KPiBpcyBzZGIgaW4gZWFj aCBjYXNlLiBUaG91
Z2ggSSB3YXMgd29uZGVyaW5nIHdoeSBlYWNoIG9uZSBzYWlkIFtVX10gaW5z dGVhZAo+IG9mIFtf
VV0uIEhlcmUgaXMgdGhlIHNtYXJ0Y3RsIGZvciBzZGEgYW5kIGJlbG93IHRo YXQgd2lsbCBiZSBm
b3Igc2RiCj4KPiA9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09Cj4gdm9yZ0B4NjRWRFI6fiQgc3Vk byBzbWFydGN0bCAt
YSAvZGV2L3NkYQo+IHNtYXJ0Y3RsIDUuNDEgMjAxMS0wNi0wOSByMzM2NSBb eDg2XzY0LWxpbnV4
LTIuNi4zNC4yMDEwMDYxMC4xXSAobG9jYWwKPiBidWlsZCkKPiBDb3B5cmln aHQgKEMpIDIwMDIt
MTEgYnkgQnJ1Y2UgQWxsZW4sIGh0dHA6Ly9zbWFydG1vbnRvb2xzLnNvdXJj ZWZvcmdlLm5ldAo+
Cj4gPT09IFNUQVJUIE9GIElORk9STUFUSU9OIFNFQ1RJT04gPT09Cj4gTW9k ZWwgRmFtaWx5OiDC
oCDCoCBTZWFnYXRlIEJhcnJhY3VkYSA3MjAwLjExCj4gRGV2aWNlIE1vZGVs OiDCoCDCoCBTVDM1
MDAzMjBBUwo+IFNlcmlhbCBOdW1iZXI6IMKgIMKgOVFNN004NlMKPiBMVSBX V04gRGV2aWNlIElk
OiA1IDAwMGM1MCAwMTA1OWM2MzYKPiBGaXJtd2FyZSBWZXJzaW9uOiBTRDFB Cj4gVXNlciBDYXBh
Y2l0eTogwqAgwqA1MDAsMTA3LDg2MiwwMTYgYnl0ZXMgWzUwMCBHQl0KPiBT ZWN0b3IgU2l6ZTog
wqAgwqAgwqA1MTIgYnl0ZXMgbG9naWNhbC9waHlzaWNhbAo+IERldmljZSBp czogwqAgwqAgwqAg
wqBJbiBzbWFydGN0bCBkYXRhYmFzZSBbZm9yIGRldGFpbHMgdXNlOiAtUCBz aG93XQo+IEFUQSBW
ZXJzaW9uIGlzOiDCoCA4Cj4gQVRBIFN0YW5kYXJkIGlzOiDCoEFUQS04LUFD UyByZXZpc2lvbiA0
Cj4gTG9jYWwgVGltZSBpczogwqAgwqBGcmkgQXVnIDI2IDE1OjIzOjQxIDIw MTEgTVNUCj4gU01B
UlQgc3VwcG9ydCBpczogQXZhaWxhYmxlIC0gZGV2aWNlIGhhcyBTTUFSVCBj YXBhYmlsaXR5Lgo+
IFNNQVJUIHN1cHBvcnQgaXM6IEVuYWJsZWQKPgo+ID09PSBTVEFSVCBPRiBS RUFEIFNNQVJUIERB
VEEgU0VDVElPTiA9PT0KPiBTTUFSVCBvdmVyYWxsLWhlYWx0aCBzZWxmLWFz c2Vzc21lbnQgdGVz
dCByZXN1bHQ6IFBBU1NFRAo+Cj4gR2VuZXJhbCBTTUFSVCBWYWx1ZXM6Cj4g T2ZmbGluZSBkYXRh
IGNvbGxlY3Rpb24gc3RhdHVzOiDCoCgweDgyKSBPZmZsaW5lIGRhdGEgY29s bGVjdGlvbiBhY3Rp
dml0eQo+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgd2FzIGNvbXBsZXRlZCB3aXRob3V0IGVycm9yLgo+IMKgIMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgQXV0 byBPZmZsaW5lIERh
dGEgQ29sbGVjdGlvbjoKPiBFbmFibGVkLgo+IFNlbGYtdGVzdCBleGVjdXRp b24gc3RhdHVzOiDC
oCDCoCDCoCggwqAgMCkgVGhlIHByZXZpb3VzIHNlbGYtdGVzdCByb3V0aW5l Cj4gY29tcGxldGVk
Cj4gwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqAgwqAg
wqAgwqB3aXRob3V0IGVycm9yIG9yIG5vIHNlbGYtdGVzdCBoYXMKPiBldmVy Cj4gwqAgwqAgwqAg
wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqBiZWVuIHJ1
bi4KPiBUb3RhbCB0aW1lIHRvIGNvbXBsZXRlIE9mZmxpbmUKPiBkYXRhIGNv bGxlY3Rpb246IMKg
IMKgIMKgIMKgIMKgIMKgIMKgIMKgKCDCoDY1MCkgc2Vjb25kcy4KPiBPZmZs aW5lIGRhdGEgY29s
bGVjdGlvbgo+IGNhcGFiaWxpdGllczogwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqAoMHg3
YikgU01BUlQgZXhlY3V0ZSBPZmZsaW5lIGltbWVkaWF0ZS4KPiDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoEF1dG8g T2ZmbGluZSBkYXRh
IGNvbGxlY3Rpb24gb24vb2ZmCj4gc3VwcG9ydC4KPiDCoCDCoCDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoFN1c3BlbmQgT2Zm bGluZSBjb2xsZWN0
aW9uIHVwb24gbmV3Cj4gwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqAgwqAg
wqAgwqAgwqAgwqAgwqAgwqBjb21tYW5kLgo+IMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgT2ZmbGluZSBzdXJmYWNl IHNjYW4gc3VwcG9y
dGVkLgo+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgU2VsZi10ZXN0IHN1cHBvcnRlZC4KPiDCoCDCoCDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoENvbnZleWFuY2Ug U2VsZi10ZXN0IHN1
cHBvcnRlZC4KPiDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoFNlbGVjdGl2ZSBTZWxmLXRlc3Qgc3VwcG9ydGVkLgo+ IFNNQVJUIGNhcGFi
aWxpdGllczogwqAgwqAgwqAgwqAgwqAgwqAoMHgwMDAzKSBTYXZlcyBTTUFS VCBkYXRhIGJlZm9y
ZSBlbnRlcmluZwo+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgIMKgIMKgcG93ZXItc2F2aW5nIG1vZGUuCj4gwqAgwqAgwqAg wqAgwqAgwqAgwqAg
wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqBTdXBwb3J0 cyBTTUFSVCBhdXRv
IHNhdmUgdGltZXIuCj4gRXJyb3IgbG9nZ2luZyBjYXBhYmlsaXR5OiDCoCDC oCDCoCDCoCgweDAx
KSBFcnJvciBsb2dnaW5nIHN1cHBvcnRlZC4KPiDCoCDCoCDCoCDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoEdlbmVyYWwgUHVycG9z ZSBMb2dnaW5nIHN1
cHBvcnRlZC4KPiBTaG9ydCBzZWxmLXRlc3Qgcm91dGluZQo+IHJlY29tbWVu ZGVkIHBvbGxpbmcg
dGltZTogwqAgwqAgwqAgwqAoIMKgIDEpIG1pbnV0ZXMuCj4gRXh0ZW5kZWQg c2VsZi10ZXN0IHJv
dXRpbmUKPiByZWNvbW1lbmRlZCBwb2xsaW5nIHRpbWU6IMKgIMKgIMKgIMKg KCAxMTkpIG1pbnV0
ZXMuCj4gQ29udmV5YW5jZSBzZWxmLXRlc3Qgcm91dGluZQo+IHJlY29tbWVu ZGVkIHBvbGxpbmcg
dGltZTogwqAgwqAgwqAgwqAoIMKgIDIpIG1pbnV0ZXMuCj4gU0NUIGNhcGFi aWxpdGllczogwqAg
wqAgwqAgwqAgwqAgwqAgwqAoMHgxMDNiKSBTQ1QgU3RhdHVzIHN1cHBvcnRl ZC4KPiDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDC oCDCoCDCoFNDVCBF
cnJvciBSZWNvdmVyeSBDb250cm9sIHN1cHBvcnRlZC4KPiDCoCDCoCDCoCDC oCDCoCDCoCDCoCDC
oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoFNDVCBGZWF0 dXJlIENvbnRyb2wg
c3VwcG9ydGVkLgo+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgIMKgIMKg
IMKgIMKgIMKgIMKgIMKgU0NUIERhdGEgVGFibGUgc3VwcG9ydGVkLgo+Cj4g U01BUlQgQXR0cmli
dXRlcyBEYXRhIFN0cnVjdHVyZSByZXZpc2lvbiBudW1iZXI6IDEwCj4gVmVu ZG9yIFNwZWNpZmlj
IFNNQVJUIEF0dHJpYnV0ZXMgd2l0aCBUaHJlc2hvbGRzOgo+IElEIyBBVFRS SUJVVEVfTkFNRSDC
oCDCoCDCoCDCoCDCoEZMQUcgwqAgwqAgVkFMVUUgV09SU1QgVEhSRVNIIFRZ UEUgVVBEQVRFRAo+
IMKgV0hFTl9GQUlMRUQgUkFXX1ZBTFVFCj4gwqAxIFJhd19SZWFkX0Vycm9y X1JhdGUgwqAgwqAg
MHgwMDBmIMKgIDExNCDCoCAxMDAgwqAgMDA2IMKgIMKgUHJlLWZhaWwgQWx3 YXlzCj4gLSDCoCDC
oCDCoCA4MzMwOTc2OAo+IMKgMyBTcGluX1VwX1RpbWUgwqAgwqAgwqAgwqAg wqAgwqAweDAwMDMg
wqAgMDk0IMKgIDA5NCDCoCAwMDAgwqAgwqBQcmUtZmFpbCBBbHdheXMKPiAt IMKgIMKgIMKgIDAK
PiDCoDQgU3RhcnRfU3RvcF9Db3VudCDCoCDCoCDCoCDCoDB4MDAzMiDCoCAx MDAgwqAgMTAwIMKg
IDAyMCDCoCDCoE9sZF9hZ2UgQWx3YXlzCj4gLSDCoCDCoCDCoCAxMwo+IMKg NSBSZWFsbG9jYXRl
ZF9TZWN0b3JfQ3QgwqAgMHgwMDMzIMKgIDEwMCDCoCAxMDAgwqAgMDM2IMKg IMKgUHJlLWZhaWwg
QWx3YXlzCj4gLSDCoCDCoCDCoCAwCj4gwqA3IFNlZWtfRXJyb3JfUmF0ZSDC oCDCoCDCoCDCoCAw
eDAwMGYgwqAgMDcxIMKgIDA2MCDCoCAwMzAgwqAgwqBQcmUtZmFpbCBBbHdh eXMKPiAtIMKgIMKg
IMKgIDEzNTU2MDY2Cj4gwqA5IFBvd2VyX09uX0hvdXJzIMKgIMKgIMKgIMKg IMKgMHgwMDMyIMKg
IDA5NCDCoCAwOTQgwqAgMDAwIMKgIMKgT2xkX2FnZSBBbHdheXMKPiAtIMKg IMKgIMKgIDU0MDYK
PiDCoDEwIFNwaW5fUmV0cnlfQ291bnQgwqAgwqAgwqAgwqAweDAwMTMgwqAg MTAwIMKgIDEwMCDC
oCAwOTcgwqAgwqBQcmUtZmFpbCBBbHdheXMKPiDCoCAtIMKgIMKgIMKgIDAK PiDCoDEyIFBvd2Vy
X0N5Y2xlX0NvdW50IMKgIMKgIMKgIDB4MDAzMiDCoCAxMDAgwqAgMTAwIMKg IDAyMCDCoCDCoE9s
ZF9hZ2UgQWx3YXlzCj4gLSDCoCDCoCDCoCAxMwo+IDE4NCBFbmQtdG8tRW5k X0Vycm9yIMKgIMKg
IMKgIMKgMHgwMDMyIMKgIDEwMCDCoCAxMDAgwqAgMDk5IMKgIMKgT2xkX2Fn ZSDCoCBBbHdheXMK
PiDCoCAtIMKgIMKgIMKgIDAKPiAxODcgUmVwb3J0ZWRfVW5jb3JyZWN0IMKg IMKgIMKgMHgwMDMy
IMKgIDEwMCDCoCAxMDAgwqAgMDAwIMKgIMKgT2xkX2FnZSDCoCBBbHdheXMK PiDCoCAtIMKgIMKg
IMKgIDAKPiAxODggQ29tbWFuZF9UaW1lb3V0IMKgIMKgIMKgIMKgIDB4MDAz MiDCoCAxMDAgwqAg
MTAwIMKgIDAwMCDCoCDCoE9sZF9hZ2UgwqAgQWx3YXlzCj4gwqAgLSDCoCDC oCDCoCAwCj4gMTg5
IEhpZ2hfRmx5X1dyaXRlcyDCoCDCoCDCoCDCoCAweDAwM2EgwqAgMTAwIMKg IDEwMCDCoCAwMDAg
wqAgwqBPbGRfYWdlIMKgIEFsd2F5cwo+IMKgIC0gwqAgwqAgwqAgMAo+IDE5 MCBBaXJmbG93X1Rl
bXBlcmF0dXJlX0NlbCAweDAwMjIgwqAgMDY3IMKgIDA2NSDCoCAwNDUgwqAg wqBPbGRfYWdlIMKg
IEFsd2F5cwo+IMKgIC0gwqAgwqAgwqAgMzMgKE1pbi9NYXggMzAvMzUpCj4g MTk0IFRlbXBlcmF0
dXJlX0NlbHNpdXMgwqAgwqAgMHgwMDIyIMKgIDAzMyDCoCAwNDAgwqAgMDAw IMKgIMKgT2xkX2Fn
ZSDCoCBBbHdheXMKPiDCoCAtIMKgIMKgIMKgIDMzICgwIDIxIDAgMCkKPiAx OTUgSGFyZHdhcmVf
RUNDX1JlY292ZXJlZCDCoDB4MDAxYSDCoCAwNTggwqAgMDMzIMKgIDAwMCDC oCDCoE9sZF9hZ2Ug
wqAgQWx3YXlzCj4gwqAgLSDCoCDCoCDCoCA4MzMwOTc2OAo+IDE5NyBDdXJy ZW50X1BlbmRpbmdf
U2VjdG9yIMKgMHgwMDEyIMKgIDEwMCDCoCAxMDAgwqAgMDAwIMKgIMKgT2xk X2FnZSDCoCBBbHdh
eXMKPiDCoCAtIMKgIMKgIMKgIDAKPiAxOTggT2ZmbGluZV9VbmNvcnJlY3Rh YmxlIMKgIDB4MDAx
MCDCoCAxMDAgwqAgMTAwIMKgIDAwMCDCoCDCoE9sZF9hZ2UgT2ZmbGluZQo+ IMKgLSDCoCDCoCDC
oCAwCj4gMTk5IFVETUFfQ1JDX0Vycm9yX0NvdW50IMKgIMKgMHgwMDNlIMKg IDIwMCDCoCAyMDAg
wqAgMDAwIMKgIMKgT2xkX2FnZSDCoCBBbHdheXMKPiDCoCAtIMKgIMKgIMKg IDAKPgo+IFNNQVJU
IEVycm9yIExvZyBWZXJzaW9uOiAxCj4gTm8gRXJyb3JzIExvZ2dlZAo+Cj4g U01BUlQgU2VsZi10
ZXN0IGxvZyBzdHJ1Y3R1cmUgcmV2aXNpb24gbnVtYmVyIDEKPiBObyBzZWxm LXRlc3RzIGhhdmUg
YmVlbiBsb2dnZWQuIMKgW1RvIHJ1biBzZWxmLXRlc3RzLCB1c2U6IHNtYXJ0 Y3RsIC10XQo+Cj4K
PiBTTUFSVCBTZWxlY3RpdmUgc2VsZi10ZXN0IGxvZyBkYXRhIHN0cnVjdHVy ZSByZXZpc2lvbiBu
dW1iZXIgMQo+IMKgU1BBTiDCoE1JTl9MQkEgwqBNQVhfTEJBIMKgQ1VSUkVO VF9URVNUX1NUQVRV
Uwo+IMKgIMKgMSDCoCDCoCDCoCDCoDAgwqAgwqAgwqAgwqAwIMKgTm90X3Rl c3RpbmcKPiDCoCDC
oDIgwqAgwqAgwqAgwqAwIMKgIMKgIMKgIMKgMCDCoE5vdF90ZXN0aW5nCj4g wqAgwqAzIMKgIMKg
IMKgIMKgMCDCoCDCoCDCoCDCoDAgwqBOb3RfdGVzdGluZwo+IMKgIMKgNCDC oCDCoCDCoCDCoDAg
wqAgwqAgwqAgwqAwIMKgTm90X3Rlc3RpbmcKPiDCoCDCoDUgwqAgwqAgwqAg wqAwIMKgIMKgIMKg
IMKgMCDCoE5vdF90ZXN0aW5nCj4gU2VsZWN0aXZlIHNlbGYtdGVzdCBmbGFn cyAoMHgwKToKPiDC
oEFmdGVyIHNjYW5uaW5nIHNlbGVjdGVkIHNwYW5zLCBkbyBOT1QgcmVhZC1z Y2FuIHJlbWFpbmRl
ciBvZiBkaXNrLgo+IElmIFNlbGVjdGl2ZSBzZWxmLXRlc3QgaXMgcGVuZGlu ZyBvbiBwb3dlci11
cCwgcmVzdW1lIGFmdGVyIDAgbWludXRlIGRlbGF5Cj4gT24gOC8yNi8yMDEx IDI6MjUgUE0sIE1h
dGhpYXMgQnVyw6luIHdyb3RlOgo+Pgo+PiBzbWFydGN0bCAtYSAvZGV2L3Nk YQo+Cj4gPT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09
PT09PT09PQo+Cj4gdm9yZ0B4NjRWRFI6fiQgc3VkbyBzbWFydGN0bCAtYSAv ZGV2L3NkYgo+IHNt
YXJ0Y3RsIDUuNDEgMjAxMS0wNi0wOSByMzM2NSBbeDg2XzY0LWxpbnV4LTIu Ni4zNC4yMDEwMDYx
MC4xXSAobG9jYWwKPiBidWlsZCkKPiBDb3B5cmlnaHQgKEMpIDIwMDItMTEg YnkgQnJ1Y2UgQWxs
ZW4sIGh0dHA6Ly9zbWFydG1vbnRvb2xzLnNvdXJjZWZvcmdlLm5ldAo+Cj4g VmVuZG9yOiDCoCDC
oCDCoCDCoCDCoCDCoCDCoCAvMTowOjA6MAo+IFByb2R1Y3Q6Cj4gVXNlciBD YXBhY2l0eTogwqAg
wqAgwqAgwqA2MDAsMzMyLDU2NSw4MTMsMzkwLDQ1MCBieXRlcyBbNjAwIFBC XQo+IExvZ2ljYWwg
YmxvY2sgc2l6ZTogwqAgNzc0ODQzOTUwIGJ5dGVzCj4gc2NzaU1vZGVQYWdl T2Zmc2V0OiByZXNw
b25zZSBsZW5ndGggdG9vIHNob3J0LCByZXNwX2xlbj00NyBvZmZzZXQ9NTAK PiBiZF9sZW49NDYK
Pj4+IFRlcm1pbmF0ZSBjb21tYW5kIGVhcmx5IGR1ZSB0byBiYWQgcmVzcG9u c2UgdG8gSUVDIG1v
ZGUgcGFnZQo+IEEgbWFuZGF0b3J5IFNNQVJUIGNvbW1hbmQgZmFpbGVkOiBl eGl0aW5nLiBUbyBj
b250aW51ZSwgYWRkIG9uZSBvciBtb3JlICctVAo+IHBlcm1pc3NpdmUnIG9w dGlvbnMuCj4KCgpJ
bmRlZWQsIHNvcnJ5LiA2MDAgUEIuLi4gd2hlcmUgZGlkIHlvdSBnZXQgdGhh dCBkcml2ZT8gOykK
Ci9NCg==
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 27.08.2011 01:14:44 von tlenz

On 8/26/2011 3:45 PM, Mathias Burén wrote:
> On 26 August 2011 23:26, Timothy D. Lenz wrote:
>> um, no, that was the email that mdadm sends I thought. And it says p=
roblem
>> is sdb in each case. Though I was wondering why each one said [U_] i=
nstead
>> of [_U]. Here is the smartctl for sda and below that will be for sdb
>>
>> ==================== ===3D=
==================== =====
==================== ===3D
>> vorg@x64VDR:~$ sudo smartctl -a /dev/sda
>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (loc=
al
>> build)
>> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourcefor=
ge.net
>>
>> ===3D START OF INFORMATION SECTION ===3D
>> Model Family: Seagate Barracuda 7200.11
>> Device Model: ST3500320AS
>> Serial Number: 9QM7M86S
>> LU WWN Device Id: 5 000c50 01059c636
>> Firmware Version: SD1A
>> User Capacity: 500,107,862,016 bytes [500 GB]
>> Sector Size: 512 bytes logical/physical
>> Device is: In smartctl database [for details use: -P show]
>> ATA Version is: 8
>> ATA Standard is: ATA-8-ACS revision 4
>> Local Time is: Fri Aug 26 15:23:41 2011 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> ===3D START OF READ SMART DATA SECTION ===3D
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status: (0x82) Offline data collection acti=
vity
>> was completed without error.
>> Auto Offline Data Collection=
:
>> Enabled.
>> Self-test execution status: ( 0) The previous self-test routi=
ne
>> completed
>> without error or no self-tes=
t has
>> ever
>> been run.
>> Total time to complete Offline
>> data collection: ( 650) seconds.
>> Offline data collection
>> capabilities: (0x7b) SMART execute Offline immedi=
ate.
>> Auto Offline data collection=
on/off
>> support.
>> Suspend Offline collection u=
pon new
>> command.
>> Offline surface scan support=
ed.
>> Self-test supported.
>> Conveyance Self-test support=
ed.
>> Selective Self-test supporte=
d.
>> SMART capabilities: (0x0003) Saves SMART data before ente=
ring
>> power-saving mode.
>> Supports SMART auto save tim=
er.
>> Error logging capability: (0x01) Error logging supported.
>> General Purpose Logging supp=
orted.
>> Short self-test routine
>> recommended polling time: ( 1) minutes.
>> Extended self-test routine
>> recommended polling time: ( 119) minutes.
>> Conveyance self-test routine
>> recommended polling time: ( 2) minutes.
>> SCT capabilities: (0x103b) SCT Status supported.
>> SCT Error Recovery Control s=
upported.
>> SCT Feature Control supporte=
d.
>> SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 10
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
>> WHEN_FAILED RAW_VALUE
>> 1 Raw_Read_Error_Rate 0x000f 114 100 006 Pre-fail Alw=
ays
>> - 83309768
>> 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Alw=
ays
>> - 0
>> 4 Start_Stop_Count 0x0032 100 100 020 Old_age Alwa=
ys
>> - 13
>> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Alw=
ays
>> - 0
>> 7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Alw=
ays
>> - 13556066
>> 9 Power_On_Hours 0x0032 094 094 000 Old_age Alwa=
ys
>> - 5406
>> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Al=
ways
>> - 0
>> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Alw=
ays
>> - 13
>> 184 End-to-End_Error 0x0032 100 100 099 Old_age Al=
ways
>> - 0
>> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Al=
ways
>> - 0
>> 188 Command_Timeout 0x0032 100 100 000 Old_age Al=
ways
>> - 0
>> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Al=
ways
>> - 0
>> 190 Airflow_Temperature_Cel 0x0022 067 065 045 Old_age Al=
ways
>> - 33 (Min/Max 30/35)
>> 194 Temperature_Celsius 0x0022 033 040 000 Old_age Al=
ways
>> - 33 (0 21 0 0)
>> 195 Hardware_ECC_Recovered 0x001a 058 033 000 Old_age Al=
ways
>> - 83309768
>> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Al=
ways
>> - 0
>> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offl=
ine
>> - 0
>> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Al=
ways
>> - 0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 1
>> No self-tests have been logged. [To run self-tests, use: smartctl -=
t]
>>
>>
>> SMART Selective self-test log data structure revision number 1
>> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
>> 1 0 0 Not_testing
>> 2 0 0 Not_testing
>> 3 0 0 Not_testing
>> 4 0 0 Not_testing
>> 5 0 0 Not_testing
>> Selective self-test flags (0x0):
>> After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute=
delay
>> On 8/26/2011 2:25 PM, Mathias Burén wrote:
>>>
>>> smartctl -a /dev/sda
>>
>> ==================== ===3D=
==================== =====
==================== ===3D
>>
>> vorg@x64VDR:~$ sudo smartctl -a /dev/sdb
>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (loc=
al
>> build)
>> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourcefor=
ge.net
>>
>> Vendor: /1:0:0:0
>> Product:
>> User Capacity: 600,332,565,813,390,450 bytes [600 PB]
>> Logical block size: 774843950 bytes
>> scsiModePageOffset: response length too short, resp_len=3D47 offset=3D=
50
>> bd_len=3D46
>>>> Terminate command early due to bad response to IEC mode page
>> A mandatory SMART command failed: exiting. To continue, add one or m=
ore '-T
>> permissive' options.
>>
>
>
> Indeed, sorry. 600 PB... where did you get that drive? ;)
>
> /M

What about those pre-fail messages on the other drive? are they=20
something to worry about now?

Also, I ran the same thing on the 2 drives for md3 and got the same=20
pre-fail messages for both of those, plus one had this nice little note=
:

==> WARNING: There are known problems with these drives,
AND THIS FIRMWARE VERSION IS AFFECTED,
see the following Seagate web pages:
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp ?DocId=3D20=
7931
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp ?DocId=3D20=
7951

4 seagate drives in this computer, this will make 3 failures since I pu=
t=20
them in. I think the drives are still in warrenty. last time I replaced=
=20
one it was good till something like 2012 or 2013. But any new drives=20
will be WD.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 01.09.2011 19:51:54 von tlenz

On 8/26/2011 3:45 PM, NeilBrown wrote:
> On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz" wrote:
>
>> I have 4 drives set up as 2 pairs. The first part has 3 partitions on
>> it and it seems 1 of those drives is failing (going to have to figure
>> out which drive it is too so I don't pull the wrong one out of the case)
>>
>> It's been awhile since I had to replace a drive in the array and my
>> notes are a bit confusing. I'm not sure which I need to use to remove
>> the drive:
>>
>>
>> sudo mdadm --manage /dev/md0 --fail /dev/sdb
>> sudo mdadm --manage /dev/md0 --remove /dev/sdb
>> sudo mdadm --manage /dev/md1 --fail /dev/sdb
>> sudo mdadm --manage /dev/md1 --remove /dev/sdb
>> sudo mdadm --manage /dev/md2 --fail /dev/sdb
>> sudo mdadm --manage /dev/md2 --remove /dev/sdb
>
> sdb is not a member of any of these arrays so all of these commands will fail.
>
> The partitions are members of the arrays.
>>
>> or
>>
>> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
>> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
>
> sd1 and sdb2 have already been marked as failed so there is little point in
> marking them as failed again. Removing them makes sense though.
>
>
>> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>
> sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is a bit
> marginal.
> So if you want to remove sdb from the machine this the correct thing to do.
> Mark sdb3 as failed, then remove it from the array.
>
>>
>> I'm not sure if I fail the drive partition or whole drive for each.
>
> You only fail things that aren't failed already, and you fail the thing that
> mdstat or mdadm -D tells you is a member of the array.
>
> NeilBrown
>
>
>
>>
>> -------------------------------------
>> The mails I got are:
>> -------------------------------------
>> A Fail event had been detected on md device /dev/md0.
>>
>> It could be related to component device /dev/sdb1.
>>
>> Faithfully yours, etc.
>>
>> P.S. The /proc/mdstat file currently contains the following:
>>
>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>> md1 : active raid1 sdb2[2](F) sda2[0]
>> 4891712 blocks [2/1] [U_]
>>
>> md2 : active raid1 sdb3[1] sda3[0]
>> 459073344 blocks [2/2] [UU]
>>
>> md3 : active raid1 sdd1[1] sdc1[0]
>> 488383936 blocks [2/2] [UU]
>>
>> md0 : active raid1 sdb1[2](F) sda1[0]
>> 24418688 blocks [2/1] [U_]
>>
>> unused devices:
>> -------------------------------------
>> A Fail event had been detected on md device /dev/md1.
>>
>> It could be related to component device /dev/sdb2.
>>
>> Faithfully yours, etc.
>>
>> P.S. The /proc/mdstat file currently contains the following:
>>
>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>> md1 : active raid1 sdb2[2](F) sda2[0]
>> 4891712 blocks [2/1] [U_]
>>
>> md2 : active raid1 sdb3[1] sda3[0]
>> 459073344 blocks [2/2] [UU]
>>
>> md3 : active raid1 sdd1[1] sdc1[0]
>> 488383936 blocks [2/2] [UU]
>>
>> md0 : active raid1 sdb1[2](F) sda1[0]
>> 24418688 blocks [2/1] [U_]
>>
>> unused devices:
>> -------------------------------------
>> A Fail event had been detected on md device /dev/md2.
>>
>> It could be related to component device /dev/sdb3.
>>
>> Faithfully yours, etc.
>>
>> P.S. The /proc/mdstat file currently contains the following:
>>
>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>> md1 : active raid1 sdb2[2](F) sda2[0]
>> 4891712 blocks [2/1] [U_]
>>
>> md2 : active raid1 sdb3[2](F) sda3[0]
>> 459073344 blocks [2/1] [U_]
>>
>> md3 : active raid1 sdd1[1] sdc1[0]
>> 488383936 blocks [2/2] [UU]
>>
>> md0 : active raid1 sdb1[2](F) sda1[0]
>> 24418688 blocks [2/1] [U_]
>>
>> unused devices:
>> -------------------------------------


Got another problem. Removed the drive and tried to start it back up and
now get Grub Error 2. I'm not sure if when I did the mirrors if
something when wrong with installing grub on the second drive< or if is
has to do with [U_] which points to sda in that report instead of [_U].

I know I pulled the correct drive. I had it labled sdb, it's the second
drive in the bios bootup drive check and it's the second connector on
the board. And when I put just it in instead of the other, I got the
noise again. I think last time a drive failed it was one of these two
drives because I remember recopying grub.

I do have another computer setup the same way, that I could put this
remaining drive on to get grub fixed, but it's a bit of a pain to get
the other computer hooked back up and I will have to dig through my
notes about getting grub setup without messing up the array and stuff. I
do know that both computers have been updated to grub 2


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 02.09.2011 07:24:01 von Simon Matthews

On Thu, Sep 1, 2011 at 10:51 AM, Timothy D. Lenz wro=
te:
>
>
> On 8/26/2011 3:45 PM, NeilBrown wrote:
>>
>> On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz" m>
>> =A0wrote:
>>
>>> I have 4 drives set up as 2 pairs. =A0The first part has 3 partitio=
ns on
>>> it and it seems 1 of those drives is failing (going to have to figu=
re
>>> out which drive it is too so I don't pull the wrong one out of the =
case)
>>>
>>> It's been awhile since I had to replace a drive in the array and my
>>> notes are a bit confusing. I'm not sure which I need to use to remo=
ve
>>> the drive:
>>>
>>>
>>> =A0 =A0 =A0 =A0sudo mdadm --manage /dev/md0 --fail /dev/sdb
>>> =A0 =A0 =A0 =A0sudo mdadm --manage /dev/md0 --remove /dev/sdb
>>> =A0 =A0 =A0 =A0sudo mdadm --manage /dev/md1 --fail /dev/sdb
>>> =A0 =A0 =A0 =A0sudo mdadm --manage /dev/md1 --remove /dev/sdb
>>> =A0 =A0 =A0 =A0sudo mdadm --manage /dev/md2 --fail /dev/sdb
>>> =A0 =A0 =A0 =A0sudo mdadm --manage /dev/md2 --remove /dev/sdb
>>
>> sdb is not a member of any of these arrays so all of these commands =
will
>> fail.
>>
>> The partitions are members of the arrays.
>>>
>>> or
>>>
>>> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
>>> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
>>
>> sd1 and sdb2 have already been marked as failed so there is little p=
oint
>> in
>> marking them as failed again. =A0Removing them makes sense though.
>>
>>
>>> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>>
>> sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is=
a bit
>> marginal.
>> So if you want to remove sdb from the machine this the correct thing=
to
>> do.
>> Mark sdb3 as failed, then remove it from the array.
>>
>>>
>>> I'm not sure if I fail the drive partition or whole drive for each.
>>
>> You only fail things that aren't failed already, and you fail the th=
ing
>> that
>> mdstat or mdadm -D tells you is a member of the array.
>>
>> NeilBrown
>>
>>
>>
>>>
>>> -------------------------------------
>>> The mails I got are:
>>> -------------------------------------
>>> A Fail event had been detected on md device /dev/md0.
>>>
>>> It could be related to component device /dev/sdb1.
>>>
>>> Faithfully yours, etc.
>>>
>>> P.S. The /proc/mdstat file currently contains the following:
>>>
>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>> =A0 =A0 =A0 =A04891712 blocks [2/1] [U_]
>>>
>>> md2 : active raid1 sdb3[1] sda3[0]
>>> =A0 =A0 =A0 =A0459073344 blocks [2/2] [UU]
>>>
>>> md3 : active raid1 sdd1[1] sdc1[0]
>>> =A0 =A0 =A0 =A0488383936 blocks [2/2] [UU]
>>>
>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>> =A0 =A0 =A0 =A024418688 blocks [2/1] [U_]
>>>
>>> unused devices:
>>> -------------------------------------
>>> A Fail event had been detected on md device /dev/md1.
>>>
>>> It could be related to component device /dev/sdb2.
>>>
>>> Faithfully yours, etc.
>>>
>>> P.S. The /proc/mdstat file currently contains the following:
>>>
>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>> =A0 =A0 =A0 =A04891712 blocks [2/1] [U_]
>>>
>>> md2 : active raid1 sdb3[1] sda3[0]
>>> =A0 =A0 =A0 =A0459073344 blocks [2/2] [UU]
>>>
>>> md3 : active raid1 sdd1[1] sdc1[0]
>>> =A0 =A0 =A0 =A0488383936 blocks [2/2] [UU]
>>>
>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>> =A0 =A0 =A0 =A024418688 blocks [2/1] [U_]
>>>
>>> unused devices:
>>> -------------------------------------
>>> A Fail event had been detected on md device /dev/md2.
>>>
>>> It could be related to component device /dev/sdb3.
>>>
>>> Faithfully yours, etc.
>>>
>>> P.S. The /proc/mdstat file currently contains the following:
>>>
>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>> =A0 =A0 =A0 =A04891712 blocks [2/1] [U_]
>>>
>>> md2 : active raid1 sdb3[2](F) sda3[0]
>>> =A0 =A0 =A0 =A0459073344 blocks [2/1] [U_]
>>>
>>> md3 : active raid1 sdd1[1] sdc1[0]
>>> =A0 =A0 =A0 =A0488383936 blocks [2/2] [UU]
>>>
>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>> =A0 =A0 =A0 =A024418688 blocks [2/1] [U_]
>>>
>>> unused devices:
>>> -------------------------------------
>
>
> Got another problem. Removed the drive and tried to start it back up =
and now
> get Grub Error 2. I'm not sure if when I did the mirrors if something=
when
> wrong with installing grub on the second drive< or if is has to do wi=
th [U_]
> which points to sda in that report instead of [_U].
>
> I know I pulled the correct drive. I had it labled sdb, it's the seco=
nd
> drive in the bios bootup drive check and it's the second connector on=
the
> board. And when I put just it in instead of the other, I got the nois=
e
> again. =A0I think last time a drive failed it was one of these two dr=
ives
> because I remember recopying grub.
>
> I do have another computer setup the same way, that I could put this
> remaining drive on to get grub fixed, but it's a bit of a pain to get=
the
> other computer hooked back up and I will have to dig through my notes=
about
> getting grub setup without messing up the array and stuff. I do know =
that
> both computers have been updated to grub 2


How did you install Grub on the second drive? I have seen some
instructions on the web that would not allow the system to boot if the
first drive failed or was removed.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 02.09.2011 17:42:20 von tlenz

On 9/1/2011 10:24 PM, Simon Matthews wrote:
> On Thu, Sep 1, 2011 at 10:51 AM, Timothy D. Lenz wrote:
>>
>>
>> On 8/26/2011 3:45 PM, NeilBrown wrote:
>>>
>>> On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz"
>>> wrote:
>>>
>>>> I have 4 drives set up as 2 pairs. The first part has 3 partitions on
>>>> it and it seems 1 of those drives is failing (going to have to figure
>>>> out which drive it is too so I don't pull the wrong one out of the case)
>>>>
>>>> It's been awhile since I had to replace a drive in the array and my
>>>> notes are a bit confusing. I'm not sure which I need to use to remove
>>>> the drive:
>>>>
>>>>
>>>> sudo mdadm --manage /dev/md0 --fail /dev/sdb
>>>> sudo mdadm --manage /dev/md0 --remove /dev/sdb
>>>> sudo mdadm --manage /dev/md1 --fail /dev/sdb
>>>> sudo mdadm --manage /dev/md1 --remove /dev/sdb
>>>> sudo mdadm --manage /dev/md2 --fail /dev/sdb
>>>> sudo mdadm --manage /dev/md2 --remove /dev/sdb
>>>
>>> sdb is not a member of any of these arrays so all of these commands will
>>> fail.
>>>
>>> The partitions are members of the arrays.
>>>>
>>>> or
>>>>
>>>> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
>>>> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
>>>
>>> sd1 and sdb2 have already been marked as failed so there is little point
>>> in
>>> marking them as failed again. Removing them makes sense though.
>>>
>>>
>>>> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>>>
>>> sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is a bit
>>> marginal.
>>> So if you want to remove sdb from the machine this the correct thing to
>>> do.
>>> Mark sdb3 as failed, then remove it from the array.
>>>
>>>>
>>>> I'm not sure if I fail the drive partition or whole drive for each.
>>>
>>> You only fail things that aren't failed already, and you fail the thing
>>> that
>>> mdstat or mdadm -D tells you is a member of the array.
>>>
>>> NeilBrown
>>>
>>>
>>>
>>>>
>>>> -------------------------------------
>>>> The mails I got are:
>>>> -------------------------------------
>>>> A Fail event had been detected on md device /dev/md0.
>>>>
>>>> It could be related to component device /dev/sdb1.
>>>>
>>>> Faithfully yours, etc.
>>>>
>>>> P.S. The /proc/mdstat file currently contains the following:
>>>>
>>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>> 4891712 blocks [2/1] [U_]
>>>>
>>>> md2 : active raid1 sdb3[1] sda3[0]
>>>> 459073344 blocks [2/2] [UU]
>>>>
>>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>> 488383936 blocks [2/2] [UU]
>>>>
>>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>> 24418688 blocks [2/1] [U_]
>>>>
>>>> unused devices:
>>>> -------------------------------------
>>>> A Fail event had been detected on md device /dev/md1.
>>>>
>>>> It could be related to component device /dev/sdb2.
>>>>
>>>> Faithfully yours, etc.
>>>>
>>>> P.S. The /proc/mdstat file currently contains the following:
>>>>
>>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>> 4891712 blocks [2/1] [U_]
>>>>
>>>> md2 : active raid1 sdb3[1] sda3[0]
>>>> 459073344 blocks [2/2] [UU]
>>>>
>>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>> 488383936 blocks [2/2] [UU]
>>>>
>>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>> 24418688 blocks [2/1] [U_]
>>>>
>>>> unused devices:
>>>> -------------------------------------
>>>> A Fail event had been detected on md device /dev/md2.
>>>>
>>>> It could be related to component device /dev/sdb3.
>>>>
>>>> Faithfully yours, etc.
>>>>
>>>> P.S. The /proc/mdstat file currently contains the following:
>>>>
>>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>> 4891712 blocks [2/1] [U_]
>>>>
>>>> md2 : active raid1 sdb3[2](F) sda3[0]
>>>> 459073344 blocks [2/1] [U_]
>>>>
>>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>> 488383936 blocks [2/2] [UU]
>>>>
>>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>> 24418688 blocks [2/1] [U_]
>>>>
>>>> unused devices:
>>>> -------------------------------------
>>
>>
>> Got another problem. Removed the drive and tried to start it back up and now
>> get Grub Error 2. I'm not sure if when I did the mirrors if something when
>> wrong with installing grub on the second drive< or if is has to do with [U_]
>> which points to sda in that report instead of [_U].
>>
>> I know I pulled the correct drive. I had it labled sdb, it's the second
>> drive in the bios bootup drive check and it's the second connector on the
>> board. And when I put just it in instead of the other, I got the noise
>> again. I think last time a drive failed it was one of these two drives
>> because I remember recopying grub.
>>
>> I do have another computer setup the same way, that I could put this
>> remaining drive on to get grub fixed, but it's a bit of a pain to get the
>> other computer hooked back up and I will have to dig through my notes about
>> getting grub setup without messing up the array and stuff. I do know that
>> both computers have been updated to grub 2
>
>
> How did you install Grub on the second drive? I have seen some
> instructions on the web that would not allow the system to boot if the
> first drive failed or was removed.
>


I think this is how I did it, at least it is what I had in my notes:

grub-install /dev/sda && grub-install /dev/sdb

And this is from my notes also. It was from an IRC chat. Don't know if
it was the raid channel or the grub channel:

[14:02] Vorg: No. First, what is the output of grub-install
--version?
[14:02] (GNU GRUB 1.98~20100115-1)
[14:04] Vorg: Ok, then run "grub-install /dev/sda &&
grub-install /dev/sdb" (where sda and sdb are the members of the array)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 03.09.2011 13:35:39 von Simon Matthews

On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz wrot=
e:
>
>>
>> How did you install Grub on the second drive? I have seen some
>> instructions on the web that would not allow the system to boot if t=
he
>> first drive failed or was removed.
>>
>
>
> I think this is how I did it, at least it is what I had in my notes:
>
> grub-install /dev/sda && grub-install /dev/sdb
>
> And this is from my notes also. It was from an IRC chat. Don't know i=
f it
> was the raid channel or the grub channel:
>
> [14:02] Vorg: No. First, what is the output of grub-instal=
l
> --version?
> [14:02] =A0(GNU GRUB 1.98~20100115-1)
> [14:04] Vorg: Ok, then run "grub-install /dev/sda && grub-=
install
> /dev/sdb" (where sda and sdb are the members of the array)
>

Which is exactly my point. You installed grub on /dev/sdb such that it
would boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
hard drive that was /dev/sdb is now /dev/sda, but Grub is still
looking for its files on the non-existent /dev/sdb.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 03.09.2011 14:17:23 von Robin Hill

--ibTvN161/egqYuK8
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:

> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz wrote:
> >
> >>
> >> How did you install Grub on the second drive? I have seen some
> >> instructions on the web that would not allow the system to boot if the
> >> first drive failed or was removed.
> >>
> >
> >
> > I think this is how I did it, at least it is what I had in my notes:
> >
> > grub-install /dev/sda && grub-install /dev/sdb
> >
> > And this is from my notes also. It was from an IRC chat. Don't know if =
it
> > was the raid channel or the grub channel:
> >
> > [14:02] Vorg: No. First, what is the output of grub-install
> > --version?
> > [14:02] =A0(GNU GRUB 1.98~20100115-1)
> > [14:04] Vorg: Ok, then run "grub-install /dev/sda && grub-in=
stall
> > /dev/sdb" (where sda and sdb are the members of the array)
> >
>=20
> Which is exactly my point. You installed grub on /dev/sdb such that it
> would boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
> looking for its files on the non-existent /dev/sdb.
>=20
The way I do it is to run grub, then for each drive do:
device (hd0) /dev/sdX
root (hd0,0)
setup (hd0)

That should set up each drive to boot up as the first drive.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--ibTvN161/egqYuK8
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk5iGtIACgkQShxCyD40xBI2BgCgtls6jS4fjkIXzQZioyx7 Q9IN
GNsAoL9LAxJvnwtPfjYUiaR9twUKZlb9
=mfJe
-----END PGP SIGNATURE-----

--ibTvN161/egqYuK8--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 03.09.2011 19:03:33 von Simon Matthews

On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill wrot=
e:
> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>
>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz w=
rote:
>> >
>> >>
>> >> How did you install Grub on the second drive? I have seen some
>> >> instructions on the web that would not allow the system to boot i=
f the
>> >> first drive failed or was removed.
>> >>
>> >
>> >
>> > I think this is how I did it, at least it is what I had in my note=
s:
>> >
>> > grub-install /dev/sda && grub-install /dev/sdb
>> >
>> > And this is from my notes also. It was from an IRC chat. Don't kno=
w if it
>> > was the raid channel or the grub channel:
>> >
>> > [14:02] Vorg: No. First, what is the output of grub-ins=
tall
>> > --version?
>> > [14:02] =A0(GNU GRUB 1.98~20100115-1)
>> > [14:04] Vorg: Ok, then run "grub-install /dev/sda && gr=
ub-install
>> > /dev/sdb" (where sda and sdb are the members of the array)
>> >
>>
>> Which is exactly my point. You installed grub on /dev/sdb such that =
it
>> would =A0boot off /dev/sdb. But if /dev/sda has failed, on reboot, t=
he
>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>> looking for its files on the non-existent /dev/sdb.
>>
> The way I do it is to run grub, then for each drive do:
> =A0 =A0device (hd0) /dev/sdX
> =A0 =A0root (hd0,0)
> =A0 =A0setup (hd0)
>
> That should set up each drive to boot up as the first drive.
>

How about (after installing grub on /dev/sda):
dd if=3D/dev/sda of=3D/dev/sdb bs=3D466 count=3D1

Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 03.09.2011 19:04:49 von Simon Matthews

On Sat, Sep 3, 2011 at 10:03 AM, Simon Matthews
wrote:
> On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill wr=
ote:
>> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>>
>>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz =
wrote:
>>> >
>>> >>
>>> >> How did you install Grub on the second drive? I have seen some
>>> >> instructions on the web that would not allow the system to boot =
if the
>>> >> first drive failed or was removed.
>>> >>
>>> >
>>> >
>>> > I think this is how I did it, at least it is what I had in my not=
es:
>>> >
>>> > grub-install /dev/sda && grub-install /dev/sdb
>>> >
>>> > And this is from my notes also. It was from an IRC chat. Don't kn=
ow if it
>>> > was the raid channel or the grub channel:
>>> >
>>> > [14:02] Vorg: No. First, what is the output of grub-in=
stall
>>> > --version?
>>> > [14:02] =A0(GNU GRUB 1.98~20100115-1)
>>> > [14:04] Vorg: Ok, then run "grub-install /dev/sda && g=
rub-install
>>> > /dev/sdb" (where sda and sdb are the members of the array)
>>> >
>>>
>>> Which is exactly my point. You installed grub on /dev/sdb such that=
it
>>> would =A0boot off /dev/sdb. But if /dev/sda has failed, on reboot, =
the
>>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>>> looking for its files on the non-existent /dev/sdb.
>>>
>> The way I do it is to run grub, then for each drive do:
>> =A0 =A0device (hd0) /dev/sdX
>> =A0 =A0root (hd0,0)
>> =A0 =A0setup (hd0)
>>
>> That should set up each drive to boot up as the first drive.
>>
>
> How about (after installing grub on /dev/sda):
> dd if=3D/dev/sda of=3D/dev/sdb bs=3D466 count=3D1

ooops, that should be bs=3D446, NOT bs=3D466

Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 03.09.2011 20:45:10 von tlenz

On 9/3/2011 5:17 AM, Robin Hill wrote:
> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>
>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz wrote:
>>>
>>>>
>>>> How did you install Grub on the second drive? I have seen some
>>>> instructions on the web that would not allow the system to boot if the
>>>> first drive failed or was removed.
>>>>
>>>
>>>
>>> I think this is how I did it, at least it is what I had in my notes:
>>>
>>> grub-install /dev/sda&& grub-install /dev/sdb
>>>
>>> And this is from my notes also. It was from an IRC chat. Don't know if it
>>> was the raid channel or the grub channel:
>>>
>>> [14:02] Vorg: No. First, what is the output of grub-install
>>> --version?
>>> [14:02] (GNU GRUB 1.98~20100115-1)
>>> [14:04] Vorg: Ok, then run "grub-install /dev/sda&& grub-install
>>> /dev/sdb" (where sda and sdb are the members of the array)
>>>
>>
>> Which is exactly my point. You installed grub on /dev/sdb such that it
>> would boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>> looking for its files on the non-existent /dev/sdb.
>>
> The way I do it is to run grub, then for each drive do:
> device (hd0) /dev/sdX
> root (hd0,0)
> setup (hd0)
>
> That should set up each drive to boot up as the first drive.
>
> Cheers,
> Robin


That is how I was trying to do it when I first set it up and was having
problems with it not working. The grub people said not to do it that way
because of a greater potential for problems.

The way I read the line I think I used, "&&" is used to put two commands
on the same line, so it should have done both. But, If I did that from
user vorg instead of user root, I would have needed sudo before both
grub-install commands. I can't remember now what I did.

The second drive is teh one that died and was removed, but I guess if
sda wasn't bootable, it could have been booting off of sdb the whole time.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 05.09.2011 10:57:08 von CoolCold

On Sat, Sep 3, 2011 at 4:17 PM, Robin Hill wrot=
e:
> The way I do it is to run grub, then for each drive do:
> =A0 =A0device (hd0) /dev/sdX
> =A0 =A0root (hd0,0)
> =A0 =A0setup (hd0)
>
> That should set up each drive to boot up as the first drive.
$me does the same way, it works.

>
> Cheers,
> =A0 =A0Robin
> --
> =A0 =A0 ___
> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0 obinhill.me.uk> |
> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0|
> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 |
>



--=20
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 09.09.2011 23:54:38 von Bill Davidsen

Timothy D. Lenz wrote:
>
>
> On 8/26/2011 3:45 PM, NeilBrown wrote:
>
> Got another problem. Removed the drive and tried to start it back up
> and now get Grub Error 2. I'm not sure if when I did the mirrors if
> something when wrong with installing grub on the second drive< or if
> is has to do with [U_] which points to sda in that report instead of
> [_U].
>
> I know I pulled the correct drive. I had it labled sdb, it's the
> second drive in the bios bootup drive check and it's the second
> connector on the board. And when I put just it in instead of the
> other, I got the noise again. I think last time a drive failed it was
> one of these two drives because I remember recopying grub.
>
> I do have another computer setup the same way, that I could put this
> remaining drive on to get grub fixed, but it's a bit of a pain to get
> the other computer hooked back up and I will have to dig through my
> notes about getting grub setup without messing up the array and stuff.
> I do know that both computers have been updated to grub 2
>
I like to check the table of device names vs. make/model/serno before
going to far. I get the out of blkdevtrk to be sure.


--
Bill Davidsen
We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination. -me, 2010



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 10.09.2011 00:01:30 von Bill Davidsen

Simon Matthews wrote:
> On Sat, Sep 3, 2011 at 10:03 AM, Simon Matthews
> wrote:
>
>> On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill wrote:
>>
>>> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>>>
>>>
>>>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz wrote:
>>>>
>>>>>
>>>>>> How did you install Grub on the second drive? I have seen some
>>>>>> instructions on the web that would not allow the system to boot if the
>>>>>> first drive failed or was removed.
>>>>>>
>>>>>>
>>>>>
>>>>> I think this is how I did it, at least it is what I had in my notes:
>>>>>
>>>>> grub-install /dev/sda&& grub-install /dev/sdb
>>>>>
>>>>> And this is from my notes also. It was from an IRC chat. Don't know if it
>>>>> was the raid channel or the grub channel:
>>>>>
>>>>> [14:02] Vorg: No. First, what is the output of grub-install
>>>>> --version?
>>>>> [14:02] (GNU GRUB 1.98~20100115-1)
>>>>> [14:04] Vorg: Ok, then run "grub-install /dev/sda&& grub-install
>>>>> /dev/sdb" (where sda and sdb are the members of the array)
>>>>>
>>>>>
>>>> Which is exactly my point. You installed grub on /dev/sdb such that it
>>>> would boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>>>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>>>> looking for its files on the non-existent /dev/sdb.
>>>>
>>>>
>>> The way I do it is to run grub, then for each drive do:
>>> device (hd0) /dev/sdX
>>> root (hd0,0)
>>> setup (hd0)
>>>
>>> That should set up each drive to boot up as the first drive.
>>>
>>>
>> How about (after installing grub on /dev/sda):
>> dd if=/dev/sda of=/dev/sdb bs=466 count=1
>>
> ooops, that should be bs=446, NOT bs=466
>

Which is why you use grub commands, because a typo can wipe out your
drive. May or may not have in this case, but there's no reason to do
stuff like that.

--
Bill Davidsen
We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination. -me, 2010



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Raid failing, which command to remove the bad drive?

am 12.09.2011 22:56:10 von tlenz

On 9/9/2011 3:01 PM, Bill Davidsen wrote:
> Simon Matthews wrote:
>> On Sat, Sep 3, 2011 at 10:03 AM, Simon Matthews
>> wrote:
>>> On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill wrote:
>>>> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>>>>
>>>>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz
>>>>> wrote:
>>>>>>> How did you install Grub on the second drive? I have seen some
>>>>>>> instructions on the web that would not allow the system to boot
>>>>>>> if the
>>>>>>> first drive failed or was removed.
>>>>>>>
>>>>>>
>>>>>> I think this is how I did it, at least it is what I had in my notes:
>>>>>>
>>>>>> grub-install /dev/sda&& grub-install /dev/sdb
>>>>>>
>>>>>> And this is from my notes also. It was from an IRC chat. Don't
>>>>>> know if it
>>>>>> was the raid channel or the grub channel:
>>>>>>
>>>>>> [14:02] Vorg: No. First, what is the output of grub-install
>>>>>> --version?
>>>>>> [14:02] (GNU GRUB 1.98~20100115-1)
>>>>>> [14:04] Vorg: Ok, then run "grub-install /dev/sda&&
>>>>>> grub-install
>>>>>> /dev/sdb" (where sda and sdb are the members of the array)
>>>>>>
>>>>> Which is exactly my point. You installed grub on /dev/sdb such that it
>>>>> would boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>>>>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>>>>> looking for its files on the non-existent /dev/sdb.
>>>>>
>>>> The way I do it is to run grub, then for each drive do:
>>>> device (hd0) /dev/sdX
>>>> root (hd0,0)
>>>> setup (hd0)
>>>>
>>>> That should set up each drive to boot up as the first drive.
>>>>
>>> How about (after installing grub on /dev/sda):
>>> dd if=/dev/sda of=/dev/sdb bs=466 count=1
>> ooops, that should be bs=446, NOT bs=466
>
> Which is why you use grub commands, because a typo can wipe out your
> drive. May or may not have in this case, but there's no reason to do
> stuff like that.
>

Found the problem:

[13:06] Vorg: That error is from grub legacy.
[13:08] Vorg: Grub2 doesn't use error numbers. "grub error 2"
is from grub legacy.

I had updated the boot drives to the new Grub. Checked in bios and it
was set to boot from SATA 3, then SATA 4 AND THEN SATA 1 :(. The second
pair are data drives and where never ment to have grub.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html