problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver

problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver

am 21.10.2010 09:31:59 von Louis-David Mitterrand

Hi,

I am setting up a new Dell T610 server with 8 WD Black Caviar sata3 1TB
disks on a LSISAS2008 controller:

Oct 21 09:12:37 grml kernel: [ 83.377388] mpt2sas0: LSISAS2008: FWVersion(02.1
5.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)

My layout is as follows:

- small un-encrypted raid1 boot partition on /dev/md0

- dm-crypt main partition on /dev/md1 (actuallly /dev/mapper/cmd1)

A recent grml64 is used to create the partitions, install the system and
run lilo.

When running lilo I get these errors from the controller:

Oct 21 08:57:11 grml kernel: [40832.015207] mpt2sas0: fault_state(0x265d)!
Oct 21 08:57:11 grml kernel: [40832.015210] mpt2sas0: sending diag reset !!
Oct 21 08:57:12 grml kernel: [40833.209839] mpt2sas0: diag reset: SUCCESS
Oct 21 08:57:12 grml ata_id[3570]: HDIO_GET_IDENTITY failed for '/dev/sde'
Oct 21 08:57:13 grml kernel: [40833.407078] mpt2sas0: LSISAS2008: FWVersion(02.15.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
Oct 21 08:57:13 grml kernel: [40833.407084] mpt2sas0: Dell PERC H200 Integrated: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1E)
Oct 21 08:57:13 grml kernel: [40833.407087] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)

etc.. and then all disks are kicked out of /dev/md1:

Oct 21 08:57:20 grml kernel: [40840.361581] md: super_written gets error=-5, uptodate=0
Oct 21 08:57:20 grml kernel: [40840.361586] md/raid:md1: Disk failure on sdd2, disabling device.
Oct 21 08:57:20 grml kernel: [40840.361587] <1>md/raid:md1: Operation continuing on 0 devices.

But on another attempt /dev/md0 was stopped as well.

Any suggestion on fixing that problem would be welcome. I can send more
complete logs.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver

am 21.10.2010 13:08:51 von Tim Small

On 21/10/10 08:31, Louis-David Mitterrand wrote:
> Hi,
>
> I am setting up a new Dell T610 server with 8 WD Black Caviar sata3 1TB
> disks on a LSISAS2008 controller:
>
> Oct 21 09:12:37 grml kernel: [ 83.377388] mpt2sas0: LSISAS2008: FWVersion(02.1
> 5.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
>
> My layout is as follows:
>
> - small un-encrypted raid1 boot partition on /dev/md0
>
> - dm-crypt main partition on /dev/md1 (actuallly /dev/mapper/cmd1)
>
> A recent grml64 is used to create the partitions, install the system and
> run lilo.
>
> When running lilo I get these errors from the controller:
>
> Oct 21 08:57:11 grml kernel: [40832.015207] mpt2sas0: fault_state(0x265d)!
> Oct 21 08:57:11 grml kernel: [40832.015210] mpt2sas0: sending diag reset !!
>


> Any suggestion on fixing that problem would be welcome. I can send more
> complete logs.
>

Looks like a firmware bug - do you have the latest firmware? Drive
firmwares? Anything in the drive error logs (using smartctl)?

If not, then try opening a bug on the kernel bugzilla - LSI engineers
read that (and sometimes even fix things).

Otherwise, you could try replacing with a straight SATA contoller, if
that box doesn't have a SAS backplane - I've not been to impressed by
the quality of engineering for LSI contollers, and SATA-on-SAS in
general hasn't been very reliable IMO. Just go for a well supported
SATA controller (e.g. Sil 3132 etc.).

Tim.


--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver

am 21.10.2010 14:50:25 von Louis-David Mitterrand

On Thu, Oct 21, 2010 at 12:08:51PM +0100, Tim Small wrote:
>
> >Any suggestion on fixing that problem would be welcome. I can send more
> >complete logs.
>
> Looks like a firmware bug - do you have the latest firmware? Drive
> firmwares? Anything in the drive error logs (using smartctl)?
>
> If not, then try opening a bug on the kernel bugzilla - LSI
> engineers read that (and sometimes even fix things).
>
> Otherwise, you could try replacing with a straight SATA contoller,
> if that box doesn't have a SAS backplane - I've not been to
> impressed by the quality of engineering for LSI contollers, and
> SATA-on-SAS in general hasn't been very reliable IMO. Just go for a
> well supported SATA controller (e.g. Sil 3132 etc.).

Hi Tim and thanks for your feedback.

I was eventually able to "fix" the problem. After very carefully running
lilo on each disk with "raid-extra-boot=/dev/sdX" (instead of "mbr") I
rebooted into my live system with a freshly compliled 2.6.36 and the
problem vanished. lilo now runs fine even my "raid-extra-boot=mbr" and
several reboots have not triggered any further issue.

The firmwares are all to their latest so I guess the mpt2sas kernel
driver must have been improved between 2.6.35 and 2.6.36.

For info here is part of the 2.6.36 boot log with a few ominous "!!" and
one "failure" but with no apparent consequence.

Cheers,

Oct 21 14:25:47 zenon kernel: mpt2sas version 06.100.00.00 loaded
Oct 21 14:25:47 zenon kernel: scsi0 : Fusion MPT SAS Host
Oct 21 14:25:47 zenon kernel: mpt2sas 0000:02:00.0: PCI INT A -> GSI 41 (level,
low) -> IRQ 41
Oct 21 14:25:47 zenon kernel: mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (16426776 kB)
Oct 21 14:25:47 zenon kernel: mpt2sas0: IO-APIC enabled: IRQ 41
Oct 21 14:25:47 zenon kernel: mpt2sas0: iomem(0x00000000df2b0000), mapped(0xffffc90000060000), size(65536)
Oct 21 14:25:47 zenon kernel: mpt2sas0: ioport(0x000000000000fc00), size(256)
Oct 21 14:25:47 zenon kernel: mpt2sas0: sending diag reset !!
Oct 21 14:25:47 zenon kernel: mpt2sas0: diag reset: SUCCESS
Oct 21 14:25:47 zenon kernel: mpt2sas0: Allocated physical memory: size(1091 kB)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Current Controller Queue Depth(467), Max Controller Queue Depth(3439)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Scatter Gather Elements per IO(128)
Oct 21 14:25:47 zenon kernel: mpt2sas0: LSISAS2008: FWVersion(02.15.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Dell PERC H200 Integrated: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1E)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
Oct 21 14:25:47 zenon kernel: mpt2sas0: sending port enable !!
Oct 21 14:25:47 zenon kernel: mpt2sas0: host_add: handle(0x0001), sas_addr(0x5842b2b05020c600), phys(8)
Oct 21 14:25:47 zenon kernel: mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:4546/_scsih_add_device( )!
Oct 21 14:25:47 zenon kernel: mpt2sas0: port enable: SUCCESS
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: Direct-Access ATA WDC WD1002FAEX-0 1D05 PQ: 0 ANSI: 5
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: SATA: handle(0x0011), sas_addr(0x4433221107000000), phy(7), device_name(0x4ee25001c38204eb)
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: SATA: enclosure_logical_id(0x5842b2b05020c600), slot(0)
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: qdepth(32), tagged(1), simple(1), ordered(0), scsi_level(6), cmd_que(1)

etc..
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver

am 22.10.2010 07:25:13 von unknown

Am 21.10.2010 13:08, schrieb Tim Small:
> On 21/10/10 08:31, Louis-David Mitterrand wrote:
>> Hi,
>>
>> I am setting up a new Dell T610 server with 8 WD Black Caviar sata3 1TB
>> disks on a LSISAS2008 controller:
>>
>> Oct 21 09:12:37 grml kernel: [ 83.377388] mpt2sas0: LSISAS2008:
>> FWVersion(02.1
>> 5.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
>>
>> My layout is as follows:
>>
>> - small un-encrypted raid1 boot partition on /dev/md0
>>
>> - dm-crypt main partition on /dev/md1 (actuallly /dev/mapper/cmd1)
>>
>> A recent grml64 is used to create the partitions, install the system and
>> run lilo.
>>
>> When running lilo I get these errors from the controller:
>>
>> Oct 21 08:57:11 grml kernel: [40832.015207] mpt2sas0:
>> fault_state(0x265d)!
>> Oct 21 08:57:11 grml kernel: [40832.015210] mpt2sas0: sending diag
>> reset !!
>>
>
>
>> Any suggestion on fixing that problem would be welcome. I can send more
>> complete logs.
>>
>
> Looks like a firmware bug - do you have the latest firmware? Drive
> firmwares? Anything in the drive error logs (using smartctl)?
>
> If not, then try opening a bug on the kernel bugzilla - LSI engineers
> read that (and sometimes even fix things).
>
> Otherwise, you could try replacing with a straight SATA contoller, if
> that box doesn't have a SAS backplane - I've not been to impressed by
> the quality of engineering for LSI contollers, and SATA-on-SAS in
> general hasn't been very reliable IMO. Just go for a well supported
> SATA controller (e.g. Sil 3132 etc.).
>
> Tim.
>
>
I'll have to object on the matter of SATA-drives on SAS-controllers. We
use 3ware/LSI 9650,9690 and 9750 controllers a lot and have rarely had
any problems. The problems we encountered came with hardware failures.
On the LSISAS2008 it's good to hear that most problems got fixed with
later kernels. As we are trying to get our lower-cost storage systems
running on this controller (onboard a supermicro MB), this shows which
way to go... Thank you for this information!

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html