Problems after last update (with CentOS 5.6)

5 posts / 0 new
Last post
#1 Tue, 04/12/2011 - 09:20
Blueforce

Problems after last update (with CentOS 5.6)

Hi,

After running yum update yesterday witch also updated CentOS 5.5 to 5.6 i get a error in the log files referring to our raid controller, 3ware Inc 9650SE (Model: 9650SE-2LP DISK) - It reports disk/raid failure!

kernel: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.

kernel: sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x2a) timed out, resetting card.

kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=0.

kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x000B): Rebuild started:unit=0.

kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=0.

kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded unit:unit=0, port=0.

kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=0.

And the "SMART Drive Status" page in Webmin doesn’t pick up any status anymore, it just returns, "No IDE or SCSI drives were found on your system." Before I could see info about the raid array and also about the disks in the array.

The box is (now) running CentOS 5.6 (Linux 2.6.18-238.5.1.el5 on x86_64), Webmin 1.540 and Virtualmin 3.84 Pro

If you have any suggestion or thoughts about this, please let me know. Thanks!

Best regards, Leffe (Blueforce)

Tue, 04/12/2011 - 11:30
andreychek

I'm not sure why Webmin isn't seeing the status, but the issue with your RAID setup is definitely troubling :-)

It look like one of the drives in your RAID may have failed.

I suspect that particular problem isn't related specifically to the CentOS update... it's probably moreso that the update taxed the disks. If one of them was getting ready to go, performing the update may have pushed it over the edge.

The job of the RAID controller is to hide the individual disks from the OS, so you may not be able to use the SMART tools on one specific disk so long as it's plugged into the RAID controller.

You may need to pull that one disk from the controller, and likely just replace it. You could certainly test it with the SMART tools after pulling it, but errors like the above aren't good :-)

-Eric

Tue, 04/12/2011 - 11:44
ronald
ronald's picture

you may have seen this thread already, just in case, here is a similar issue ..
http://us.generation-nt.com/answer/3w-9xxx-scsi0-warning-0x06-0x0037-cha...

Tue, 04/12/2011 - 12:10 (Reply to #3)
Blueforce

Hi Ronald,

Thanks for your feedback!

I can run "smartctl" from the command-line and get the disk status, and it reports this for the drives:

DISK 0

SMART overall-health self-assessment test result: PASSED

Error 2 occurred at disk power-on lifetime: 6255 hours (260 days + 15 hours)

DISK 1

SMART overall-health self-assessment test result: PASSED

No Errors Logged

And I almost shure I could get both the array status and disk status from the "SMART Drive Status" page before.

The box is only 9 moth old and the disks have been running fine all the time without any errors. And yes a disk can fail at any time, but the strange thing is that the SMART gives both drives PASSED! And why can't i no longer use the "SMART Drive Status" page??

I will read the thread you posted - Thanks!

Best regards, Leffe

Tue, 04/12/2011 - 17:01 (Reply to #4)
Blueforce

Jamie did solve our issue with SMART Drive Status not returning any information.

The SMART Drive Status need the tw_cli to be installed. I installed it and now the SMART Drive Status is working again.

/Leif

Topic locked