sata

Vista + Loose SATA cable = Bad news

So... Here's what I have been doing during the past 5 hours:

I was noticing some odd behavior from my PC, which runs Fedora Linux as its primary OS but it also has Windows Vista installed for gaming. The machine would boot without any indication of trouble, but once it had been up & running for about 5 minutes, the system would hang and the hard disk activity light on the case would stay permanently on. A soft reboot wouldn't fix the problem either - a complete shutdown was required. At first I thought it was an OS problem, so I rebooted into Vista but found it was affected too. I immediately thought, "hardware". I tried leaving the computer alone for an hour to see if it it would eventually come out of the freeze, but it clearly wasn't doing anything with the disk because the system remained frozen and I could not hear the disk heads moving (and on a 10K RPM drive, those are pretty loud). I ran memtest86+ and did a 3 minute S.M.A.R.T self-test on /dev/sda in Fedora, but oddly enough both came up clean.

Since my hardware seemed OK, I powered down the PC, opened the case and made sure there were no loose cables. Sure enough, the problem was the SATA cable which connected my motherboard to my hard disk. After disconnecting it, blowing off some excess dust and reconnecting it, everything was fine. But that's not where the story ends.

By the time I had reproduced the problem, tested the RAM & hard disk and reconnected the SATA cable, I had done about 15 power cycles. Linux handled the whole situation pretty gracefully - it logged the specific SATA errors (Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK) and put the root filesystem into read-only mode. After reconnecting the cable, Fedora was up and running as if nothing had happened (it did do an automatic fsck upon booting, but the check came up clean). Vista, on the other hand, didn't take it so well - it informed me that I need to run CHKDISK upon starting up, so I let it repair C:\ and it orphans thousands and thousands of files... After CHKDISK completed I was (surprisingly) able to boot up, but many programs - including explorer.exe - were crashing. Judging by the amount of orphaned files, I'm guessing that quite a few system files were missing or corrupted.

So, long story short, if you have any SATA problems and Vista starts orphaning a tons of files during CHKDISK, save yourself some time by canceling the CHKDISK and make sure you have your Vista installation DVD handy.

Rating: 

NexStar3 eSATA+USB2.0 Enclosure

I recently bought a Vantec NexStar3 eSATA+USB2.0 enclosure plus a Western Digital 500GB SATA2 drive to go inside it. The enclosure and the drive are great, but when I put the two together I was having some really, really annoying problems.

My BIOS detected the drive fine, AHCI or IDE mode, and could bootup into Linux (I tried with Fedora, Ubuntu, PCLinuxOS). It's detected in the OS too, but not usable:

Aug 15 23:32:56 LinuxBox kernel: ata7.00: exception Emask 0x10 SAct 0xff SErr 0x780100 action 0x2
Aug 15 23:32:56 LinuxBox kernel: ata7.00: irq_stat 0x08000000
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/20:00:af:96:bc/00:00:05:00:00/40 tag 0 cdb 0x0 data 16384 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/80:08:d0:86:cc/00:00:05:00:00/40 tag 1 cdb 0x0 data 65536 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/7f:10:50:87:cc/00:00:05:00:00/40 tag 2 cdb 0x0 data 65024 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/08:18:e7:87:cc/00:00:05:00:00/40 tag 3 cdb 0x0 data 4096 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/07:20:66:7b:4c/00:00:12:00:00/40 tag 4 cdb 0x0 data 3584 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/02:28:e3:70:cc/00:00:1e:00:00/40 tag 5 cdb 0x0 data 1024 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/07:30:79:b1:d4/00:00:01:00:00/40 tag 6 cdb 0x0 data 3584 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/06:38:e5:70:cc/00:00:1e:00:00/40 tag 7 cdb 0x0 data 3072 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:3c:e5:70:cc/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7: soft resetting port
Aug 15 23:32:56 LinuxBox kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: configured for UDMA/133
Aug 15 23:32:56 LinuxBox kernel: ata7: EH complete
Aug 15 23:32:56 LinuxBox kernel: sd 7:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 15 23:32:56 LinuxBox kernel: sd 7:0:0:0: [sdb] Write Protect is off
Aug 15 23:32:56 LinuxBox kernel: sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 15 23:32:56 LinuxBox kernel: scsi 6:0:0:0: Direct-Access     DMI      WD1600BB-00GUC0  3.52 PQ: 0 ANSI: 0
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: [sdc] Write Protect is off
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: [sdc] Assuming drive cache: write through
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: [sdc] Write Protect is off
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: [sdc] Assuming drive cache: write through
Aug 15 23:32:56 LinuxBox kernel:  sdc: sdc1 sdc2 < sdc5 sdc6 > sdc3
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: [sdc] Attached SCSI disk
Aug 15 23:32:56 LinuxBox kernel: sd 6:0:0:0: Attached scsi generic sg3 type 0
Aug 15 23:32:56 LinuxBox kernel: NET: Registered protocol family 10
Aug 15 23:32:56 LinuxBox kernel: lo: Disabled Privacy Extensions
Aug 15 23:32:56 LinuxBox kernel: floppy0: no floppy controllers found
Aug 15 23:32:56 LinuxBox kernel: No dock devices found.
Aug 15 23:32:56 LinuxBox kernel: device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com
Aug 15 23:32:56 LinuxBox kernel: device-mapper: multipath: version 1.0.5 loaded
Aug 15 23:32:56 LinuxBox kernel: EXT3 FS on sda1, internal journal
Aug 15 23:32:56 LinuxBox kernel: ata7.00: exception Emask 0x10 SAct 0x2 SErr 0x580100 action 0x2
Aug 15 23:32:56 LinuxBox kernel: ata7.00: irq_stat 0x08000000
Aug 15 23:32:56 LinuxBox kernel: ata7.00: cmd 60/80:08:27:97:bc/00:00:05:00:00/40 tag 1 cdb 0x0 data 65536 in
Aug 15 23:32:56 LinuxBox kernel:          res 40/00:0c:27:97:bc/00:00:05:00:00/40 Emask 0x10 (ATA bus error)
Aug 15 23:32:56 LinuxBox kernel: ata7: soft resetting port
Aug 15 23:32:56 LinuxBox kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 15 23:32:56 LinuxBox kernel: ata7.00: configured for UDMA/133
Aug 15 23:32:56 LinuxBox kernel: ata7: EH complete
Aug 15 23:32:56 LinuxBox kernel: sd 7:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 15 23:32:56 LinuxBox kernel: sd 7:0:0:0: [sdb] Write Protect is off
Aug 15 23:32:56 LinuxBox kernel: sd 7:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

After tons of messages like that and a few minutes, a "soft reset" and "EH complete" would occur and only then would a tiny bit of data get through. Right after that the link froze up again with tons of errors.

So I googled, changed ports, changed wires, used different eSATA wires, nothing changed. I tried the drive internally and it's perfectly fine, it operated at full 3.0Gbps - SATA II speed. After hours of searching I learned this:

  • The ICH8 SATA controller on the GA-965P-S3 (rev 1.0) doesn't actually support AHCI. Later revisions of the ICH8, for example the ones on the GA-965P-S3 rev 3.3, do though. /me grumbles...
  • The one time the drive did work without the link failing was when it was reset enough times and auto-negotiated the link down to 1.5Gbps - Perhaps electrical interference was the problem.
  • All SATA II Western Digital drives have a manual override for the SATA link speed using the jumpers: putting a jumper on the second column of pins will limit the drive to 1.5Gbps (regular SATA speed)

As you can probably guess, the first point is just a major annoyance and the last two solved the problem. My drive now boots up with a 1.5Gbps link, so no need to let it fail for a while before using it. As for the enclosure, it's a great buy for it's value - I got it for $40 CAD. It features SATA connectors inside and as mentioned above can be plugged in via USB or eSATA. Vantec even supplies you with an eSATA cable and an SATA --> eSATA bracket!

Rating: