SATA, SW RAID, nestabilni system

Roman Paral roman.paral na FSV.CVUT.CZ
Úterý Leden 31 10:18:23 CET 2006


Dobry den,

malinko bych potreboval poradit,

mam server zhruba nasleduji konfigurace:
mb: Asus NCCH-DR, 2xcpu Xeon 3,4GHz
disk: 2x WDC WD740GD SATA
radic: IDE interface: Intel Corp. 6300ESB SATA Storage Controller (rev 02)

nainstalovan Debian stable, kompilovane jadro 2.6.15 #1 SMP

v jadre ohlede SATA:

CONFIG_SCSI_SATA=y
CONFIG_SCSI_SATA_AHCI=y
CONFIG_SCSI_ATA_PIIX=y
CONFIG_SCSI_SATA_INTEL_COMBINED=y

vytvoren SW RAID 1 na vsech diskovych oddilech

Pri pristupu na disk se zacal pocitac kousat, vzdy po resetu a obnove pole
vydrzel bezet o neco dele,
nechal jsem proto cist cely disk (dd if=/dev/sda of=/dev/null..), ale vedlo
to k nasledujicimu stavu viz vypis /var/log/messages
po resetu je vypis stavu pole /proc/mdstat a cast vypisu smartctl.

Muze byt problem s ovladacem na SATA, pripadne nekde neco spatne nastaveno,
nebo treba i HW problem s diskem, mate nejakou zkusenost?

Dekuji moc, Roman Paral


...
Jan  9 22:41:38 wifi2 kernel: ata1: status=0xd0 { Busy }
Jan  9 22:41:38 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan  9 22:41:38 wifi2 last message repeated 2 times
Jan  9 22:42:08 wifi2 kernel: ata1: status=0xd0 { Busy }
Jan  9 22:42:08 wifi2 kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002
Jan  9 22:42:08 wifi2 kernel: sda: Current: sense key=0xb
Jan  9 22:42:08 wifi2 kernel:     ASC=0x47 ASCQ=0x0
Jan  9 22:42:08 wifi2 kernel: end_request: I/O error, dev sda, sector 35150028
Jan  9 22:42:08 wifi2 kernel: ^IOperation continuing on 1 devices
Jan  9 22:42:08 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan  9 22:42:08 wifi2 kernel: RAID1 conf printout:
Jan  9 22:42:08 wifi2 kernel:  --- wd:1 rd:2
Jan  9 22:42:08 wifi2 kernel:  disk 0, wo:0, o:1, dev:sdb3
Jan  9 22:42:08 wifi2 kernel:  disk 1, wo:1, o:0, dev:sda3
Jan  9 22:42:08 wifi2 kernel: RAID1 conf printout:
Jan  9 22:42:08 wifi2 kernel:  --- wd:1 rd:2
Jan  9 22:42:08 wifi2 kernel:  disk 0, wo:0, o:1, dev:sdb3
Jan  9 22:42:08 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan  9 22:42:08 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan  9 22:42:38 wifi2 kernel: ata1: status=0xd0 { Busy }
Jan  9 22:42:38 wifi2 kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002
Jan  9 22:42:38 wifi2 kernel: sda: Current: sense key=0xb
Jan  9 22:42:38 wifi2 kernel:     ASC=0x47 ASCQ=0x0
Jan  9 22:42:38 wifi2 kernel: end_request: I/O error, dev sda, sector 8070134
Jan  9 22:42:38 wifi2 kernel: ^IOperation continuing on 1 devices
Jan  9 22:42:38 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan  9 22:42:38 wifi2 last message repeated 2 times
Jan  9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan  9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan  9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan  9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan  9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan  9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan  9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan  9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan  9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan  9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan  9 22:42:42 wifi2 kernel: RAID1 conf printout:
Jan  9 22:42:42 wifi2 kernel:  --- wd:1 rd:2
Jan  9 22:42:42 wifi2 kernel:  disk 0, wo:0, o:1, dev:sdb2
Jan  9 22:42:42 wifi2 kernel:  disk 1, wo:1, o:0, dev:sda2
Jan  9 22:42:42 wifi2 kernel: RAID1 conf printout:
Jan  9 22:42:42 wifi2 kernel:  --- wd:1 rd:2
Jan  9 22:42:42 wifi2 kernel:  disk 0, wo:0, o:1, dev:sdb2
...


cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[0]
      3903680 blocks [2/1] [U_]

md2 : active raid1 sdb3[0]
      9767424 blocks [2/1] [U_]

md3 : active raid1 sdb5[0] sda5[1]
      45311232 blocks [2/2] [UU]

md4 : active raid1 sdb6[0] sda6[1]
      3903680 blocks [2/2] [UU]

md5 : active raid1 sdb7[0] sda7[1]
      5815424 blocks [2/2] [UU]

md0 : active raid1 sdb1[0] sda1[1]
      3903680 blocks [2/2] [UU]


smartctl -a -d ata /dev/sda

...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always -       0
  3 Spin_Up_Time            0x0007   120   112   021    Pre-fail  Always -       4500
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always -       40
  5 Reallocated_Sector_Ct   0x0033   199   199   140    Pre-fail  Always -       12
  7 Seek_Error_Rate         0x000b   100   253   051    Pre-fail  Always -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always -       353
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always -       0
 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always -       40
194 Temperature_Celsius     0x0022   126   113   000    Old_age   Always -       24
196 Reallocated_Event_Count 0x0032   189   189   000    Old_age   Always -       11
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always -       0
200 Multi_Zone_Error_Rate   0x0009   200   179   051    Pre-fail  Offline-       0
...

smartctl -a -d ata /dev/sdb

...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always -       0
  3 Spin_Up_Time            0x0007   127   118   021    Pre-fail  Always -       4183
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always -       43
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always -       0
  7 Seek_Error_Rate         0x000b   100   253   051    Pre-fail  Always -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always -       369
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always -       0
 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always -       43
194 Temperature_Celsius     0x0022   128   113   000    Old_age   Always -       22
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always -       1
200 Multi_Zone_Error_Rate   0x0009   200   179   051    Pre-fail  Offline-       0
...


Další informace o konferenci Linux