SATA, SW RAID, nestabilni system
Roman Paral
roman.paral na FSV.CVUT.CZ
Úterý Leden 31 10:18:23 CET 2006
Dobry den,
malinko bych potreboval poradit,
mam server zhruba nasleduji konfigurace:
mb: Asus NCCH-DR, 2xcpu Xeon 3,4GHz
disk: 2x WDC WD740GD SATA
radic: IDE interface: Intel Corp. 6300ESB SATA Storage Controller (rev 02)
nainstalovan Debian stable, kompilovane jadro 2.6.15 #1 SMP
v jadre ohlede SATA:
CONFIG_SCSI_SATA=y
CONFIG_SCSI_SATA_AHCI=y
CONFIG_SCSI_ATA_PIIX=y
CONFIG_SCSI_SATA_INTEL_COMBINED=y
vytvoren SW RAID 1 na vsech diskovych oddilech
Pri pristupu na disk se zacal pocitac kousat, vzdy po resetu a obnove pole
vydrzel bezet o neco dele,
nechal jsem proto cist cely disk (dd if=/dev/sda of=/dev/null..), ale vedlo
to k nasledujicimu stavu viz vypis /var/log/messages
po resetu je vypis stavu pole /proc/mdstat a cast vypisu smartctl.
Muze byt problem s ovladacem na SATA, pripadne nekde neco spatne nastaveno,
nebo treba i HW problem s diskem, mate nejakou zkusenost?
Dekuji moc, Roman Paral
...
Jan 9 22:41:38 wifi2 kernel: ata1: status=0xd0 { Busy }
Jan 9 22:41:38 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan 9 22:41:38 wifi2 last message repeated 2 times
Jan 9 22:42:08 wifi2 kernel: ata1: status=0xd0 { Busy }
Jan 9 22:42:08 wifi2 kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002
Jan 9 22:42:08 wifi2 kernel: sda: Current: sense key=0xb
Jan 9 22:42:08 wifi2 kernel: ASC=0x47 ASCQ=0x0
Jan 9 22:42:08 wifi2 kernel: end_request: I/O error, dev sda, sector 35150028
Jan 9 22:42:08 wifi2 kernel: ^IOperation continuing on 1 devices
Jan 9 22:42:08 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan 9 22:42:08 wifi2 kernel: RAID1 conf printout:
Jan 9 22:42:08 wifi2 kernel: --- wd:1 rd:2
Jan 9 22:42:08 wifi2 kernel: disk 0, wo:0, o:1, dev:sdb3
Jan 9 22:42:08 wifi2 kernel: disk 1, wo:1, o:0, dev:sda3
Jan 9 22:42:08 wifi2 kernel: RAID1 conf printout:
Jan 9 22:42:08 wifi2 kernel: --- wd:1 rd:2
Jan 9 22:42:08 wifi2 kernel: disk 0, wo:0, o:1, dev:sdb3
Jan 9 22:42:08 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan 9 22:42:08 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan 9 22:42:38 wifi2 kernel: ata1: status=0xd0 { Busy }
Jan 9 22:42:38 wifi2 kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002
Jan 9 22:42:38 wifi2 kernel: sda: Current: sense key=0xb
Jan 9 22:42:38 wifi2 kernel: ASC=0x47 ASCQ=0x0
Jan 9 22:42:38 wifi2 kernel: end_request: I/O error, dev sda, sector 8070134
Jan 9 22:42:38 wifi2 kernel: ^IOperation continuing on 1 devices
Jan 9 22:42:38 wifi2 kernel: ATA: abnormal status 0xD0 on port 0x1F7
Jan 9 22:42:38 wifi2 last message repeated 2 times
Jan 9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan 9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan 9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan 9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan 9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan 9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan 9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan 9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan 9 22:42:42 wifi2 kernel: ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
Jan 9 22:42:42 wifi2 kernel: ata1: error=0x04 { DriveStatusError }
Jan 9 22:42:42 wifi2 kernel: RAID1 conf printout:
Jan 9 22:42:42 wifi2 kernel: --- wd:1 rd:2
Jan 9 22:42:42 wifi2 kernel: disk 0, wo:0, o:1, dev:sdb2
Jan 9 22:42:42 wifi2 kernel: disk 1, wo:1, o:0, dev:sda2
Jan 9 22:42:42 wifi2 kernel: RAID1 conf printout:
Jan 9 22:42:42 wifi2 kernel: --- wd:1 rd:2
Jan 9 22:42:42 wifi2 kernel: disk 0, wo:0, o:1, dev:sdb2
...
cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[0]
3903680 blocks [2/1] [U_]
md2 : active raid1 sdb3[0]
9767424 blocks [2/1] [U_]
md3 : active raid1 sdb5[0] sda5[1]
45311232 blocks [2/2] [UU]
md4 : active raid1 sdb6[0] sda6[1]
3903680 blocks [2/2] [UU]
md5 : active raid1 sdb7[0] sda7[1]
5815424 blocks [2/2] [UU]
md0 : active raid1 sdb1[0] sda1[1]
3903680 blocks [2/2] [UU]
smartctl -a -d ata /dev/sda
...
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 120 112 021 Pre-fail Always - 4500
4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 40
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 12
7 Seek_Error_Rate 0x000b 100 253 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 353
10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 40
194 Temperature_Celsius 0x0022 126 113 000 Old_age Always - 24
196 Reallocated_Event_Count 0x0032 189 189 000 Old_age Always - 11
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0009 200 179 051 Pre-fail Offline- 0
...
smartctl -a -d ata /dev/sdb
...
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 127 118 021 Pre-fail Always - 4183
4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 43
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 253 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 369
10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 43
194 Temperature_Celsius 0x0022 128 113 000 Old_age Always - 22
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0009 200 179 051 Pre-fail Offline- 0
...
Další informace o konferenci Linux