odchazejici disk? - delsi
Vaclav Bilek
vbilek na seznam.cz
Pondělí Červen 13 15:18:26 CEST 2005
doporucuji
1.smartctl -t long /dev/hda
2.vymenit ksandu
pravdepodobne jde o vadny disk...
JaFi wrote:
> Ahoj vsichni!
>
> Mam na serveru FC2. V serveru jsou dva zrcadlene disky. Vcera mi server
> zatuhl. V logu nic nebylo. Po restartu server nabehl. Pri nabihani mi to
> napsalo:
>
> Jun 11 13:34:45 server kernel: hda: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: hda: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=1559594, sector=1559462
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: end_request: I/O error, dev hda, sector
> 1559462
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: hda: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: hda: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=1559594, sector=1559470
> Jun 11 13:32:37 server fsck: ^B
>
> atd....
>
> Po nabehu mi z pole vypadla oblast /dev/hda2. Rikal jsem si tedy, ze to
> vypada na vadny disk. Ale zkusil jsem jeste tuto oblast znovu do pole
> zapojit a k memu prekvapeni to bez problemu proslo.
>
> Kazdopadne jsem jeste vyzkousel: smartctl -a /dev/hda
> a vypsalo to toto:
>
> smartctl version 5.21 Copyright (C) 2002-3 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Device Model: WDC WD800JB-00CRA1
> Serial Number: WD-WCA8E7531992
> Firmware Version: 17.07W17
> Device is: Not in smartctl database [for details use: -P showall]
> ATA Version is: 5
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Mon Jun 13 13:35:04 2005 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x84) Offline data collection activity
> was
> suspended by an interrupting
> command from host.
> Auto Offline Data Collection:
> Enabled.
> Self-test execution status: ( 0) The previous self-test routine
> completed
> without error or no self-test
> has ever
> been run.
> Total time to complete Offline
> data collection: (3120) seconds.
> Offline data collection
> capabilities: (0x3b) SMART execute Offline immediate.
> Auto Offline data collection
> on/off support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> Conveyance Self-test supported.
> No Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> No General Purpose Logging support.
> Short self-test routine
> recommended polling time: ( 2) minutes.
> Extended self-test routine
> recommended polling time: ( 58) minutes.
> Conveyance self-test routine
> recommended polling time: ( 5) minutes.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0007 103 094 021 Pre-fail
> Always - 3941
> 4 Start_Stop_Count 0x0032 100 100 040 Old_age
> Always - 127
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail
> Always - 0
> 9 Power_On_Hours 0x0032 078 078 000 Old_age
> Always - 16446
> 10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail
> Always - 0
> 11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 108
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
> - 0
> 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always
> - 0
> 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always
> - 0
> 200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail
> Offline - 0
>
> SMART Error Log Version: 1
> ATA Error Count: 17 (device log contains only the most recent five errors)
> CR = Command Register [HEX]
> FR = Features Register [HEX]
> SC = Sector Count Register [HEX]
> SN = Sector Number Register [HEX]
> CL = Cylinder Low Register [HEX]
> CH = Cylinder High Register [HEX]
> DH = Device/Head Register [HEX]
> DC = Device Command Register [HEX]
> ER = Error register [HEX]
> ST = Status register [HEX]
> Timestamp = decimal seconds since the previous disk power-on.
> Note: timestamp "wraps" after 2^32 msec = 49.710 days.
>
> Error 17 occurred at disk power-on lifetime: 13 hours
> When the command that caused the error occurred, the device was active
> or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 80 2a cc 17 e0
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
> -- -- -- -- -- -- -- -- --------- --------------------
> c8 00 80 26 cc 17 e0 00 102.100 READ DMA
> c8 00 88 1e cc 17 e0 00 99.450 READ DMA
> c8 00 90 16 cc 17 e0 00 96.850 READ DMA
> c8 00 98 0e cc 17 e0 00 94.400 READ DMA
> c8 00 a0 06 cc 17 e0 00 91.900 READ DMA
>
> Error 16 occurred at disk power-on lifetime: 13 hours
> When the command that caused the error occurred, the device was active
> or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 88 2a cc 17 e0
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
> -- -- -- -- -- -- -- -- --------- --------------------
> c8 00 88 1e cc 17 e0 00 99.450 READ DMA
> c8 00 90 16 cc 17 e0 00 96.850 READ DMA
> c8 00 98 0e cc 17 e0 00 94.400 READ DMA
> c8 00 a0 06 cc 17 e0 00 91.900 READ DMA
> c8 00 a8 fe cb 17 e0 00 89.400 READ DMA
>
> Error 15 occurred at disk power-on lifetime: 13 hours
> When the command that caused the error occurred, the device was active
> or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 90 2a cc 17 e0
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
> -- -- -- -- -- -- -- -- --------- --------------------
> c8 00 90 16 cc 17 e0 00 96.850 READ DMA
> c8 00 98 0e cc 17 e0 00 94.400 READ DMA
> c8 00 a0 06 cc 17 e0 00 91.900 READ DMA
> c8 00 a8 fe cb 17 e0 00 89.400 READ DMA
> c8 00 b0 f6 cb 17 e0 00 86.800 READ DMA
>
> Error 14 occurred at disk power-on lifetime: 13 hours
> When the command that caused the error occurred, the device was active
> or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 98 2a cc 17 e0
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
> -- -- -- -- -- -- -- -- --------- --------------------
> c8 00 98 0e cc 17 e0 00 94.400 READ DMA
> c8 00 a0 06 cc 17 e0 00 91.900 READ DMA
> c8 00 a8 fe cb 17 e0 00 89.400 READ DMA
> c8 00 b0 f6 cb 17 e0 00 86.800 READ DMA
> c8 00 b8 ee cb 17 e0 00 84.300 READ DMA
>
> Error 13 occurred at disk power-on lifetime: 13 hours
> When the command that caused the error occurred, the device was active
> or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 a0 2a cc 17 e0
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
> -- -- -- -- -- -- -- -- --------- --------------------
> c8 00 a0 06 cc 17 e0 00 91.900 READ DMA
> c8 00 a8 fe cb 17 e0 00 89.400 READ DMA
> c8 00 b0 f6 cb 17 e0 00 86.800 READ DMA
> c8 00 b8 ee cb 17 e0 00 84.300 READ DMA
> c8 00 c0 e6 cb 17 e0 00 81.700 READ DMA
>
> SMART Self-test log structure revision number 1
> No self-tests have been logged. [Use the smartctl -t option to run these.]
>
>
> Popravde receno se v tom moc nevyznam, ale chapu to tak, ze disk je vadny.
>
> Co vy na to?
>
> Diky za radu.
>
> JF
Další informace o konferenci Linux