odchazejici disk? - delsi

Vaclav Bilek vbilek na seznam.cz
Pondělí Červen 13 15:18:26 CEST 2005


doporucuji

1.smartctl -t long /dev/hda
2.vymenit ksandu
pravdepodobne jde o  vadny disk...


JaFi wrote:
> Ahoj vsichni!
> 
> Mam na serveru FC2. V serveru jsou dva zrcadlene disky. Vcera mi server
> zatuhl. V logu nic nebylo. Po restartu server nabehl. Pri nabihani mi to
> napsalo:
> 
> Jun 11 13:34:45 server kernel: hda: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: hda: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=1559594, sector=1559462
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: end_request: I/O error, dev hda, sector
> 1559462
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: hda: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Jun 11 13:32:37 server fsck: ^B
> Jun 11 13:34:45 server kernel: hda: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=1559594, sector=1559470
> Jun 11 13:32:37 server fsck: ^B
> 
> atd....
> 
> Po nabehu mi z pole vypadla oblast /dev/hda2. Rikal jsem si tedy, ze to
> vypada na vadny disk. Ale zkusil jsem jeste tuto oblast znovu do pole
> zapojit a k memu prekvapeni to bez problemu proslo.
> 
> Kazdopadne jsem jeste vyzkousel: smartctl -a /dev/hda
> a vypsalo to toto:
> 
> smartctl version 5.21 Copyright (C) 2002-3 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> 
> === START OF INFORMATION SECTION ===
> Device Model:     WDC WD800JB-00CRA1
> Serial Number:    WD-WCA8E7531992
> Firmware Version: 17.07W17
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   5
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Mon Jun 13 13:35:04 2005 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x84) Offline data collection activity
> was
>                                         suspended by an interrupting
> command from host.
>                                         Auto Offline Data Collection:
> Enabled.
> Self-test execution status:      (   0) The previous self-test routine
> completed
>                                         without error or no self-test
> has ever
>                                         been run.
> Total time to complete Offline
> data collection:                 (3120) seconds.
> Offline data collection
> capabilities:                    (0x3b) SMART execute Offline immediate.
>                                         Auto Offline data collection
> on/off support.
>                                         Suspend Offline collection upon new
>                                         command.
>                                         Offline surface scan supported.
>                                         Self-test supported.
>                                         Conveyance Self-test supported.
>                                         No Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                         power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         No General Purpose Logging support.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        (  58) minutes.
> Conveyance self-test routine
> recommended polling time:        (   5) minutes.
> 
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED 
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail
> Always       -       0
>   3 Spin_Up_Time            0x0007   103   094   021    Pre-fail
> Always       -       3941
>   4 Start_Stop_Count        0x0032   100   100   040    Old_age
> Always       -       127
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail
> Always       -       0
>   9 Power_On_Hours          0x0032   078   078   000    Old_age
> Always       -       16446
>  10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail
> Always       -       0
>  11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       108
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
>       -       0
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
>       -       0
> 198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always
>       -       0
> 199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always
>       -       0
> 200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail
> Offline      -       0
> 
> SMART Error Log Version: 1
> ATA Error Count: 17 (device log contains only the most recent five errors)
>         CR = Command Register [HEX]
>         FR = Features Register [HEX]
>         SC = Sector Count Register [HEX]
>         SN = Sector Number Register [HEX]
>         CL = Cylinder Low Register [HEX]
>         CH = Cylinder High Register [HEX]
>         DH = Device/Head Register [HEX]
>         DC = Device Command Register [HEX]
>         ER = Error register [HEX]
>         ST = Status register [HEX]
> Timestamp = decimal seconds since the previous disk power-on.
> Note: timestamp "wraps" after 2^32 msec = 49.710 days.
> 
> Error 17 occurred at disk power-on lifetime: 13 hours
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 80 2a cc 17 e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 80 26 cc 17 e0 00     102.100  READ DMA
>   c8 00 88 1e cc 17 e0 00      99.450  READ DMA
>   c8 00 90 16 cc 17 e0 00      96.850  READ DMA
>   c8 00 98 0e cc 17 e0 00      94.400  READ DMA
>   c8 00 a0 06 cc 17 e0 00      91.900  READ DMA
> 
> Error 16 occurred at disk power-on lifetime: 13 hours
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 88 2a cc 17 e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 88 1e cc 17 e0 00      99.450  READ DMA
>   c8 00 90 16 cc 17 e0 00      96.850  READ DMA
>   c8 00 98 0e cc 17 e0 00      94.400  READ DMA
>   c8 00 a0 06 cc 17 e0 00      91.900  READ DMA
>   c8 00 a8 fe cb 17 e0 00      89.400  READ DMA
> 
> Error 15 occurred at disk power-on lifetime: 13 hours
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 90 2a cc 17 e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 90 16 cc 17 e0 00      96.850  READ DMA
>   c8 00 98 0e cc 17 e0 00      94.400  READ DMA
>   c8 00 a0 06 cc 17 e0 00      91.900  READ DMA
>   c8 00 a8 fe cb 17 e0 00      89.400  READ DMA
>   c8 00 b0 f6 cb 17 e0 00      86.800  READ DMA
> 
> Error 14 occurred at disk power-on lifetime: 13 hours
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 98 2a cc 17 e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 98 0e cc 17 e0 00      94.400  READ DMA
>   c8 00 a0 06 cc 17 e0 00      91.900  READ DMA
>   c8 00 a8 fe cb 17 e0 00      89.400  READ DMA
>   c8 00 b0 f6 cb 17 e0 00      86.800  READ DMA
>   c8 00 b8 ee cb 17 e0 00      84.300  READ DMA
> 
> Error 13 occurred at disk power-on lifetime: 13 hours
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 a0 2a cc 17 e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 a0 06 cc 17 e0 00      91.900  READ DMA
>   c8 00 a8 fe cb 17 e0 00      89.400  READ DMA
>   c8 00 b0 f6 cb 17 e0 00      86.800  READ DMA
>   c8 00 b8 ee cb 17 e0 00      84.300  READ DMA
>   c8 00 c0 e6 cb 17 e0 00      81.700  READ DMA
> 
> SMART Self-test log structure revision number 1
> No self-tests have been logged.  [Use the smartctl -t option to run these.]
> 
> 
> Popravde receno se v tom moc nevyznam, ale chapu to tak, ze disk je vadny.
> 
> Co vy na to?
> 
> Diky za radu.
> 
> JF


Další informace o konferenci Linux