Podezreni na problem s hdd - jak to zjistit?
Martin Dvorak
martin.dvorak na jasnet.cz
Úterý Únor 10 14:23:10 CET 2004
myslim, ze je lepsi CELY server otestovat, napriklad muze byt spatna
deska...
asi je nejlepsi ohlasit jednonocni vypadek a zurive testovat vsechny
komponenty ;-)))
martik
Michal Samek wrote:
> DD,
> mam posledni dobou jakesi podivne problemy na serveru (viz thread o
> nekillnutelnosti jigda) a zacinam si myslet, ze to muze byt diskem.
> Protoze i samba mi pri pristupu k nekterym datum silne zpomaluje, lidi
> si obcas stezuji, ze jim to jede pomalu (mame tu hlavne dosove ucto
> sdilene na sambovych discich).
>
> Uz jsem tu barracudu jednou menil (a podobny typ mi odesel uz nekolikrat
> jinde), je to nejaka nestastna serie. Neni to ale v teto chvili ve
> stavu, ze bych z logu poznal, ze je neco spatne, je to pouze misty velmi
> zpomalene - jakoby se snazil ta problemova mista cist porad dokolecka,
> ale nakonec z toho ta data nejak vylovi.
>
> Da se tahle situace detekovat smartem? Staci, kdyz povolim logovani a
> budu zkouset smartctl -l, zda to neco vypise? Zatim to teda zadne chyby
> nehlasi a z toho vypisu smartctl -a tez nejsem moc moudry. Ze by se
> takhle choval filesystem se mi moc nezda (je tam ext3 na rh7.3
> 2.4.18-4), aspon jsem to jeste nikde nevidel.
>
> Pokud nekdo mate nejaky napad, jak to zdiagnostikovat za provozu, diky
> za nej.
>
> Prikladam smartctl -a:
> smartctl -a /dev/hda
> Device: ST380011A Supports ATA Version 6
> Drive supports S.M.A.R.T. and is enabled
> Check S.M.A.R.T. Passed.
>
> General Smart Values:
> Off-line data collection status: (0x82) Offline data collection activity
> completed without error
>
> Self-test execution status: ( 0) The previous self-test routine
> completed
> without error or no self-test has ever
> been run
>
> Total time to complete off-line
> data collection: ( 430) Seconds
>
> Offline data collection
> Capabilities: (0x5b)SMART EXECUTE OFF-LINE IMMEDIATE
> Automatic timer ON/OFF support
> Suspend Offline Collection upon new
> command
> Offline surface scan supported
> Self-test supported
>
> Smart Capablilities: (0x0003) Saves SMART data before entering
> power-saving mode
> Supports SMART auto save timer
>
> Error logging capability: (0x01) Error logging supported
>
> Short self-test routine
> recommended polling time: ( 1) Minutes
>
> Extended self-test routine
> recommended polling time: ( 58) Minutes
>
> Vendor Specific SMART Attributes with Thresholds:
> Revision Number: 10
> Attribute Flag Value Worst Threshold Raw Value
> ( 1)Raw Read Error Rate 0x000f 066 065 006 96138821
> ( 3)Spin Up Time 0x0003 099 098 000 0
> ( 4)Start Stop Count 0x0032 100 100 020 16
> ( 5)Reallocated Sector Ct 0x0033 100 100 036 0
> ( 7)Seek Error Rate 0x000f 081 060 030 162853821
> ( 9)Power On Hours 0x0032 097 097 000 3118
> ( 10)Spin Retry Count 0x0013 100 100 097 0
> ( 12)Power Cycle Count 0x0032 100 100 020 24
> (194)Temperature 0x0022 029 041 000 29
> (195)Hardware ECC Recovered 0x001a 066 065 000 96138821
> (197)Current Pending Sector 0x0012 100 100 000 0
> (198)Offline Uncorrectable 0x0010 100 100 000 0
> (199)UDMA CRC Error Count 0x003e 200 200 000 0
> (200)Unknown Attribute 0x0000 100 253 000 0
> (202)Unknown Attribute 0x0032 100 253 000 0
> SMART Error Log:
> SMART Error Logging Version: 1
> No Errors Logged
Další informace o konferenci Linux