Podezreni na problem s hdd - jak to zjistit?
Michal Samek
webmaster na tony.cz
Úterý Únor 10 15:06:18 CET 2004
DD,
mam posledni dobou jakesi podivne problemy na serveru (viz thread o
nekillnutelnosti jigda) a zacinam si myslet, ze to muze byt diskem.
Protoze i samba mi pri pristupu k nekterym datum silne zpomaluje, lidi
si obcas stezuji, ze jim to jede pomalu (mame tu hlavne dosove ucto
sdilene na sambovych discich).
Uz jsem tu barracudu jednou menil (a podobny typ mi odesel uz nekolikrat
jinde), je to nejaka nestastna serie. Neni to ale v teto chvili ve
stavu, ze bych z logu poznal, ze je neco spatne, je to pouze misty velmi
zpomalene - jakoby se snazil ta problemova mista cist porad dokolecka,
ale nakonec z toho ta data nejak vylovi.
Da se tahle situace detekovat smartem? Staci, kdyz povolim logovani a
budu zkouset smartctl -l, zda to neco vypise? Zatim to teda zadne chyby
nehlasi a z toho vypisu smartctl -a tez nejsem moc moudry. Ze by se
takhle choval filesystem se mi moc nezda (je tam ext3 na rh7.3
2.4.18-4), aspon jsem to jeste nikde nevidel.
Pokud nekdo mate nejaky napad, jak to zdiagnostikovat za provozu, diky
za nej.
Prikladam smartctl -a:
smartctl -a /dev/hda
Device: ST380011A Supports ATA Version 6
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed.
General Smart Values:
Off-line data collection status: (0x82) Offline data collection activity
completed without error
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run
Total time to complete off-line
data collection: ( 430) Seconds
Offline data collection
Capabilities: (0x5b)SMART EXECUTE OFF-LINE IMMEDIATE
Automatic timer ON/OFF support
Suspend Offline Collection upon new
command
Offline surface scan supported
Self-test supported
Smart Capablilities: (0x0003) Saves SMART data before entering
power-saving mode
Supports SMART auto save timer
Error logging capability: (0x01) Error logging supported
Short self-test routine
recommended polling time: ( 1) Minutes
Extended self-test routine
recommended polling time: ( 58) Minutes
Vendor Specific SMART Attributes with Thresholds:
Revision Number: 10
Attribute Flag Value Worst Threshold Raw Value
( 1)Raw Read Error Rate 0x000f 066 065 006 96138821
( 3)Spin Up Time 0x0003 099 098 000 0
( 4)Start Stop Count 0x0032 100 100 020 16
( 5)Reallocated Sector Ct 0x0033 100 100 036 0
( 7)Seek Error Rate 0x000f 081 060 030 162853821
( 9)Power On Hours 0x0032 097 097 000 3118
( 10)Spin Retry Count 0x0013 100 100 097 0
( 12)Power Cycle Count 0x0032 100 100 020 24
(194)Temperature 0x0022 029 041 000 29
(195)Hardware ECC Recovered 0x001a 066 065 000 96138821
(197)Current Pending Sector 0x0012 100 100 000 0
(198)Offline Uncorrectable 0x0010 100 100 000 0
(199)UDMA CRC Error Count 0x003e 200 200 000 0
(200)Unknown Attribute 0x0000 100 253 000 0
(202)Unknown Attribute 0x0032 100 253 000 0
SMART Error Log:
SMART Error Logging Version: 1
No Errors Logged
--
Michal Samek <webmaster na tony.cz>
Další informace o konferenci Linux