Podezreni na problem s hdd - jak to zjistit?

Michal Samek webmaster na tony.cz
Úterý Únor 10 15:06:18 CET 2004


DD,
mam posledni dobou jakesi podivne problemy na serveru (viz thread o
nekillnutelnosti jigda) a zacinam si myslet, ze to muze byt diskem.
Protoze i samba mi pri pristupu k nekterym datum silne zpomaluje, lidi
si obcas stezuji, ze jim to jede pomalu (mame tu hlavne dosove ucto
sdilene na sambovych discich).

Uz jsem tu barracudu jednou menil (a podobny typ mi odesel uz nekolikrat
jinde), je to nejaka nestastna serie. Neni to ale v teto chvili ve
stavu, ze bych z logu poznal, ze je neco spatne, je to pouze misty velmi
zpomalene - jakoby se snazil ta problemova mista cist porad dokolecka,
ale nakonec z toho ta data nejak vylovi. 

Da se tahle situace detekovat smartem? Staci, kdyz povolim logovani a
budu zkouset smartctl -l, zda to neco vypise? Zatim to teda zadne chyby
nehlasi a z toho vypisu smartctl -a tez nejsem moc moudry. Ze by se
takhle choval filesystem se mi moc nezda (je tam ext3 na rh7.3
2.4.18-4), aspon jsem to jeste nikde nevidel. 

Pokud nekdo mate nejaky napad, jak to zdiagnostikovat za provozu, diky
za nej.

Prikladam smartctl -a:
 smartctl -a /dev/hda
Device: ST380011A  Supports ATA Version 6
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed.

General Smart Values: 
Off-line data collection status: (0x82)	Offline data collection activity
					completed without error

Self-test execution status:      (   0)	The previous self-test routine
completed
					without error or no self-test has ever 
					been run

Total time to complete off-line 
data collection: 		 ( 430) Seconds

Offline data collection 
Capabilities: 			 (0x5b)SMART EXECUTE OFF-LINE IMMEDIATE
					Automatic timer ON/OFF support
					Suspend Offline Collection upon new
					command
					Offline surface scan supported
					Self-test supported

Smart Capablilities:           (0x0003)	Saves SMART data before entering
					power-saving mode
					Supports SMART auto save timer

Error logging capability:        (0x01)	Error logging supported

Short self-test routine 
recommended polling time: 	 (   1) Minutes

Extended self-test routine 
recommended polling time: 	 (  58) Minutes

Vendor Specific SMART Attributes with Thresholds:
Revision Number: 10
Attribute                    Flag     Value Worst Threshold Raw Value
(  1)Raw Read Error Rate     0x000f   066   065   006       96138821
(  3)Spin Up Time            0x0003   099   098   000       0
(  4)Start Stop Count        0x0032   100   100   020       16
(  5)Reallocated Sector Ct   0x0033   100   100   036       0
(  7)Seek Error Rate         0x000f   081   060   030       162853821
(  9)Power On Hours          0x0032   097   097   000       3118
( 10)Spin Retry Count        0x0013   100   100   097       0
( 12)Power Cycle Count       0x0032   100   100   020       24
(194)Temperature             0x0022   029   041   000       29
(195)Hardware ECC Recovered  0x001a   066   065   000       96138821
(197)Current Pending Sector  0x0012   100   100   000       0
(198)Offline Uncorrectable   0x0010   100   100   000       0
(199)UDMA CRC Error Count    0x003e   200   200   000       0
(200)Unknown Attribute       0x0000   100   253   000       0
(202)Unknown Attribute       0x0032   100   253   000       0
SMART Error Log:
SMART Error Logging Version: 1
No Errors Logged


-- 
Michal Samek <webmaster na tony.cz>



Další informace o konferenci Linux