Podezreni na problem s hdd - jak to zjistit?

Martin Dvorak martin.dvorak na jasnet.cz
Úterý Únor 10 14:23:10 CET 2004


myslim, ze je lepsi CELY server otestovat, napriklad muze byt spatna 
deska...
asi je nejlepsi ohlasit jednonocni vypadek a zurive testovat vsechny 
komponenty ;-)))
martik

Michal Samek wrote:
> DD,
> mam posledni dobou jakesi podivne problemy na serveru (viz thread o
> nekillnutelnosti jigda) a zacinam si myslet, ze to muze byt diskem.
> Protoze i samba mi pri pristupu k nekterym datum silne zpomaluje, lidi
> si obcas stezuji, ze jim to jede pomalu (mame tu hlavne dosove ucto
> sdilene na sambovych discich).
> 
> Uz jsem tu barracudu jednou menil (a podobny typ mi odesel uz nekolikrat
> jinde), je to nejaka nestastna serie. Neni to ale v teto chvili ve
> stavu, ze bych z logu poznal, ze je neco spatne, je to pouze misty velmi
> zpomalene - jakoby se snazil ta problemova mista cist porad dokolecka,
> ale nakonec z toho ta data nejak vylovi. 
> 
> Da se tahle situace detekovat smartem? Staci, kdyz povolim logovani a
> budu zkouset smartctl -l, zda to neco vypise? Zatim to teda zadne chyby
> nehlasi a z toho vypisu smartctl -a tez nejsem moc moudry. Ze by se
> takhle choval filesystem se mi moc nezda (je tam ext3 na rh7.3
> 2.4.18-4), aspon jsem to jeste nikde nevidel. 
> 
> Pokud nekdo mate nejaky napad, jak to zdiagnostikovat za provozu, diky
> za nej.
> 
> Prikladam smartctl -a:
>  smartctl -a /dev/hda
> Device: ST380011A  Supports ATA Version 6
> Drive supports S.M.A.R.T. and is enabled
> Check S.M.A.R.T. Passed.
> 
> General Smart Values: 
> Off-line data collection status: (0x82)	Offline data collection activity
> 					completed without error
> 
> Self-test execution status:      (   0)	The previous self-test routine
> completed
> 					without error or no self-test has ever 
> 					been run
> 
> Total time to complete off-line 
> data collection: 		 ( 430) Seconds
> 
> Offline data collection 
> Capabilities: 			 (0x5b)SMART EXECUTE OFF-LINE IMMEDIATE
> 					Automatic timer ON/OFF support
> 					Suspend Offline Collection upon new
> 					command
> 					Offline surface scan supported
> 					Self-test supported
> 
> Smart Capablilities:           (0x0003)	Saves SMART data before entering
> 					power-saving mode
> 					Supports SMART auto save timer
> 
> Error logging capability:        (0x01)	Error logging supported
> 
> Short self-test routine 
> recommended polling time: 	 (   1) Minutes
> 
> Extended self-test routine 
> recommended polling time: 	 (  58) Minutes
> 
> Vendor Specific SMART Attributes with Thresholds:
> Revision Number: 10
> Attribute                    Flag     Value Worst Threshold Raw Value
> (  1)Raw Read Error Rate     0x000f   066   065   006       96138821
> (  3)Spin Up Time            0x0003   099   098   000       0
> (  4)Start Stop Count        0x0032   100   100   020       16
> (  5)Reallocated Sector Ct   0x0033   100   100   036       0
> (  7)Seek Error Rate         0x000f   081   060   030       162853821
> (  9)Power On Hours          0x0032   097   097   000       3118
> ( 10)Spin Retry Count        0x0013   100   100   097       0
> ( 12)Power Cycle Count       0x0032   100   100   020       24
> (194)Temperature             0x0022   029   041   000       29
> (195)Hardware ECC Recovered  0x001a   066   065   000       96138821
> (197)Current Pending Sector  0x0012   100   100   000       0
> (198)Offline Uncorrectable   0x0010   100   100   000       0
> (199)UDMA CRC Error Count    0x003e   200   200   000       0
> (200)Unknown Attribute       0x0000   100   253   000       0
> (202)Unknown Attribute       0x0032   100   253   000       0
> SMART Error Log:
> SMART Error Logging Version: 1
> No Errors Logged


Další informace o konferenci Linux