smartd - nez odejde disk

Vaclav Dvorsky hufhendr na sendmail.cz
Úterý Červen 3 22:16:30 CEST 2003


Honza Houstek napsal(a):
>>Ten server bezi take ve skleniku (zasklena panelakova lodzie) a o
>>vikendu jsem pridal dalsi ventilator nad zdroj, teplota disku je ted
>>aktualne na 45°C, obcas v noci jde na 40°C.
>>
>>Je tedy mozne, ze smartd funguje a SMS mi neprisla, protoze je vsechno v
>>poradku? Hlida vubec smartd napr. kritickou hodnotu teploty nebo se
>>omezuje jen na chyby zapisu/cteni?
> 
> Smart ma nekolik ukazatelu ktere maji urcite hodnoty. Co konkretne tyto
> hodnoty znamenaji zavisi na konkretnim typu disku a mozna by to slo nekde
> dohledat. Neni to ale dulezite. Napr. toto je vystup od popisovaneho disku
> v pocitaci ve "skleniku"
> 
> Id=  1  Status=15  {Prefailure  Online }  Value= 61  Threshold=  6  Passed
> Id=  3  Status= 3  {Prefailure  Online }  Value= 99  Threshold=  0  Passed
> Id=  4  Status=50  {Advisory    Online }  Value=100  Threshold= 20  Passed
> Id=  5  Status=51  {Prefailure  Online }  Value=100  Threshold= 36  Passed
> Id=  7  Status=15  {Prefailure  Online }  Value= 76  Threshold= 30  Passed
> Id=  9  Status=50  {Advisory    Online }  Value= 99  Threshold=  0  Passed
> Id= 10  Status=19  {Prefailure  Online }  Value=100  Threshold= 97  Passed
> Id= 12  Status=50  {Advisory    Online }  Value=100  Threshold= 20  Passed
> Id=194  Status=34  {Advisory    Online }  Value= 54  Threshold=  0  Passed
> Id=195  Status=26  {Advisory    Online }  Value= 61  Threshold=  0  Passed
> Id=197  Status=18  {Advisory    Online }  Value=100  Threshold=  0  Passed
> Id=198  Status=16  {Advisory    OffLine}  Value=100  Threshold=  0  Passed
> Id=199  Status=62  {Advisory    Online }  Value=200  Threshold=  0  Passed
> Id=200  Status= 0  {Advisory    OffLine}  Value=100  Threshold=  0  Passed
> Id=202  Status=50  {Advisory    Online }  Value=100  Threshold=  0  Passed
> 
> Dulezite je u kazde hodnoty, zda je Advisory ci Prefailure. Kdyz se neco
> stane s hodnotou ukazatele Advisory, tak to nic nemusi znamenat, kdezto
> kdyz se jedna o Prefailure, tak temer jiste disk mele z posledniho.
> 
> "Neco se stane" znamena zejmena pokles hodnoty pod Threshold.
> 
> Konkretne u tohoto disku je teplota obsahem ukazatele s Id 194 a je v
> Celsiovych stupnich (54). Vsimnete si, ze polozka je Advisory a ma
> Threshold 0 (tedy nebyla by Passed pouze v pripade, ze by teplota klesla
> pod 0, coz ale stejne nenastane ani v mrazaku, nebot ten ukazatel je
> omezen intervalem 0-100).
> 

Vypis meho disku vypada trochu jinak a treba u polozky 194 vidim
aktualni teplotu na 44°C. Worst je 57°C, coz je nejvyssi dosazena
teplota - to sedi. Ostatnim hodnotam moc nerozumim.
Jen doufam, ze tomu rozumi smartd a vi, kdy ma zareagovat. Mne by
zajimalo, jestli s tim ma nekdo zkusenosti a jestli to mam nastavene
spravne?

smartctl -a /dev/hda
Device: ST310215A  Supports ATA Version 4
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed.

General Smart Values:
Off-line data collection status: (0x82) Offline data collection activity
                                         completed without error

Self-test execution status:      (  36) The self-test routine was 
interrupted
                                         by the host with a hard or soft 
reset

Total time to complete off-line
data collection:                 ( 422) Seconds

Offline data collection
Capabilities:                    (0x1b)SMART EXECUTE OFF-LINE IMMEDIATE
                                         Automatic timer ON/OFF support
                                         Suspend Offline Collection upon new
                                         command
                                         Offline surface scan supported
                                         Self-test supported

Smart Capablilities:           (0x0003) Saves SMART data before entering
                                         power-saving mode
                                         Supports SMART auto save timer

Error logging capability:        (0x01) Error logging supported

Short self-test routine
recommended polling time:        (   1) Minutes

Extended self-test routine
recommended polling time:        (  10) Minutes

Vendor Specific SMART Attributes with Thresholds:
Revision Number: 10
Attribute                    Flag     Value Worst Threshold Raw Value
(  1)Raw Read Error Rate     0x000f   076   066   025       129355914
(  3)Spin Up Time            0x0003   073   070   000       0
(  4)Start Stop Count        0x0032   100   100   020       379
(  5)Reallocated Sector Ct   0x0033   100   100   036       0
(  7)Seek Error Rate         0x000f   078   060   030       60791053
(  9)Power On Hours          0x0032   093   093   000       6580
( 10)Spin Retry Count        0x0013   100   099   000       0
( 12)Power Cycle Count       0x0032   100   100   020       119
(194)Temperature             0x0022   044   057   000       44
(195)Hardware ECC Recovered  0x001a   073   064   000       70811798
(197)Current Pending Sector  0x0012   100   100   000       0
(198)Offline Uncorrectable   0x0010   100   100   000       0
(199)UDMA CRC Error Count    0x003e   200   200   000       0
(200)Unknown Attribute       0x0000   100   100   000       0
(202)Unknown Attribute       0x0032   100   253   000       0
SMART Error Log:
SMART Error Logging Version: 1
No Errors Logged
[root na Akira root]#

-- 
Vaclav Dvorsky
http://www.akira.cz
emajl: hufhendr na akira.cz, iso-8859-2
tel: +420608021530, PGP: 0xD38E2CA7, X.509 supported



Další informace o konferenci Linux