smartd - nez odejde disk
Vaclav Dvorsky
hufhendr na sendmail.cz
Úterý Červen 3 22:16:30 CEST 2003
Honza Houstek napsal(a):
>>Ten server bezi take ve skleniku (zasklena panelakova lodzie) a o
>>vikendu jsem pridal dalsi ventilator nad zdroj, teplota disku je ted
>>aktualne na 45°C, obcas v noci jde na 40°C.
>>
>>Je tedy mozne, ze smartd funguje a SMS mi neprisla, protoze je vsechno v
>>poradku? Hlida vubec smartd napr. kritickou hodnotu teploty nebo se
>>omezuje jen na chyby zapisu/cteni?
>
> Smart ma nekolik ukazatelu ktere maji urcite hodnoty. Co konkretne tyto
> hodnoty znamenaji zavisi na konkretnim typu disku a mozna by to slo nekde
> dohledat. Neni to ale dulezite. Napr. toto je vystup od popisovaneho disku
> v pocitaci ve "skleniku"
>
> Id= 1 Status=15 {Prefailure Online } Value= 61 Threshold= 6 Passed
> Id= 3 Status= 3 {Prefailure Online } Value= 99 Threshold= 0 Passed
> Id= 4 Status=50 {Advisory Online } Value=100 Threshold= 20 Passed
> Id= 5 Status=51 {Prefailure Online } Value=100 Threshold= 36 Passed
> Id= 7 Status=15 {Prefailure Online } Value= 76 Threshold= 30 Passed
> Id= 9 Status=50 {Advisory Online } Value= 99 Threshold= 0 Passed
> Id= 10 Status=19 {Prefailure Online } Value=100 Threshold= 97 Passed
> Id= 12 Status=50 {Advisory Online } Value=100 Threshold= 20 Passed
> Id=194 Status=34 {Advisory Online } Value= 54 Threshold= 0 Passed
> Id=195 Status=26 {Advisory Online } Value= 61 Threshold= 0 Passed
> Id=197 Status=18 {Advisory Online } Value=100 Threshold= 0 Passed
> Id=198 Status=16 {Advisory OffLine} Value=100 Threshold= 0 Passed
> Id=199 Status=62 {Advisory Online } Value=200 Threshold= 0 Passed
> Id=200 Status= 0 {Advisory OffLine} Value=100 Threshold= 0 Passed
> Id=202 Status=50 {Advisory Online } Value=100 Threshold= 0 Passed
>
> Dulezite je u kazde hodnoty, zda je Advisory ci Prefailure. Kdyz se neco
> stane s hodnotou ukazatele Advisory, tak to nic nemusi znamenat, kdezto
> kdyz se jedna o Prefailure, tak temer jiste disk mele z posledniho.
>
> "Neco se stane" znamena zejmena pokles hodnoty pod Threshold.
>
> Konkretne u tohoto disku je teplota obsahem ukazatele s Id 194 a je v
> Celsiovych stupnich (54). Vsimnete si, ze polozka je Advisory a ma
> Threshold 0 (tedy nebyla by Passed pouze v pripade, ze by teplota klesla
> pod 0, coz ale stejne nenastane ani v mrazaku, nebot ten ukazatel je
> omezen intervalem 0-100).
>
Vypis meho disku vypada trochu jinak a treba u polozky 194 vidim
aktualni teplotu na 44°C. Worst je 57°C, coz je nejvyssi dosazena
teplota - to sedi. Ostatnim hodnotam moc nerozumim.
Jen doufam, ze tomu rozumi smartd a vi, kdy ma zareagovat. Mne by
zajimalo, jestli s tim ma nekdo zkusenosti a jestli to mam nastavene
spravne?
smartctl -a /dev/hda
Device: ST310215A Supports ATA Version 4
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed.
General Smart Values:
Off-line data collection status: (0x82) Offline data collection activity
completed without error
Self-test execution status: ( 36) The self-test routine was
interrupted
by the host with a hard or soft
reset
Total time to complete off-line
data collection: ( 422) Seconds
Offline data collection
Capabilities: (0x1b)SMART EXECUTE OFF-LINE IMMEDIATE
Automatic timer ON/OFF support
Suspend Offline Collection upon new
command
Offline surface scan supported
Self-test supported
Smart Capablilities: (0x0003) Saves SMART data before entering
power-saving mode
Supports SMART auto save timer
Error logging capability: (0x01) Error logging supported
Short self-test routine
recommended polling time: ( 1) Minutes
Extended self-test routine
recommended polling time: ( 10) Minutes
Vendor Specific SMART Attributes with Thresholds:
Revision Number: 10
Attribute Flag Value Worst Threshold Raw Value
( 1)Raw Read Error Rate 0x000f 076 066 025 129355914
( 3)Spin Up Time 0x0003 073 070 000 0
( 4)Start Stop Count 0x0032 100 100 020 379
( 5)Reallocated Sector Ct 0x0033 100 100 036 0
( 7)Seek Error Rate 0x000f 078 060 030 60791053
( 9)Power On Hours 0x0032 093 093 000 6580
( 10)Spin Retry Count 0x0013 100 099 000 0
( 12)Power Cycle Count 0x0032 100 100 020 119
(194)Temperature 0x0022 044 057 000 44
(195)Hardware ECC Recovered 0x001a 073 064 000 70811798
(197)Current Pending Sector 0x0012 100 100 000 0
(198)Offline Uncorrectable 0x0010 100 100 000 0
(199)UDMA CRC Error Count 0x003e 200 200 000 0
(200)Unknown Attribute 0x0000 100 100 000 0
(202)Unknown Attribute 0x0032 100 253 000 0
SMART Error Log:
SMART Error Logging Version: 1
No Errors Logged
[root na Akira root]#
--
Vaclav Dvorsky
http://www.akira.cz
emajl: hufhendr na akira.cz, iso-8859-2
tel: +420608021530, PGP: 0xD38E2CA7, X.509 supported
Další informace o konferenci Linux