disky a smart
Petr Baláš
petr na balas.cz
Sobota Květen 8 11:32:21 CEST 2010
Takze
Disky se seriovym cislem 9VP51DKE a 9VP4WW02
a disky se seriovym cislem 5VP4YE6S a 5VP4Z4BJ nefunguji.
Zacinam z toho mit dojem, ze to je dalsi pripad a la disky Seagate
1,5TB co nefungovaly v linux SW RAIDu a trvalo nekolik
mesicu nez Seagate vydal opravny firmware.
Seagate - pruser s 1,5 TB disky a SW RAIDem a ted tohle.
WD - ted jsem doresil pruser se zoufale pomalym (Win) serverem
a vypadky site tim, ze jsem ze SW RAID u vyhodil disk WD
a nahradil ho za Seagate
Samsung - mel jsem je rad do te doby, nez jsem vyreklamoval
cca 80procent vsech disku o velikosti 1 TB co prosly okolo me.
Existuje jeste vubec nejaky pouzitelny vyrobce pevnych disku?
2010/5/8 Petr Baláš <petr na balas.cz>:
> Zdravim
>
> Sklasam tu jeden mensi servrik a vypadavaji mi tu disky ze SW RAID pole.
> Pole je RAID5 - /dev/sda3, /dev/sdb3, /dev/sdc3, /dev/sdd3
> Dvakrat za sebou (behem par hodin) vypadly disky sda a sdd.
>
> smart vesele tvrdi, ze disky jsou OK:
>
> localhost:~# smartctl -H /dev/sda
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
> Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> ale vysledky selftestu s tim moc nekoresponduji:
>
> localhost:~# smartctl -l selftest /dev/sda
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
> Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining
> LifeTime(hours) LBA_of_first_error
> # 1 Extended offline Completed: read failure 90% 61
> 1905068652
> # 2 Conveyance offline Completed: read failure 90% 61
> 1905068652
>
>
> Jake mate zkusenosti s duveryhodnosti toho, co leze ze smartu?
>
> Vsechny disky jsou Seagate ST31000528AS
> Debian, 64bit, kernel vlastni 2.6.33.3, disky v BIOSu nastaveny na AHCI
> Jeden vadny disk bych bral ale dva najednou me trochu prekvapuji.
>
>
> Chyby ze strany Linuxu vypadaly takto:
> ata1.00: qc timeout (cmd 0x2f)
> ata1: failed to read log page 10h (errno=-5)
> ata1.00: exception Emask 0x1 SAct 0x1ff SErr 0x0 action 0x6 frozen
> ata1.00: irq_stat 0x40000008
> ata1.00: failed command: READ FPDMA QUEUED
> ata1.00: cmd 60/00:00:0c:0c:8d/01:00:71:00:00/40 tag 0 ncq 131072 in
> res 40/00:20:0c:0b:8d/00:00:71:00:00/40 Emask 0x1 (device error)
> ata1.00: status: { DRDY }
> .....
> ata1: hard resetting link
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: configured for UDMA/133
> ata1: EH complete
> ata1.00: qc timeout (cmd 0x2f)
> ata1: failed to read log page 10h (errno=-5)
> ata1.00: exception Emask 0x1 SAct 0x1ff SErr 0x0 action 0x6 frozen
> ata1.00: irq_stat 0x40000008
> ata1.00: failed command: READ FPDMA QUEUED
> ata1.00: cmd 60/00:00:0c:12:8d/01:00:71:00:00/40 tag 0 ncq 131072 in
> res 40/00:40:0c:0c:8d/00:00:71:00:00/40 Emask 0x1 (device error)
> ata1.00: status: { DRDY }
> .....
>
>
> Jeste pro doplneni - tohle je error log.
>
> localhost:~# smartctl -l error /dev/sda
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
> Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> ATA Error Count: 22 (device log contains only the most recent five errors)
> CR = Command Register [HEX]
> FR = Features Register [HEX]
> SC = Sector Count Register [HEX]
> SN = Sector Number Register [HEX]
> CL = Cylinder Low Register [HEX]
> CH = Cylinder High Register [HEX]
> DH = Device/Head Register [HEX]
> DC = Device Command Register [HEX]
> ER = Error register [HEX]
> ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 22 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 ff ff ff ef 00 1d+02:49:01.027 READ DMA EXT
> 27 00 00 00 00 00 e0 00 1d+02:49:01.027 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 1d+02:49:01.026 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 1d+02:49:01.006 SET FEATURES [Set transfer mode]
> 27 00 00 00 00 00 e0 00 1d+02:49:01.006 READ NATIVE MAX ADDRESS EXT
>
> Error 21 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 ff ff ff ef 00 1d+02:48:57.880 READ DMA EXT
> 27 00 00 00 00 00 e0 00 1d+02:48:57.880 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 1d+02:48:57.879 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 1d+02:48:57.859 SET FEATURES [Set transfer mode]
> 27 00 00 00 00 00 e0 00 1d+02:48:57.859 READ NATIVE MAX ADDRESS EXT
>
> Error 20 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 ff ff ff ef 00 1d+02:48:54.708 READ DMA EXT
> 27 00 00 00 00 00 e0 00 1d+02:48:54.707 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 1d+02:48:54.706 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 1d+02:48:54.687 SET FEATURES [Set transfer mode]
> 27 00 00 00 00 00 e0 00 1d+02:48:54.687 READ NATIVE MAX ADDRESS EXT
>
> Error 19 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 ff ff ff ef 00 1d+02:48:51.554 READ DMA EXT
> 27 00 00 00 00 00 e0 00 1d+02:48:51.552 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 1d+02:48:51.551 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 1d+02:48:51.551 SET FEATURES [Set transfer mode]
> 27 00 00 00 00 00 e0 00 1d+02:48:51.531 READ NATIVE MAX ADDRESS EXT
>
> Error 18 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
> When the command that caused the error occurred, the device was
> active or idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 e0 ff ff ff ef 00 1d+02:48:48.406 READ DMA EXT
> 27 00 00 00 00 00 e0 00 1d+02:48:48.405 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 1d+02:48:48.404 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 1d+02:48:48.384 SET FEATURES [Set transfer mode]
> 27 00 00 00 00 00 e0 00 1d+02:48:48.384 READ NATIVE MAX ADDRESS EXT
>
> --
> Petr Baláš - petr at balas dot cz
>
--
Petr Baláš - petr at balas dot cz
Další informace o konferenci Linux