disky a smart

Petr Baláš petr na balas.cz
Sobota Květen 8 11:32:21 CEST 2010


Takze

Disky se seriovym cislem 9VP51DKE a 9VP4WW02
a disky se seriovym cislem 5VP4YE6S a 5VP4Z4BJ nefunguji.
Zacinam z toho mit dojem, ze to je dalsi pripad a la disky Seagate
1,5TB co nefungovaly v linux SW RAIDu a trvalo nekolik
mesicu nez Seagate vydal opravny firmware.

Seagate - pruser s 1,5 TB disky a SW RAIDem a ted tohle.
WD - ted jsem doresil pruser se zoufale pomalym (Win) serverem
  a vypadky site tim, ze jsem ze SW RAID u vyhodil disk WD
  a nahradil ho za Seagate
Samsung - mel jsem je rad do te doby, nez jsem vyreklamoval
  cca 80procent vsech disku o velikosti 1 TB co prosly okolo me.

Existuje jeste vubec nejaky pouzitelny vyrobce pevnych disku?



2010/5/8 Petr Baláš <petr na balas.cz>:
> Zdravim
>
> Sklasam tu jeden mensi servrik a vypadavaji mi tu disky ze SW RAID pole.
> Pole je RAID5 - /dev/sda3, /dev/sdb3, /dev/sdc3, /dev/sdd3
> Dvakrat za sebou (behem par hodin) vypadly disky sda a sdd.
>
> smart vesele tvrdi, ze disky jsou OK:
>
> localhost:~# smartctl -H /dev/sda
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
> Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> ale vysledky selftestu s tim moc nekoresponduji:
>
> localhost:~# smartctl -l selftest /dev/sda
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
> Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Completed: read failure       90%        61
>      1905068652
> # 2  Conveyance offline  Completed: read failure       90%        61
>      1905068652
>
>
> Jake mate zkusenosti s duveryhodnosti toho, co leze ze smartu?
>
> Vsechny disky jsou Seagate ST31000528AS
> Debian, 64bit, kernel vlastni 2.6.33.3, disky v BIOSu nastaveny na AHCI
> Jeden vadny disk bych bral ale dva najednou me trochu prekvapuji.
>
>
> Chyby ze strany Linuxu vypadaly takto:
> ata1.00: qc timeout (cmd 0x2f)
> ata1: failed to read log page 10h (errno=-5)
> ata1.00: exception Emask 0x1 SAct 0x1ff SErr 0x0 action 0x6 frozen
> ata1.00: irq_stat 0x40000008
> ata1.00: failed command: READ FPDMA QUEUED
> ata1.00: cmd 60/00:00:0c:0c:8d/01:00:71:00:00/40 tag 0 ncq 131072 in
>         res 40/00:20:0c:0b:8d/00:00:71:00:00/40 Emask 0x1 (device error)
> ata1.00: status: { DRDY }
> .....
> ata1: hard resetting link
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata1.00: configured for UDMA/133
> ata1: EH complete
> ata1.00: qc timeout (cmd 0x2f)
> ata1: failed to read log page 10h (errno=-5)
> ata1.00: exception Emask 0x1 SAct 0x1ff SErr 0x0 action 0x6 frozen
> ata1.00: irq_stat 0x40000008
> ata1.00: failed command: READ FPDMA QUEUED
> ata1.00: cmd 60/00:00:0c:12:8d/01:00:71:00:00/40 tag 0 ncq 131072 in
>         res 40/00:40:0c:0c:8d/00:00:71:00:00/40 Emask 0x1 (device error)
> ata1.00: status: { DRDY }
> .....
>
>
> Jeste pro doplneni - tohle je error log.
>
> localhost:~# smartctl -l error /dev/sda
> smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
> Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART Error Log Version: 1
> ATA Error Count: 22 (device log contains only the most recent five errors)
>        CR = Command Register [HEX]
>        FR = Features Register [HEX]
>        SC = Sector Count Register [HEX]
>        SN = Sector Number Register [HEX]
>        CL = Cylinder Low Register [HEX]
>        CH = Cylinder High Register [HEX]
>        DH = Device/Head Register [HEX]
>        DC = Device Command Register [HEX]
>        ER = Error register [HEX]
>        ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 22 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 00 ff ff ff ef 00   1d+02:49:01.027  READ DMA EXT
>  27 00 00 00 00 00 e0 00   1d+02:49:01.027  READ NATIVE MAX ADDRESS EXT
>  ec 00 00 00 00 00 a0 00   1d+02:49:01.026  IDENTIFY DEVICE
>  ef 03 46 00 00 00 a0 00   1d+02:49:01.006  SET FEATURES [Set transfer mode]
>  27 00 00 00 00 00 e0 00   1d+02:49:01.006  READ NATIVE MAX ADDRESS EXT
>
> Error 21 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 00 ff ff ff ef 00   1d+02:48:57.880  READ DMA EXT
>  27 00 00 00 00 00 e0 00   1d+02:48:57.880  READ NATIVE MAX ADDRESS EXT
>  ec 00 00 00 00 00 a0 00   1d+02:48:57.879  IDENTIFY DEVICE
>  ef 03 46 00 00 00 a0 00   1d+02:48:57.859  SET FEATURES [Set transfer mode]
>  27 00 00 00 00 00 e0 00   1d+02:48:57.859  READ NATIVE MAX ADDRESS EXT
>
> Error 20 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 00 ff ff ff ef 00   1d+02:48:54.708  READ DMA EXT
>  27 00 00 00 00 00 e0 00   1d+02:48:54.707  READ NATIVE MAX ADDRESS EXT
>  ec 00 00 00 00 00 a0 00   1d+02:48:54.706  IDENTIFY DEVICE
>  ef 03 46 00 00 00 a0 00   1d+02:48:54.687  SET FEATURES [Set transfer mode]
>  27 00 00 00 00 00 e0 00   1d+02:48:54.687  READ NATIVE MAX ADDRESS EXT
>
> Error 19 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 00 ff ff ff ef 00   1d+02:48:51.554  READ DMA EXT
>  27 00 00 00 00 00 e0 00   1d+02:48:51.552  READ NATIVE MAX ADDRESS EXT
>  ec 00 00 00 00 00 a0 00   1d+02:48:51.551  IDENTIFY DEVICE
>  ef 03 46 00 00 00 a0 00   1d+02:48:51.551  SET FEATURES [Set transfer mode]
>  27 00 00 00 00 00 e0 00   1d+02:48:51.531  READ NATIVE MAX ADDRESS EXT
>
> Error 18 occurred at disk power-on lifetime: 51 hours (2 days + 3 hours)
>  When the command that caused the error occurred, the device was
> active or idle.
>
>  After command completion occurred, registers were:
>  ER ST SC SN CL CH DH
>  -- -- -- -- -- -- --
>  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
>
>  Commands leading to the command that caused the error were:
>  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>  -- -- -- -- -- -- -- --  ----------------  --------------------
>  25 00 e0 ff ff ff ef 00   1d+02:48:48.406  READ DMA EXT
>  27 00 00 00 00 00 e0 00   1d+02:48:48.405  READ NATIVE MAX ADDRESS EXT
>  ec 00 00 00 00 00 a0 00   1d+02:48:48.404  IDENTIFY DEVICE
>  ef 03 46 00 00 00 a0 00   1d+02:48:48.384  SET FEATURES [Set transfer mode]
>  27 00 00 00 00 00 e0 00   1d+02:48:48.384  READ NATIVE MAX ADDRESS EXT
>
> --
> Petr Baláš - petr at balas dot cz
>



-- 
Petr Baláš - petr at balas dot cz



Další informace o konferenci Linux