divne resetovani SATA

Petr Stehlik pstehlik na sophics.cz
Pondělí Leden 26 09:05:40 CET 2009


Zdravim,

mam tu celkem novy board ASUS M3N78-CM s nvidia chipsetem NFORCE-MCP77
(IDE interface: nVidia Corporation Unknown device 0ad0 (rev a2)), na
SATA pripojeny 1 TB disk (ata1.00: ATA-8: ST31000340NS, SN05, max
UDMA/133), a v logu se nahodne jednou za cas objevuji divne hlasky:

[217856.445397] ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen
[217856.445397] ata1.00: cmd 61/08:00:57:10:90/00:00:14:00:00/40 tag 0 ncq 4096 out
[217856.445397]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[217856.445397] ata1.00: status: { DRDY }
[217856.445397] ata1.00: cmd 61/08:08:47:97:82/00:00:5a:00:00/40 tag 1 ncq 4096 out
[217856.445397]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[217856.445397] ata1.00: status: { DRDY }
[217856.445397] ata1.00: cmd 61/08:10:57:97:82/00:00:5a:00:00/40 tag 2 ncq 4096 out
[217856.445397]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[217856.445397] ata1.00: status: { DRDY }
[217856.445397] ata1: hard resetting link
[217863.009310] ata1: link is slow to respond, please be patient (ready=0)
[217863.545183] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[217863.545183] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[217863.545183] ata1.00: revalidation failed (errno=-5)
[217863.545183] ata1: failed to recover some devices, retrying in 5 secs
[217868.553271] ata1: hard resetting link
[217869.057277] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[217869.061268] ata1.00: configured for UDMA/133
[217869.061268] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
[217869.061268] ata1: irq_stat 0x00400040, connection status changed
[217869.065280] ata1.00: configured for UDMA/133
[217869.065280] ata1: EH complete
[217869.066420] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors (1000205 MB)
[217869.066473] sd 0:0:0:0: [sda] Write Protect is off
[217869.066497] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[217869.066514] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Pritom kdyz se na ten system divam a testuju ho, vydrzi zaplnit cely ten
TB daty, muzu zaroven cist, psat i pocitat a zadny problem na me
nevyhrkne. Kdyz ho ale necham pres noc odpocinout si, tak pak nekdy rano
je na konzoli takovyto SATA reset.

Kernel je debianni 2.6.26-vserver-amd64 (posledni dostupny - novejsi asi
jeste mnoho mesicu nebude kvuli vydavani lennyho). 

Puvodne jsem umel podobne hlasky s resetem sbernice vyvolat jednoduchym
S.M.A.R.T. testem, pokud byl disk jumperem prepnuty na 1,5 Gbps (to se
chovalo konzistentne, tj. resetlo se to pokazde), ale po te, co jsem
disk prepl na 3 Gbps a v smartd.conf ubral trosku predvoleb uz to
nezlobi. Ted to jen jednou za cas, pouze ovsem v dobe, kdy se na nej
nedivam a system nic nedela, takto neprijemne resetne.

Prepnuti SATA sbernice v BIOSu mezi normalnim a AHCI modem nema vliv,
zda se.

Netusi nekdo prosim, jestli treba v novejsich kernelech neco podobneho
neopravovali, nebo nema nekdo napovedu, jak tohle ustabilizovat tak, aby
se to neresetovalo?

Diky

Petr





Další informace o konferenci Linux