Tuhnutie servera

Ján Koštial lucky na itservice.sk
Neděle Únor 10 14:44:37 CET 2002


Zdravím všetkých.
Prosím o radu, tuhne mi server.

Genéza: 
Existoval funkčný server v zostave:    
    AMD DURON 700, 128MB RAM, MB ASUS KT7RAID,
    2x 30GB HDD MAXTOR IDE, 7200ot./min.
    Linux RedHat6.1
Nedávno prešiel upgrejdom:
    vymenené disky za SCSI IBM DDYS-T18350N Ultra160
    s radičom Adaptec AHA-19160
    Nová inštalácia Linux RedHat7.2, kernel 2.4.7-10

Od tej doby tuhne - usudzujem, že príčinou bude to SCSI.
Server niekedy zatuhne natvrdo - bez hlášky - pomôže len HW reset.
Niekedy zatuhne len niektorý proces - napríklad je možné sa prihlásiť z inej konzoly.
Niekedy server vypíše niečo takéto:
Oops: 0000
CPU:    0
EIP:    0010:[<c8807a53>]
EFLAGS:    00010297
eax:    0000001c    ebx: c7b80000    ecx: c7b80068    edx: 00000000
esi:    c7b80000    edi: 00000001    ebp: 00000000    esp: c0259e8c
ds: 0018    es: 0018    ss: 0018
process swapper (pid: 0, stackpage=c0259000)
Stack: 00000000 00000000 c13aae18 ...
            ...
Call Trace: [<c8819d00>] ...
            ...
Code: 8b 7c 15 04 ...
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
    
Nepomohlo ani prepnutie radiča zo 160MB/s na 80MB/s - ale to by som tam aj tak nechcel natrvalo.
Disky majú nastavené ID 3 a 4, Terminátor je na kábli.
Poraďte, čo s tým.
    Iný driver? 
    Ako má byť správne nastavený radič? 
    Ako majú byť správne nastavené prepojky na diskoch?
    Vadí zdieľanie IRQ na PCI?

Vďaka za akúkoľvek radu.

Lucky.


Niečo z messages:
Feb 10 11:47:56 localhost kernel: SCSI subsystem driver Revision: 1.00
Feb 10 11:47:56 localhost kernel: PCI: Found IRQ 11 for device 00:0b.0
Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.2
Feb 10 11:47:57 localhost netfs: Mounting other filesystems:  succeeded
Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.3
Feb 10 11:47:57 localhost kernel: (scsi0) <Adaptec AIC-7892 Ultra 160/m SCSI host adapter> found at PCI 0/11/0
Feb 10 11:47:57 localhost kernel: (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs
Feb 10 11:47:57 localhost kernel: (scsi0) Downloading sequencer code... 396 instructions downloaded
Feb 10 11:47:57 localhost kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
Feb 10 11:47:57 localhost kernel:        <Adaptec AIC-7892 Ultra 160/m SCSI host adapter>
Feb 10 11:47:57 localhost kernel:   Vendor: IBM       Model: DDYS-T18350N      Rev: S96H
Feb 10 11:47:57 localhost kernel:   Type:   Direct-Access                      ANSI SCSI revision: 03
Feb 10 11:47:57 localhost kernel:   Vendor: IBM       Model: DDYS-T18350N      Rev: S96H
Feb 10 11:47:57 localhost apmd[776]: Version 3.0final (APM BIOS 1.2, Linux driver 1.14)
Feb 10 11:47:57 localhost apmd: apmd startup succeeded
Feb 10 11:47:57 localhost kernel:   Type:   Direct-Access                      ANSI SCSI revision: 03
Feb 10 11:47:57 localhost kernel: Attached scsi disk sda at scsi0, channel 0, id 3, lun 0
Feb 10 11:47:57 localhost kernel: Attached scsi disk sdb at scsi0, channel 0, id 4, lun 0
Feb 10 11:47:57 localhost kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 63.
Feb 10 11:47:57 localhost kernel: SCSI device sda: 35843670 512-byte hdwr sectors (18352 MB)
Feb 10 11:47:57 localhost kernel: Partition check:
Feb 10 11:47:57 localhost kernel:  sda: sda1 sda2 < sda5 > sda3
Feb 10 11:47:57 localhost kernel: (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec, offset 63.
Feb 10 11:47:57 localhost kernel: SCSI device sdb: 35843670 512-byte hdwr sectors (18352 MB)
Feb 10 11:47:57 localhost kernel:  sdb: sdb1 sdb2 < sdb5 > sdb3

Našlo sa tam aj toto:
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 3c 86 fe 00 00 f8 00 
Feb 10 12:03:18 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR 0x62, SSTAT0 0x5, SSTAT1 0x3
Feb 10 12:03:18 localhost kernel: (scsi0:0:3:0) SG_CACHEPTR 0x64, SSTAT2 0x0, STCNT 0x0
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 3c 87 f6 00 00 08 00 
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00 
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00 
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00 
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00 
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00 
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00 
Feb 10 12:07:41 localhost login(pam_unix)[1061]: session closed for user root
Feb 10 12:07:52 localhost login(pam_unix)[1204]: session opened for user root by LOGIN(uid=0)
Feb 10 12:07:52 localhost  -- root[1204]: ROOT LOGIN ON tty1
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 2b 12 de 00 00 f8 00 
Feb 10 12:19:58 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR 0x62, SSTAT0 0x5, SSTAT1 0x3
Feb 10 12:19:58 localhost kernel: (scsi0:0:3:0) SG_CACHEPTR 0x58, SSTAT2 0x0, STCNT 0x0
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 2b 13 d6 00 00 08 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00 
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00 



---
Odchádzajúca správa neobsahuje ZNÁME vírusy.
Skontrolované antivírusovým systémom AVG (http://www.grisoft.cz).
Verzia: 6.0.323 / Vírusová databáza: 180 - dátum vydania: 8.2.2002


Další informace o konferenci Linux