Tuhnutie servera
Ján Koštial
lucky na itservice.sk
Neděle Únor 10 14:44:37 CET 2002
Zdravím všetkých.
Prosím o radu, tuhne mi server.
Genéza:
Existoval funkčný server v zostave:
AMD DURON 700, 128MB RAM, MB ASUS KT7RAID,
2x 30GB HDD MAXTOR IDE, 7200ot./min.
Linux RedHat6.1
Nedávno prešiel upgrejdom:
vymenené disky za SCSI IBM DDYS-T18350N Ultra160
s radičom Adaptec AHA-19160
Nová inštalácia Linux RedHat7.2, kernel 2.4.7-10
Od tej doby tuhne - usudzujem, že príčinou bude to SCSI.
Server niekedy zatuhne natvrdo - bez hlášky - pomôže len HW reset.
Niekedy zatuhne len niektorý proces - napríklad je možné sa prihlásiť z inej konzoly.
Niekedy server vypíše niečo takéto:
Oops: 0000
CPU: 0
EIP: 0010:[<c8807a53>]
EFLAGS: 00010297
eax: 0000001c ebx: c7b80000 ecx: c7b80068 edx: 00000000
esi: c7b80000 edi: 00000001 ebp: 00000000 esp: c0259e8c
ds: 0018 es: 0018 ss: 0018
process swapper (pid: 0, stackpage=c0259000)
Stack: 00000000 00000000 c13aae18 ...
...
Call Trace: [<c8819d00>] ...
...
Code: 8b 7c 15 04 ...
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
Nepomohlo ani prepnutie radiča zo 160MB/s na 80MB/s - ale to by som tam aj tak nechcel natrvalo.
Disky majú nastavené ID 3 a 4, Terminátor je na kábli.
Poraďte, čo s tým.
Iný driver?
Ako má byť správne nastavený radič?
Ako majú byť správne nastavené prepojky na diskoch?
Vadí zdieľanie IRQ na PCI?
Vďaka za akúkoľvek radu.
Lucky.
Niečo z messages:
Feb 10 11:47:56 localhost kernel: SCSI subsystem driver Revision: 1.00
Feb 10 11:47:56 localhost kernel: PCI: Found IRQ 11 for device 00:0b.0
Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.2
Feb 10 11:47:57 localhost netfs: Mounting other filesystems: succeeded
Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.3
Feb 10 11:47:57 localhost kernel: (scsi0) <Adaptec AIC-7892 Ultra 160/m SCSI host adapter> found at PCI 0/11/0
Feb 10 11:47:57 localhost kernel: (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs
Feb 10 11:47:57 localhost kernel: (scsi0) Downloading sequencer code... 396 instructions downloaded
Feb 10 11:47:57 localhost kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
Feb 10 11:47:57 localhost kernel: <Adaptec AIC-7892 Ultra 160/m SCSI host adapter>
Feb 10 11:47:57 localhost kernel: Vendor: IBM Model: DDYS-T18350N Rev: S96H
Feb 10 11:47:57 localhost kernel: Type: Direct-Access ANSI SCSI revision: 03
Feb 10 11:47:57 localhost kernel: Vendor: IBM Model: DDYS-T18350N Rev: S96H
Feb 10 11:47:57 localhost apmd[776]: Version 3.0final (APM BIOS 1.2, Linux driver 1.14)
Feb 10 11:47:57 localhost apmd: apmd startup succeeded
Feb 10 11:47:57 localhost kernel: Type: Direct-Access ANSI SCSI revision: 03
Feb 10 11:47:57 localhost kernel: Attached scsi disk sda at scsi0, channel 0, id 3, lun 0
Feb 10 11:47:57 localhost kernel: Attached scsi disk sdb at scsi0, channel 0, id 4, lun 0
Feb 10 11:47:57 localhost kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 63.
Feb 10 11:47:57 localhost kernel: SCSI device sda: 35843670 512-byte hdwr sectors (18352 MB)
Feb 10 11:47:57 localhost kernel: Partition check:
Feb 10 11:47:57 localhost kernel: sda: sda1 sda2 < sda5 > sda3
Feb 10 11:47:57 localhost kernel: (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec, offset 63.
Feb 10 11:47:57 localhost kernel: SCSI device sdb: 35843670 512-byte hdwr sectors (18352 MB)
Feb 10 11:47:57 localhost kernel: sdb: sdb1 sdb2 < sdb5 > sdb3
Našlo sa tam aj toto:
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 3c 86 fe 00 00 f8 00
Feb 10 12:03:18 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR 0x62, SSTAT0 0x5, SSTAT1 0x3
Feb 10 12:03:18 localhost kernel: (scsi0:0:3:0) SG_CACHEPTR 0x64, SSTAT2 0x0, STCNT 0x0
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 3c 87 f6 00 00 08 00
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00
Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00
Feb 10 12:07:41 localhost login(pam_unix)[1061]: session closed for user root
Feb 10 12:07:52 localhost login(pam_unix)[1204]: session opened for user root by LOGIN(uid=0)
Feb 10 12:07:52 localhost -- root[1204]: ROOT LOGIN ON tty1
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 2b 12 de 00 00 f8 00
Feb 10 12:19:58 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR 0x62, SSTAT0 0x5, SSTAT1 0x3
Feb 10 12:19:58 localhost kernel: (scsi0:0:3:0) SG_CACHEPTR 0x58, SSTAT2 0x0, STCNT 0x0
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 2b 13 d6 00 00 08 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00
Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00
---
Odchádzajúca správa neobsahuje ZNÁME vírusy.
Skontrolované antivírusovým systémom AVG (http://www.grisoft.cz).
Verzia: 6.0.323 / Vírusová databáza: 180 - dátum vydania: 8.2.2002
Další informace o konferenci Linux