Tuhnutie servera

Michal Vymazal gandalf na mbox.vol.cz
Neděle Únor 10 20:10:22 CET 2002


Ján Koštial wrote:

> Zdravím všetkých.
> Prosím o radu, tuhne mi server.
> 
> Genéza:
> Existoval funkčný server v zostave:
>     AMD DURON 700, 128MB RAM, MB ASUS KT7RAID,
>     2x 30GB HDD MAXTOR IDE, 7200ot./min.
>     Linux RedHat6.1
> Nedávno prešiel upgrejdom:
>     vymenené disky za SCSI IBM DDYS-T18350N Ultra160
>     s radičom Adaptec AHA-19160
>     Nová inštalácia Linux RedHat7.2, kernel 2.4.7-10
> 
> Od tej doby tuhne - usudzujem, že príčinou bude to SCSI.
> Server niekedy zatuhne natvrdo - bez hlášky - pomôže len HW reset.
> Niekedy zatuhne len niektorý proces - napríklad je možné sa prihlásiť z
> inej konzoly. Niekedy server vypíše niečo takéto:
> Oops: 0000
> CPU:    0
> EIP:    0010:[<c8807a53>]
> EFLAGS:    00010297
> eax:    0000001c    ebx: c7b80000    ecx: c7b80068    edx: 00000000
> esi:    c7b80000    edi: 00000001    ebp: 00000000    esp: c0259e8c
> ds: 0018    es: 0018    ss: 0018
> process swapper (pid: 0, stackpage=c0259000)
> Stack: 00000000 00000000 c13aae18 ...
>             ...
> Call Trace: [<c8819d00>] ...
>             ...
> Code: 8b 7c 15 04 ...
>  <0>Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing
>     
> Nepomohlo ani prepnutie radiča zo 160MB/s na 80MB/s - ale to by som tam aj
> tak nechcel natrvalo. Disky majú nastavené ID 3 a 4, Terminátor je na
> kábli. Poraďte, čo s tým.
>     Iný driver?
>     Ako má byť správne nastavený radič?
>     Ako majú byť správne nastavené prepojky na diskoch?
>     Vadí zdieľanie IRQ na PCI?
> 
> Vďaka za akúkoľvek radu.
> 
> Lucky.
> 
> 
> Niečo z messages:
> Feb 10 11:47:56 localhost kernel: SCSI subsystem driver Revision: 1.00
> Feb 10 11:47:56 localhost kernel: PCI: Found IRQ 11 for device 00:0b.0
> Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.2
> Feb 10 11:47:57 localhost netfs: Mounting other filesystems:  succeeded
> Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.3
> Feb 10 11:47:57 localhost kernel: (scsi0) <Adaptec AIC-7892 Ultra 160/m
> SCSI host adapter> found at PCI 0/11/0 Feb 10 11:47:57 localhost kernel:
> (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs Feb 10 11:47:57 localhost
> kernel: (scsi0) Downloading sequencer code... 396 instructions downloaded
> Feb 10 11:47:57 localhost kernel: scsi0 : Adaptec AHA274x/284x/294x
> (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
> Feb 10 11:47:57 localhost kernel:        <Adaptec AIC-7892 Ultra 160/m
> SCSI host adapter>
> Feb 10 11:47:57 localhost kernel:   Vendor: IBM       Model: DDYS-T18350N 
>     Rev: S96H
> Feb 10 11:47:57 localhost kernel:   Type:   Direct-Access                 
>     ANSI SCSI revision: 03
> Feb 10 11:47:57 localhost kernel:   Vendor: IBM       Model: DDYS-T18350N 
>     Rev: S96H Feb 10 11:47:57 localhost apmd[776]: Version 3.0final (APM
> BIOS 1.2, Linux driver 1.14) Feb 10 11:47:57 localhost apmd: apmd startup
> succeeded
> Feb 10 11:47:57 localhost kernel:   Type:   Direct-Access                 
>     ANSI SCSI revision: 03 Feb 10 11:47:57 localhost kernel: Attached scsi
> disk sda at scsi0, channel 0, id 3, lun 0 Feb 10 11:47:57 localhost
> kernel: Attached scsi disk sdb at scsi0, channel 0, id 4, lun 0 Feb 10
> 11:47:57 localhost kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec,
> offset 63. Feb 10 11:47:57 localhost kernel: SCSI device sda: 35843670
> 512-byte hdwr sectors (18352 MB) Feb 10 11:47:57 localhost kernel:
> Partition check:
> Feb 10 11:47:57 localhost kernel:  sda: sda1 sda2 < sda5 > sda3
> Feb 10 11:47:57 localhost kernel: (scsi0:0:4:0) Synchronous at 80.0
> Mbyte/sec, offset 63. Feb 10 11:47:57 localhost kernel: SCSI device sdb:
> 35843670 512-byte hdwr sectors (18352 MB)
> Feb 10 11:47:57 localhost kernel:  sdb: sdb1 sdb2 < sdb5 > sdb3
> 
> Našlo sa tam aj toto:
> Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout :
> pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 3c 86 fe 00 00 f8 00
> Feb 10 12:03:18 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR
> 0x62, SSTAT0 0x5, SSTAT1 0x3 Feb 10 12:03:18 localhost kernel:
> (scsi0:0:3:0) SG_CACHEPTR 0x64, SSTAT2 0x0, STCNT 0x0 Feb 10 12:03:18
> localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0,
> channel 0, id 3, lun 0 Read (10) 00 01 3c 87 f6 00 00 08 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00 Feb 10
> 12:07:41 localhost login(pam_unix)[1061]: session closed for user root Feb
> 10 12:07:52 localhost login(pam_unix)[1204]: session opened for user root
> by LOGIN(uid=0)
> Feb 10 12:07:52 localhost  -- root[1204]: ROOT LOGIN ON tty1
> Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout :
> pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 2b 12 de 00 00 f8 00
> Feb 10 12:19:58 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR
> 0x62, SSTAT0 0x5, SSTAT1 0x3 Feb 10 12:19:58 localhost kernel:
> (scsi0:0:3:0) SG_CACHEPTR 0x58, SSTAT2 0x0, STCNT 0x0 Feb 10 12:19:58
> localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0,
> channel 0, id 3, lun 0 Read (10) 00 01 2b 13 d6 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00
> 
>
Zdravim
Kernel 2.4.7-10 neni to prave orechove. Zkuste se podivat do updates pro RH 
7.2

-- 
Michal Vymazal
gandalf na mbox.vol.cz
Home Computer


Další informace o konferenci Linux