Tuhnutie servera
Michal Vymazal
gandalf na mbox.vol.cz
Neděle Únor 10 20:10:22 CET 2002
Ján Koštial wrote:
> Zdravím všetkých.
> Prosím o radu, tuhne mi server.
>
> Genéza:
> Existoval funkčný server v zostave:
> AMD DURON 700, 128MB RAM, MB ASUS KT7RAID,
> 2x 30GB HDD MAXTOR IDE, 7200ot./min.
> Linux RedHat6.1
> Nedávno prešiel upgrejdom:
> vymenené disky za SCSI IBM DDYS-T18350N Ultra160
> s radičom Adaptec AHA-19160
> Nová inštalácia Linux RedHat7.2, kernel 2.4.7-10
>
> Od tej doby tuhne - usudzujem, že príčinou bude to SCSI.
> Server niekedy zatuhne natvrdo - bez hlášky - pomôže len HW reset.
> Niekedy zatuhne len niektorý proces - napríklad je možné sa prihlásiť z
> inej konzoly. Niekedy server vypíše niečo takéto:
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c8807a53>]
> EFLAGS: 00010297
> eax: 0000001c ebx: c7b80000 ecx: c7b80068 edx: 00000000
> esi: c7b80000 edi: 00000001 ebp: 00000000 esp: c0259e8c
> ds: 0018 es: 0018 ss: 0018
> process swapper (pid: 0, stackpage=c0259000)
> Stack: 00000000 00000000 c13aae18 ...
> ...
> Call Trace: [<c8819d00>] ...
> ...
> Code: 8b 7c 15 04 ...
> <0>Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing
>
> Nepomohlo ani prepnutie radiča zo 160MB/s na 80MB/s - ale to by som tam aj
> tak nechcel natrvalo. Disky majú nastavené ID 3 a 4, Terminátor je na
> kábli. Poraďte, čo s tým.
> Iný driver?
> Ako má byť správne nastavený radič?
> Ako majú byť správne nastavené prepojky na diskoch?
> Vadí zdieľanie IRQ na PCI?
>
> Vďaka za akúkoľvek radu.
>
> Lucky.
>
>
> Niečo z messages:
> Feb 10 11:47:56 localhost kernel: SCSI subsystem driver Revision: 1.00
> Feb 10 11:47:56 localhost kernel: PCI: Found IRQ 11 for device 00:0b.0
> Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.2
> Feb 10 11:47:57 localhost netfs: Mounting other filesystems: succeeded
> Feb 10 11:47:57 localhost kernel: PCI: Sharing IRQ 11 with 00:07.3
> Feb 10 11:47:57 localhost kernel: (scsi0) <Adaptec AIC-7892 Ultra 160/m
> SCSI host adapter> found at PCI 0/11/0 Feb 10 11:47:57 localhost kernel:
> (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs Feb 10 11:47:57 localhost
> kernel: (scsi0) Downloading sequencer code... 396 instructions downloaded
> Feb 10 11:47:57 localhost kernel: scsi0 : Adaptec AHA274x/284x/294x
> (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
> Feb 10 11:47:57 localhost kernel: <Adaptec AIC-7892 Ultra 160/m
> SCSI host adapter>
> Feb 10 11:47:57 localhost kernel: Vendor: IBM Model: DDYS-T18350N
> Rev: S96H
> Feb 10 11:47:57 localhost kernel: Type: Direct-Access
> ANSI SCSI revision: 03
> Feb 10 11:47:57 localhost kernel: Vendor: IBM Model: DDYS-T18350N
> Rev: S96H Feb 10 11:47:57 localhost apmd[776]: Version 3.0final (APM
> BIOS 1.2, Linux driver 1.14) Feb 10 11:47:57 localhost apmd: apmd startup
> succeeded
> Feb 10 11:47:57 localhost kernel: Type: Direct-Access
> ANSI SCSI revision: 03 Feb 10 11:47:57 localhost kernel: Attached scsi
> disk sda at scsi0, channel 0, id 3, lun 0 Feb 10 11:47:57 localhost
> kernel: Attached scsi disk sdb at scsi0, channel 0, id 4, lun 0 Feb 10
> 11:47:57 localhost kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec,
> offset 63. Feb 10 11:47:57 localhost kernel: SCSI device sda: 35843670
> 512-byte hdwr sectors (18352 MB) Feb 10 11:47:57 localhost kernel:
> Partition check:
> Feb 10 11:47:57 localhost kernel: sda: sda1 sda2 < sda5 > sda3
> Feb 10 11:47:57 localhost kernel: (scsi0:0:4:0) Synchronous at 80.0
> Mbyte/sec, offset 63. Feb 10 11:47:57 localhost kernel: SCSI device sdb:
> 35843670 512-byte hdwr sectors (18352 MB)
> Feb 10 11:47:57 localhost kernel: sdb: sdb1 sdb2 < sdb5 > sdb3
>
> Našlo sa tam aj toto:
> Feb 10 12:03:18 localhost kernel: scsi : aborting command due to timeout :
> pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 3c 86 fe 00 00 f8 00
> Feb 10 12:03:18 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR
> 0x62, SSTAT0 0x5, SSTAT1 0x3 Feb 10 12:03:18 localhost kernel:
> (scsi0:0:3:0) SG_CACHEPTR 0x64, SSTAT2 0x0, STCNT 0x0 Feb 10 12:03:18
> localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0,
> channel 0, id 3, lun 0 Read (10) 00 01 3c 87 f6 00 00 08 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c8 a6 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 c9 9e 00 00 f8 00 Feb 10
> 12:03:18 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 00 70 ca 96 00 00 10 00 Feb 10
> 12:07:41 localhost login(pam_unix)[1061]: session closed for user root Feb
> 10 12:07:52 localhost login(pam_unix)[1204]: session opened for user root
> by LOGIN(uid=0)
> Feb 10 12:07:52 localhost -- root[1204]: ROOT LOGIN ON tty1
> Feb 10 12:19:58 localhost kernel: scsi : aborting command due to timeout :
> pid 0, scsi0, channel 0, id 3, lun 0 Read (10) 00 01 2b 12 de 00 00 f8 00
> Feb 10 12:19:58 localhost kernel: (scsi0:0:3:0) SCSISIGI 0x44, SEQADDR
> 0x62, SSTAT0 0x5, SSTAT1 0x3 Feb 10 12:19:58 localhost kernel:
> (scsi0:0:3:0) SG_CACHEPTR 0x58, SSTAT2 0x0, STCNT 0x0 Feb 10 12:19:58
> localhost kernel: scsi : aborting command due to timeout : pid 0, scsi0,
> channel 0, id 3, lun 0 Read (10) 00 01 2b 13 d6 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 3, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ec fe 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ed f6 00 00 f8 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 15 ee ee 00 00 08 00 Feb 10
> 12:19:58 localhost kernel: scsi : aborting command due to timeout : pid 0,
> scsi0, channel 0, id 4, lun 0 Write (10) 00 01 10 39 8e 00 00 08 00
>
>
Zdravim
Kernel 2.4.7-10 neni to prave orechove. Zkuste se podivat do updates pro RH
7.2
--
Michal Vymazal
gandalf na mbox.vol.cz
Home Computer
Další informace o konferenci Linux