Gentoo - AMD64 - Memory problem

havelka na valka.cz havelka na valka.cz
Čtvrtek Duben 14 13:31:26 CEST 2005


Ahoj lidi,
  po nekolika tydnech boju mezi mnou a mym novym gentoo strojem jsem se
rozhodl pozadat vas o pomoc ... :)

  Vypada to, ze stroj ma problemy s pameti, bud nekde utika, nebo se tam
deje neco jineho spatneho :(

  Je to AMD64 3000+, 1.5Gb pameti, 2x80gb SATA raid, 200gb IDE, 1gb NET.

  Sw : GENTOO, Apache, mySQL, postfix, proFTPd

> uname -a
Linux holly 2.6.9-gentoo-r1 #2 SMP Thu Feb 3 18:32:16 UTC 2005 x86_64
AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux

  Zatez : 4 miliony webstranek za mesic, vetsinou PHP, JPG, GIF

  Problem : Kdyz jsem pouzival apache 1.3.33, po nejake dobe, nahodne,
vetsinou tak po dvou hodinach, serveru proste dosla pamet. Vsechno bylo
"vyzrane" apacem, coz melo za nasledek swapovani a oom_killer zacal
postupne vybijet procesy. Spatne reseni : cron job kterej kazdou hodin
apace zrestartuje, coz vyse uvedeny problem de facto vyresilo, ale cas
od casu mu pamet stihne dojit jeste driv, nez po hodine, navic je to
proste jen obezlicka, nikoliv reseni. Je mi divny, ze kazdej apaci
proces (s mod_perl a mod_php, oba jako DSO) zabira dle vypisu 100Mb
pameti. staci tedy spustit a zatizit patnact apacich procesu, a veskera
pamet pocitace je tatam.

            Inu nelenil jsem, obvinil jsem z problemu stareho Apache, a
jal se emergovat apace 2.0.51, vse probehlo, apace jsem
pustil, a ackoliv sel zjevne podstatne lepe a rychleji,
nasledovala dost stejna situace ...



Apr 13 12:22:26 holly kblockd/0: page allocation failure. order:1,
mode:0x20 Apr 13 12:22:26 holly
Apr 13 12:22:26 holly Call Trace:<IRQ>
<ffffffff8015a580>{__alloc_pages+832}
<ffffffff804189ad>{qdisc_restart+45} Apr 13 12:22:26 holly
<ffffffff8015a61e>{__get_free_pages+14}
<ffffffff8015dab6>{kmem_getpages+38}
Apr 13 12:22:26 holly <ffffffff8043985a>{tcp_v4_route_req+250}
<ffffffff8015e2ea>{cache_alloc_refill+666}
Apr 13 12:22:26 holly <ffffffff8015dfd6>{kmem_cache_alloc+54}
<ffffffff80405624>{sk_alloc+52}
Apr 13 12:22:26 holly <ffffffff8043d4f0>{tcp_create_openreq_child+48}
<ffffffff8043a047>{tcp_v4_syn_recv_sock+87}
Apr 13 12:22:26 holly <ffffffff8043dc9e>{tcp_check_req+606}
<ffffffff80376c0f>{ide_multwrite+271}
Apr 13 12:22:26 holly <ffffffff80377204>{__ide_do_rw_disk+1204}
<ffffffff80339d36>{as_move_to_dispatch+342}
Apr 13 12:22:26 holly <ffffffff80360d24>{rtl8169_start_xmit+196}
<ffffffff804189ad>{qdisc_restart+45}
Apr 13 12:22:26 holly <ffffffff8040cdef>{dev_queue_xmit+559}
<ffffffff8042261a>{ip_finish_output+362}
Apr 13 12:22:26 holly <ffffffff80422ab5>{ip_queue_xmit+1061}
<ffffffff8011311c>{handle_IRQ_event+44}
Apr 13 12:22:26 holly <ffffffff80433047>{tcp_transmit_skb+2007}
<ffffffff8041d44a>{ip_route_input+58}
Apr 13 12:22:26 holly <ffffffff804083e7>{skb_checksum+87}
<ffffffff8043a404>{tcp_v4_do_rcv+228}
Apr 13 12:22:26 holly <ffffffff8043ac5b>{tcp_v4_rcv+1643}
<ffffffff8041fa42>{ip_local_deliver+178}
Apr 13 12:22:26 holly <ffffffff8041fe7d>{ip_rcv+909}
<ffffffff8040d2fc>{netif_receive_skb+396}
Apr 13 12:22:26 holly <ffffffff8040d3c6>{process_backlog+150}
<ffffffff8040d4e4>{net_rx_action+132}
Apr 13 12:22:26 holly <ffffffff8013ccd1>{__do_softirq+113}
<ffffffff8033afc0>{as_work_handler+0}
Apr 13 12:22:26 holly <ffffffff8013cd85>{do_softirq+53}
<ffffffff80110c1f>{apic_timer_interrupt+99}
Apr 13 12:22:26 holly <EOI> <ffffffff8036dd60>{ide_outsw+0}
<ffffffff8048f9f5>{_spin_unlock_irqrestore+5}
Apr 13 12:22:26 holly <ffffffff80148670>{worker_thread+496}
<ffffffff80134000>{default_wake_function+0}
Apr 13 12:22:26 holly <ffffffff80134000>{default_wake_function+0}
<ffffffff8014c9e0>{keventd_create_kthread+0}
Apr 13 12:22:26 holly <ffffffff80148480>{worker_thread+0}
<ffffffff8014c9e0>{keventd_create_kthread+0}
Apr 13 12:22:28 holly <ffffffff8014c999>{kthread+217}
<ffffffff80110f27>{child_rip+8}
Apr 13 12:22:28 holly <ffffffff8014c9e0>{keventd_create_kthread+0}
<ffffffff8014c8c0>{kthread+0}
Apr 13 12:22:28 holly <ffffffff80110f1f>{child_rip+0}
Apr 13 12:28:15 holly oom-killer: gfp_mask=0x1d2
Apr 13 12:28:16 holly Node 0 DMA per-cpu:
Apr 13 12:28:16 holly cpu 0 hot: low 2, high 6, batch 1
Apr 13 12:28:16 holly cpu 0 cold: low 0, high 2, batch 1
Apr 13 12:28:16 holly Node 0 Normal per-cpu:
Apr 13 12:28:16 holly cpu 0 hot: low 32, high 96, batch 16
Apr 13 12:28:16 holly cpu 0 cold: low 0, high 32, batch 16
Apr 13 12:28:16 holly Node 0 HighMem per-cpu: empty
Apr 13 12:28:16 holly
Apr 13 12:28:16 holly Free pages:        1236kB (0kB HighMem)
Apr 13 12:28:16 holly Active:184487 inactive:183900 dirty:0 writeback:13
unstable:0 free:309 slab:4523 mapped:370096 pagetables:7200
Apr 13 12:28:16 holly Node 0 DMA free:12kB min:12kB low:24kB high:36kB
active:4356kB inactive:3972kB present:16384kB
Apr 13 12:28:16 holly protections[]: 0 0 0
Apr 13 12:28:16 holly Node 0 Normal free:1224kB min:1236kB low:2472kB
high:3708kB active:733592kB inactive:731628kB present:1556416kB
Apr 13 12:28:16 holly protections[]: 0 0 0
Apr 13 12:28:16 holly Node 0 HighMem free:0kB min:128kB low:256kB
high:384kB active:0kB inactive:0kB present:0kB
Apr 13 12:28:16 holly protections[]: 0 0 0
Apr 13 12:28:16 holly Node 0 DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12kB
Apr 13 12:28:16 holly Node 0 Normal: 0*4kB 3*8kB 1*16kB 11*32kB 7*64kB
3*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1224kB
Apr 13 12:28:16 holly Node 0 HighMem: empty
Apr 13 12:28:16 holly Swap cache: add 2057140, delete 2057113, find
1327263/1422184, race 20+53
Apr 13 12:28:16 holly Out of Memory: Killed process 23709 (apache2).


Server zacal vybijet apace, dokud se neuvolnila pamet (lec mu to trvalo
tak dlouho, ze restart byl rychlejsi reseni). Tato situace se nahodne
opakovala jeste nekolikrat.

Jal jsem se tedy hledat na internetu, kde je zakopana sobaka a nasel
jsem toto :

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=131251#c24

kde se uvadi, ze uzivatele s jadrem 2.6.8 - 2.6.9 (ktere mam ja) a AMD64
maji problem s leakovanim pameti, coz vypada na stejny problem jako u
me.

Moje otazka zni - muze to byt opravdu muj problem ? (prece jen je to
redhati konference, ja mam gentoo), Nemate s tim nekdo podobnou
zkusenost ? Bezi vam gentoo na AMD64 bez problemu ? A na zaver, pomohl
by mi prechod na starsi/novejsi (* nehodici se skrtne) jadro ?

Vzhledem k tomu, ze se povazuju za uchazejiciho "udrzovace", lec v
zadnem pripade ne za velkeho linuxoveho guru, ktery kompiluje jadra a
podobne, radsi se zeptam, nez se budu snazit zajistit nekoho, kdo mi s
tim pomuze.

Pripadne nastaveni a konfiguraky samozrejme poslu, jen si reknete co.
  Diky
    Radek Havelka





Další informace o konferenci Linux