kernel 2.1.63

Čtvrtek Listopad 13 18:36:44 CET 1997

On Thu, 13 Nov 1997, Filip Zaludek wrote:

> %Subj% by mel umet detekovat F00FC7C8 bug.

ve strucnosti lze rici, ze se zatuhnuti zabrani tak, ze se mu pri jeho
padu nastavi jeste druha noha :)

v praxi to znamena zhruba to, ze se zajisti, aby pri pokusu o vyvolani
trapu "neplatna instrukce" (kdy to zatuhne) doslo k vypadku stranky
(kdy se CPU kupodivu dostane opet do zdraveho stavu a nevytuhne)

pripojuji tri relevatni prispevky z bugtraq

--Pavel Kankovsky aka Peak (troja.mff.cuni.cz network administration)


Date: Wed, 12 Nov 1997 15:19:34 -0600
From: VaX#n8 <vax na LINKDEAD.PARANOIA.COM>
To: BUGTRAQ na NETSPACE.ORG
Subject: mode of the i586 F0 bug

I believe the instruction in assembler would be:
LOCK CMPXCHG8B EAX

Forgive me if my memory fails me.
The important thing to know here is that CMPXCHG8B compares the 64-bit
value in the concatenated registers EDX:EAX with the operand (ostensibly
a 64-bit datum in memory).

The deal with the 0F 0F C7 C8 is that the last byte, C8, is a part of
the instruction called the MOD R/M byte.  This byte is composed of
three fields; Mod, Reg/Opcode, and R/M.  The Opcode field, bits 3-5,
is used up by the instruction in this case, giving it a fixed value of
"001".  The Mod (bits 6-7) and R/M field (bits 0-2) combine to form
an addressing mode which specifies the register EAX.  Since the
comparison is supposed to occur over 8 memory bytes, this is obviously
not a valid instruction.

Combining CMPXCHG8B with LOCK is okay; LOCK asserts the LOCK# pin,
guaranteeing exclusive use of the memory bus during the instruction.
This is mostly useful on multiple-processor machines.

It is the opinion of an anonymous source at an Austin-based processor
manufacturer that the Intel hardware designers forgot to unlock the
bus before trying to load the descriptor for the appropriate
exception handler, which would explain why locking it into the
L1 cache helps.  I suppose the hardware does unlock it before actually
executing the handler.  It is unclear to me why completely disabling the
L1 cache in the BIOS is rumored to help.

This really belongs on an Intel hardware list and not on bugtraq,
but I think since we repeated several fact shreds here, it merits
a more complete explanation.
Sorry to prolong this already tedious thread.

I wonder if anyone has taken the obvious (but somewhat laborious)
task of attempting an exhaustive test of invalid instructions with
a LOCK prefix.

It has been said that the bug might have been discovered by an Intel
competitor, implying that they wanted to damage Intel.  Posting to
Linux newsgroups would be an unnecessarily indirect method of doing that.
I find it equally possible that it could have been someone simply
experimenting with an assembler, or who happened to execute a mangled
binary and had the tenacity to track down the bug.  I wouldn't blame
anyone for not signing their name to a bug potentially as serious as
this one, and remaining anonymous shows remarkable ego control.
After all, you never know who your next employer will be :)
--
VaX#n8

------

Date: Wed, 12 Nov 1997 22:37:10 +0000
From: Alan Cox <alan na LXORGUK.UKUU.ORG.UK>
To: BUGTRAQ na NETSPACE.ORG
Subject: Re: mode of the i586 F0 bug

> manufacturer that the Intel hardware designers forgot to unlock the
> bus before trying to load the descriptor for the appropriate
> exception handler, which would explain why locking it into the
> L1 cache helps.  I suppose the hardware does unlock it before actually

It would also explain how the real fix works.

If you take a BSDI box after the patch and before the patch and compare
the MMU tables via /dev/mem etc you'll find there are a pair of funny pages
where the interrupt descriptor table has moved.

Odder still the low part of it doesnt have a pte. What it seems is done is to
put the low descriptors into an invalid page and take a page fault when
it tries to handle the fault from the lock cmpxchg8.

The linux code is based on this observation and does this trick. The page
fault handler then checks the fault  and sees a kernel mode fault on
the descriptor block[1] and works out what the real fault was. It then calls
the relevant kernel function instead of doing normal page fault processing.
We could probably just remap the page then but its faster to call the
functions by hand than map and remap the page (causing tlb flushes).

Hopefully that info and the 2.1.63 linux patch is enough to get the fix into
other free OS's too. And if anyone can find a way to break the linux 2.1.63
fix we'd all love to know. Hopefully a complete official intel workaround
will appear shortly and we can switch to that.

Alan
[1] This is important - or we might take a fault for a user process at the
same address by chance and do a trap instead ..

------

Date: Wed, 12 Nov 1997 18:45:15 -0600
From: Aleph One <aleph1 na DFW.NET>
To: BUGTRAQ na NETSPACE.ORG
Subject: Linux F00F Patch

This are the relevant parts of the linux kernel 2.1.63 patch that fix the
Pentium bug that Alan mentioned.

Aleph One / aleph1 na dfw.net
http://underground.org/
KeyID 1024/948FD6B5
Fingerprint EE C9 E8 AA CB AF 09 61  8C 39 EA 47 A8 6A B8 01

diff -u --recursive --new-file v2.1.62/linux/arch/i386/kernel/setup.c linux/arch/i386/kernel/setup.c

--- v2.1.62/linux/arch/i386/kernel/setup.c      Tue Sep 23 16:48:46 1997
+++ linux/arch/i386/kernel/setup.c      Wed Nov 12 11:09:56 1997
@@ -42,6 +42,7 @@
 char x86_mask = 0;             /* set by kernel/head.S */
 int x86_capability = 0;                /* set by kernel/head.S */
 int fdiv_bug = 0;              /* set if Pentium(TM) with FP bug */
+int pentium_f00f_bug = 0;      /* set if Pentium(TM) with F00F bug */
 int have_cpuid = 0;             /* set if CPUID instruction works */

 char x86_vendor_id[13] = "unknown";
@@ -359,6 +360,7 @@
                                        "fdiv_bug\t: %s\n"
                                        "hlt_bug\t\t: %s\n"
                                       "sep_bug\t\t: %s\n"
+                                      "pentium_f00f_bug\t\t: %s\n"
                                        "fpu\t\t: %s\n"
                                        "fpu_exception\t: %s\n"
                                        "cpuid\t\t: %s\n"
@@ -367,6 +369,7 @@
                                        CD(fdiv_bug) ? "yes" : "no",
                                        CD(hlt_works_ok) ? "no" : "yes",
                                       sep_bug ? "yes" : "no",
+                                      pentium_f00f_bug ? "yes" : "no",
                                        CD(hard_math) ? "yes" : "no",
                                        (CD(hard_math) && ignore_irq13)
                                          ? "yes" : "no",
diff -u --recursive --new-file v2.1.62/linux/arch/i386/kernel/traps.c linux/arch/i386/kernel/traps.c
--- v2.1.62/linux/arch/i386/kernel/traps.c      Sun Sep  7 13:10:42 1997
+++ linux/arch/i386/kernel/traps.c      Wed Nov 12 11:09:56 1997
@@ -413,6 +413,51 @@

 #endif /* CONFIG_MATH_EMULATION */

+static struct
+{
+       short limit __attribute__((packed));
+       void * addr __attribute__((packed));
+       short __pad __attribute__((packed));
+} idt_d;
+
+void * idt2;
+
+__initfunc(void trap_init_f00f_bug(void))
+{
+       pgd_t * pgd;
+       pmd_t * pmd;
+       pte_t * pte;
+       unsigned long twopage;
+
+       printk("moving IDT ... ");
+
+       twopage = (unsigned long) vmalloc (2*PAGE_SIZE);
+
+       idt2 = (void *)(twopage + 4096-7*8);
+
+       memcpy(idt2,&idt,sizeof(idt));
+
+       idt_d.limit = 256*8-1;
+       idt_d.addr = idt2;
+       idt_d.__pad = 0;
+
+        __asm__ __volatile__("\tlidt %0": "=m" (idt_d));
+
+       /*
+        * Unmap lower page:
+        */
+       pgd = pgd_offset(current->mm, twopage);
+       pmd = pmd_offset(pgd, twopage);
+       pte = pte_offset(pmd, twopage);
+
+       pte_clear(pte);
+       flush_tlb_all();
+
+       printk(" ... done\n");
+}
+
+
+
 __initfunc(void trap_init(void))
 {
        int i;
diff -u --recursive --new-file v2.1.62/linux/arch/i386/mm/fault.c linux/arch/i386/mm/fault.c
--- v2.1.62/linux/arch/i386/mm/fault.c  Wed Oct 15 16:04:23 1997
+++ linux/arch/i386/mm/fault.c  Wed Nov 12 11:09:55 1997
@@ -74,6 +74,25 @@
        return 0;
 }

+asmlinkage void divide_error(void);
+asmlinkage void debug(void);
+asmlinkage void nmi(void);
+asmlinkage void int3(void);
+asmlinkage void overflow(void);
+asmlinkage void bounds(void);
+asmlinkage void invalid_op(void);
+
+asmlinkage void do_divide_error (struct pt_regs *, unsigned long);
+asmlinkage void do_debug (struct pt_regs *, unsigned long);
+asmlinkage void do_nmi (struct pt_regs *, unsigned long);
+asmlinkage void do_int3 (struct pt_regs *, unsigned long);
+asmlinkage void do_overflow (struct pt_regs *, unsigned long);
+asmlinkage void do_bounds (struct pt_regs *, unsigned long);
+asmlinkage void do_invalid_op (struct pt_regs *, unsigned long);
+
+extern int * idt2;
+extern int pentium_f00f_bug;
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -170,6 +189,46 @@
                goto out;
        }

+       printk("<%p/%p>\n", idt2, (void *)address);
+       /*
+        * Pentium F0 0F C7 C8 bug workaround:
+        */
+       if ( pentium_f00f_bug && (address >= (unsigned long)idt2) &&
+                       (address < (unsigned long)idt2+256*8) ) {
+
+               void (*handler) (void);
+               int nr = (address-(unsigned long)idt2)/8;
+               unsigned long low, high;
+
+               low = idt[nr].a;
+               high = idt[nr].b;
+
+               handler = (void (*) (void)) ((low&0x0000ffff) | (high&0xffff0000));
+               printk("<handler %p... ", handler);
+               unlock_kernel();
+
+               if (handler==divide_error)
+                       do_divide_error(regs,error_code);
+               else if (handler==debug)
+                       do_debug(regs,error_code);
+               else if (handler==nmi)
+                       do_nmi(regs,error_code);
+               else if (handler==int3)
+                       do_int3(regs,error_code);
+               else if (handler==overflow)
+                       do_overflow(regs,error_code);
+               else if (handler==bounds)
+                       do_bounds(regs,error_code);
+               else if (handler==invalid_op)
+                       do_invalid_op(regs,error_code);
+               else {
+                       printk("INVALID HANDLER!\n");
+                       for (;;) __cli();
+               }
+               printk("... done>\n");
+               goto out;
+       }
+
        /* Are we prepared to handle this kernel fault?  */
        if ((fixup = search_exception_table(regs->eip)) != 0) {
                printk(KERN_DEBUG "%s: Exception at [<%lx>] cr2=%lx (fixup: %lx)\n",
@@ -193,6 +252,7 @@
                flush_tlb();
                goto out;
        }
+
        if (address < PAGE_SIZE)
                printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference");
        else
diff -u --recursive --new-file v2.1.62/linux/include/asm-i386/bugs.h linux/include/asm-i386/bugs.h
--- v2.1.62/linux/include/asm-i386/bugs.h       Thu Sep 11 09:02:24 1997
+++ linux/include/asm-i386/bugs.h       Wed Nov 12 11:09:55 1997
@@ -166,6 +166,32 @@
        }
 }

+/*
+ * All current models of Pentium and Pentium with MMX technology CPUs
+ * have the F0 0F bug, which lets nonpriviledged users lock up the system:
+ */
+
+extern int pentium_f00f_bug;
+
+__initfunc(static void check_pentium_f00f(void))
+{
+       /*
+        * Pentium and Pentium MMX
+        */
+       printk("checking for F00F bug ...");
+       if(x86==5 && !memcmp(x86_vendor_id, "GenuineIntel", 12))
+       {
+               extern void trap_init_f00f_bug(void);
+
+               printk(KERN_INFO "\nIntel Pentium/[MMX] F0 0F bug detected - turning on workaround.\n");
+               pentium_f00f_bug = 1;
+               trap_init_f00f_bug();
+       } else {
+               printk(KERN_INFO " no F0 0F bug in this CPU, great!\n");
+               pentium_f00f_bug = 0;
+       }
+}
+
 __initfunc(static void check_bugs(void))
 {
        check_tlb();
@@ -173,5 +199,6 @@
        check_hlt();
        check_popad();
        check_amd_k6();
+       check_pentium_f00f();
        system_utsname.machine[1] = '0' + x86;
 }