x86: vmm: Finalize to owning_proc, not cur_proc.
authorBarret Rhoden <brho@cs.berkeley.edu>
Fri, 13 Jan 2017 22:19:13 +0000 (17:19 -0500)
committerBarret Rhoden <brho@cs.berkeley.edu>
Wed, 18 Jan 2017 15:00:03 +0000 (10:00 -0500)
commit56fbd8e93bd3438b186cc144a5e529018109f6e1
treea830df0c0b8195e9d26cb386e71d372984962f6f
parente88301d338aaeb6814653a9a9c57e8bb5ee2d49b
x86: vmm: Finalize to owning_proc, not cur_proc.

Similar to a previous bug, x86_finalize_vmtf() assumed the TF belonged to
cur_proc, but it actually belongs to owning_proc.  If we finalize a TF on a
core that runs proc's/kthreads concurrently, then we could have a situation
where cur_proc != owning_proc.  Then we'd try finding a GPC for the other
process, instead of the VMM.  Yikes!

This was relatively easy to make happen regularly: run vmrunkernel as an
SCP under strace from ssh.  I think I triggered it with perf at some point
too.

Here's the main debugging info that pointed me the right way:

couldn't find a gpc, p 284, guest_pcoreid 0

kernel panic at kern/arch/x86/vmm/vmm.c:206, from core 0: assertion failed:
gpc
Entering Nanwan's Dungeon on Core 0 (Ints off):
Type 'help' for a list of commands.
ROS(Core 0)> ps
     PID Name                 State      Parent
-------------------------------------------------
      15 /bin/cs              WAITING         0
      12 /bin/ipconfig        WAITING         0
       1 bash                 WAITING         0
     269 /bin/dropbear        WAITING         0
     284 strace               RUNNABLE_S    275
     274 /bin/dropbear        WAITING       269
     270 /bin/bash            WAITING         1
     285 vmrunkernel          RUNNING_S     284
     275 -sh                  WAITING       274
ROS(Core 0)> bt
Stack Backtrace on Core 0:
#01 [<0xffffffffc201ed74>] in mon_backtrace
#02 [<0xffffffffc201fd77>] in monitor
#03 [<0xffffffffc200ca1a>] in _panic
#04 [<0xffffffffc2134e9c>] in unload_guest_pcore
#05 [<0xffffffffc21320d8>] in arch_finalize_ctx
#06 [<0xffffffffc205d1bb>] in copy_current_ctx_to
#07 [<0xffffffffc204d70c>] in __notify
#08 [<0xffffffffc205d71f>] in process_routine_kmsg
#09 [<0xffffffffc2051665>] in proc_restartcore

Note that the lookup was using PID 284 (strace), but the VM was 285.

Signed-off-by: Barret Rhoden <brho@cs.berkeley.edu>
kern/arch/x86/trap.c
kern/src/syscall.c
kern/src/trap.c