2Barret Rhoden
   4Updated 2012-11-14
   6This document explains the basic ideas behind our "kernel messages" (KMSGs) and
   7some of the arcane bits behind the implementation.  These were formerly called
   8active messages, since they were an implementation of the low-level hardware
  13Our kernel messages are just work that is shipped remotely, delayed in time, or
  14both.  They currently consist of a function pointer and a few arguments.  Kernel
  15messages of a given type will be executed in order, with guaranteed delivery.
  17Initially, they were meant to be a way to immediately execute code on another
  18core (once interrupts are enabled), in the order in which the messages were
  19sent.  This is insufficient (and wasn't what we wanted for the task,
  20incidentally).  We simply want to do work on another core, but not necessarily
  21instantly.  And not necessarily on another core.
  23Currently, there are two types, distinguished by which list they are sent to per
  24core: immediate and routine.   Routine messages are often referred to as RKMs.
  25Immediate messages will get executed as soon as possible (once interrupts are
  26enabled).  Routine messages will be executed at convenient points in the kernel.
  27This includes when the kernel is about to pop back to userspace
  28(proc_restartcore()), or smp_idle()ing.  Routine messages are necessary when
  29their function does not return, such as a __launch_kthread.  They should also be
  30used if the work is not worth fully interrupting the kernel.  (An IPI will still
  31be sent, but the work will be delayed).  Finally, they should be used if their
  32work could affect currently executing kernel code (like a syscall).
  34For example, some older KMSGs such as __startcore used to not return and would
  35pop directly into user space.  This complicted the KMSG code quite a bit.  While
  36these functions now return, they still can't be immediate messages.  Proc
  37management KMSGs change the cur_ctx out from under a syscall, which can lead to
  38a bunch of issues.
  40Immediate kernel messages are executed in interrupt context, with interrupts
  41disabled.  Routine messages are only executed from places in the code where the
  42kernel doesn't care if the functions don't return or otherwise cause trouble.
  43This means RKMs aren't run in interrupt context in the kernel (or if the kernel
  44code itself traps).  We don't have a 'process context' like Linux does, instead
  45its more of a 'default context'.  That's where RKMs run, and they run with IRQs
  48RKMs can enable IRQs, or otherwise cause IRQs to be enabled.  __launch_kthread
  49is a good example: it runs a kthread, which may have had IRQs enabled.
  51With RKMs, there are no concerns about the kernel holding locks or otherwise
  52"interrupting" its own execution.  Routine messages are a little different than
  53just trapping into the kernel, since the functions don't have to return and may
  54result in clobbering the kernel stack.  Also note that this behavior is
  55dependent on where we call process_routine_kmsg().  Don't call it somewhere you
  56need to return to.
  58An example of an immediate message would be a TLB_shootdown.  Check current,
  59flush if applicable, and return.  It doesn't harm the kernel at all.  Another
  60example would be certain debug routines.
  64KMSGs have a long history tied to process management code.  The main issues were
  65related to which KMSG functions return and which ones mess with local state (like
  66clobbering cur_ctx or the owning_proc).  Returning was a big deal because you
  67can't just arbitrarily abandon a kernel context (locks or refcnts could be held,
  68etc).  This is why immediates must return.  Likewise, there are certain
  69invariants about what a core is doing that shouldn't be changed by an IRQ
  70handler (which is what an immed message really is).  See all the old proc
  71management commits if you want more info (check for changes to __startcore).
  73Other Uses:
  75Kernel messages will also be the basis for the alarm system.  All it is is
  76expressing work that needs to be done.  That being said, the k_msg struct will
  77probably receive a timestamp field, among other things.  Routine messages also
  78will replace the old workqueue, which hasn't really been used in 40 months or
  81To Return or Not:
  83Routine k_msgs do not have to return.  Immediate messages must.  The distinction
  84is in how they are sent (send_kernel_message() will take a flag), so be careful.
  86To retain some sort of sanity, the functions that do not return must adhere to
  87some rules.  At some point they need to end in a place where they check routine
  88messages or enable interrupts.  Simply calling smp_idle() will do this.  The
  89idea behind this is that routine messages will get processed once the kernel is
  90able to (at a convenient place). 
  92Missing Routine Messages:
  94It's important that the kernel always checks for routine messages before leaving
  95the kernel, either to halt the core or to pop into userspace.  There is a race
  96involved with messages getting posted after we check the list, but before we
  97pop/halt.  In that time, we send an IPI.  This IPI will force us back into the
  98kernel at some point in the code before process_routine_kmsg(), thus keeping us
  99from missing the RKM.
 101In the future, if we know the kernel code on a particular core is not attempting
 102to halt/pop, then we could avoid sending this IPI.  This is the essence of the
 103optimization in send_kernel_message() where we don't IPI ourselves.  A more
 104formal/thorough way to do this would be useful, both to avoid bugs and to
 105improve cross-core KMSG performance.
 107IRQ Trickiness:
 109You cannot enable interrupts in the handle_kmsg_ipi() handler, either in the
 110code or in any immediate kmsg.  Since we send the EOI before running the handler
 111(on x86), another IPI could cause us to reenter the handler, which would spin on
 112the lock the previous context is holding (nested IRQ stacks).  Using irqsave
 113locks is not sufficient, since they assume IRQs are not turned on in the middle
 114of their operation (such as in the body of an immediate kmsg).
 116Other Notes:
 118Unproven hunch, but the main performance bottleneck with multiple senders and
 119receivers of k_msgs will be the slab allocator.  We use the slab so we can
 120dynamically create the k_msgs (can pass them around easily, delay with them
 121easily (alarms), and most importantly we can't deadlock by running out of room
 122in a static buffer).
 124Architecture Dependence:
 126Some details will differ, based on architectural support.  For instance,
 127immediate messages can be implemented with true active messages.  Other systems
 128with maskable IPI vectors can use a different IPI for routine messages, and that
 129interrupt can get masked whenever we enter the kernel (note, that means making
 130every trap gate an interrupt gate), and we unmask that interrupt when we want to
 131process routine messages.
 133However, given the main part of kmsgs is arch-independent, I've consolidated all
 134of it in one location until we need to have separate parts of the implementation.