kernel/sched: Fix rare SMP deadlock
It was possible with pathological timing (see below) for the scheduler
to pick a cycle of threads on each CPU and enter the context switch
path on all of them simultaneously.
Example:
* CPU0 is idle, CPU1 is running thread A
* CPU1 makes high priority thread B runnable
* CPU1 reaches a schedule point (or returns from an interrupt) and
decides to run thread B instead
* CPU0 simultaneously takes its IPI and returns, selecting thread A
Now both CPUs enter wait_for_switch() to spin, waiting for the context
switch code on the other thread to finish and mark the thread
runnable. So we have a deadlock, each CPU is spinning waiting for the
other!
Actually, in practice this seems not to happen on existing hardware
platforms, it's only exercisable in emulation. The reason is that the
hardware IPI time is much faster than the software paths required to
reach a schedule point or interrupt exit, so CPU1 always selects the
newly scheduled thread and no deadlock appears. I tried for a bit to
make this happen with a cycle of three threads, but it's complicated
to get right and I still couldn't get the timing to hit correctly. In
qemu, though, the IPI is implemented as a Unix signal sent to the
thread running the other CPU, which is far slower and opens the window
to see this happen.
The solution is simple enough: don't store the _current thread in the
run queue until we are on the tail end of the context switch path,
after wait_for_switch() and going to reach the end in guaranteed time.
Note that this requires changing a little logic to handle the yield
case: because we can no longer rely on _current's position in the run
queue to suppress it, we need to do the priority comparison directly
based on the existing "swap_ok" flag (which has always meant
"yielded", and maybe should be renamed).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
3 files changed