4ff457113e90a3e51de5b9f792f4ae3608e92382 - third_party/github/zephyrproject-rtos/zephyr

commit	4ff457113e90a3e51de5b9f792f4ae3608e92382	[log] [tgz]
author	Andy Ross <andrew.j.ross@intel.com>	Mon Feb 08 08:28:54 2021 -0800
committer	Anas Nashif <anas.nashif@intel.com>	Sun Feb 14 16:22:45 2021 -0500
tree	b3185faa115a237f52160dcb045c9e8da58766ea
parent	91946ef21c4a1653be862dbaeae90c55aeadd932 [diff]

kernel/sched: Fix rare SMP deadlock It was possible with pathological timing (see below) for the scheduler to pick a cycle of threads on each CPU and enter the context switch path on all of them simultaneously. Example: * CPU0 is idle, CPU1 is running thread A * CPU1 makes high priority thread B runnable * CPU1 reaches a schedule point (or returns from an interrupt) and decides to run thread B instead * CPU0 simultaneously takes its IPI and returns, selecting thread A Now both CPUs enter wait_for_switch() to spin, waiting for the context switch code on the other thread to finish and mark the thread runnable. So we have a deadlock, each CPU is spinning waiting for the other! Actually, in practice this seems not to happen on existing hardware platforms, it's only exercisable in emulation. The reason is that the hardware IPI time is much faster than the software paths required to reach a schedule point or interrupt exit, so CPU1 always selects the newly scheduled thread and no deadlock appears. I tried for a bit to make this happen with a cycle of three threads, but it's complicated to get right and I still couldn't get the timing to hit correctly. In qemu, though, the IPI is implemented as a Unix signal sent to the thread running the other CPU, which is far slower and opens the window to see this happen. The solution is simple enough: don't store the _current thread in the run queue until we are on the tail end of the context switch path, after wait_for_switch() and going to reach the end in guaranteed time. Note that this requires changing a little logic to handle the yield case: because we can no longer rely on _current's position in the run queue to suppress it, we need to do the priority comparison directly based on the existing "swap_ok" flag (which has always meant "yielded", and maybe should be renamed). Signed-off-by: Andy Ross <andrew.j.ross@intel.com>