kernel/sched: Address thread abort termination delay issue on SMP
It's possible for a thread to abort itself simultaneously with an
external abort from another thread. In fact in our test suite this is
a common thing, as ztest will abort its own spawend threads at the end
of a test, as they tend to be exiting on their own.
When that happens, the thread marks itself DEAD and does all its
scheduler bookeeping, but it is STILL RUNNING on its own stack until
it makes its way to its final swap. The external context would see
that "dead" metadata and return from k_thread_abort(), allowing the
next test to reuse and spawn the same thread struct while the old
context was still running. Obviously that's bad.
Unfortunately, this is impossible to address completely without
modifying every SMP architecture to add a API-visible hook to every
swap that signals completion. In practice the best we can do is add a
delay. But note the optimization: almost always, the scheduler IPI
catches the running thread and kills it from interrupt context
(i.e. on a different stack). When that happens, we know that the
interrupted thread will never be resumed (because it's dead) and can
elide the delay. We only pay the cost when we actually detect a race.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2 files changed