kernel/work: Fix race under with delayed work item cancellation
The call to unschedule_locked() would return true ("successfully
unscheduled") even in the case where the underlying z_abort_timeout()
failed (because the callback was already unpended and
in-progress/complete/about-to-be-run, remember that timeout callbacks
are unsynchronized), leading to state bugs and races against the
callback behavior.
Correctly detect that case and propagate the error to the caller.
Fixes #51872
Signed-off-by: Andy Ross <andyross@google.com>
diff --git a/kernel/work.c b/kernel/work.c
index 6ef001f..08c9d47 100644
--- a/kernel/work.c
+++ b/kernel/work.c
@@ -917,10 +917,13 @@
bool ret = false;
struct k_work *work = &dwork->work;
- /* If scheduled, try to cancel. */
+ /* If scheduled, try to cancel. If it fails, that means the
+ * callback has been dequeued and will inevitably run (or has
+ * already run), so treat that as "undelayed" and return
+ * false.
+ */
if (flag_test_and_clear(&work->flags, K_WORK_DELAYED_BIT)) {
- z_abort_timeout(&dwork->timeout);
- ret = true;
+ ret = z_abort_timeout(&dwork->timeout) == 0;
}
return ret;