kernel/queue: Fix spurious NULL exit condition when using timeouts

The queue loop when CONFIG_POLL is in used has an inherent race
between the return of k_poll() and the inspection of the list where no
lock can be held.  Other contending readers of the same queue can
sneak in and steal the item out of the list before the current thread
gets to the sys_sflist_get() call, and the current loop will (if it
has a timeout) spuriously return NULL before the timeout expires.

It's not even a hard race to exercise.  Consider three threads at
different priorities: High (which can be an ISR too), Mid, and Low:

1. Mid and Low both enter k_queue_get() and sleep inside k_poll() on
   an empty queue.

2. High comes along and calls k_queue_insert().  The queue code then
   wakes up Mid, and reschedules, but because High is still running Mid
   doesn't get to run yet.

3. High inserts a SECOND item.  The queue then unpends the next thread
   in the list (Low), and readies it to run.  But as before, it won't
   be scheduled yet.

4. Now High sleeps (or if it's an interrupt, exits), and Mid gets to
   run.  It dequeues and returns the item it was delivered normally.

5. But Mid is still running!  So it re-enters the loop it's sitting in
   and calls k_queue_get() again, which sees and returns the second
   item in the queue synchronously.  Then it calls it a third time and
   goes to sleep because the queue is empty.

6. Finally, Low wakes up to find an empty queue, and returns NULL
   despite the fact that the timeout hadn't expired.

The fix is simple enough: check the timeout expiration inside the loop
so we don't return early.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
1 file changed