arch: POSIX: Fix race with unused threads

Fix a race which seems to have been presenting itself
very sporadically on loaded systems.
The race seems to have caused tests/kernel/sched/schedule_api
to fail at random on native_posix.

The case is a bit convoluted:
When the kernel calls z_new_thread(), the POSIX arch saves
the new thread entry call in that new Zephyr thread stack
together with a bit of extra info for the POSIX arch.
And spawns a new pthread (posix_thread_starter()) which
will eventually (after the Zephyr kernel swapped to it),
call that entry function.
(Note that in principle a thread spawned by pthreads may
be arbitrarily delayed)
The POSIX arch does not try to synchronize to that new
pthread (because why should it) until the first time the
Zephyr kernel tries to swap to that thread.
But, the kernel may never try to swap to it.
And therefore that thread's posix_thread_starter() may never
have got to run before the thread was aborted, and its
Zephyr stack reused for something else by the Zephyr app.

As posix_thread_starter() was relaying on looking into that
thread stack, it may now be looking into another thread stack
or anything else.

So, this commit fixes it by having posix_thread_starter()
get the input it always needs not from the Zephyr stack,
but from its own pthread_create() parameter pointing to a
structure kept by the POSIX arch.

Note that if the thread was aborted before reaching that point
posix_thread_starter() will NOT call the Zephyr thread entry
function, but just cleanup.

With this change all "asynchronous" parts of the POSIX arch
should relay only on the POSIX arch own structures.

Signed-off-by: Alberto Escolar Piedras <alpi@oticon.com>
1 file changed