kernel/swap: Add SMP "wait for switch" synchronization

On SMP, there is an inherent race when swapping: the old thread adds
itself back to the run queue before calling into the arch layer to do
the context switch.  The former is properly synchronized under the
scheduler lock, and the later operates with interrupts locally
disabled.  But until somewhere in the middle of arch_switch(), the old
thread (that is in the run queue!) does not have complete saved state
that can be restored.

So it's possible for another CPU to grab a thread before it is saved
and try to restore its unsaved register contents (which are garbage --
typically whatever state it had at the last interrupt).

Fix this by leveraging the "swapped_from" pointer already passed to
arch_switch() as a synchronization primitive.  When the switch
implementation writes the new handle value, we know the switch is
complete.  Then we can wait for that in z_swap() and at interrupt
exit.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
diff --git a/kernel/thread.c b/kernel/thread.c
index 527980d..b57dc31 100644
--- a/kernel/thread.c
+++ b/kernel/thread.c
@@ -525,6 +525,11 @@
 	arch_new_thread(new_thread, stack, stack_size, entry, p1, p2, p3,
 			  prio, options);
 
+#ifdef CONFIG_SMP
+	/* switch handle must be non-null except when inside z_swap() */
+	new_thread->switch_handle = new_thread;
+#endif
+
 #ifdef CONFIG_THREAD_USERSPACE_LOCAL_DATA
 #ifndef CONFIG_THREAD_USERSPACE_LOCAL_DATA_ARCH_DEFER_SETUP
 	/* don't set again if the arch's own code in arch_new_thread() has