kernel/sched: Use new barrier and spin APIs

The switch_handle field in the thread struct is used as an atomic flag
between CPUs in SMP, and has been known for a long time to technically
require memory barriers for correct operation.  We have an API for
that now, so put them in:

* The code immediately before arch_switch() needs a write barrier to
  ensure that thread state written by the scheduler is seen to happen
  before the outgoing thread is flagged with a valid switch handle.

* The loop in z_sched_switch_spin() needs a read barrier at the end,
  to make sure the calling context doesn't load state from before the
  other CPU stored the switch handle.

Also, that same spot in switch_spin was spinning with interrupts held,
which means it needs a call to arch_spin_relax() to avoid a FPU state
deadlock on some architectures.

Signed-off-by: Andy Ross <andyross@google.com>
diff --git a/kernel/include/kswap.h b/kernel/include/kswap.h
index 354ef06..fa1e538 100644
--- a/kernel/include/kswap.h
+++ b/kernel/include/kswap.h
@@ -8,6 +8,7 @@
 
 #include <ksched.h>
 #include <zephyr/spinlock.h>
+#include <zephyr/sys/barrier.h>
 #include <kernel_arch_func.h>
 
 #ifdef CONFIG_STACK_SENTINEL
@@ -48,10 +49,6 @@
  * treat this because the scheduler lock can't be released by the
  * switched-to thread, which is going to (obviously) be running its
  * own code and doesn't know it was switched out.
- *
- * Note: future SMP architectures may need a fence/barrier or cache
- * invalidation here.  Current ones don't, and sadly Zephyr doesn't
- * have a framework for that yet.
  */
 static inline void z_sched_switch_spin(struct k_thread *thread)
 {
@@ -59,8 +56,13 @@
 	volatile void **shp = (void *)&thread->switch_handle;
 
 	while (*shp == NULL) {
-		k_busy_wait(1);
+		arch_spin_relax();
 	}
+	/* Read barrier: don't allow any subsequent loads in the
+	 * calling code to reorder before we saw switch_handle go
+	 * non-null.
+	 */
+	barrier_dmem_fence_full();
 #endif
 }
 
@@ -154,8 +156,12 @@
 		void *newsh = new_thread->switch_handle;
 
 		if (IS_ENABLED(CONFIG_SMP)) {
-			/* Active threads MUST have a null here */
+			/* Active threads must have a null here.  And
+			 * it must be seen before the scheduler lock
+			 * is released!
+			 */
 			new_thread->switch_handle = NULL;
+			barrier_dmem_fence_full(); /* write barrier */
 		}
 		k_spin_release(&sched_spinlock);
 		arch_switch(newsh, &old_thread->switch_handle);