arch/x86: Fix stack alignment for user threads

The x86_64 SysV ABI requires 16 byte alignment for the stack pointer
during execution of normal code.  That means that on entry to an
ABI-compatible C function (which is reached via a CALL instruction
that pushes the return address) the RSP register must be MISaligned by
exactly 8 bytes.  The kernel mode thread setup got this right, but we
missed the equivalent condition in userspace entry.

The end result was a misaligned stack, which is surprisingly robust
for most use.  But recent toolchains have starting doing some more
elaborate vectorization, and the resulting SSE instructions started
failing in userspace on the misaliged loads.

Note that there's a comment about optimization: we're doing the stack
alignment in the "wrong place" and are needlessly wasting bytes in
some cases.  We should see the raw stack boundaries where we are
setting up RSP values.  Add a FIXME to this effect, but don't touch
anything as this patch is a targeted bugfix.

Also fix a somewhat embarassing 32-bit-ism that would have truncated
the address of a userspace stack that we tried to put above 4G.

Fixes #31018

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2 files changed