doc/hardware/porting/arch.rst - third_party/github/zephyrproject-rtos/zephyr - Git at Google

 .. _architecture_porting_guide:

 Architecture Porting Guide
 ##########################

 An architecture port is needed to enable Zephyr to run on an :abbr:`ISA
 (instruction set architecture)` or an :abbr:`ABI (Application Binary
 Interface)` that is not currently supported.

 The following are examples of ISAs and ABIs that Zephyr supports:

 * x86_32 ISA with System V ABI
 * ARMv7-M ISA with Thumb2 instruction set and ARM Embedded ABI (aeabi)
 * ARCv2 ISA

 For information on Kconfig configuration, see
 :ref:`setting_configuration_values`. Architectures use a Kconfig configuration
 scheme similar to boards.

 An architecture port can be divided in several parts; most are required and
 some are optional:

 * **The early boot sequence**: each architecture has different steps it must
   take when the CPU comes out of reset (required).

 * **Interrupt and exception handling**: each architecture handles asynchronous
   and unrequested events in a specific manner (required).

 * **Thread context switching**: the Zephyr context switch is dependent on the
   ABI and each ISA has a different set of registers to save (required).

 * **Thread creation and termination**: A thread's initial stack frame is ABI
   and architecture-dependent, and thread abortion possibly as well (required).

 * **Device drivers**: most often, the system clock timer and the interrupt
   controller are tied to the architecture (some required, some optional).

 * **Utility libraries**: some common kernel APIs rely on a
   architecture-specific implementation for performance reasons (required).

 * **CPU idling/power management**: most architectures implement instructions
   for putting the CPU to sleep (partly optional, most likely very desired).

 * **Fault management**: for implementing architecture-specific debug help and
   handling of fatal error in threads (partly optional).

 * **Linker scripts and toolchains**: architecture-specific details will most
   likely be needed in the build system and when linking the image (required).

 Early Boot Sequence
 *******************

 The goal of the early boot sequence is to take the system from the state it is
 after reset to a state where is can run C code and thus the common kernel
 initialization sequence. Most of the time, very few steps are needed, while
 some architectures require a bit more work to be performed.

 Common steps for all architectures:

 * Setup an initial stack.
 * If running an :abbr:`XIP (eXecute-In-Place)` kernel, copy initialized data
 * from ROM to RAM.
 * If not using an ELF loader, zero the BSS section.
 * Jump to :code:`_Cstart()`, the early kernel initialization

   * :code:`_Cstart()` is responsible for context switching out of the fake
     context running at startup into the main thread.

 Some examples of architecture-specific steps that have to be taken:

 * If given control in real mode on x86_32, switch to 32-bit protected mode.
 * Setup the segment registers on x86_32 to handle boot loaders that leave them
   in an unknown or broken state.
 * Initialize a board-specific watchdog on Cortex-M3/4.
 * Switch stacks from MSP to PSP on Cortex-M.
 * Use a different approach than calling into _Swap() on Cortex-M to prevent
   race conditions.
 * Setup FIRQ and regular IRQ handling on ARCv2.

 Interrupt and Exception Handling
 ********************************

 Each architecture defines interrupt and exception handling differently.

 When a device wants to signal the processor that there is some work to be done
 on its behalf, it raises an interrupt. When a thread does an operation that is
 not handled by the serial flow of the software itself, it raises an exception.
 Both, interrupts and exceptions, pass control to a handler. The handler is
 known as an :abbr:`ISR (Interrupt Service Routine)` in the case of
 interrupts. The handler performs the work required by the exception or the
 interrupt.  For interrupts, that work is device-specific. For exceptions, it
 depends on the exception, but most often the core kernel itself is responsible
 for providing the handler.

 The kernel has to perform some work in addition to the work the handler itself
 performs. For example:

 * Prior to handing control to the handler:

   * Save the currently executing context.
   * Possibly getting out of power saving mode, which includes waking up
     devices.
   * Updating the kernel uptime if getting out of tickless idle mode.

 * After getting control back from the handler:

   * Decide whether to perform a context switch.
   * When performing a context switch, restore the context being context
     switched in.

 This work is conceptually the same across architectures, but the details are
 completely different:

 * The registers to save and restore.
 * The processor instructions to perform the work.
 * The numbering of the exceptions.
 * etc.

 It thus needs an architecture-specific implementation, called the
 interrupt/exception stub.

 Another issue is that the kernel defines the signature of ISRs as:

 .. code-block:: C

     void (*isr)(void *parameter)

 Architectures do not have a consistent or native way of handling parameters to
 an ISR. As such there are two commonly used methods for handling the
 parameter.

 * Using some architecture defined mechanism, the parameter value is forced in
   the stub. This is commonly found in X86-based architectures.

 * The parameters to the ISR are inserted and tracked via a separate table
   requiring the architecture to discover at runtime which interrupt is
   executing. A common interrupt handler demuxer is installed for all entries of
   the real interrupt vector table, which then fetches the device's ISR and
   parameter from the separate table. This approach is commonly used in the ARC
   and ARM architectures via the :kconfig:option:`CONFIG_GEN_ISR_TABLES` implementation.
   You can find examples of the stubs by looking at :code:`_interrupt_enter()` in
   x86, :code:`_IntExit()` in ARM, :code:`_isr_wrapper()` in ARM, or the full
   implementation description for ARC in :zephyr_file:`arch/arc/core/isr_wrapper.S`.

 Each architecture also has to implement primitives for interrupt control:

 * locking interrupts: :c:macro:`irq_lock()`, :c:macro:`irq_unlock()`.
 * registering interrupts: :c:macro:`IRQ_CONNECT()`.
 * programming the priority if possible :c:func:`irq_priority_set`.
 * enabling/disabling interrupts: :c:macro:`irq_enable()`, :c:macro:`irq_disable()`.

 .. note::

   :c:macro:`IRQ_CONNECT` is a macro that uses assembler and/or linker script
   tricks to connect interrupts at build time, saving boot time and text size.

 The vector table should contain a handler for each interrupt and exception that
 can possibly occur. The handler can be as simple as a spinning loop. However,
 we strongly suggest that handlers at least print some debug information. The
 information helps figuring out what went wrong when hitting an exception that
 is a fault, like divide-by-zero or invalid memory access, or an interrupt that
 is not expected (:dfn:`spurious interrupt`). See the ARM implementation in
 :zephyr_file:`arch/arm/core/aarch32/cortex_m/fault.c` for an example.

 Thread Context Switching
 ************************

 Multi-threading is the basic purpose to have a kernel at all. Zephyr supports
 two types of threads: preemptible and cooperative.

 Two crucial concepts when writing an architecture port are the following:

 * Cooperative threads run at a higher priority than preemptible ones, and
   always preempt them.

 * After handling an interrupt, if a cooperative thread was interrupted, the
   kernel always goes back to running that thread, since it is not preemptible.

 A context switch can happen in several circumstances:

 * When a thread executes a blocking operation, such as taking a semaphore that
   is currently unavailable.

 * When a preemptible thread unblocks a thread of higher priority by releasing
   the object on which it was blocked.

 * When an interrupt unblocks a thread of higher priority than the one currently
   executing, if the currently executing thread is preemptible.

 * When a thread runs to completion.

 * When a thread causes a fatal exception and is removed from the running
   threads. For example, referencing invalid memory,

 Therefore, the context switching must thus be able to handle all these cases.

 The kernel keeps the next thread to run in a "cache", and thus the context
 switching code only has to fetch from that cache to select which thread to run.

 There are two types of context switches: :dfn:`cooperative` and :dfn:`preemptive`.

 * A *cooperative* context switch happens when a thread willfully gives the
   control to another thread. There are two cases where this happens

   * When a thread explicitly yields.
   * When a thread tries to take an object that is currently unavailable and is
     willing to wait until the object becomes available.

 * A *preemptive* context switch happens either because an ISR or a
   thread causes an operation that schedules a thread of higher priority than the
   one currently running, if the currently running thread is preemptible.
   An example of such an operation is releasing an object on which the thread
   of higher priority was waiting.

 .. note::

   Control is never taken from cooperative thread when one of them is the
   running thread.

 A cooperative context switch is always done by having a thread call the
 :code:`_Swap()` kernel internal symbol. When :code:`_Swap` is called, the
 kernel logic knows that a context switch has to happen: :code:`_Swap` does not
 check to see if a context switch must happen. Rather, :code:`_Swap` decides
 what thread to context switch in. :code:`_Swap` is called by the kernel logic
 when an object being operated on is unavailable, and some thread
 yielding/sleeping primitives.

 .. note::

   On x86 and Nios2, :code:`_Swap` is generic enough and the architecture
   flexible enough that :code:`_Swap` can be called when exiting an interrupt
   to provoke the context switch. This should not be taken as a rule, since
   neither the ARM Cortex-M or ARCv2 port do this.

 Since :code:`_Swap` is cooperative, the caller-saved registers from the ABI are
 already on the stack. There is no need to save them in the k_thread structure.

 A context switch can also be performed preemptively. This happens upon exiting
 an ISR, in the kernel interrupt exit stub:

 * :code:`_interrupt_enter` on x86 after the handler is called.
 * :code:`_IntExit` on ARM.
 * :code:`_firq_exit` and :code:`_rirq_exit` on ARCv2.

 In this case, the context switch must only be invoked when the interrupted
 thread was preemptible, not when it was a cooperative one, and only when the
 current interrupt is not nested.

 The kernel also has the concept of "locking the scheduler". This is a concept
 similar to locking the interrupts, but lighter-weight since interrupts can
 still occur. If a thread has locked the scheduler, is it temporarily
 non-preemptible.

 So, the decision logic to invoke the context switch when exiting an interrupt
 is simple:

 * If the interrupted thread is not preemptible, do not invoke it.
 * Else, fetch the cached thread from the ready queue, and:

   * If the cached thread is not the current thread, invoke the context switch.
   * Else, do not invoke it.

 This is simple, but crucial: if this is not implemented correctly, the kernel
 will not function as intended and will experience bizarre crashes, mostly due
 to stack corruption.

 .. note::

   If running a coop-only system, i.e. if :kconfig:option:`CONFIG_NUM_PREEMPT_PRIORITIES`
   is 0, no preemptive context switch ever happens. The interrupt code can be
   optimized to not take any scheduling decision when this is the case.

 Thread Creation and Termination
 *******************************

 To start a new thread, a stack frame must be constructed so that the context
 switch can pop it the same way it would pop one from a thread that had been
 context switched out. This is to be implemented in an architecture-specific
 :code:`_new_thread` internal routine.

 The thread entry point is also not to be called directly, i.e. it should not be
 set as the :abbr:`PC (program counter)` for the new thread. Rather it must be
 wrapped in :code:`_thread_entry`. This means that the PC in the stack
 frame shall be set to :code:`_thread_entry`, and the thread entry point shall
 be passed as the first parameter to :code:`_thread_entry`. The specifics of
 this depend on the ABI.

 The need for an architecture-specific thread termination implementation depends
 on the architecture. There is a generic implementation, but it might not work
 for a given architecture.

 One reason that has been encountered for having an architecture-specific
 implementation of thread termination is that aborting a thread might be
 different if aborting because of a graceful exit or because of an exception.
 This is the case for ARM Cortex-M, where the CPU has to be taken out of handler
 mode if the thread triggered a fatal exception, but not if the thread
 gracefully exits its entry point function.

 This means implementing an architecture-specific version of
 :c:func:`k_thread_abort`, and setting the Kconfig option
 :kconfig:option:`CONFIG_ARCH_HAS_THREAD_ABORT` as needed for the architecture (e.g. see
 :zephyr_file:`arch/arm/core/aarch32/cortex_m/Kconfig`).

 Thread Local Storage
 ********************

 To enable thread local storage on a new architecture:

 #. Implement :c:func:`arch_tls_stack_setup` to setup the TLS storage area in
    stack. Refer to the toolchain documentation on how the storage area needs
    to be structured. Some helper functions can be used:

    * Function :c:func:`z_tls_data_size` returns the size
      needed for thread local variables (excluding any extra data required by
      toolchain and architecture).
    * Function :c:func:`z_tls_copy` prepares the TLS storage area for
      thread local variables. This only copies the variable themselves and
      does not do architecture and/or toolchain specific data.

 #. In the context switching, grab the ``tls`` field inside the new thread's
    ``struct k_thread`` and put it into an appropriate register (or some
    other variable) for access to the TLS storage area. Refer to toolchain
    and architecture documentation on which registers to use.
 #. In kconfig, add ``select CONFIG_ARCH_HAS_THREAD_LOCAL_STORAGE`` to
    kconfig related to the new architecture.
 #. Run the ``tests/kernel/threads/tls`` to make sure the new code works.

 Device Drivers
 **************

 The kernel requires very few hardware devices to function. In theory, the only
 required device is the interrupt controller, since the kernel can run without a
 system clock. In practice, to get access to most, if not all, of the sanity
 check test suite, a system clock is needed as well. Since these two are usually
 tied to the architecture, they are part of the architecture port.

 Interrupt Controllers
 =====================

 There can be significant differences between the interrupt controllers and the
 interrupt concepts across architectures.

 For example, x86 has the concept of an :abbr:`IDT (Interrupt Descriptor Table)`
 and different interrupt controllers. The position of an interrupt in the IDT
 determines its priority.

 On the other hand, the ARM Cortex-M has the :abbr:`NVIC (Nested Vectored
 Interrupt Controller)` as part of the architecture definition. There is no need
 for an IDT-like table that is separate from the NVIC vector table. The position
 in the table has nothing to do with priority of an IRQ: priorities are
 programmable per-entry.

 The ARCv2 has its interrupt unit as part of the architecture definition, which
 is somewhat similar to the NVIC. However, where ARC defines interrupts as
 having a one-to-one mapping between exception and interrupt numbers (i.e.
 exception 1 is IRQ1, and device IRQs start at 16), ARM has IRQ0 being
 equivalent to exception 16 (and weirdly enough, exception 1 can be seen as
 IRQ-15).

 All these differences mean that very little, if anything, can be shared between
 architectures with regards to interrupt controllers.

 System Clock
 ============

 x86 has APIC timers and the HPET as part of its architecture definition. ARM
 Cortex-M has the SYSTICK exception. Finally, ARCv2 has the timer0/1 device.

 Kernel timeouts are handled in the context of the system clock timer driver's
 interrupt handler.


 Console Over Serial Line
 ========================

 There is one other device that is almost a requirement for an architecture
 port, since it is so useful for debugging. It is a simple polling, output-only,
 serial port driver on which to send the console (:code:`printk`,
 :code:`printf`) output.

 It is not required, and a RAM console (:kconfig:option:`CONFIG_RAM_CONSOLE`)
 can be used to send all output to a circular buffer that can be read
 by a debugger instead.

 Utility Libraries
 *****************

 The kernel depends on a few functions that can be implemented with very few
 instructions or in a lock-less manner in modern processors. Those are thus
 expected to be implemented as part of an architecture port.

 * Atomic operators.

   * If instructions do exist for a given architecture, the implementation is
     configured using the :kconfig:option:`CONFIG_ATOMIC_OPERATIONS_ARCH` Kconfig
     option.

   * If instructions do not exist for a given architecture,
     a generic version that wraps :c:func:`irq_lock` or :c:func:`irq_unlock`
     around non-atomic operations exists. It is configured using the
     :kconfig:option:`CONFIG_ATOMIC_OPERATIONS_C` Kconfig option.

 * Find-least-significant-bit-set and find-most-significant-bit-set.

   * If instructions do not exist for a given architecture, it is always
     possible to implement these functions as generic C functions.

 It is possible to use compiler built-ins to implement these, but be careful
 they use the required compiler barriers.

 CPU Idling/Power Management
 ***************************

 The kernel provides support for CPU power management with two functions:
 :c:func:`arch_cpu_idle` and :c:func:`arch_cpu_atomic_idle`.

 :c:func:`arch_cpu_idle` can be as simple as calling the power saving
 instruction for the architecture with interrupts unlocked, for example
 :code:`hlt` on x86, :code:`wfi` or :code:`wfe` on ARM, :code:`sleep` on ARC.
 This function can be called in a loop within a context that does not care if it
 get interrupted or not by an interrupt before going to sleep. There are
 basically two scenarios when it is correct to use this function:

 * In a single-threaded system, in the only thread when the thread is not used
   for doing real work after initialization, i.e. it is sitting in a loop doing
   nothing for the duration of the application.

 * In the idle thread.

 :c:func:`arch_cpu_atomic_idle`, on the other hand, must be able to atomically
 re-enable interrupts and invoke the power saving instruction. It can thus be
 used in real application code, again in single-threaded systems.

 Normally, idling the CPU should be left to the idle thread, but in some very
 special scenarios, these APIs can be used by applications.

 Both functions must exist for a given architecture. However, the implementation
 can be simply the following steps, if desired:

 #. unlock interrupts
 #. NOP

 However, a real implementation is strongly recommended.

 Fault Management
 ****************

 In the event of an unhandled CPU exception, the architecture
 code must call into :c:func:`z_fatal_error`.  This function dumps
 out architecture-agnostic information and makes a policy
 decision on what to do next by invoking :c:func:`k_sys_fatal_error`.
 This function can be overridden to implement application-specific
 policies that could include locking interrupts and spinning forever
 (the default implementation) or even powering off the
 system (if supported).

 Toolchain and Linking
 *********************

 Toolchain support has to be added to the build system.

 Some architecture-specific definitions are needed in :zephyr_file:`include/zephyr/toolchain/gcc.h`.
 See what exists in that file for currently supported architectures.

 Each architecture also needs its own linker script, even if most sections can
 be derived from the linker scripts of other architectures. Some sections might
 be specific to the new architecture, for example the SCB section on ARM and the
 IDT section on x86.

 Memory Management
 *****************

 If the target platform enables paging and requires drivers to memory-map
 their I/O regions, :kconfig:option:`CONFIG_MMU` needs to be enabled and the
 following API implemented:

 - :c:func:`arch_mem_map`
 - :c:func:`arch_mem_unmap`
 - :c:func:`arch_page_phys_get`

 Stack Objects
 *************

 The presence of memory protection hardware affects how stack objects are
 created. All architecture ports must specify the required alignment of the
 stack pointer, which is some combination of CPU and ABI requirements. This
 is defined in architecture headers with :c:macro:`ARCH_STACK_PTR_ALIGN` and
 is typically something small like 4, 8, or 16 bytes.

 Two types of thread stacks exist:

 - "kernel" stacks defined with :c:macro:`K_KERNEL_STACK_DEFINE()` and related
   APIs, which can host kernel threads running in supervisor mode or
   used as the stack for interrupt/exception handling. These have significantly
   relaxed alignment requirements and use less reserved data. No memory is
   reserved for privilege elevation stacks.

 - "thread" stacks which typically use more memory, but are capable of hosting
   thread running in user mode, as well as any use-cases for kernel stacks.

 If :kconfig:option:`CONFIG_USERSPACE` is not enabled, "thread" and "kernel" stacks are
 equivalent.

 Additional macros may be defined in the architecture layer to specify
 the alignment of the base of stack objects, any reserved data inside the
 stack object not used for the thread's stack buffer, and how to round up
 stack sizes to support user mode threads. In the absence of definitions
 some defaults are assumed:

 - :c:macro:`ARCH_KERNEL_STACK_RESERVED`: default no reserved space
 - :c:macro:`ARCH_THREAD_STACK_RESERVED`: default no reserved space
 - :c:macro:`ARCH_KERNEL_STACK_OBJ_ALIGN`: default align to
   :c:macro:`ARCH_STACK_PTR_ALIGN`
 - :c:macro:`ARCH_THREAD_STACK_OBJ_ALIGN`: default align to
   :c:macro:`ARCH_STACK_PTR_ALIGN`
 - :c:macro:`ARCH_THREAD_STACK_SIZE_ALIGN`: default round up to
   :c:macro:`ARCH_STACK_PTR_ALIGN`

 All stack creation macros are defined in terms of these.

 Stack objects all have the following layout, with some regions potentially
 zero-sized depending on configuration. There are always two main parts:
 reserved memory at the beginning, and then the stack buffer itself. The
 bounds of some areas can only be determined at runtime in the context of
 its associated thread object. Other areas are entirely computable at build
 time.

 Some architectures may need to carve-out reserved memory at runtime from the
 stack buffer, instead of unconditionally reserving it at build time, or to
 supplement an existing reserved area (as is the case with the ARM FPU).
 Such carve-outs will always be tracked in ``thread.stack_info.start``.
 The region specified by	``thread.stack_info.start`` and
 ``thread.stack_info.size`` is always fully accessible by a user mode thread.
 ``thread.stack_info.delta`` denotes an offset which can be used to compute
 the initial stack pointer from the very end of the stack object, taking into
 account storage for TLS and ASLR random offsets.

 ::

 	+---------------------+ <- thread.stack_obj
 	| Reserved Memory     | } K_(THREAD|KERNEL)_STACK_RESERVED
 	+---------------------+
 	| Carved-out memory   |
 	|.....................| <- thread.stack_info.start
 	| Unused stack buffer |
 	|                     |
 	|.....................| <- thread's current stack pointer
 	| Used stack buffer   |
 	|                     |
 	|.....................| <- Initial stack pointer. Computable
 	| ASLR Random offset  |      with thread.stack_info.delta
 	+---------------------| <- thread.userspace_local_data
 	| Thread-local data   |
 	+---------------------+ <- thread.stack_info.start +
 	                             thread.stack_info.size


 At present, Zephyr does not support stacks that grow upward.

 No Memory Protection
 ====================

 If no memory protection is in use, then the defaults are sufficient.

 HW-based stack overflow detection
 =================================

 This option uses hardware features to generate a fatal error if a thread
 in supervisor mode overflows its stack. This is useful for debugging, although
 for a couple reasons, you can't reliably make any assertions about the state
 of the system after this happens:

 * The kernel could have been inside a critical section when the overflow
   occurs, leaving important global data structures in a corrupted state.

 * For systems that implement stack protection using a guard memory region,
   it's possible to overshoot the guard and corrupt adjacent data structures
   before the hardware detects this situation.

 To enable the :kconfig:option:`CONFIG_HW_STACK_PROTECTION` feature, the system must
 provide some kind of hardware-based stack overflow protection, and enable the
 :kconfig:option:`CONFIG_ARCH_HAS_STACK_PROTECTION` option.

 Two forms of HW-based stack overflow detection are supported: dedicated
 CPU features for this purpose, or special read-only guard regions immediately
 preceding stack buffers.

 :kconfig:option:`CONFIG_HW_STACK_PROTECTION` only catches stack overflows for
 supervisor threads. This is not required to catch stack overflow from user
 threads; :kconfig:option:`CONFIG_USERSPACE` is orthogonal.

 This feature only detects supervisor mode stack overflows, including stack
 overflows when handling system calls. It doesn't guarantee that the kernel has
 not been corrupted. Any stack overflow in supervisor mode should be treated as
 a fatal error, with no assertions about the integrity of the overall system
 possible.

 Stack overflows in user mode are recoverable (from the kernel's perspective)
 and require no special configuration; :kconfig:option:`CONFIG_HW_STACK_PROTECTION`
 only applies to catching overflows when the CPU is in supervisor mode.

 CPU-based stack overflow detection
 ----------------------------------

 If we are detecting stack overflows in supervisor mode via special CPU
 registers (like ARM's SPLIM), then the defaults are sufficient.


 Guard-based stack overflow detection
 ------------------------------------

 We are detecting supervisor mode stack overflows via special memory protection
 region located immediately before the stack buffer that generates an exception
 on write. Reserved memory will be used for the guard region.

 :c:macro:`ARCH_KERNEL_STACK_RESERVED` should be defined to the minimum size
 of a memory protection region. On most ARM CPUs this is 32 bytes.
 :c:macro:`ARCH_KERNEL_STACK_OBJ_ALIGN` should also be set to the required
 alignment for this region.

 MMU-based systems should not reserve RAM for the guard region and instead
 simply leave an non-present virtual page below every stack when it is mapped
 into the address space. The stack object will still need to be properly aligned
 and sized to page granularity.

 ::

    +-----------------------------+ <- thread.stack_obj
    | Guard reserved memory       | } K_KERNEL_STACK_RESERVED
    +-----------------------------+
    | Guard carve-out             |
    |.............................| <- thread.stack_info.start
    | Stack buffer                |
    .                             .

 Guard carve-outs for kernel stacks are uncommon and should be avoided if
 possible. They tend to be needed for two situations:

 * The same stack may be re-purposed to host a user thread, in which case
   the guard is unnecessary and shouldn't be unconditionally reserved.
   This is the case when privilege elevation stacks are not inside the stack
   object.

 * The required guard size is variable and depends on context. For example, some
   ARM CPUs have lazy floating point stacking during exceptions and may
   decrement the stack pointer by a large value without writing anything,
   completely overshooting a minimally-sized guard and corrupting adjacent
   memory. Rather than unconditionally reserving a larger guard, the extra
   memory is carved out if the thread uses floating point.

 User mode enabled
 =================

 Enabling user mode activates two new requirements:

 * A separate fixed-sized privilege mode stack, specified by
   :kconfig:option:`CONFIG_PRIVILEGED_STACK_SIZE`, must be allocated that the user
   thread cannot access. It is used as the stack by the kernel when handling
   system calls. If stack guards are implemented, a stack guard region must
   be able to be placed before it, with support for carve-outs if necessary.

 * The memory protection hardware must be able to program a region that exactly
   covers the thread's stack buffer, tracked in ``thread.stack_info``. This
   implies that :c:macro:`ARCH_THREAD_STACK_SIZE_ADJUST()` will need to round
   up the requested stack size so that a region may cover it, and that
   :c:macro:`ARCH_THREAD_STACK_OBJ_ALIGN()` is also specified per the
   granularity of the memory protection hardware.

 This becomes more complicated if the memory protection hardware requires that
 all memory regions be sized to a power of two, and aligned to their own size.
 This is common on older MPUs and is known with
 :kconfig:option:`CONFIG_MPU_REQUIRES_POWER_OF_TWO_ALIGNMENT`.

 ``thread.stack_info`` always tracks the user-accessible part of the stack
 object, it must always be correct to program a memory protection region with
 user access using the range stored within.

 Non power-of-two memory region requirements
 -------------------------------------------

 On systems without power-of-two region requirements, the reserved memory area
 for threads stacks defined by :c:macro:`K_THREAD_STACK_RESERVED` may be used to
 contain the privilege mode stack. The layout could be something like:

 ::

    +------------------------------+ <- thread.stack_obj
    | Other platform data          |
    +------------------------------+
    | Guard region (if enabled)    |
    +------------------------------+
    | Guard carve-out (if needed)  |
    |..............................|
    | Privilege elevation stack    |
    +------------------------------| <- thread.stack_obj +
    | Stack buffer                 |      K_THREAD_STACK_RESERVED =
    .                              .      thread.stack_info.start

 The guard region, and any carve-out (if needed) would be configured as a
 read-only region when the thread is created.

 * If the thread is a supervisor thread, the privilege elevation region is just
   extra stack memory. An overflow will eventually crash into the guard region.

 * If the thread is running in user mode, a memory protection region will be
   configured to allow user threads access to the stack buffer, but nothing
   before or after it. An overflow in user mode will crash into the privilege
   elevation stack, which the user thread has no access to. An overflow when
   handling a system call will crash into the guard region.

 On an MMU system there should be no physical guards; the privilege mode stack
 will be mapped into kernel memory, and the stack buffer in the user part of
 memory, each with non-present virtual guard pages below them to catch runtime
 stack overflows.

 Other platform data may be stored before the guard region, but this is highly
 discouraged if such data could be stored in ``thread.arch`` somewhere.

 :c:macro:`ARCH_THREAD_STACK_RESERVED` will need to be defined to capture
 the size of the reserved region containing platform data, privilege elevation
 stacks, and guards. It must be appropriately sized such that an MPU region
 to grant user mode access to the stack buffer can be placed immediately
 after it.

 Power-of-two memory region requirements
 ---------------------------------------

 Thread stack objects must be sized and aligned to the same power of two,
 without any reserved memory to allow efficient packing in memory. Thus,
 any guards in the thread stack must be completely carved out, and the
 privilege elevation stack must be allocated elsewhere.

 :c:macro:`ARCH_THREAD_STACK_SIZE_ADJUST()` and
 :c:macro:`ARCH_THREAD_STACK_OBJ_ALIGN()` should both be defined to
 :c:macro:`Z_POW2_CEIL()`. :c:macro:`K_THREAD_STACK_RESERVED` must be 0.

 For the privilege stacks, the :kconfig:option:`CONFIG_GEN_PRIV_STACKS` must be,
 enabled. For every thread stack found in the system, a corresponding fixed-
 size kernel stack used for handling system calls is generated. The address
 of the privilege stacks can be looked up quickly at runtime based on the
 thread stack address using :c:func:`z_priv_stack_find()`. These stacks are
 laid out the same way as other kernel-only stacks.

 ::

    +-----------------------------+ <- z_priv_stack_find(thread.stack_obj)
    | Reserved memory             | } K_KERNEL_STACK_RESERVED
    +-----------------------------+
    | Guard carve-out (if needed) |
    |.............................|
    | Privilege elevation stack   |
    |                             |
    +-----------------------------+ <- z_priv_stack_find(thread.stack_obj) +
                                         K_KERNEL_STACK_RESERVED +
                                         CONFIG_PRIVILEGED_STACK_SIZE

    +-----------------------------+ <- thread.stack_obj
    | MPU guard carve-out         |
    | (supervisor mode only)      |
    |.............................| <- thread.stack_info.start
    | Stack buffer                |
    .                             .

 The guard carve-out in the thread stack object is only used if the thread is
 running in supervisor mode. If the thread drops to user mode, there is no guard
 and the entire object is used as the stack buffer, with full access to the
 associated user mode thread and ``thread.stack_info`` updated appropriately.

 User Mode Threads
 *****************

 To support user mode threads, several kernel-to-arch APIs need to be
 implemented, and the system must enable the :kconfig:option:`CONFIG_ARCH_HAS_USERSPACE`
 option. Please see the documentation for each of these functions for more
 details:

 * :c:func:`arch_buffer_validate` to test whether the current thread has
   access permissions to a particular memory region

 * :c:func:`arch_user_mode_enter` which will irreversibly drop a supervisor
   thread to user mode privileges. The stack must be wiped.

 * :c:func:`arch_syscall_oops` which generates a kernel oops when system
   call parameters can't be validated, in such a way that the oops appears to be
   generated from where the system call was invoked in the user thread

 * :c:func:`arch_syscall_invoke0` through
   :c:func:`arch_syscall_invoke6` invoke a system call with the
   appropriate number of arguments which must all be passed in during the
   privilege elevation via registers.

 * :c:func:`arch_is_user_context` return nonzero if the CPU is currently
   running in user mode

 * :c:func:`arch_mem_domain_max_partitions_get` which indicates the max
   number of regions for a memory domain. MMU systems have an unlimited amount,
   MPU systems have constraints on this.

 Some architectures may need to update software memory management structures
 or modify hardware registers on another CPU when memory domain APIs are invoked.
 If so, :kconfig:option:`CONFIG_ARCH_MEM_DOMAIN_SYNCHRONOUS_API` must be selected by the
 architecture and some additional APIs must be implemented. This is common
 on MMU systems and uncommon on MPU systems:

 * :c:func:`arch_mem_domain_thread_add`

 * :c:func:`arch_mem_domain_thread_remove`

 * :c:func:`arch_mem_domain_partition_add`

 * :c:func:`arch_mem_domain_partition_remove`

 Please see the doxygen documentation of these APIs for details.

 In addition to implementing these APIs, there are some other tasks as well:

 * :c:func:`_new_thread` needs to spawn threads with :c:macro:`K_USER` in
   user mode

 * On context switch, the outgoing thread's stack memory should be marked
   inaccessible to user mode by making the appropriate configuration changes in
   the memory management hardware.. The incoming thread's stack memory should
   likewise be marked as accessible. This ensures that threads can't mess with
   other thread stacks.

 * On context switch, the system needs to switch between memory domains for
   the incoming and outgoing threads.

 * Thread stack areas must include a kernel stack region. This should be
   inaccessible to user threads at all times. This stack will be used when
   system calls are made. This should be fixed size for all threads, and must
   be large enough to handle any system call.

 * A software interrupt or some kind of privilege elevation mechanism needs to
   be established. This is closely tied to how the _arch_syscall_invoke macros
   are implemented. On system call, the appropriate handler function needs to
   be looked up in _k_syscall_table. Bad system call IDs should jump to the
   :c:enum:`K_SYSCALL_BAD` handler. Upon completion of the system call, care
   must be taken not to leak any register state back to user mode.

 GDB Stub
 ********

 To enable GDB stub for remote debugging on a new architecture:

 #. Create a new ``gdbstub.h`` header file under appropriate architecture
    include directory (``include/arch/<arch>/gdbstub.h``).

    * Create a new struct ``struct gdb_ctx`` as the GDB context.

      * Must define a member named ``exception`` of type ``unsigned int`` to
        store the GDB exception reason. This value needs to be set before
        entering :c:func:`z_gdb_main_loop`.

      * Architecture can define as many members as needed for GDB stub to
        function.

      * Pointer to this struct needs to be passed to :c:func:`z_gdb_main_loop`,
        where this pointer will be passed to other GDB stub functions.

 #. Functions for entering and exiting GDB stub main loop.

    * If the architecture relies on interrupts to service breakpoints,
      interrupt service routines (ISR) need to be implemented, which
      will serve as the entry point to GDB stub main loop.

    * These functions need to save and restore context so code execution
      can continue as if no breakpoints have been encountered.

    * These functions need to call :c:func:`z_gdb_main_loop` after saving
      execution context to go into the GDB stub main loop to receive commands
      from GDB.

    * Before calling :c:func:`z_gdb_main_loop`, :c:member:`gdb_ctx.exception`
      must be set to specify the exception reason.

 #. Implement necessary functions to support GDB stub functionality:

    * :c:func:`arch_gdb_init`

      * This needs to initialize necessary bits to support GDB stub functionality,
        for example, setting up the GDB context and connecting debug interrupts.

      * This must stop code execution via architecture specific method (e.g.
        raising debug interrupts). This allows GDB to connect during boot.

    * :c:func:`arch_gdb_continue`

      * This function is called when GDB sends a ``c`` or ``continue`` command
        to continue code execution.

    * :c:func:`arch_gdb_step`

      * This function is called when GDB sends a ``si`` or ``stepi`` command
        to execute one machine instruction, before returning to GDB prompt.

    * Hardware register read/write functions:

      * Since the GDB stub is running on the target, manipulation of hardware
        registers need to cached to avoid affecting the execution of GDB stub.
        Think of it as context switching, where the execution context is
        changed to the GDB stub. So that the register values of the running
        thread before context switch need to be stored. Manipulation of
        register values must only be done to this cached copy. The updated
        values will then be written to hardware registers before switching
        back to the previous running thread.

      * :c:func:`arch_gdb_reg_readall`

        * This collects all hardware register values that would appear in
          a ``g``/``G`` packets which will be sent back to GDB. The format of
          the G-packet is architecture specific. Consult GDB on what is
          expected.

        * Note that, for most architectures, a valid G-packet must be returned
          and sent to GDB. If a packet without incorrect length is sent to
          GDB, GDB will abort the debugging session.

      * :c:func:`arch_gdb_reg_writeall`

        * This takes a G-packet sent by GDB and populates the hardware
          registers with values from the G-packet.

      * :c:func:`arch_gdb_reg_readone`

        * This reads the value of one hardware register and sends
          the result to GDB.

      * :c:func:`arch_gdb_reg_writeone`

        * This writes the value of one hardware register received from GDB.

    * Breakpoints:

      * :c:func:`arch_gdb_add_breakpoint` and
        :c:func:`arch_gdb_remove_breakpoint`

      * GDB may decide to use software breakpoints which modifies
        the memory at the breakpoint locations to replace the instruction
        with software breakpoint or trap instructions. GDB will then
        restore the memory content once execution reaches the breakpoints.
        GDB supports this by default and there is usually no need to
        handle software breakpoints in the architecture code (where
        breakpoint type is ``0``).

      * Hardware breakpoints (type ``1``) are required if the code is
        in ROM or flash that cannot be modified at runtime. Consult
        the architecture datasheet on how to enable hardware breakpoints.

      * If hardware breakpoints are not supported by the architecture,
        there is no need to implement these in architecture code.
        GDB will then rely on software breakpoints.

 #. For architecture where certain memory regions are not accessible,
    an array named :c:var:`gdb_mem_region_array` of type
    :c:struct:`gdb_mem_region` needs to be defined to specify regions
    that are accessible. For each array item:

    * :c:member:`gdb_mem_region.start` specifies the start of a memory
      region.

    * :c:member:`gdb_mem_region.end` specifies the end of a memory
      region.

    * :c:member:`gdb_mem_region.attributes` specifies the permission
      of a memory region.

      * :c:macro:`GDB_MEM_REGION_RO`: region is read-only.

      * :c:macro:`GDB_MEM_REGION_RW`: region is read-write.

    * :c:member:`gdb_mem_region.alignment` specifies read/write alignment
      of a memory region. Use ``0`` if there is no alignment requirement
      and read/write can be done byte-by-byte.

 API Reference
 *************

 Timing
 ======

 .. doxygengroup:: arch-timing

 Threads
 =======

 .. doxygengroup:: arch-threads

 .. doxygengroup:: arch-tls

 Power Management
 ================

 .. doxygengroup:: arch-pm

 Symmetric Multi-Processing
 ==========================

 .. doxygengroup:: arch-smp

 Interrupts
 ==========

 .. doxygengroup:: arch-irq

 Userspace
 =========

 .. doxygengroup:: arch-userspace

 Memory Management
 =================

 .. doxygengroup:: arch-mmu

 Miscellaneous Architecture APIs
 ===============================

 .. doxygengroup:: arch-misc

 GDB Stub APIs
 =============

 .. doxygengroup:: arch-gdbstub