kernel/sched: Add "thread_usage" API for thread runtime cycle monitoring

This is an alternate backend that does what THREAD_RUNTIME_STATS is
doing currently, but with a few advantages:

* Correctly synchronized: you can't race against a running thread
  (potentially on another CPU!) while querying its usage.

* Realtime results: you get the right answer always, up to timer
  precision, even if a thread has been running for a while
  uninterrupted and hasn't updated its total.

* Portable, no need for per-architecture code at all for the simple
  case. (It leverages the USE_SWITCH layer to do this, so won't work
  on older architectures)

* Faster/smaller: minimizes use of 64 bit math; lower overhead in
  thread struct (keeps the scratch "started" time in the CPU struct
  instead).  One 64 bit counter per thread and a 32 bit scratch
  register in the CPU struct.

* Standalone.  It's a core (but optional) scheduler feature, no
  dependence on para-kernel configuration like the tracing
  infrastructure.

* More precise: allows architectures to optionally call a trivial
  zero-argument/no-result cdecl function out of interrupt entry to
  avoid accounting for ISR runtime in thread totals.  No configuration
  needed here, if it's called then you get proper ISR accounting, and
  if not you don't.

For right now, pending unification, it's added side-by-side with the
older API and left as a z_*() internal symbol.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
6 files changed