Add a CFI tester to CHECK_ABI.

This uses the x86 trap flag and libunwind to test CFI works at each
instruction. For now, it just uses the system one out of pkg-config and
disables unwind tests if unavailable. We'll probably want to stick a
copy into //third_party and perhaps try the LLVM one later.

This tester caught two bugs in P-256 CFI annotations already:
I47b5f9798b3bcee1748e537b21c173d312a14b42 and
I9f576d868850312d6c14d1386f8fbfa85021b347

An earlier design used PTRACE_SINGLESTEP with libunwind's remote
unwinding features. ptrace is a mess around stop signals (see group-stop
discussion in ptrace(2)) and this is 10x faster, so I went with it. The
question of which is more future-proof is complex:

- There are two libunwinds with the same API,
  https://www.nongnu.org/libunwind/ and LLVM's. This currently uses the
  system nongnu.org for convenience. In future, LLVM's should be easier
  to bundle (less complex build) and appears to even support Windows,
  but I haven't tested this.  Moreover, setting the trap flag keeps the
  test single-process, which is less complex on Windows. That suggests
  the trap flag design and switching to LLVM later. However...

- Not all architectures have a trap flag settable by userspace. As far
  as I can tell, ARMv8's PSTATE.SS can only be set from the kernel. If
  we stick with nongnu.org libunwind, we can use PTRACE_SINGLESTEP and
  remote unwinding. Or we implement it for LLVM. Another thought is for
  the ptracer to bounce SIGTRAP back into the process, to share the
  local unwinding code.

- ARMv7 has no trap flag at all and PTRACE_SINGLESTEP fails. Debuggers
  single-step by injecting breakpoints instead. However, ARMv8's trap
  flag seems to work in both AArch32 and AArch64 modes, so we may be
  able to condition it on a 64-bit kernel.

Sadly, neither strategy works with Intel SDE. Adding flags to cpucap
vectors as we do with ARM would help, but it would not emulate CPUs
newer than the host CPU. For now, I've just had SDE tests disable these.

Annoyingly, CMake does not allow object libraries to have dependencies,
so make test_support a proper static library. Rename the target to
test_support_lib to avoid
https://gitlab.kitware.com/cmake/cmake/issues/17785

Update-Note: This adds a new optional test dependency, but it's disabled
by default (define BORINGSSL_HAVE_LIBUNWIND), so consumers do not need
to do anything. We'll probably want to adjust this in the future.

Bug: 181
Change-Id: I817263d7907aff0904a9cee83f8b26747262cc0c
Reviewed-on: https://boringssl-review.googlesource.com/c/33966
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/test/CMakeLists.txt b/crypto/test/CMakeLists.txt
index 0b1eab8..d2e4cdf 100644
--- a/crypto/test/CMakeLists.txt
+++ b/crypto/test/CMakeLists.txt
@@ -1,7 +1,7 @@
 add_library(
-  test_support
+  test_support_lib
 
-  OBJECT
+  STATIC
 
   abi_test.cc
   file_test.cc
@@ -10,7 +10,12 @@
   wycheproof_util.cc
 )
 
-add_dependencies(test_support global_target)
+if (LIBUNWIND_FOUND)
+  target_compile_options(test_support_lib PRIVATE ${LIBUNWIND_CFLAGS_OTHER})
+  target_include_directories(test_support_lib PRIVATE ${LIBUNWIND_INCLUDE_DIRS})
+  target_link_libraries(test_support_lib ${LIBUNWIND_LDFLAGS})
+endif()
+add_dependencies(test_support_lib global_target)
 
 add_library(
   boringssl_gtest_main
diff --git a/crypto/test/abi_test.cc b/crypto/test/abi_test.cc
index 890aa15..e86f2f4 100644
--- a/crypto/test/abi_test.cc
+++ b/crypto/test/abi_test.cc
@@ -14,12 +14,41 @@
 
 #include "abi_test.h"
 
+#include <stdarg.h>
+#include <stdio.h>
+
+#include <algorithm>
+#include <array>
+
+#include <openssl/buf.h>
+#include <openssl/mem.h>
 #include <openssl/rand.h>
+#include <openssl/span.h>
+
+#if defined(OPENSSL_LINUX) && defined(SUPPORTS_ABI_TEST) && \
+    defined(BORINGSSL_HAVE_LIBUNWIND)
+#define UNWIND_TEST_SIGTRAP
+
+#define UNW_LOCAL_ONLY
+#include <errno.h>
+#include <fcntl.h>
+#include <libunwind.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+#endif  // LINUX && SUPPORTS_ABI_TEST && HAVE_LIBUNWIND
 
 
 namespace abi_test {
+
 namespace internal {
 
+static bool g_unwind_tests_enabled = false;
+
 std::string FixVAArgsString(const char *str) {
   std::string ret = str;
   size_t idx = ret.find(',');
@@ -37,26 +66,429 @@
 }
 
 #if defined(SUPPORTS_ABI_TEST)
-crypto_word_t RunTrampoline(Result *out, crypto_word_t func,
-                            const crypto_word_t *argv, size_t argc) {
-  CallerState state;
-  RAND_bytes(reinterpret_cast<uint8_t *>(&state), sizeof(state));
-
-  // TODO(davidben): Use OS debugging APIs to single-step |func| and test that
-  // CFI and SEH annotations are correct.
-  CallerState state2 = state;
-  crypto_word_t ret = abi_test_trampoline(func, &state2, argv, argc);
-
-  *out = Result();
-#define CALLER_STATE_REGISTER(type, name)                    \
-  if (state.name != state2.name) {                           \
-    out->errors.push_back(#name " was not restored"); \
+// ForEachMismatch calls |func| for each register where |a| and |b| differ.
+template <typename Func>
+static void ForEachMismatch(const CallerState &a, const CallerState &b,
+                            const Func &func) {
+#define CALLER_STATE_REGISTER(type, name) \
+  if (a.name != b.name) {                 \
+    func(#name);                          \
   }
   LOOP_CALLER_STATE_REGISTERS()
 #undef CALLER_STATE_REGISTER
+}
+
+// ReadUnwindResult adds the results of the most recent unwind test to |out|.
+static void ReadUnwindResult(Result *out);
+
+crypto_word_t RunTrampoline(Result *out, crypto_word_t func,
+                            const crypto_word_t *argv, size_t argc,
+                            bool unwind) {
+  CallerState state;
+  RAND_bytes(reinterpret_cast<uint8_t *>(&state), sizeof(state));
+
+  unwind &= g_unwind_tests_enabled;
+  CallerState state2 = state;
+  crypto_word_t ret = abi_test_trampoline(func, &state2, argv, argc, unwind);
+
+  *out = Result();
+  ForEachMismatch(state, state2, [&](const char *reg) {
+    out->errors.push_back(std::string(reg) + " was not restored after return");
+  });
+  if (unwind) {
+    ReadUnwindResult(out);
+  }
   return ret;
 }
+#endif  // SUPPORTS_ABI_TEST
+
+#if defined(UNWIND_TEST_SIGTRAP)
+// On Linux, we test unwind metadata using libunwind and |SIGTRAP|. We run the
+// function under test with the trap flag set. This results in |SIGTRAP|s on
+// every instruction. We then handle these signals and verify with libunwind.
+
+// HandleEINTR runs |func| and returns the result, retrying the operation on
+// |EINTR|.
+template <typename Func>
+static auto HandleEINTR(const Func &func) -> decltype(func()) {
+  decltype(func()) ret;
+  do {
+    ret = func();
+  } while (ret < 0 && errno == EINTR);
+  return ret;
+}
+
+static bool ReadFileToString(std::string *out, const char *path) {
+  out->clear();
+
+  int fd = HandleEINTR([&] { return open(path, O_RDONLY); });
+  if (fd < 0) {
+    return false;
+  }
+
+  for (;;) {
+    char buf[1024];
+    ssize_t ret = HandleEINTR([&] { return read(fd, buf, sizeof(buf)); });
+    if (ret < 0) {
+      close(fd);
+      return false;
+    }
+    if (ret == 0) {
+      close(fd);
+      return true;
+    }
+    out->append(buf, static_cast<size_t>(ret));
+  }
+}
+
+static bool IsBeingDebugged() {
+  std::string status;
+  if (!ReadFileToString(&status, "/proc/self/status")) {
+    perror("error reading /proc/self/status");
+    return false;
+  }
+  std::string key = "\nTracerPid:\t";
+  size_t idx = status.find(key);
+  if (idx == std::string::npos) {
+    return false;
+  }
+  idx += key.size();
+  return idx < status.size() && status[idx] != '0';
+}
+
+// IsAncestorStackFrame returns true if |a_sp| is an ancestor stack frame of
+// |b_sp|.
+static bool IsAncestorStackFrame(unw_word_t a_sp, unw_word_t b_sp) {
+#if defined(OPENSSL_X86_64)
+  // The stack grows down, so ancestor stack frames have higher addresses.
+  return a_sp > b_sp;
+#else
+#error "unknown architecture"
 #endif
+}
+
+static int CallerStateFromUNWCursor(CallerState *out, unw_cursor_t *cursor) {
+  // |CallerState| uses |crypto_word_t|, while libunwind uses |unw_word_t|, but
+  // both are defined as |uint*_t| from stdint.h, so we can assume the types
+  // match.
+#if defined(OPENSSL_X86_64)
+  int ret = 0;
+  ret = ret < 0 ? ret : unw_get_reg(cursor, UNW_X86_64_RBX, &out->rbx);
+  ret = ret < 0 ? ret : unw_get_reg(cursor, UNW_X86_64_RBP, &out->rbp);
+  ret = ret < 0 ? ret : unw_get_reg(cursor, UNW_X86_64_R12, &out->r12);
+  ret = ret < 0 ? ret : unw_get_reg(cursor, UNW_X86_64_R13, &out->r13);
+  ret = ret < 0 ? ret : unw_get_reg(cursor, UNW_X86_64_R14, &out->r14);
+  ret = ret < 0 ? ret : unw_get_reg(cursor, UNW_X86_64_R15, &out->r15);
+  return ret;
+#else
+#error "unknown architecture"
+#endif
+}
+
+// Implement some string formatting utilties. Ideally we would use |snprintf|,
+// but this is called in a signal handler and |snprintf| is not async-signal-
+// safe.
+
+static std::array<char, DECIMAL_SIZE(unw_word_t) + 1> WordToDecimal(
+    unw_word_t v) {
+  std::array<char, DECIMAL_SIZE(unw_word_t) + 1> ret;
+  size_t len = 0;
+  do {
+    ret[len++] = '0' + v % 10;
+    v /= 10;
+  } while (v != 0);
+  for (size_t i = 0; i < len / 2; i++) {
+    std::swap(ret[i], ret[len - 1 - i]);
+  }
+  ret[len] = '\0';
+  return ret;
+}
+
+static std::array<char, sizeof(unw_word_t) * 2 + 1> WordToHex(unw_word_t v) {
+  static const char kHex[] = "0123456789abcdef";
+  std::array<char, sizeof(unw_word_t) * 2 + 1> ret;
+  for (size_t i = sizeof(unw_word_t) - 1; i < sizeof(unw_word_t); i--) {
+    uint8_t b = v & 0xff;
+    v >>= 8;
+    ret[i * 2] = kHex[b >> 4];
+    ret[i * 2 + 1] = kHex[b & 0xf];
+  }
+  ret[sizeof(unw_word_t) * 2] = '\0';
+  return ret;
+}
+
+static void StrCatSignalSafeImpl(bssl::Span<char> out) {}
+
+template <typename... Args>
+static void StrCatSignalSafeImpl(bssl::Span<char> out, const char *str,
+                                 Args... args) {
+  BUF_strlcat(out.data(), str, out.size());
+  StrCatSignalSafeImpl(out, args...);
+}
+
+template <typename... Args>
+static void StrCatSignalSafe(bssl::Span<char> out, Args... args) {
+  if (out.empty()) {
+    return;
+  }
+  out[0] = '\0';
+  StrCatSignalSafeImpl(out, args...);
+}
+
+static int UnwindToSignalFrame(unw_cursor_t *cursor) {
+  for (;;) {
+    int ret = unw_is_signal_frame(cursor);
+    if (ret < 0) {
+      return ret;
+    }
+    if (ret != 0) {
+      return 0;  // Found the signal frame.
+    }
+    ret = unw_step(cursor);
+    if (ret < 0) {
+      return ret;
+    }
+  }
+}
+
+// IPToString returns a human-readable representation of |ip|, using debug
+// information from |ctx| if available. |ip| must be the address of |ctx|'s
+// signal frame. This function is async-signal-safe.
+static std::array<char, 256> IPToString(unw_word_t ip, unw_context_t *ctx) {
+  std::array<char, 256> ret;
+  // Use a new cursor. The caller's cursor has already been unwound, but
+  // |unw_get_proc_name| is slow so we do not wish to call it all the time.
+  unw_cursor_t cursor;
+  // Work around a bug in libunwind. See
+  // https://git.savannah.gnu.org/gitweb/?p=libunwind.git;a=commit;h=819bf51bbd2da462c2ec3401e8ac9153b6e725e3
+  OPENSSL_memset(&cursor, 0, sizeof(cursor));
+  unw_word_t off;
+  if (unw_init_local(&cursor, ctx) != 0 ||
+      UnwindToSignalFrame(&cursor) != 0 ||
+      unw_get_proc_name(&cursor, ret.data(), ret.size(), &off) != 0) {
+    StrCatSignalSafe(bssl::MakeSpan(ret), "0x", WordToHex(ip).data());
+    return ret;
+  }
+  size_t len = strlen(ret.data());
+  // Print the offset in decimal, to match gdb's disassembly output and ease
+  // debugging.
+  StrCatSignalSafe(bssl::MakeSpan(ret).subspan(len), "+",
+                   WordToDecimal(off).data(), " (0x", WordToHex(ip).data(),
+                   ")");
+  return ret;
+}
+
+static pthread_t g_main_thread;
+
+// g_in_trampoline is true if we are in an instrumented |abi_test_trampoline|
+// call, in the region that triggers |SIGTRAP|.
+static bool g_in_trampoline = false;
+// g_unwind_function_done, if |g_in_trampoline| is true, is whether the function
+// under test has returned. It is undefined otherwise.
+static bool g_unwind_function_done;
+// g_trampoline_state, if |g_in_trampoline| is true, is the state the function
+// under test must preserve. It is undefined otherwise.
+static CallerState g_trampoline_state;
+// g_trampoline_sp, if |g_in_trampoline| is true, is the stack pointer of the
+// trampoline frame. It is undefined otherwise.
+static unw_word_t g_trampoline_sp;
+
+// kMaxUnwindErrors is the maximum number of unwind errors reported per
+// function. If a function's unwind tables are wrong, we are otherwise likely to
+// repeat the same error at multiple addresses.
+static constexpr size_t kMaxUnwindErrors = 10;
+
+// Errors are saved in a signal handler. We use a static buffer to avoid
+// allocation.
+static size_t num_unwind_errors = 0;
+static char unwind_errors[kMaxUnwindErrors][512];
+
+template <typename... Args>
+static void AddUnwindError(Args... args) {
+  if (num_unwind_errors >= kMaxUnwindErrors) {
+    return;
+  }
+  StrCatSignalSafe(unwind_errors[num_unwind_errors], args...);
+  num_unwind_errors++;
+}
+
+template <typename... Args>
+[[noreturn]] static void FatalError(Args... args) {
+  // We cannot use |snprintf| here because it is not async-signal-safe.
+  char buf[512];
+  StrCatSignalSafe(buf, args..., "\n");
+  write(STDERR_FILENO, buf, strlen(buf));
+  abort();
+}
+
+static void TrapHandler(int sig) {
+  // Note this is a signal handler, so only async-signal-safe functions may be
+  // used here. See signal-safety(7). libunwind promises local unwind is
+  // async-signal-safe.
+
+  // |pthread_equal| is not listed as async-signal-safe, but this is clearly an
+  // oversight.
+  if (!pthread_equal(g_main_thread, pthread_self())) {
+    FatalError("SIGTRAP on background thread");
+  }
+
+  unw_context_t ctx;
+  int ret = unw_getcontext(&ctx);
+  unw_cursor_t cursor;
+  // Work around a bug in libunwind which breaks rax and rdx recovery. This
+  // breaks functions which temporarily use rax as the CFA register. See
+  // https://git.savannah.gnu.org/gitweb/?p=libunwind.git;a=commit;h=819bf51bbd2da462c2ec3401e8ac9153b6e725e3
+  OPENSSL_memset(&cursor, 0, sizeof(cursor));
+  ret = ret < 0 ? ret : unw_init_local(&cursor, &ctx);
+  ret = ret < 0 ? ret : UnwindToSignalFrame(&cursor);
+  unw_word_t sp, ip;
+  ret = ret < 0 ? ret : unw_get_reg(&cursor, UNW_REG_SP, &sp);
+  ret = ret < 0 ? ret : unw_get_reg(&cursor, UNW_REG_IP, &ip);
+  if (ret < 0) {
+    FatalError("Error initializing unwind cursor: ", unw_strerror(ret));
+  }
+
+  const unw_word_t kStartAddress =
+      reinterpret_cast<unw_word_t>(&abi_test_unwind_start);
+  const unw_word_t kReturnAddress =
+      reinterpret_cast<unw_word_t>(&abi_test_unwind_return);
+  const unw_word_t kStopAddress =
+      reinterpret_cast<unw_word_t>(&abi_test_unwind_stop);
+  if (!g_in_trampoline) {
+    if (ip != kStartAddress) {
+      FatalError("Unexpected SIGTRAP at ", IPToString(ip, &ctx).data());
+    }
+
+    // Save the current state and begin.
+    g_in_trampoline = true;
+    g_unwind_function_done = false;
+    g_trampoline_sp = sp;
+    ret = CallerStateFromUNWCursor(&g_trampoline_state, &cursor);
+    if (ret < 0) {
+      FatalError("Error getting initial caller state: ", unw_strerror(ret));
+    }
+  } else {
+    if (sp == g_trampoline_sp || g_unwind_function_done) {
+      // |g_unwind_function_done| should imply |sp| is |g_trampoline_sp|, but
+      // clearing the trap flag in x86 briefly displaces the stack pointer.
+      //
+      // Also note we check both |ip| and |sp| below, in case the function under
+      // test is also |abi_test_trampoline|.
+      if (ip == kReturnAddress && sp == g_trampoline_sp) {
+        g_unwind_function_done = true;
+      }
+      if (ip == kStopAddress && sp == g_trampoline_sp) {
+        // |SIGTRAP| is fatal again.
+        g_in_trampoline = false;
+      }
+    } else if (IsAncestorStackFrame(sp, g_trampoline_sp)) {
+      // This should never happen. We went past |g_trampoline_sp| without
+      // stopping at |kStopAddress|.
+      AddUnwindError("stack frame is before caller at ",
+                     IPToString(ip, &ctx).data());
+      g_in_trampoline = false;
+    } else if (num_unwind_errors < kMaxUnwindErrors) {
+      for (;;) {
+        ret = unw_step(&cursor);
+        if (ret < 0) {
+          AddUnwindError("error unwinding from ", IPToString(ip, &ctx).data(),
+                         ": ", unw_strerror(ret));
+          break;
+        }
+        if (ret == 0) {
+          AddUnwindError("could not unwind to starting frame from ",
+                         IPToString(ip, &ctx).data());
+          break;
+        }
+
+        unw_word_t cur_sp;
+        ret = unw_get_reg(&cursor, UNW_REG_SP, &cur_sp);
+        if (ret < 0) {
+          AddUnwindError("error recovering stack pointer unwinding from ",
+                         IPToString(ip, &ctx).data(), ": ", unw_strerror(ret));
+          break;
+        }
+        if (IsAncestorStackFrame(cur_sp, g_trampoline_sp)) {
+          AddUnwindError("unwound past starting frame from ",
+                         IPToString(ip, &ctx).data());
+          break;
+        }
+        if (cur_sp == g_trampoline_sp) {
+          // We found the parent frame. Check the return address.
+          unw_word_t cur_ip;
+          ret = unw_get_reg(&cursor, UNW_REG_IP, &cur_ip);
+          if (ret < 0) {
+            AddUnwindError("error recovering return address unwinding from ",
+                           IPToString(ip, &ctx).data(), ": ",
+                           unw_strerror(ret));
+          } else if (cur_ip != kReturnAddress) {
+            AddUnwindError("wrong return address unwinding from ",
+                           IPToString(ip, &ctx).data());
+          }
+
+          // Check the remaining registers.
+          CallerState state;
+          ret = CallerStateFromUNWCursor(&state, &cursor);
+          if (ret < 0) {
+            AddUnwindError("error recovering registers unwinding from ",
+                           IPToString(ip, &ctx).data(), ": ",
+                           unw_strerror(ret));
+          } else {
+            ForEachMismatch(state, g_trampoline_state, [&](const char *reg) {
+              AddUnwindError(reg, " was not recovered unwinding from ",
+                             IPToString(ip, &ctx).data());
+            });
+          }
+          break;
+        }
+      }
+    }
+  }
+}
+
+static void ReadUnwindResult(Result *out) {
+  for (size_t i = 0; i < num_unwind_errors; i++) {
+    out->errors.emplace_back(unwind_errors[i]);
+  }
+  if (num_unwind_errors == kMaxUnwindErrors) {
+    out->errors.emplace_back("(additional errors omitted)");
+  }
+  num_unwind_errors = 0;
+}
+
+static void EnableUnwindTestsImpl() {
+  if (IsBeingDebugged()) {
+    // Unwind tests drive logic via |SIGTRAP|, which conflicts with debuggers.
+    fprintf(stderr, "Debugger detected. Disabling unwind tests.\n");
+    return;
+  }
+
+  g_main_thread = pthread_self();
+
+  struct sigaction trap_action;
+  OPENSSL_memset(&trap_action, 0, sizeof(trap_action));
+  sigemptyset(&trap_action.sa_mask);
+  trap_action.sa_handler = TrapHandler;
+  if (sigaction(SIGTRAP, &trap_action, NULL) != 0) {
+    perror("sigaction");
+    abort();
+  }
+
+  g_unwind_tests_enabled = true;
+}
+
+#else
+// TODO(davidben): Implement an SEH-based unwind-tester.
+#if defined(SUPPORTS_ABI_TEST)
+static void ReadUnwindResult(Result *) {}
+#endif
+static void EnableUnwindTestsImpl() {}
+#endif  // UNWIND_TEST_SIGTRAP
 
 }  // namespace internal
+
+void EnableUnwindTests() { internal::EnableUnwindTestsImpl(); }
+
+bool UnwindTestsEnabled() { return internal::g_unwind_tests_enabled; }
+
 }  // namespace abi_test
diff --git a/crypto/test/abi_test.h b/crypto/test/abi_test.h
index c1ef8f1..23f3aa5 100644
--- a/crypto/test/abi_test.h
+++ b/crypto/test/abi_test.h
@@ -113,11 +113,15 @@
 };
 
 // RunTrampoline runs |func| on |argv|, recording ABI errors in |out|. It does
-// not perform any type-checking.
+// not perform any type-checking. If |unwind| is true and unwind tests have been
+// enabled, |func| is single-stepped under an unwind test.
 crypto_word_t RunTrampoline(Result *out, crypto_word_t func,
-                            const crypto_word_t *argv, size_t argc);
+                            const crypto_word_t *argv, size_t argc,
+                            bool unwind);
 
-// CheckImpl runs |func| on |args|, recording ABI errors in |out|.
+// CheckImpl runs |func| on |args|, recording ABI errors in |out|. If |unwind|
+// is true and unwind tests have been enabled, |func| is single-stepped under an
+// unwind test.
 //
 // It returns the value as a |crypto_word_t| to work around problems when |R| is
 // void. |args| is wrapped in a |DeductionGuard| so |func| determines the
@@ -125,7 +129,7 @@
 // instance, if |func| takes const int *, and the caller passes an int *, the
 // compiler will complain the deduced types do not match.
 template <typename R, typename... Args>
-inline crypto_word_t CheckImpl(Result *out, R (*func)(Args...),
+inline crypto_word_t CheckImpl(Result *out, bool unwind, R (*func)(Args...),
                                typename DeductionGuard<Args>::Type... args) {
   static_assert(sizeof...(args) <= 10,
                 "too many arguments for abi_test_trampoline");
@@ -135,7 +139,7 @@
       (crypto_word_t)args...,
   };
   return RunTrampoline(out, reinterpret_cast<crypto_word_t>(func), argv,
-                       sizeof...(args));
+                       sizeof...(args), unwind);
 }
 #else
 // To simplify callers when ABI testing support is unavoidable, provide a backup
@@ -143,14 +147,15 @@
 // call |func| directly.
 template <typename R, typename... Args>
 inline typename std::enable_if<!std::is_void<R>::value, crypto_word_t>::type
-CheckImpl(Result *out, R (*func)(Args...),
+CheckImpl(Result *out, bool /* unwind */, R (*func)(Args...),
           typename DeductionGuard<Args>::Type... args) {
   *out = Result();
   return func(args...);
 }
 
 template <typename... Args>
-inline crypto_word_t CheckImpl(Result *out, void (*func)(Args...),
+inline crypto_word_t CheckImpl(Result *out, bool /* unwind */,
+                               void (*func)(Args...),
                                typename DeductionGuard<Args>::Type... args) {
   *out = Result();
   func(args...);
@@ -169,13 +174,14 @@
 std::string FixVAArgsString(const char *str);
 
 // CheckGTest behaves like |CheckImpl|, but it returns the correct type and
-// raises GTest assertions on failure.
+// raises GTest assertions on failure. If |unwind| is true and unwind tests are
+// enabled, |func| is single-stepped under an unwind test.
 template <typename R, typename... Args>
 inline R CheckGTest(const char *va_args_str, const char *file, int line,
-                    R (*func)(Args...),
+                    bool unwind, R (*func)(Args...),
                     typename DeductionGuard<Args>::Type... args) {
   Result result;
-  crypto_word_t ret = CheckImpl(&result, func, args...);
+  crypto_word_t ret = CheckImpl(&result, unwind, func, args...);
   if (!result.ok()) {
     testing::Message msg;
     msg << "ABI failures in " << FixVAArgsString(va_args_str) << ":\n";
@@ -195,9 +201,17 @@
 template <typename R, typename... Args>
 inline R Check(Result *out, R (*func)(Args...),
                typename internal::DeductionGuard<Args>::Type... args) {
-  return (R)internal::CheckImpl(out, func, args...);
+  return (R)internal::CheckImpl(out, false, func, args...);
 }
 
+// EnableUnwindTests enables unwind tests, if supported. If not supported, it
+// does nothing.
+void EnableUnwindTests();
+
+// UnwindTestsEnabled returns true if unwind tests are enabled and false
+// otherwise.
+bool UnwindTestsEnabled();
+
 }  // namespace abi_test
 
 // CHECK_ABI calls the first argument on the remaining arguments and returns the
@@ -206,26 +220,73 @@
 //
 // |CHECK_ABI| does return the value and thus may replace any function call,
 // provided it takes only simple parameters. However, it is recommended to test
-// ABI separately from functional tests of assembly. A future unwind testing
-// extension will single-step the function, which is inefficient.
+// ABI separately from functional tests of assembly. Fully instrumenting a
+// function for ABI checking requires single-stepping the function, which is
+// inefficient.
 //
 // Functional testing requires coverage of input values, while ABI testing only
 // requires branch coverage. Most of our assembly is constant-time, so usually
 // only a few instrumented calls are necessray.
-#define CHECK_ABI(...) \
-  abi_test::internal::CheckGTest(#__VA_ARGS__, __FILE__, __LINE__, __VA_ARGS__)
+#define CHECK_ABI(...)                                                   \
+  abi_test::internal::CheckGTest(#__VA_ARGS__, __FILE__, __LINE__, true, \
+                                 __VA_ARGS__)
+
+// CHECK_ABI_NO_UNWIND behaves like |CHECK_ABI| but disables unwind testing.
+#define CHECK_ABI_NO_UNWIND(...)                                          \
+  abi_test::internal::CheckGTest(#__VA_ARGS__, __FILE__, __LINE__, false, \
+                                 __VA_ARGS__)
 
 
 // Internal functions.
 
 #if defined(SUPPORTS_ABI_TEST)
+struct Uncallable {
+  Uncallable() = delete;
+};
+
+extern "C" {
+
 // abi_test_trampoline loads callee-saved registers from |state|, calls |func|
 // with |argv|, then saves the callee-saved registers into |state|. It returns
-// the result of |func|. We give |func| type |crypto_word_t| to avoid tripping
-// MSVC's warning 4191.
-extern "C" crypto_word_t abi_test_trampoline(
-    crypto_word_t func, abi_test::internal::CallerState *state,
-    const crypto_word_t *argv, size_t argc);
+// the result of |func|. If |unwind| is non-zero, this function triggers unwind
+// instrumentation.
+//
+// We give |func| type |crypto_word_t| to avoid tripping MSVC's warning 4191.
+crypto_word_t abi_test_trampoline(crypto_word_t func,
+                                  abi_test::internal::CallerState *state,
+                                  const crypto_word_t *argv, size_t argc,
+                                  crypto_word_t unwind);
+
+// abi_test_unwind_start points at the instruction that starts unwind testing in
+// |abi_test_trampoline|. This is the value of the instruction pointer at the
+// first |SIGTRAP| during unwind testing.
+//
+// This symbol is not a function and should not be called.
+void abi_test_unwind_start(Uncallable);
+
+// abi_test_unwind_return points at the instruction immediately after the call in
+// |abi_test_trampoline|. When unwinding the function under test, this is the
+// expected address in the |abi_test_trampoline| frame. After this address, the
+// unwind tester should ignore |SIGTRAP| until |abi_test_unwind_stop|.
+//
+// This symbol is not a function and should not be called.
+void abi_test_unwind_return(Uncallable);
+
+// abi_test_unwind_stop is the value of the instruction pointer at the final
+// |SIGTRAP| during unwind testing.
+//
+// This symbol is not a function and should not be called.
+void abi_test_unwind_stop(Uncallable);
+
+// abi_test_bad_unwind_wrong_register preserves the ABI, but annotates the wrong
+// register in CFI metadata.
+void abi_test_bad_unwind_wrong_register(void);
+
+// abi_test_bad_unwind_temporary preserves the ABI, but temporarily corrupts the
+// storage space for a saved register, breaking unwind.
+void abi_test_bad_unwind_temporary(void);
+
+}  // extern "C"
 #endif  // SUPPORTS_ABI_TEST
 
 
diff --git a/crypto/test/asm/trampoline-x86_64.pl b/crypto/test/asm/trampoline-x86_64.pl
index 432bcc8..d41aadf 100755
--- a/crypto/test/asm/trampoline-x86_64.pl
+++ b/crypto/test/asm/trampoline-x86_64.pl
@@ -124,15 +124,17 @@
 my $stack_params_skip = $win64 ? scalar(@inp) : 0;
 my $num_stack_params = $win64 ? $max_params : $max_params - scalar(@inp);
 
-my ($func, $state, $argv, $argc) = @inp;
+my ($func, $state, $argv, $argc, $unwind) = @inp;
 my $code = <<____;
 .text
 
 # abi_test_trampoline loads callee-saved registers from |state|, calls |func|
 # with |argv|, then saves the callee-saved registers into |state|. It returns
-# the result of |func|.
+# the result of |func|. If |unwind| is non-zero, this function triggers unwind
+# instrumentation.
 # uint64_t abi_test_trampoline(void (*func)(...), CallerState *state,
-#                              const uint64_t *argv, size_t argc);
+#                              const uint64_t *argv, size_t argc,
+#                              int unwind);
 .type	abi_test_trampoline, \@abi-omnipotent
 .globl	abi_test_trampoline
 .align	16
@@ -143,12 +145,16 @@
 	#   8 bytes - align
 	#   $caller_state_size bytes - saved caller registers
 	#   8 bytes - scratch space
+	#   8 bytes - saved copy of \$unwind (SysV-only)
 	#   8 bytes - saved copy of \$state
 	#   8 bytes - saved copy of \$func
 	#   8 bytes - if needed for stack alignment
 	#   8*$num_stack_params bytes - parameters for \$func
 ____
 my $stack_alloc_size = 8 + $caller_state_size + 8*3 + 8*$num_stack_params;
+if (!$win64) {
+  $stack_alloc_size += 8;
+}
 # SysV and Windows both require the stack to be 16-byte-aligned. The call
 # instruction offsets it by 8, so stack allocations must be 8 mod 16.
 if ($stack_alloc_size % 16 != 8) {
@@ -158,13 +164,25 @@
 my $stack_params_offset = 8 * $stack_params_skip;
 my $func_offset = 8 * $num_stack_params;
 my $state_offset = $func_offset + 8;
-my $scratch_offset = $state_offset + 8;
+# On Win64, unwind is already passed in memory. On SysV, it is passed in as
+# register and we must reserve stack space for it.
+my ($unwind_offset, $scratch_offset);
+if ($win64) {
+  $unwind_offset = $stack_alloc_size + 5*8;
+  $scratch_offset = $state_offset + 8;
+} else {
+  $unwind_offset = $state_offset + 8;
+  $scratch_offset = $unwind_offset + 8;
+}
 my $caller_state_offset = $scratch_offset + 8;
 $code .= <<____;
 	subq	\$$stack_alloc_size, %rsp
 .cfi_adjust_cfa_offset	$stack_alloc_size
 .Labi_test_trampoline_prolog_alloc:
 ____
+$code .= <<____ if (!$win64);
+	movq	$unwind, $unwind_offset(%rsp)
+____
 # Store our caller's state. This is needed because we modify it ourselves, and
 # also to isolate the test infrastruction from the function under test failing
 # to save some register.
@@ -198,7 +216,7 @@
 foreach (@inp) {
   $code .= <<____;
 	dec	%r11
-	js	.Lcall
+	js	.Largs_done
 	movq	(%r10), $_
 	addq	\$8, %r10
 ____
@@ -207,7 +225,7 @@
 	leaq	$stack_params_offset(%rsp), %rax
 .Largs_loop:
 	dec	%r11
-	js	.Lcall
+	js	.Largs_done
 
 	# This block should be:
 	#    movq (%r10), %rtmp
@@ -223,10 +241,42 @@
 	addq	\$8, %rax
 	jmp	.Largs_loop
 
-.Lcall:
+.Largs_done:
 	movq	$func_offset(%rsp), %rax
+	movq	$unwind_offset(%rsp), %r10
+	testq	%r10, %r10
+	jz	.Lno_unwind
+
+	# Set the trap flag.
+	pushfq
+	orq	\$0x100, 0(%rsp)
+	popfq
+
+	# Run an instruction to trigger a breakpoint immediately before the
+	# call.
+	nop
+.globl	abi_test_unwind_start
+abi_test_unwind_start:
+
+	call	*%rax
+.globl	abi_test_unwind_return
+abi_test_unwind_return:
+
+	# Clear the trap flag. Note this assumes the trap flag was clear on
+	# entry. We do not support instrumenting an unwind-instrumented
+	# |abi_test_trampoline|.
+	pushfq
+	andq	\$-0x101, 0(%rsp)	# -0x101 is ~0x100
+	popfq
+.globl	abi_test_unwind_stop
+abi_test_unwind_stop:
+
+	jmp	.Lcall_done
+
+.Lno_unwind:
 	call	*%rax
 
+.Lcall_done:
 	# Store what \$func did our state, so our caller can check.
 	movq  $state_offset(%rsp), $state
 ____
@@ -275,6 +325,49 @@
 ____
 }
 
+$code .= <<____;
+# abi_test_bad_unwind_wrong_register preserves the ABI, but annotates the wrong
+# register in CFI metadata.
+# void abi_test_bad_unwind_wrong_register(void);
+.type	abi_test_bad_unwind_wrong_register, \@abi-omnipotent
+.globl	abi_test_bad_unwind_wrong_register
+.align	16
+abi_test_bad_unwind_wrong_register:
+.cfi_startproc
+	pushq	%r12
+.cfi_push	%r13	# This should be %r12
+	popq	%r12
+.cfi_pop	%r12
+	ret
+.cfi_endproc
+.size	abi_test_bad_unwind_wrong_register,.-abi_test_bad_unwind_wrong_register
+
+# abi_test_bad_unwind_temporary preserves the ABI, but temporarily corrupts the
+# storage space for a saved register, breaking unwind.
+# void abi_test_bad_unwind_temporary(void);
+.type	abi_test_bad_unwind_temporary, \@abi-omnipotent
+.globl	abi_test_bad_unwind_temporary
+.align	16
+abi_test_bad_unwind_temporary:
+.cfi_startproc
+	pushq	%r12
+.cfi_push	%r12
+
+	inc	%r12
+	movq	%r12, (%rsp)
+	# Unwinding from here is incorrect.
+
+	dec	%r12
+	movq	%r12, (%rsp)
+	# Unwinding is now fixed.
+
+	popq	%r12
+.cfi_pop	%r12
+	ret
+.cfi_endproc
+.size	abi_test_bad_unwind_temporary,.-abi_test_bad_unwind_temporary
+____
+
 if ($win64) {
   # Add unwind metadata for SEH.
   #
diff --git a/crypto/test/gtest_main.cc b/crypto/test/gtest_main.cc
index f19b830..aeec0f5 100644
--- a/crypto/test/gtest_main.cc
+++ b/crypto/test/gtest_main.cc
@@ -35,16 +35,15 @@
   testing::InitGoogleTest(&argc, argv);
   bssl::SetupGoogleTest();
 
-#if !defined(OPENSSL_WINDOWS)
+  bool unwind_tests = true;
   for (int i = 1; i < argc; i++) {
+#if !defined(OPENSSL_WINDOWS)
     if (strcmp(argv[i], "--fork_unsafe_buffering") == 0) {
       RAND_enable_fork_unsafe_buffering(-1);
     }
-  }
 #endif
 
 #if defined(TEST_ARM_CPUS)
-  for (int i = 1; i < argc; i++) {
     if (strncmp(argv[i], "--cpu=", 6) == 0) {
       const char *cpu = argv[i] + 6;
       uint32_t armcap;
@@ -69,9 +68,17 @@
       printf("Simulating CPU '%s'\n", cpu);
       *armcap_ptr = armcap;
     }
-  }
 #endif  // TEST_ARM_CPUS
 
+    if (strcmp(argv[i], "--no_unwind_tests") == 0) {
+      unwind_tests = false;
+    }
+  }
+
+  if (unwind_tests) {
+    abi_test::EnableUnwindTests();
+  }
+
   // Run the entire test suite under an ABI check. This is less effective than
   // testing the individual assembly functions, but will catch issues with
   // rarely-used registers.