Replace aes_nohw with a bitsliced implementation.

aes_nohw is currently one of several variable-time table-based
implementations in C or assembly (armv4, x86, and x86_64). Replace all
of these with a C bitsliced implementation, with 32-bit, 64-bit, and
128-bit (SSE2) variants. This is based on the algorithms described in:

This makes our AES implementation constant-time in all build

There were far too many benchmarks to put in the commit message.
Instead, please refer to this fancy spreadsheet:

Parallel modes on x86 and x86_64 do fine due to the SSE2 code. AES-GCM
actually gets faster. The 64-bit (4x) bitsliced implementation is less
effective at speeding parallel modes but still helps. The 32-bit (2x)
bitsliced implementation even less.

Non-parallel modes, sadly, take a *dramatic* performance hit. I tried a
constant-time table lookup for comparison, but bitslicing was still
better. This implementation performs comparably to the table in
BearSSL's documentation, which suggests I didn't do anything obviously
wrong. (Note BearSSL's table for 'ct' corresponds to a 32-bit bitsliced
implementation compiled for 64-bit. Compiling this implementation for
64-bit matches, but compiling it for 32-bit seems to be considerably

Assumptions that may make this palatable:

- AES-GCM is by far the most important AES mode, and we perform okay
  with it. Modern things aren't built out of CBC.

- A nontrivial chunk of Chrome users on Windows don't have SSSE3 and
  would be affected by this change. They would get the SSE2 version
  which performs well for AES-GCM *and* is constant-time.

- ARM devices are primarily mobile which cycles hardware much faster.
  Chrome for Android has required NEON for several years now, so it
  would not run this code. (Aside from

- aarch64 mandates NEON, so it would not run this code.

- QUIC packet number encryption does use a one-off block operation, but
  only once per packet.

- Arguably this is undoing a performance gain that we never earned. That
  said, it was a dramatic performance gain in places.

As an alternative, we could just check in the SSE2 version and drop the
x86 and x86_64 table-based assembly, but this still leaves the generic
code with cache-timing side channels.

Change-Id: I0f4b4467a49790509503c529d7c0940318096a00
Commit-Queue: Adam Langley <>
Reviewed-by: Adam Langley <>
12 files changed
tree: dae189630ad5bae7961494590158ea2f4175be46
  1. .github/
  2. crypto/
  3. decrepit/
  4. fuzz/
  5. include/
  6. ssl/
  7. third_party/
  8. tool/
  9. util/
  10. .clang-format
  11. .gitignore
  15. CMakeLists.txt
  16. codereview.settings
  19. go.mod
  24. sources.cmake


BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: