Replace CBC_MAC_ROTATE_IN_PLACE with an N lg N rotation.

Really the only thing we should be doing with these ciphers is hastening
their demise, but it was the weekend and this seemed like fun.

EVP_tls_cbc_copy_mac needs to rotate a buffer by a secret amount. (It
extracts the MAC, but rotated.) We have two codepaths for this. If
CBC_MAC_ROTATE_IN_PLACE is defined (always on), we make some assumptions
abuot cache lines, play games with volatile, and hope that doesn't leak
anything. Otherwise, we do O(N^2) work to constant-time select the
rotation incidences.

But we can do O(N lg N). Rotate by powers of two and constant-time
select by the offset's bit positions. (Handwaivy lower-bound: an array
position has N possible values, so, armed with only a constant-time
select, we need O(lg N) work to resolve it. There's N array positions,
so O(N lg N).)

A microbenchmark of EVP_tls_cbc_copy_mac shows this is 27% faster than
the old one, but still 32% slower than the in-place version.

in-place:
Did 15724000 CopyFromMAC operations in 20000744us (786170.8 ops/sec)
N^2:
Did 8443000 CopyFromMAC operations in 20001582us (422116.6 ops/sec)
N lg N:
Did 10718000 CopyFromMAC operations in 20000763us (535879.6 ops/sec)

This results in the following the CBC ciphers. I measured
AES-128-CBC-SHA1 and AES-256-CBC-SHA384 which are, respectively, the
cipher where the other bits are the fastest and the cipher where N is
largest.

in-place:
Did 2634000 AES-128-CBC-SHA1 (16 bytes) open operations in 10000739us (263380.5 ops/sec): 4.2 MB/s
Did 1424000 AES-128-CBC-SHA1 (1350 bytes) open operations in 10002782us (142360.4 ops/sec): 192.2 MB/s
Did 531000 AES-128-CBC-SHA1 (8192 bytes) open operations in 10002460us (53086.9 ops/sec): 434.9 MB/s
N^2:
Did 2529000 AES-128-CBC-SHA1 (16 bytes) open operations in 10001474us (252862.7 ops/sec): 4.0 MB/s
Did 1392000 AES-128-CBC-SHA1 (1350 bytes) open operations in 10006659us (139107.4 ops/sec): 187.8 MB/s
Did 528000 AES-128-CBC-SHA1 (8192 bytes) open operations in 10001276us (52793.3 ops/sec): 432.5 MB/s
N lg N:
Did 2531000 AES-128-CBC-SHA1 (16 bytes) open operations in 10003057us (253022.7 ops/sec): 4.0 MB/s
Did 1390000 AES-128-CBC-SHA1 (1350 bytes) open operations in 10003287us (138954.3 ops/sec): 187.6 MB/s
Did 531000 AES-128-CBC-SHA1 (8192 bytes) open operations in 10002448us (53087.0 ops/sec): 434.9 MB/s

in-place:
Did 1249000 AES-256-CBC-SHA384 (16 bytes) open operations in 10001767us (124877.9 ops/sec): 2.0 MB/s
Did 879000 AES-256-CBC-SHA384 (1350 bytes) open operations in 10009244us (87818.8 ops/sec): 118.6 MB/s
Did 344000 AES-256-CBC-SHA384 (8192 bytes) open operations in 10025897us (34311.1 ops/sec): 281.1 MB/s
N^2:
Did 1072000 AES-256-CBC-SHA384 (16 bytes) open operations in 10008090us (107113.3 ops/sec): 1.7 MB/s
Did 780000 AES-256-CBC-SHA384 (1350 bytes) open operations in 10007787us (77939.3 ops/sec): 105.2 MB/s
Did 333000 AES-256-CBC-SHA384 (8192 bytes) open operations in 10016332us (33245.7 ops/sec): 272.3 MB/s
N lg N:
Did 1168000 AES-256-CBC-SHA384 (16 bytes) open operations in 10007671us (116710.5 ops/sec): 1.9 MB/s
Did 836000 AES-256-CBC-SHA384 (1350 bytes) open operations in 10001536us (83587.2 ops/sec): 112.8 MB/s
Did 339000 AES-256-CBC-SHA384 (8192 bytes) open operations in 10018522us (33837.3 ops/sec): 277.2 MB/s

TLS CBC performance isn't as important as it was before, and the costs
aren't that high, so avoid making assumptions about cache lines. (If we
care much about CBC open performance, we probably should get the malloc
out of EVP_tls_cbc_digest_record at the end.)

Change-Id: Ib8d8271be4b09e5635062cd3b039e1e96f0d9d3d
Reviewed-on: https://boringssl-review.googlesource.com/11003
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
1 file changed
tree: 776afe7b1ecea458616947d8f608730ed0278f39
  1. .github/
  2. crypto/
  3. decrepit/
  4. fuzz/
  5. include/
  6. infra/
  7. ssl/
  8. third_party/
  9. tool/
  10. util/
  11. .clang-format
  12. .gitignore
  13. API-CONVENTIONS.md
  14. BUILDING.md
  15. CMakeLists.txt
  16. codereview.settings
  17. CONTRIBUTING.md
  18. FUZZING.md
  19. INCORPORATING.md
  20. LICENSE
  21. PORTING.md
  22. README.md
  23. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

There are other files in this directory which might be helpful: