Fix allocation size in BN_mod_exp_mont_consttime.

powerbuf's layout is:
- num_powers values mod m, stored transposed
- one value mod m, tmp
- one value mod m, am
- (mont5-only) an extra copy of m

powerbuf_len broadly computed this, but where tmp + am would be
sizeof(BN_ULONG) * top * 2, it used
sizeof(BN_ULONG) * max(top * 2, num_powers).

(In the actual code it's written as a ternary op and some
multiplications are factored out.)

That is, it allocated enough room for tmp + am OR an extra row in the
num_powers table, as if each entry were top + 1 words long instead of
top, with the space overlapping. This expression dates to upstream's
https://github.com/openssl/openssl/commit/361512da0d900ba276096cbd152e304d402aca65,
though the exact layout has shifted over the years as mont5 evolved.
(Originally, it only contained one extra value mod m.)

At the time, this was necessary because bn_mul_mont_gather5 actually
overreads the table by one row! Although it only uses top * 32 words, it
requires the table to have (top + 1) * 32 words. This is because the
computation was scheduled so that the .Louter4x loop would read and mask
off the next iteration's value while incorporating the previous
iteration:

There were masked reads from $bp into XMM registers at the start of the
loop:
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L541

The XMM logic is interleaved throughout and does not move to a
general-purpose register, $m0, until much later. $m is not read again
until after the jump.
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L700

Meanwhile, the loop is already reading $m0 at the start of the
iteration.
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L551

The whole thing is bootstrapped by similar code just above it:
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L531

In the final iteration, we read one extra row into $m0 but never use it.
That is the overread.

I also confirmed this by rewinding our x86_64-mont5.pl to this state,
hacking things up until it built, and then hacking up
BN_mod_exp_mont_consttime to place the table in its own allocation, with
no extra slop using C11 aligned_alloc. This was so valgrind could
accurately instrument the bounds. When I did that, valgrind was clean if
I allocated (top + 1) * num_powers, but flagged an out-of-bounds read at
top * num_powers.

This no longer applies. After
https://github.com/openssl/openssl/commit/25d14c6c29b53907bf614b9964d43cd98401a7fc,
bn_mul_mont_gather5's scheduling is far less complicated. .Louter4x now
begins with a masked read, setting up $m0, and then it incorporates $m0
into the product. The same valgrind strategy confirmed this. Thus, I
don't believe this extra row is needed and we can allocate the buffer
straightforwardly.

Change-Id: I6c1ee8d5ebdb66eb4e5fec63d2140814c13ae146
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55231
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>
1 file changed
tree: a74904a6068ffc143d9fc18335c1e26ea192062c
  1. .github/
  2. crypto/
  3. decrepit/
  4. fuzz/
  5. include/
  6. rust/
  7. ssl/
  8. third_party/
  9. tool/
  10. util/
  11. .clang-format
  12. .gitignore
  13. API-CONVENTIONS.md
  14. BREAKING-CHANGES.md
  15. BUILDING.md
  16. CMakeLists.txt
  17. codereview.settings
  18. CONTRIBUTING.md
  19. FUZZING.md
  20. go.mod
  21. go.sum
  22. INCORPORATING.md
  23. LICENSE
  24. OpenSSLConfig.cmake
  25. PORTING.md
  26. README.md
  27. SANDBOXING.md
  28. sources.cmake
  29. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: