commit | de434576d7412b95a8eb90d613fc9f01e2d7166b | [log] [tgz] |
---|---|---|
author | David Benjamin <davidben@google.com> | Tue Nov 15 20:34:28 2022 -0500 |
committer | Boringssl LUCI CQ <boringssl-scoped@luci-project-accounts.iam.gserviceaccount.com> | Mon Nov 28 23:17:25 2022 +0000 |
tree | a74904a6068ffc143d9fc18335c1e26ea192062c | |
parent | 3ae0778f3d60a078356a683ee3a9a825008c6711 [diff] |
Fix allocation size in BN_mod_exp_mont_consttime. powerbuf's layout is: - num_powers values mod m, stored transposed - one value mod m, tmp - one value mod m, am - (mont5-only) an extra copy of m powerbuf_len broadly computed this, but where tmp + am would be sizeof(BN_ULONG) * top * 2, it used sizeof(BN_ULONG) * max(top * 2, num_powers). (In the actual code it's written as a ternary op and some multiplications are factored out.) That is, it allocated enough room for tmp + am OR an extra row in the num_powers table, as if each entry were top + 1 words long instead of top, with the space overlapping. This expression dates to upstream's https://github.com/openssl/openssl/commit/361512da0d900ba276096cbd152e304d402aca65, though the exact layout has shifted over the years as mont5 evolved. (Originally, it only contained one extra value mod m.) At the time, this was necessary because bn_mul_mont_gather5 actually overreads the table by one row! Although it only uses top * 32 words, it requires the table to have (top + 1) * 32 words. This is because the computation was scheduled so that the .Louter4x loop would read and mask off the next iteration's value while incorporating the previous iteration: There were masked reads from $bp into XMM registers at the start of the loop: https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L541 The XMM logic is interleaved throughout and does not move to a general-purpose register, $m0, until much later. $m is not read again until after the jump. https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L700 Meanwhile, the loop is already reading $m0 at the start of the iteration. https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L551 The whole thing is bootstrapped by similar code just above it: https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L531 In the final iteration, we read one extra row into $m0 but never use it. That is the overread. I also confirmed this by rewinding our x86_64-mont5.pl to this state, hacking things up until it built, and then hacking up BN_mod_exp_mont_consttime to place the table in its own allocation, with no extra slop using C11 aligned_alloc. This was so valgrind could accurately instrument the bounds. When I did that, valgrind was clean if I allocated (top + 1) * num_powers, but flagged an out-of-bounds read at top * num_powers. This no longer applies. After https://github.com/openssl/openssl/commit/25d14c6c29b53907bf614b9964d43cd98401a7fc, bn_mul_mont_gather5's scheduling is far less complicated. .Louter4x now begins with a masked read, setting up $m0, and then it incorporates $m0 into the product. The same valgrind strategy confirmed this. Thus, I don't believe this extra row is needed and we can allocate the buffer straightforwardly. Change-Id: I6c1ee8d5ebdb66eb4e5fec63d2140814c13ae146 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55231 Reviewed-by: Bob Beck <bbe@google.com> Commit-Queue: Bob Beck <bbe@google.com>
BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.
Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.
Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.
BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.
Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.
Project links:
There are other files in this directory which might be helpful: