commit	de434576d7412b95a8eb90d613fc9f01e2d7166b	[log] [tgz]
author	David Benjamin <davidben@google.com>	Tue Nov 15 20:34:28 2022 -0500
committer	Boringssl LUCI CQ <boringssl-scoped@luci-project-accounts.iam.gserviceaccount.com>	Mon Nov 28 23:17:25 2022 +0000
tree	a74904a6068ffc143d9fc18335c1e26ea192062c
parent	3ae0778f3d60a078356a683ee3a9a825008c6711 [diff]

Fix allocation size in BN_mod_exp_mont_consttime.

powerbuf's layout is:
- num_powers values mod m, stored transposed
- one value mod m, tmp
- one value mod m, am
- (mont5-only) an extra copy of m

powerbuf_len broadly computed this, but where tmp + am would be
sizeof(BN_ULONG) * top * 2, it used
sizeof(BN_ULONG) * max(top * 2, num_powers).

(In the actual code it's written as a ternary op and some
multiplications are factored out.)

That is, it allocated enough room for tmp + am OR an extra row in the
num_powers table, as if each entry were top + 1 words long instead of
top, with the space overlapping. This expression dates to upstream's
https://github.com/openssl/openssl/commit/361512da0d900ba276096cbd152e304d402aca65,
though the exact layout has shifted over the years as mont5 evolved.
(Originally, it only contained one extra value mod m.)

At the time, this was necessary because bn_mul_mont_gather5 actually
overreads the table by one row! Although it only uses top * 32 words, it
requires the table to have (top + 1) * 32 words. This is because the
computation was scheduled so that the .Louter4x loop would read and mask
off the next iteration's value while incorporating the previous
iteration:

There were masked reads from $bp into XMM registers at the start of the
loop:
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L541

The XMM logic is interleaved throughout and does not move to a
general-purpose register, $m0, until much later. $m is not read again
until after the jump.
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L700

Meanwhile, the loop is already reading $m0 at the start of the
iteration.
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L551

The whole thing is bootstrapped by similar code just above it:
https://github.com/openssl/openssl/blob/361512da0d900ba276096cbd152e304d402aca65/crypto/bn/asm/x86_64-mont5.pl#L531

In the final iteration, we read one extra row into $m0 but never use it.
That is the overread.

I also confirmed this by rewinding our x86_64-mont5.pl to this state,
hacking things up until it built, and then hacking up
BN_mod_exp_mont_consttime to place the table in its own allocation, with
no extra slop using C11 aligned_alloc. This was so valgrind could
accurately instrument the bounds. When I did that, valgrind was clean if
I allocated (top + 1) * num_powers, but flagged an out-of-bounds read at
top * num_powers.

This no longer applies. After
https://github.com/openssl/openssl/commit/25d14c6c29b53907bf614b9964d43cd98401a7fc,
bn_mul_mont_gather5's scheduling is far less complicated. .Louter4x now
begins with a masked read, setting up $m0, and then it incorporates $m0
into the product. The same valgrind strategy confirmed this. Thus, I
don't believe this extra row is needed and we can allocate the buffer
straightforwardly.

Change-Id: I6c1ee8d5ebdb66eb4e5fec63d2140814c13ae146
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55231
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Bob Beck <bbe@google.com>

crypto/fipsmodule/bn/exponentiation.c[diff]

1 file changed

tree: a74904a6068ffc143d9fc18335c1e26ea192062c

README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful:

PORTING.md: how to port OpenSSL-using code to BoringSSL.
BUILDING.md: how to build BoringSSL
INCORPORATING.md: how to incorporate BoringSSL into a project.
API-CONVENTIONS.md: general API conventions for BoringSSL consumers and developers.
STYLE.md: rules and guidelines for coding style.
include/openssl: public headers with API documentation in comments. Also available online.
FUZZING.md: information about fuzzing BoringSSL.
CONTRIBUTING.md: how to contribute to BoringSSL.
BREAKING-CHANGES.md: notes on potentially-breaking changes.
SANDBOXING.md: notes on using BoringSSL in a sandboxed environment.