Align rsaz_avx2_preferred with x86_64-mont5.pl.

x86_64-mont5.pl checks for both BMI1 and BMI2, because the MULX path
also uses the ANDN instruction. Some history here from upstream:

a5bb5bca52f57021a4017521c55a6b3590bbba7a, dated 2013-10-03, added the
MULX path to x86_64-mont5.pl. At the time, the cpuid check was
BMI2+ADX. (MULX comes from BMI2.)

37de2b5c1e370b493932552556940eb89922b027, dated 2013-10-09, made
BN_mod_exp_mont_consttime prefer the MULX mont5 code over the AVX2 rsaz
code, with a matching BMI2+ADX cpuid check.

8fc8f486f7fa098c9fbb6a6ae399e3c6856e0d87, dated 2016-01-25, tweaked some
code to use the ANDN instruction, from BMI1. Correspondingly, it changed
the cpuid check to be BMI1+BMI2+ADX. The BN_mod_exp_mont_consttime check
was left unchanged.

This CL fixes our version of the BN_mod_exp_mont_consttime check to
match the assembly, by also checking BMI1. (This should be a no-op.
Presumably any processor with BMI2 also has BMI1.)

Change-Id: Ib0cacc7e2be840d970460eef4dd9ded7fb24231c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51547
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/fipsmodule/bn/rsaz_exp.h b/crypto/fipsmodule/bn/rsaz_exp.h
index 2f0c2c0..b8150f1 100644
--- a/crypto/fipsmodule/bn/rsaz_exp.h
+++ b/crypto/fipsmodule/bn/rsaz_exp.h
@@ -47,9 +47,10 @@
 
 OPENSSL_INLINE int rsaz_avx2_preferred(void) {
   const uint32_t *cap = OPENSSL_ia32cap_get();
-  static const uint32_t kBMI2AndADX = (1 << 8) | (1 << 19);
-  if ((cap[2] & kBMI2AndADX) == kBMI2AndADX) {
-    // If BMI2 and ADX are available, x86_64-mont5.pl is faster.
+  static const uint32_t kBMI1BMI2AndADX = (1 << 3) | (1 << 8) | (1 << 19);
+  if ((cap[2] & kBMI1BMI2AndADX) == kBMI1BMI2AndADX) {
+    // If BMI1, BMI2, and ADX are available, x86_64-mont5.pl is faster. See the
+    // .Lmulx4x_enter and .Lpowerx5_enter branches.
     return 0;
   }
   return (cap[2] & (1 << 5)) != 0;  // AVX2