ChaCha20-Poly1305 for Armv8 (AArch64)

This work continues on top of the CL opened by Vlad Krasnov
(https://boringssl-review.googlesource.com/c/boringssl/+/44364). The
CL was thoroughly reviewed by David Benjamin but not merged due to
some outstanding comments which this work addresses:
- The flag check when doing the final reduction in poly1305 was
  changed from `eq` to `cs`
- The CFI prologues and epilogues of open/seal were modified as
  recommended by David.
- Added Pointer Authentication instruction to the functions that are
  exported from the assembly code as pointed out by David.

Testing:
- The current tests against ChaCha20-Poly1305 continue to pass.
- More test vectors were produced using a Python script to try and
  prove that having `eq` instead of `cs` was a bug.  They passed as
  well, but didn't result in the most significant word being
  non-zero after the reduction, which would have highlighted the
  bug. An argument about why it's unlikely to find the vector is
  detailed below.
- `objdump -W|Wf|WF` was used to confirm the value of the CFA and the
  locations of the registers relative to the CFA were as expected. See
  https://www.imperialviolet.org/2017/01/18/cfi.html.

Performance:
|      Size   | Before (MB/s) | After (MB/s) | Improvement |
|    16 bytes |      30.5     |      43.3    |  1.42x      |
|   256 bytes |     220.7     |     361.5    |  1.64x      |
|  1350 bytes |     285.9     |     639.4    |  2.24x      |
|  8192 bytes |     329.6     |     798.3    |  2.42x      |
| 16384 bytes |     331.9     |     814.9    |  2.46x      |

Explanation of the unlikelihood of finding a test vector:
* the modulus is in t2:t1:t0 = 3 : FF..FF : FF..FB, each being a 64 bit
  word; i.e. t2 = 3, t1 = all 1s.
* acc2 <= 4 after the previous reduction.
* It is highly likely to have borrow = 1 from acc1 - t1 since t1 is
  all FFs.
* So for almost all test vectors we have acc2 <= 4 and borrow = 1,
  thus (t2 = acc2 - t2 - borrow) will be 0 whenever acc >
  modulus. **It would be highly unlikely to find such a test vector
  with t2 > 0 after that final reduction:** Trying to craft that
  vector requires having acc and r of high values before their
  multiplication, yet ensuring that after the reduction (see Note) of
  their product, the resulting value of the accumulator has t2 = 4,
  all 1s in t1 and most of t0 so that no borrow occurs from acc1:acc0
  - t1:t0.
* Note: the reduction is basically carried by folding over the top
  64+62 bits once, then folding them again shifted left by 2,
  resulting in adding 5 times those bits.

Change-Id: If7d86b7a9b74ec3615ac2d7a97f80100dbfaee7f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/51885
Reviewed-by: Adam Langley <alangley@gmail.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
3 files changed
tree: 08f39b1e4c9102dc42567268d3aff7dbe7228504
  1. .github/
  2. crypto/
  3. decrepit/
  4. fuzz/
  5. include/
  6. rust/
  7. ssl/
  8. third_party/
  9. tool/
  10. util/
  11. .clang-format
  12. .gitignore
  13. API-CONVENTIONS.md
  14. BREAKING-CHANGES.md
  15. BUILDING.md
  16. CMakeLists.txt
  17. codereview.settings
  18. CONTRIBUTING.md
  19. FUZZING.md
  20. go.mod
  21. go.sum
  22. INCORPORATING.md
  23. LICENSE
  24. OpenSSLConfig.cmake
  25. PORTING.md
  26. README.md
  27. SANDBOXING.md
  28. sources.cmake
  29. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: