Add AVX2/BMI2 implementation of SHA1
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 5 Apr 2019 14:39:22 +0000 (17:39 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 5 Apr 2019 14:39:22 +0000 (17:39 +0300)
commitb982900bfe6403e95a157271d8d811c9c573af9e
treea449944c826137a7eae8861d516a2f69c59224bf
parentced7508c857c0cc37da2299a393e5b167dd28e54
Add AVX2/BMI2 implementation of SHA1

* cipher/Makefile.am: Add 'sha1-avx2-bmi2-amd64.S'.
* cipher/hash-common.h (MD_BLOCK_CTX_BUFFER_SIZE): New.
(gcry_md_block_ctx): Change buffer length to MD_BLOCK_CTX_BUFFER_SIZE.
* cipher/sha1-avx-amd64.S: Add missing .size for transform function.
* cipher/sha1-ssse3-amd64.S: Add missing .size for transform function.
* cipher/sha1-avx-bmi2-amd64.S: Add missing .size for transform
function; Tweak implementation for small ~1% speed increase.
* cipher/sha1-avx2-bmi2-amd64.S: New.
* cipher/sha1.c (USE_AVX2, _gcry_sha1_transform_amd64_avx2_bmi2)
(do_sha1_transform_amd64_avx2_bmi2): New.
(sha1_init) [USE_AVX2]: Enable AVX2 implementation if supported by
HW features.
(sha1_final): Merge processing of two last blocks when extra block is
needed.
--

Benchmarks on Intel Haswell (4.0 Ghz):

Before (AVX/BMI2):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.970 ns/B     983.2 MiB/s      3.88 c/B

After (AVX/BMI2, ~1% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.960 ns/B     993.1 MiB/s      3.84 c/B

After (AVX2/BMI2, ~9% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.890 ns/B      1071 MiB/s      3.56 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/hash-common.h
cipher/sha1-avx-amd64.S
cipher/sha1-avx-bmi2-amd64.S
cipher/sha1-avx2-bmi2-amd64.S [new file with mode: 0644]
cipher/sha1-ssse3-amd64.S
cipher/sha1.c
configure.ac