Add AVX and AVX2/BMI implementations for SHA-256
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Tue, 17 Dec 2013 13:35:38 +0000 (15:35 +0200)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Wed, 18 Dec 2013 15:00:04 +0000 (17:00 +0200)
commita5c2bbfe0db515d739ab683297903c77b1eec124
treeef6d9ba8d35b6e621aee58e91431d0fd446e940e
parente4e458465b124e25b6aec7a60174bf1ca32dc5fd
Add AVX and AVX2/BMI implementations for SHA-256

* LICENSES: Add 'cipher/sha256-avx-amd64.S' and
'cipher/sha256-avx2-bmi2-amd64.S'.
* cipher/Makefile.am: Add 'sha256-avx-amd64.S' and
'sha256-avx2-bmi2-amd64.S'.
* cipher/sha256-avx-amd64.S: New.
* cipher/sha256-avx2-bmi2-amd64.S: New.
* cipher/sha256-ssse3-amd64.S: Use 'lea' instead of 'add' in few
places for tiny speed improvement.
* cipher/sha256.c (USE_AVX, USE_AVX2): New.
(SHA256_CONTEXT) [USE_AVX, USE_AVX2]: Add 'use_avx' and 'use_avx2'.
(sha256_init, sha224_init) [USE_AVX, USE_AVX2]: Initialize above
new context members.
[USE_AVX] (_gcry_sha256_transform_amd64_avx): New.
[USE_AVX2] (_gcry_sha256_transform_amd64_avx2): New.
(transform) [USE_AVX2]: Use AVX2 assembly if enabled.
(transform) [USE_AVX]: Use AVX assembly if enabled.
* configure.ac: Add 'sha256-avx-amd64.lo' and
'sha256-avx2-bmi2-amd64.lo'.
--

Patch adds fast AVX and AVX2/BMI2 implementations of SHA-256 by Intel
Corporation. The assembly source is licensed under 3-clause BSD license,
thus compatible with LGPL2.1+. Original source can be accessed at:
 http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs

Implementation is described in white paper
 "Fast SHA - 256 Implementations on IntelĀ® Architecture Processors"
 http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html

Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's
      faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much
      slower than RORQ, so therefore AVX implementation is (for now) limited
      to Intel CPUs.
Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional
      HWF flag.

Benchmarks:

cpu                C-lang       SSSE3        AVX/AVX2     C vs AVX/AVX2
                                                                   vs SSSE3
Intel i5-4570       13.86 c/B    10.27 c/B     8.70 c/B    1.59x    1.18x
Intel i5-2450M      17.25 c/B    12.36 c/B    10.31 c/B    1.67x    1.19x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
LICENSES
cipher/Makefile.am
cipher/sha256-avx-amd64.S [new file with mode: 0644]
cipher/sha256-avx2-bmi2-amd64.S [new file with mode: 0644]
cipher/sha256-ssse3-amd64.S
cipher/sha256.c
configure.ac