Optimize Keccak 64-bit absorb functions
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sat, 31 Oct 2015 19:29:56 +0000 (21:29 +0200)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 1 Nov 2015 19:47:06 +0000 (21:47 +0200)
commit2857cb89c6dc1c02266600bc1fd2967a3cd5cf88
tree1d5ca6d7135264461757cfdd90b051fc98f5f0ed
parent07e4839e75a7bca3a6c0a94aecfe75efe61d7ff2
Optimize Keccak 64-bit absorb functions

* cipher/keccak.c [USE_64BIT] [__x86_64__] (absorb_lanes64_8)
(absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New.
* cipher/keccak.c [USE_64BIT] [!__x86_64__] (absorb_lanes64_8)
(absorb_lanes64_4, absorb_lanes64_2, absorb_lanes64_1): New.
[USE_64BIT] (KECCAK_F1600_ABSORB_FUNC_NAME): New.
[USE_64BIT] (keccak_absorb_lanes64): Remove.
[USE_64BIT_SHLD] (KECCAK_F1600_ABSORB_FUNC_NAME): New.
[USE_64BIT_SHLD] (keccak_absorb_lanes64_shld): Remove.
[USE_64BIT_BMI2] (KECCAK_F1600_ABSORB_FUNC_NAME): New.
[USE_64BIT_BMI2] (keccak_absorb_lanes64_bmi2): Remove.
* cipher/keccak_permute_64.h (KECCAK_F1600_ABSORB_FUNC_NAME): New.
--

Optimize 64-bit absorb functions for small speed-up. After this
change, 64-bit BMI2 implementation matches speed of fastest results
from SUPERCOP for Intel Haswell CPUs (long messages).

Benchmark on Intel Haswell @ 3.2 Ghz:

Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHAKE128       |      2.32 ns/B     411.7 MiB/s      7.41 c/B
 SHAKE256       |      2.84 ns/B     336.2 MiB/s      9.08 c/B
 SHA3-224       |      2.69 ns/B     354.9 MiB/s      8.60 c/B
 SHA3-256       |      2.84 ns/B     336.0 MiB/s      9.08 c/B
 SHA3-384       |      3.69 ns/B     258.4 MiB/s     11.81 c/B
 SHA3-512       |      5.30 ns/B     179.9 MiB/s     16.97 c/B

After:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHAKE128       |      2.27 ns/B     420.6 MiB/s      7.26 c/B
 SHAKE256       |      2.79 ns/B     341.4 MiB/s      8.94 c/B
 SHA3-224       |      2.64 ns/B     361.7 MiB/s      8.44 c/B
 SHA3-256       |      2.79 ns/B     341.5 MiB/s      8.94 c/B
 SHA3-384       |      3.65 ns/B     261.4 MiB/s     11.68 c/B
 SHA3-512       |      5.27 ns/B     181.0 MiB/s     16.87 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/keccak.c
cipher/keccak_permute_64.h