chacha20: add AVX2/AMD64 assembly implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 11 May 2014 09:00:19 +0000 (12:00 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 11 May 2014 09:08:46 +0000 (12:08 +0300)
commita39ee7555691d18cae97560f130aaf952bfbd278
tree593cc30752310037c063f5af76fecc42adc94901
parentdef7d4cad386271c6d4e2f10aabe0cb4abd871e4
chacha20: add AVX2/AMD64 assembly implementation

* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'.
* cipher/chacha20-avx2-amd64.S: New.
* cipher/chacha20.c (USE_AVX2): New macro.
[USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New.
(chacha20_do_setkey): Select AVX2 implementation if there is HW
support.
(selftest): Increase size of buf by 256.
* configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'.
--

Add AVX2 optimized implementation for ChaCha20. Based on implementation by
Andrew Moon.

SSSE3 (Intel Haswell):

 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.742 ns/B    1284.8 MiB/s      2.38 c/B
     STREAM dec |     0.741 ns/B    1286.5 MiB/s      2.37 c/B

AVX2:

 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.393 ns/B    2428.0 MiB/s      1.26 c/B
     STREAM dec |     0.392 ns/B    2433.6 MiB/s      1.25 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/chacha20-avx2-amd64.S [new file with mode: 0644]
cipher/chacha20.c
configure.ac