poly1305: add AMD64/AVX2 optimized implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 11 May 2014 17:52:27 +0000 (20:52 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 16 May 2014 17:54:54 +0000 (20:54 +0300)
commit98f021961ee65669037bc8bb552a69fd78f610fc
treeb95d026328b20f3d0104d749471b8f3649cd23bd
parent297532602ed2d881d8fdc393d1961068a143a891
poly1305: add AMD64/AVX2 optimized implementation

* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'.
* cipher/poly1305-avx2-amd64.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_AVX2)
(POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE)
(POLY1305_AVX2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed.
* cipher/poly1305.c [POLY1305_USE_AVX2]
(_gcry_poly1305_amd64_avx2_init_ext)
(_gcry_poly1305_amd64_avx2_finish_ext)
(_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if
AVX2 supported by CPU.
* configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'.
--

Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt

Benchmarks on Intel i5-4570 (haswell):

Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.448 ns/B    2129.5 MiB/s      1.43 c/B

New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.205 ns/B    4643.5 MiB/s     0.657 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/poly1305-avx2-amd64.S [new file with mode: 0644]
cipher/poly1305-internal.h
cipher/poly1305.c
configure.ac