poly1305: add AMD64/SSE2 optimized implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 11 May 2014 17:18:49 +0000 (20:18 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Mon, 12 May 2014 17:32:50 +0000 (20:32 +0300)
commit297532602ed2d881d8fdc393d1961068a143a891
tree9fc6e7cfd4f685cf52102a39a6c361e6ed160499
parente813958419b0ec4439e6caf07d3b2234cffa2bfa
poly1305: add AMD64/SSE2 optimized implementation

* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'.
* cipher/poly1305-internal.h (POLY1305_USE_SSE2)
(POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE)
(POLY1305_SSE2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed.
* cipher/poly1305-sse2-amd64.S: New.
* cipher/poly1305.c [POLY1305_USE_SSE2]
(_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New.
(_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version.
* configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'.
--

Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt

Benchmarks on Intel i5-4570 (haswell):

Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.844 ns/B    1130.2 MiB/s      2.70 c/B

New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.448 ns/B    2129.5 MiB/s      1.43 c/B

Benchmarks on Intel i5-2450M (sandy-bridge):

Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |      1.25 ns/B     763.0 MiB/s      3.12 c/B

New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.605 ns/B    1575.9 MiB/s      1.51 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/poly1305-internal.h
cipher/poly1305-sse2-amd64.S [new file with mode: 0644]
cipher/poly1305.c
configure.ac