chacha20: add SSE2/AMD64 optimized implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 16 May 2014 18:28:26 +0000 (21:28 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 16 May 2014 18:41:14 +0000 (21:41 +0300)
commit323b1eb80ff3396d83fedbe5bba9a4e6c412d192
tree1d8aac7a5f8c04a93a8c115410b15b0f85af5169
parent98f021961ee65669037bc8bb552a69fd78f610fc
chacha20: add SSE2/AMD64 optimized implementation

* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'.
* cipher/chacha20-sse2-amd64.S: New.
* cipher/chacha20.c (USE_SSE2): New.
[USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New.
(chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks
function.
* configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'.
--

Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt

Benchmark on Intel i5-4570 (haswell),
with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3":

Old:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      1.97 ns/B     483.8 MiB/s      6.31 c/B
     STREAM dec |      1.97 ns/B     483.6 MiB/s      6.31 c/B

New:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.931 ns/B    1024.7 MiB/s      2.98 c/B
     STREAM dec |     0.930 ns/B    1025.0 MiB/s      2.98 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/chacha20-sse2-amd64.S [new file with mode: 0644]
cipher/chacha20.c
configure.ac