Add AMD64 assembly implementation of Salsa20
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sat, 26 Oct 2013 12:00:48 +0000 (15:00 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Mon, 28 Oct 2013 14:12:19 +0000 (16:12 +0200)
commit5a3d43485efdc09912be0967ee0a3ce345b3b15a
treeff8e937e2d010ae8e015707f5665915dabe1e915
parente214e8392671dd30e9c33260717b5e756debf3bf
Add AMD64 assembly implementation of Salsa20

* cipher/Makefile.am: Add 'salsa20-amd64.S'.
* cipher/salsa20-amd64.S: New.
* cipher/salsa20.c (USE_AMD64): New macro.
[USE_AMD64] (_gcry_salsa20_amd64_keysetup, _gcry_salsa20_amd64_ivsetup)
(_gcry_salsa20_amd64_encrypt_blocks): New prototypes.
[USE_AMD64] (salsa20_keysetup, salsa20_ivsetup, salsa20_core): New.
[!USE_AMD64] (salsa20_core): Change 'src' to non-constant, update block
counter in 'salsa20_core' and return burn stack depth.
[!USE_AMD64] (salsa20_keysetup, salsa20_ivsetup): New.
(salsa20_do_setkey): Move generic key setup to 'salsa20_keysetup'.
(salsa20_setkey): Fix burn stack depth.
(salsa20_setiv): Move generic IV setup to 'salsa20_ivsetup'.
(salsa20_do_encrypt_stream) [USE_AMD64]: Process large buffers in AMD64
implementation.
(salsa20_do_encrypt_stream): Move stack burning to this function...
(salsa20_encrypt_stream, salsa20r12_encrypt_stream): ...from these
functions.
* configure.ac [x86-64]: Add 'salsa20-amd64.lo'.
--

Patch adds fast AMD64 assembly implementation for Salsa20. This implementation
is based on public domain code by D. J. Bernstein and it is available at
http://cr.yp.to/snuffle.html (amd64-xmm6). Implementation gains extra speed
by processing four blocks in parallel with help SSE2 instructions.

Benchmark results on Intel Core i5-4570 (3.2 Ghz):

Before:
SALSA20        |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      3.88 ns/B     246.0 MiB/s     12.41 c/B
     STREAM dec |      3.88 ns/B     246.0 MiB/s     12.41 c/B
                =
 SALSA20R12     |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      2.46 ns/B     387.9 MiB/s      7.87 c/B
     STREAM dec |      2.46 ns/B     387.7 MiB/s      7.87 c/B

After:
 SALSA20        |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.985 ns/B     967.8 MiB/s      3.15 c/B
     STREAM dec |     0.987 ns/B     966.5 MiB/s      3.16 c/B
                =
 SALSA20R12     |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.636 ns/B    1500.5 MiB/s      2.03 c/B
     STREAM dec |     0.636 ns/B    1499.2 MiB/s      2.04 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/salsa20-amd64.S [new file with mode: 0644]
cipher/salsa20.c
configure.ac