Add Serpent AVX2 implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 9 Jun 2013 13:37:38 +0000 (16:37 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 9 Jun 2013 13:37:42 +0000 (16:37 +0300)
commite7ab4e1a7396f4609b9033207015b239ab4a5140
tree97572bf1acf49e030d0ad1c361cb7c7415ebc4a5
parent3289bca708bdd02c69a331095ac6ca9a1efd74cc
Add Serpent AVX2 implementation

* cipher/Makefile.am: Add 'serpent-avx2-amd64.S'.
* cipher/serpent-avx2-amd64.S: New file.
* cipher/serpent.c (USE_AVX2): New macro.
(serpent_context_t) [USE_AVX2]: Add 'use_avx2'.
[USE_AVX2] (_gcry_serpent_avx2_ctr_enc, _gcry_serpent_avx2_cbc_dec)
(_gcry_serpent_avx2_cfb_dec): New prototypes.
(serpent_setkey_internal) [USE_AVX2]: Check for AVX2 capable hardware
and set 'use_avx2'.
(_gcry_serpent_ctr_enc) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cbc_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cfb_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks'
so that AVX2 codepaths are tested.
* configure.ac (serpent) [avx2support]: Add 'serpent-avx2-amd64.lo'.
--

Add new AVX2 implementation of Serpent that processes 16 blocks in parallel.

Speed old (SSE2) vs. new (AVX2) on Intel Core i5-4570:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.00x   1.00x   1.00x   2.10x   1.00x   2.16x   1.01x   1.00x   2.16x   2.18x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/serpent-avx2-amd64.S [new file with mode: 0644]
cipher/serpent.c
configure.ac