serpent: add SSE2 accelerated amd64 implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Thu, 23 May 2013 08:04:18 +0000 (11:04 +0300)
committerWerner Koch <wk@gnupg.org>
Thu, 23 May 2013 10:07:36 +0000 (12:07 +0200)
commit2fd06e207dcea1d8a7f0e7e92f3359615a99421b
treebbebbd2ec43e5b8ac7e40331f68fe27832ca01dd
parentc85501af8222913f0a1e20e77fceb88e93417925
serpent: add SSE2 accelerated amd64 implementation

* configure.ac (serpent): Add 'serpent-sse2-amd64.lo'.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add
'serpent-sse2-amd64.S'.
* cipher/cipher.c (gcry_cipher_open) [USE_SERPENT]: Register bulk
functions for CBC-decryption and CTR-mode.
* cipher/serpent.c (USE_SSE2): New macro.
[USE_SSE2] (_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec):
New prototypes to assembler functions.
(serpent_setkey): Set 'serpent_init_done' before calling serpent_test.
(_gcry_serpent_ctr_enc): New function.
(_gcry_serpent_cbc_dec): New function.
(selftest_ctr_128): New function.
(selftest_cbc_128): New function.
(selftest): Call selftest_ctr_128 and selftest_cbc_128.
* cipher/serpent-sse2-amd64.S: New file.
* src/cipher.h (_gcry_serpent_ctr_enc): New prototype.
(_gcry_serpent_cbc_dec): New prototype.
--

[v2]: Converted to SSE2, to support all amd64 processors (SSE2 is required
      feature by AMD64 SysV ABI).

Patch adds word-sliced SSE2 implementation of Serpent for amd64 for speeding
up parallelizable workloads (CTR mode, CBC mode decryption). Implementation
processes eight blocks in parallel, with two four-block sets interleaved for
out-of-order scheduling.

Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.00x   0.99x   1.00x   3.98x   1.00x   1.01x   1.00x   1.01x   4.04x   4.04x

Speed old vs. new on AMD Phenom II X6 1055T:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.02x   1.01x   1.00x   2.83x   1.00x   1.00x   1.00x   1.00x   2.72x   2.72x

Speed old vs. new on Intel Core2 Duo T8100:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.00x   1.02x   0.97x   4.02x   0.98x   1.01x   0.98x   1.00x   3.82x   3.91x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/cipher.c
cipher/serpent-sse2-amd64.S [new file with mode: 0644]
cipher/serpent.c
configure.ac
src/cipher.h