Add ARM NEON assembly implementation of Serpent
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 27 Oct 2013 12:07:59 +0000 (14:07 +0200)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Mon, 28 Oct 2013 14:19:09 +0000 (16:19 +0200)
commit2cb6e1f323d24359b1c5b113be5c2f79a2a4cded
tree344c148104db2f032e5f227615b1ff6b39f910c4
parent3ff9d2571c18cd7a34359f9c60a10d3b0f932b23
Add ARM NEON assembly implementation of Serpent

* cipher/Makefile.am: Add 'serpent-armv7-neon.S'.
* cipher/serpent-armv7-neon.S: New.
* cipher/serpent.c (USE_NEON): New macro.
(serpent_context_t) [USE_NEON]: Add 'use_neon'.
[USE_NEON] (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec): New prototypes.
(serpent_setkey_internal) [USE_NEON]: Detect NEON support.
(_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec) [USE_NEON]: Use NEON implementations
to process eight blocks in parallel.
* configure.ac [neonsupport]: Add 'serpent-armv7-neon.lo'.
--

Patch adds ARM NEON optimized implementation of Serpent cipher
to speed up parallelizable bulk operations.

Benchmarks on ARM Cortex-A8 (armhf, 1008 Mhz):

Old:
 SERPENT128     |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     43.53 ns/B     21.91 MiB/s     43.88 c/B
        CFB dec |     44.77 ns/B     21.30 MiB/s     45.13 c/B
        CTR enc |     45.21 ns/B     21.10 MiB/s     45.57 c/B
        CTR dec |     45.21 ns/B     21.09 MiB/s     45.57 c/B
New:
 SERPENT128     |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     26.26 ns/B     36.32 MiB/s     26.47 c/B
        CFB dec |     26.21 ns/B     36.38 MiB/s     26.42 c/B
        CTR enc |     26.20 ns/B     36.40 MiB/s     26.41 c/B
        CTR dec |     26.20 ns/B     36.40 MiB/s     26.41 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/salsa20-armv7-neon.S
cipher/serpent-armv7-neon.S [new file with mode: 0644]
cipher/serpent.c
configure.ac