Improve performance of SHA-512/ARM/NEON implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Tue, 17 Dec 2013 13:35:38 +0000 (15:35 +0200)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Wed, 18 Dec 2013 15:00:24 +0000 (17:00 +0200)
commitdf629ba53a662427ebd3ddca90c3fe9ddd6511d3
tree0e383e4186907a3f607343c4ef39116118bc363f
parenta5c2bbfe0db515d739ab683297903c77b1eec124
Improve performance of SHA-512/ARM/NEON implementation

* cipher/sha512-armv7-neon.S (RT01q, RT23q, RT45q, RT67q): New.
(round_0_63, round_64_79): Remove.
(rounds2_0_63, rounds2_64_79): New.
(_gcry_sha512_transform_armv7_neon): Add 'nblks' input; Handle multiple
input blocks; Use new round macros.
* cipher/sha512.c [USE_ARM_NEON_ASM]
(_gcry_sha512_transform_armv7_neon): Add 'num_blks'.
(transform) [USE_ARM_NEON_ASM]: Pass nblks to assembly.
--

Benchmarks on ARM Cortex-A8:

C-language:     139.1 c/B
Old ARM/NEON:   34.30 c/B
New ARM/NEON:   24.46 c/B

New vs C:       5.68x
New vs Old:     1.40x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/sha512-armv7-neon.S
cipher/sha512.c