cast5: add three rounds parallel handling to generic C implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 31 Mar 2019 15:26:58 +0000 (18:26 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Sun, 31 Mar 2019 15:26:58 +0000 (18:26 +0300)
commit4ec566b3689eff4a712eacfcbb4161eb243bb1df
tree63a0a09a4d1f20ac8dd4237fb5466005a76421a5
parent8a0e68be1020d0c359bf8191159ac1ebe32a5aa0
cast5: add three rounds parallel handling to generic C implementation

* cipher/cast5.c (do_encrypt_block_3, do_decrypt_block_3): New.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec): Use
new three block functions.
--

Benchmark on aarch64 (cortex-a53, 816 Mhz):

Before:
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     35.24 ns/B     27.07 MiB/s     28.75 c/B
        CFB dec |     34.62 ns/B     27.54 MiB/s     28.25 c/B
        CTR enc |     35.39 ns/B     26.95 MiB/s     28.88 c/B
After (~40%-50% faster):
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     23.05 ns/B     41.38 MiB/s     18.81 c/B
        CFB dec |     24.49 ns/B     38.94 MiB/s     19.98 c/B
        CTR dec |     24.57 ns/B     38.82 MiB/s     20.05 c/B

Benchmark on i386 (haswell, 4000 Mhz):

Before:
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      6.92 ns/B     137.7 MiB/s     27.69 c/B
        CFB dec |      6.83 ns/B     139.7 MiB/s     27.32 c/B
        CTR enc |      7.01 ns/B     136.1 MiB/s     28.03 c/B
After (~70% faster):
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      3.97 ns/B     240.1 MiB/s     15.89 c/B
        CFB dec |      3.96 ns/B     241.0 MiB/s     15.83 c/B
        CTR enc |      4.01 ns/B     237.8 MiB/s     16.04 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/cast5.c