AES-NI improvements for AMD64
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sat, 6 Jan 2018 16:53:20 +0000 (18:53 +0200)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Tue, 9 Jan 2018 16:44:34 +0000 (18:44 +0200)
commitc9e9cb2eb6a1c659d3825ca627228b732f2f2152
tree25b0ed20bfc6106781a3b18bc8b9236218010332
parentb3ec0f752c925cde36f560f0f9309ab6450bbfd9
AES-NI improvements for AMD64

* cipher/rijndael-aesni.c [__x86_64__] (aesni_prepare_7_15_variable)
(aesni_prepare_7_15, aesni_cleanup_7_15, do_aesni_enc_vec8)
(do_aesni_dec_vec8, do_aesni_ctr_8): New.
(_gcry_aes_aesni_ctr_enc, _gcry_aes_aesni_cfb_dec)
(_gcry_aes_aesni_cbc_dec, aesni_ocb_enc, aesni_ocb_dec)
(_gcry_aes_aesni_ocb_auth) [__x86_64__]: Add 8 parallel blocks
processing.
--

Benchmarks on Intel Core i7-4790K, 4.0Ghz (no turbo, no HT):

Before:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     0.175 ns/B    5448.7 MiB/s     0.700 c/B
        CFB dec |     0.174 ns/B    5466.2 MiB/s     0.698 c/B
        CTR enc |     0.182 ns/B    5226.0 MiB/s     0.730 c/B
        OCB enc |     0.194 ns/B    4913.9 MiB/s     0.776 c/B
        OCB dec |     0.200 ns/B    4769.2 MiB/s     0.800 c/B
       OCB auth |     0.172 ns/B    5545.0 MiB/s     0.688 c/B

After (1.08x to 1.14x faster):
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     0.157 ns/B    6075.6 MiB/s     0.628 c/B
        CFB dec |     0.158 ns/B    6034.1 MiB/s     0.632 c/B
        CTR enc |     0.159 ns/B    5979.4 MiB/s     0.638 c/B
        OCB enc |     0.175 ns/B    5447.1 MiB/s     0.700 c/B
        OCB dec |     0.183 ns/B    5203.9 MiB/s     0.733 c/B
       OCB auth |     0.156 ns/B    6101.3 MiB/s     0.625 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/rijndael-aesni.c