Optimizations for GCM Intel/PCLMUL implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 26 Apr 2019 16:29:32 +0000 (19:29 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 26 Apr 2019 16:29:37 +0000 (19:29 +0300)
commitaf5f3fb08674608acf6617ea622ed0b9a2ee77a5
treeb54bd01927b19232a0fc57adc4f5f0f5545d65c3
parentb9be297bb8eba7a09fa8413261de1587adcfd381
Optimizations for GCM Intel/PCLMUL implementation

* cipher/cipher-gcm-intel-pclmul.c (reduction): New.
(glmul_pclmul): Include shifting to left into pclmul operations; Use
'reduction' helper function.
[__x86_64__] (gfmul_pclmul_aggr4): Reorder instructions and adjust
register usage to free up registers; Use 'reduction' helper function;
Include shifting to left into pclmul operations; Moving load H values
and input from caller into this function.
[__x86_64__] (gfmul_pclmul_aggr8): New.
(gcm_lsh): New.
(_gcry_ghash_setup_intel_pclmul): Left shift H values to left by
one; Preserve XMM6-XMM15 registers on WIN64.
(_gcry_ghash_intel_pclmul) [__x86_64__]: Use 8 block aggregated
reduction function.
--

Benchmark on Intel Haswell (amd64):

Before:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
 GMAC_AES           |     0.206 ns/B      4624 MiB/s     0.825 c/B      3998

After (+50% faster):
                    |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
 GMAC_AES           |     0.137 ns/B      6953 MiB/s     0.548 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/cipher-gcm-intel-pclmul.c