Optimizations for generic table-based GCM implementations
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Sat, 27 Apr 2019 19:03:31 +0000 (22:03 +0300)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Sat, 27 Apr 2019 19:03:31 +0000 (22:03 +0300)
commitecd02cdd61e8c690f48637656f0e1e08b750fe30
treeba9a35eb7281169118e63abf0c884882dddbfc30
parentaf5f3fb08674608acf6617ea622ed0b9a2ee77a5
Optimizations for generic table-based GCM implementations

* cipher/cipher-gcm.c [GCM_TABLES_USE_U64] (do_fillM): Precalculate
M[32..63] values.
[GCM_TABLES_USE_U64] (do_ghash): Split processing of two 64-bit halfs
of the input to two separate loops; Use precalculated M[] values.
[GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_fillM): Precalculate
M[64..127] values.
[GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_ghash): Use precalculated
M[] values.
[GCM_USE_TABLES] (bshift): Avoid conditional execution for mask
calculation.
* cipher/cipher-internal.h (gcry_cipher_handle): Double gcm_table size.
--

Benchmark on Intel Haswell (amd64, --disable-hwf all):

 Before:
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      2.79 ns/B     341.3 MiB/s     11.17 c/B      3998

 After (~36% faster):
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      2.05 ns/B     464.7 MiB/s      8.20 c/B      3998

Benchmark on Intel Haswell (win32, --disable-hwf all):

 Before:
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      4.90 ns/B     194.8 MiB/s     19.57 c/B      3997

 After (~36% faster):
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      3.58 ns/B     266.4 MiB/s     14.31 c/B      3999

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/cipher-gcm.c
cipher/cipher-internal.h