Add AVX2/vpgather bulk implementation of Twofish
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Wed, 4 Jan 2017 08:18:36 +0000 (10:18 +0200)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 6 Jan 2017 10:48:20 +0000 (12:48 +0200)
commitc59a8ce51ceb9a80169c44ef86a67e95cf8528c3
tree79900afec0b7eaeb7b47d0de95159f11648da4d3
parent232a129b1f915fc54881506e4b07c89cf84932e6
Add AVX2/vpgather bulk implementation of Twofish

* cipher/Makefile.am: Add 'twofish-avx2-amd64.S'.
* cipher/twofish-avx2-amd64.S: New.
* cipher/twofish.c (USE_AVX2): New.
(TWOFISH_context) [USE_AVX2]: Add 'use_avx2' member.
(ASM_FUNC_ABI): New.
(twofish_setkey): Add check for AVX2 and fast VPGATHER HW features.
(_gcry_twofish_avx2_ctr_enc, _gcry_twofish_avx2_cbc_dec)
(_gcry_twofish_avx2_cfb_dec, _gcry_twofish_avx2_ocb_enc)
(_gcry_twofish_avx2_ocb_dec, _gcry_twofish_avx2_ocb_auth): New.
(_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec, _gcry_twofish_cfb_dec)
(_gcry_twofish_ocb_crypt, _gcry_twofish_ocb_auth): Add AVX2 bulk
handling.
(selftest_ctr, selftest_cbc, selftest_cfb): Increase nblocks from
3+X to 16+X.
* configure.ac: Add 'twofish-avx2-amd64.lo'.
* src/g10lib.h (HWF_INTEL_FAST_VPGATHER): New.
* src/hwf-x86.c (detect_x86_gnuc): Add detection for
HWF_INTEL_FAST_VPGATHER.
* src/hwfeatures.c (HWF_INTEL_FAST_VPGATHER): Add
"intel-fast-vpgather" for HWF_INTEL_FAST_VPGATHER.
--

Benchmark on Intel Core i3-6100 (3.7 Ghz):

Before:
 TWOFISH        |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      4.25 ns/B     224.5 MiB/s     15.71 c/B
        ECB dec |      4.16 ns/B     229.5 MiB/s     15.38 c/B
        CBC enc |      4.53 ns/B     210.4 MiB/s     16.77 c/B
        CBC dec |      2.71 ns/B     351.6 MiB/s     10.04 c/B
        CFB enc |      4.60 ns/B     207.3 MiB/s     17.02 c/B
        CFB dec |      2.70 ns/B     353.5 MiB/s      9.98 c/B
        OFB enc |      4.25 ns/B     224.2 MiB/s     15.74 c/B
        OFB dec |      4.24 ns/B     225.0 MiB/s     15.68 c/B
        CTR enc |      2.72 ns/B     350.6 MiB/s     10.06 c/B
        CTR dec |      2.72 ns/B     350.7 MiB/s     10.06 c/B
        CCM enc |      7.25 ns/B     131.5 MiB/s     26.83 c/B
        CCM dec |      7.25 ns/B     131.5 MiB/s     26.83 c/B
       CCM auth |      4.57 ns/B     208.9 MiB/s     16.89 c/B
        GCM enc |      3.02 ns/B     315.3 MiB/s     11.19 c/B
        GCM dec |      3.02 ns/B     315.6 MiB/s     11.18 c/B
       GCM auth |     0.297 ns/B    3208.4 MiB/s      1.10 c/B
        OCB enc |      2.73 ns/B     349.7 MiB/s     10.09 c/B
        OCB dec |      2.82 ns/B     338.3 MiB/s     10.43 c/B
       OCB auth |      2.77 ns/B     343.7 MiB/s     10.27 c/B

After (CBC-dec & CFB-dec & CTR & OCB, ~1.5x faster):
 TWOFISH        |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      4.25 ns/B     224.2 MiB/s     15.74 c/B
        ECB dec |      4.15 ns/B     229.5 MiB/s     15.37 c/B
        CBC enc |      4.61 ns/B     206.8 MiB/s     17.06 c/B
        CBC dec |      1.75 ns/B     544.0 MiB/s      6.49 c/B
        CFB enc |      4.52 ns/B     211.0 MiB/s     16.72 c/B
        CFB dec |      1.72 ns/B     554.1 MiB/s      6.37 c/B
        OFB enc |      4.27 ns/B     223.3 MiB/s     15.80 c/B
        OFB dec |      4.28 ns/B     222.7 MiB/s     15.84 c/B
        CTR enc |      1.73 ns/B     549.9 MiB/s      6.42 c/B
        CTR dec |      1.75 ns/B     545.1 MiB/s      6.47 c/B
        CCM enc |      6.31 ns/B     151.2 MiB/s     23.34 c/B
        CCM dec |      6.42 ns/B     148.5 MiB/s     23.76 c/B
       CCM auth |      4.56 ns/B     208.9 MiB/s     16.89 c/B
        GCM enc |      1.90 ns/B     502.8 MiB/s      7.02 c/B
        GCM dec |      2.00 ns/B     477.8 MiB/s      7.38 c/B
       GCM auth |     0.300 ns/B    3178.6 MiB/s      1.11 c/B
        OCB enc |      1.76 ns/B     542.2 MiB/s      6.51 c/B
        OCB dec |      1.76 ns/B     540.7 MiB/s      6.53 c/B
       OCB auth |      1.76 ns/B     542.8 MiB/s      6.50 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/Makefile.am
cipher/twofish-avx2-amd64.S [new file with mode: 0644]
cipher/twofish.c
configure.ac
src/g10lib.h
src/hwf-x86.c
src/hwfeatures.c