libgcrypt.git
3 months agoGCM: move look-up table to .data section and unshare between processes
Jussi Kivilinna [Fri, 31 May 2019 14:27:25 +0000 (17:27 +0300)]
GCM: move look-up table to .data section and unshare between processes

* cipher/cipher-gcm.c (ATTR_ALIGNED_64): New.
(gcmR): Move to 'gcm_table' structure.
(gcm_table): New structure for look-up table with counters before and
after.
(gcmR): New macro.
(prefetch_table): Handle input with length not multiple of 256.
(do_prefetch_tables): Modify pre- and post-table counters to unshare
look-up table pages between processes.
--

GnuPG-bug-id: 4541
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
3 months agoAES: move look-up tables to .data section and unshare between processes
Jussi Kivilinna [Fri, 31 May 2019 14:18:09 +0000 (17:18 +0300)]
AES: move look-up tables to .data section and unshare between processes

* cipher/rijndael-internal.h (ATTR_ALIGNED_64): New.
* cipher/rijndael-tables.h (encT): Move to 'enc_tables' structure.
(enc_tables): New structure for encryption table with counters before
and after.
(encT): New macro.
(dec_tables): Add counters before and after encryption table; Move
from .rodata to .data section.
(do_encrypt): Change 'encT' to 'enc_tables.T'.
(do_decrypt): Change '&dec_tables' to 'dec_tables.T'.
* cipher/cipher-gcm.c (prefetch_table): Make inline; Handle input
with length not multiple of 256.
(prefetch_enc, prefetch_dec): Modify pre- and post-table counters
to unshare look-up table pages between processes.
--

GnuPG-bug-id: 4541
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agocipher/Makefile.am: add '-fcoverage-*' to instrumentation munging
Jussi Kivilinna [Sun, 19 May 2019 11:33:07 +0000 (14:33 +0300)]
cipher/Makefile.am: add '-fcoverage-*' to instrumentation munging

* cipher/Makefile.am: Remove '-fcoverage-*' flag for mixed asm/C
i386+amd64 implementations.
--

Combination '-fprofile-instr-generate -fcoverage-mapping' was causing
build error as former was removed by munging and latter requires thay.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agomd: fix UBSAN warning
Jussi Kivilinna [Wed, 15 May 2019 17:31:08 +0000 (20:31 +0300)]
md: fix UBSAN warning

* cipher/md.c (gcry_md_list): Define 'context' as array of
PROPERLY_ALIGNED_TYPE.
(md_enable, _gcry_md_reset, _gcry_md_close, md_final, md_set_key)
(prepare_macpads, md_read, md_extract): Access md context through
'gcry_md_list->context' pointer instead of 'gcry_md_list->context.c'.
--

This commit fixes error output seen with undefined behavior sanitizer:
md.c:980:28: runtime error: index 184 out of bounds for type 'char [1]'
md.c:991:28: runtime error: index 368 out of bounds for type 'char [1]'
md.c:713:44: runtime error: index 184 out of bounds for type 'char [1]'
md.c:830:42: runtime error: index 368 out of bounds for type 'char [1]'

Issue was reported in dev.gnupg.org task T3247 and Cryptofuzz.

GnuPG-bug-id: 3247
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoDisable instrumentation on mixed Intel SSE C/assembly implementations
Jussi Kivilinna [Tue, 14 May 2019 20:14:48 +0000 (23:14 +0300)]
Disable instrumentation on mixed Intel SSE C/assembly implementations

* cipher/Makefile.am: Make 'tiger.o' and 'tiger.lo' depend on Makefile;
Add instrumentation option munging.
* cipher/cipher-gcm-intel-pcmul.c (ALWAYS_INLINE)
(NO_INSTRUMENT_FUNCTION, ASM_FUNC_ATTR, ASM_FUNC_ATTR_INLINE): New.
(reduction, gfmul_pclmul, gfmul_pclmul_aggr4, gfmul_pclmul_aggr8)
(gcm_lsh): Define with 'ASM_FUNC_ATTR_INLINE' instead of 'inline'.
(_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): Define with
'ASM_FUNC_ATTR'.
* cipher/crc-intel-pcmul.c (ALWAYS_INLINE, NO_INSTRUMENT_FUNCTION)
(ASM_FUNC_ATTR, ASM_FUNC_ATTR_INLINE): New.
(crc32_reflected_bulk, crc32_reflected_less_than_16, crc32_bulk)
(crc32_less_than_16): Define with 'ASM_FUNC_ATTR_INLINE' instead of
'inline'.
(_gcry_crc32_intel_pclmul, _gcry_crc24rfc2440_intel_pclmul): Define
with 'ASM_FUNC_ATTR'.
* cipher/rijndael-aesni.c (NO_INSTRUMENT_FUNCTION, ASM_FUNC_ATTR)
(ASM_FUNC_ATTR_INLINE, ASM_FUNC_ATTR_NOINLINE): New.
(aes_ocb_get_l, do_aesni_prepare_decryption, do_aesni_enc)
(do_aesni_dec, do_aesni_enc_vec4, do_aesni_dec_vec4, do_aesni_enc_vec8)
(do_aesni_dec_vec8, aesni_ocb_checksum): Define with
'ASM_FUNC_ATTR_INLINE' instead of 'inline'.
(do_aesni_ctr, do_aesni_ctr_4, do_aesni_ctr_8): Define wtih
'ASM_FUNC_ATTR_INLINE'.
(aesni_ocb_enc, aesni_ocb_dec): Define with 'ASM_FUNC_ATTR_NOINLINE'
instead of 'NO_INLINE'.
(_gcry_aes_aesni_do_setkey, _gcry_aes_aesni_prepare_decryption)
(_gcry_aes_aesni_encrypt, _gcry_aes_aesni_cfg_enc)
(_gcry_aes_aesni_cbc_enc, _gcry_aes_aesni_ctr_enc)
(_gcry_aes_aesni_decrypt, _gcry_aes_aesni_cfb_dec)
(_gcry_aes_aesni_cbc_dec, _gcry_aes_aesni_ocb_crypt)
(_gcry_aes_aesni_ocb_auth, _gcry_aes_aesni_xts_enc)
(_gcry_aes_aesni_xts_dec, _gcry_aes_aesni_xts_crypt): Define with
'ASM_FUNC_ATTR'.
* cipher/rijndael-ssse3-amd64.c (ALWAYS_INLINE, NO_INSTRUMENT_FUNCTION)
(ASM_FUNC_ATTR, ASM_FUNC_ATTR_INLINE): New.
(aes_ocb_get_l, do_ssse3_prepare_decryption, do_vpaes_ssse3_enc)
(do_vpaes_ssse3_dec): Define with 'ASM_FUNC_ATTR_INLINE' instead of
'inline'.
(_gcry_aes_ssse3_do_setkey, _gcry_aes_ssse3_prepare_decryption)
(_gcry_aes_ssse3_encrypt, _gcry_aes_ssse3_cfb_enc)
(_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc)
(_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_dec)
(_gcry_aes_ssse3_cbc_dec, ssse3_ocb_enc, ssse3_ocb_dec)
(_gcry_aes_ssse3_ocb_crypt, _gcry_aes_ssse3_ocb_auth): Define with
'ASM_FUNC_ATTR'.
* cipher/sha1-intel-shaext.c (NO_INSTRUMENT_FUNCTION)
(ASM_FUNC_ATTR): New.
(_gcry_sha1_transform_intel_shaext): Define with 'ASM_FUNC_ATTR'.
* cipher/sha256-intel-shaext.c (NO_INSTRUMENT_FUNCTION)
(ASM_FUNC_ATTR): New.
(_gcry_sha256_transform_intel_shaext): Define with 'ASM_FUNC_ATTR'.
* configure.ac (ENABLE_INSTRUMENTATION_MUNGING): New.
--

This commit disables instrumentation for mixed C/assembly implementations
for i386 and amd64 that make use of XMM registers. These implementations
use cc as thin assembly front-end and do not tolerate instrumentation
function calls inserted by compiler as those functions may clobber the
XMM registers.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agotests/basic: fix signed interger overflow
Jussi Kivilinna [Tue, 14 May 2019 16:43:08 +0000 (19:43 +0300)]
tests/basic: fix signed interger overflow

* tests/basic.c (check_ocb_cipher_largebuf_split): Cast to unsigned
when generating buffer values.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agotests: do not use GCC variadic macro extension for xgcry_control
Jussi Kivilinna [Tue, 14 May 2019 16:31:20 +0000 (19:31 +0300)]
tests: do not use GCC variadic macro extension for xgcry_control

* tests/t-common.h (xgcry_control): Use doubly nested parenthesis for
passing arguments for gcry_control instead of GCC specific variadic
macro extension.
* tests/aeswrap.c: Change xgcry_control to use doubly nested
parenthesis.
* tests/basic.c: Ditto.
* tests/bench-slope.c: Ditto.
* tests/benchmark.c: Ditto.
* tests/curves.c: Ditto.
* tests/dsa-rfc6979.c: Ditto.
* tests/fips186-dsa: Ditto.
* tests/fipsdrv.c: Ditto.
* tests/fipsrngdrv.c: Ditto.
* tests/gchash.c: Ditto.
* tests/hashtest.c: Ditto.
* tests/hmac.c: Ditto.
* tests/keygen.c: Ditto.
* tests/keygrip.c: Ditto.
* tests/mpitests.c: Ditto.
* tests/pkbench.c: Ditto.
* tests/pkcs1v2.c: Ditto.
* tests/prime.c: Ditto.
* tests/pubkey.c: Ditto.
* tests/random.c: Ditto.
* tests/rsacvt.c: Ditto.
* tests/t-convert.c: Ditto.
* tests/t-cv25519.c: Ditto.
* tests/t-ed25519.c: Ditto.
* tests/t-kdf.c: Ditto.
* tests/t-lock.c: Ditto.
* tests/t-mpi-bit.c: Ditto.
* tests/t-mpi-point.c: Ditto.
* tests/t-secmem.c: Ditto.
* tests/t-sexp.c: Ditto.
* tests/version.c: Ditto.
--

GnuPG-bug-id: 4499
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agotests/basic: mark CFB and CFB8 as stream block cipher modes
Jussi Kivilinna [Fri, 10 May 2019 19:33:57 +0000 (22:33 +0300)]
tests/basic: mark CFB and CFB8 as stream block cipher modes

* tests/basic.c (get_algo_mode_blklen): Return '1' for CFB and CFB8.
--

This commit marks CFB and CFB8 modes as stream ciphers so that they get
run through tests/basic.c's split input buffer testing (input split&feed
to cipher in variating sized parts).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoFix message digest final function for MD4, MD5 and RMD160
Jussi Kivilinna [Thu, 9 May 2019 18:43:52 +0000 (21:43 +0300)]
Fix message digest final function for MD4, MD5 and RMD160

* cipher/md4.c (md4_final): Use buffer offset '64 + 56' for bit count
on 'need one extra block' path.
* cipher/md5.c (md5_final): Ditto.
* cipher/rmd160.c (rmd160_final): Ditto.
* tests/basic.c (check_one_md_final): New.
(check_digest): Add new '*' test vectors and handle them with
check_one_md_final.
--

This commit fixes bug introduced with commit "Optimizations for
digest final functions" e76cd0e2b1f6025c1319576a5848815d1d231aeb
to MD4, MD5 and RMD160 where digest ended up being wrong for input
message sizes 64*x+56..64. Patch also adds new test case that runs
message digest algorithms with different message lengths from 0 to
289.

Reported-by: Guido Vranken <guidovranken@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoFix carry overflow in Stribog in 512-bit addition
Dmitry Eremin-Solenikov [Sat, 4 May 2019 21:37:03 +0000 (00:37 +0300)]
Fix carry overflow in Stribog in 512-bit addition

* cipher/stribog.c (transform_bits): properly calculate carry flag
* tests/basic.c (check_digests): add two more test cases
--

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 months agoAdd support for explicit_memset
Jussi Kivilinna [Sat, 27 Apr 2019 20:15:27 +0000 (23:15 +0300)]
Add support for explicit_memset

* configure.ac: Add function check for 'explicit_memset'.
* src/misc.c (_gcry_fast_wipememory, _gcry_fast_wipememory2): Use
explicit_memset if available.
--

GnuPG-bug-id: D476
Reported-by: <devnexen@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoFix CFI_PUSH/CFI_POP redefine build warning with AMD64 MPI
Jussi Kivilinna [Sat, 27 Apr 2019 19:35:43 +0000 (22:35 +0300)]
Fix CFI_PUSH/CFI_POP redefine build warning with AMD64 MPI

* mpi/amd64/func_abi.h: Move CFI macros into [__x86_64__] block.
* mpi/i386/syntax.h: Move CFI macros into [__i386__] block.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoEnable four block aggregated GCM Intel PCLMUL implementation on i386
Jussi Kivilinna [Sat, 27 Apr 2019 18:38:00 +0000 (21:38 +0300)]
Enable four block aggregated GCM Intel PCLMUL implementation on i386

* cipher/cipher-gcm-intel-pclmul.c (reduction): Change "%%xmm7" to
"%%xmm5".
(gfmul_pclmul_aggr4): Move outside [__x86_64__] block; Remove usage of
XMM8-XMM15 registers; Do not preload H-values and be_mask to reduce
register usage for i386.
(_gcry_ghash_setup_intel_pclmul): Enable calculation of H2, H3 and H4
on i386.
(_gcry_ghash_intel_pclmul): Adjust to above gfmul_pclmul_aggr4
changes; Move 'aggr4' code path outside [__x86_64__] block.
--

Benchmark on Intel Haswell (win32):

Before:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
 GMAC_AES           |     0.446 ns/B      2140 MiB/s      1.78 c/B      3998

After (~2.38x faster):
                    |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
 GMAC_AES           |     0.187 ns/B      5107 MiB/s     0.747 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoPrefetch GCM look-up tables
Jussi Kivilinna [Sat, 27 Apr 2019 16:33:28 +0000 (19:33 +0300)]
Prefetch GCM look-up tables

* cipher/cipher-gcm.c (prefetch_table, do_prefetch_tables)
(prefetch_tables): New.
(ghash_internal): Call prefetch_tables.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoOptimizations for generic table-based GCM implementations
Jussi Kivilinna [Sat, 27 Apr 2019 19:03:31 +0000 (22:03 +0300)]
Optimizations for generic table-based GCM implementations

* cipher/cipher-gcm.c [GCM_TABLES_USE_U64] (do_fillM): Precalculate
M[32..63] values.
[GCM_TABLES_USE_U64] (do_ghash): Split processing of two 64-bit halfs
of the input to two separate loops; Use precalculated M[] values.
[GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_fillM): Precalculate
M[64..127] values.
[GCM_USE_TABLES && !GCM_TABLES_USE_U64] (do_ghash): Use precalculated
M[] values.
[GCM_USE_TABLES] (bshift): Avoid conditional execution for mask
calculation.
* cipher/cipher-internal.h (gcry_cipher_handle): Double gcm_table size.
--

Benchmark on Intel Haswell (amd64, --disable-hwf all):

 Before:
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      2.79 ns/B     341.3 MiB/s     11.17 c/B      3998

 After (~36% faster):
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      2.05 ns/B     464.7 MiB/s      8.20 c/B      3998

Benchmark on Intel Haswell (win32, --disable-hwf all):

 Before:
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      4.90 ns/B     194.8 MiB/s     19.57 c/B      3997

 After (~36% faster):
                     |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  GMAC_AES           |      3.58 ns/B     266.4 MiB/s     14.31 c/B      3999

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoOptimizations for GCM Intel/PCLMUL implementation
Jussi Kivilinna [Fri, 26 Apr 2019 16:29:32 +0000 (19:29 +0300)]
Optimizations for GCM Intel/PCLMUL implementation

* cipher/cipher-gcm-intel-pclmul.c (reduction): New.
(glmul_pclmul): Include shifting to left into pclmul operations; Use
'reduction' helper function.
[__x86_64__] (gfmul_pclmul_aggr4): Reorder instructions and adjust
register usage to free up registers; Use 'reduction' helper function;
Include shifting to left into pclmul operations; Moving load H values
and input from caller into this function.
[__x86_64__] (gfmul_pclmul_aggr8): New.
(gcm_lsh): New.
(_gcry_ghash_setup_intel_pclmul): Left shift H values to left by
one; Preserve XMM6-XMM15 registers on WIN64.
(_gcry_ghash_intel_pclmul) [__x86_64__]: Use 8 block aggregated
reduction function.
--

Benchmark on Intel Haswell (amd64):

Before:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
 GMAC_AES           |     0.206 ns/B      4624 MiB/s     0.825 c/B      3998

After (+50% faster):
                    |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
 GMAC_AES           |     0.137 ns/B      6953 MiB/s     0.548 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoMove data pointer macro for 64-bit ARM assembly to common header
Jussi Kivilinna [Fri, 26 Apr 2019 16:29:19 +0000 (19:29 +0300)]
Move data pointer macro for 64-bit ARM assembly to common header

* cipher/asm-common-aarch64.h (GET_DATA_POINTER): New.
* cipher/chacha20-aarch64.S (GET_DATA_POINTER): Remove.
* cipher/cipher-gcm-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/crc-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/rijndael-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/sha1-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
* cipher/sha256-armv8-aarch64-ce.S (GET_DATA_POINTER): Remove.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoAdd CFI unwind assembly directives for 64-bit ARM assembly
Jussi Kivilinna [Fri, 26 Apr 2019 16:29:08 +0000 (19:29 +0300)]
Add CFI unwind assembly directives for 64-bit ARM assembly

* cipher/asm-common-aarch64.h (CFI_STARTPROC, CFI_ENDPROC)
(CFI_REMEMBER_STATE, CFI_RESTORE_STATE, CFI_ADJUST_CFA_OFFSET)
(CFI_REL_OFFSET, CFI_DEF_CFA_REGISTER, CFI_REGISTER, CFI_RESTORE)
(DW_REGNO_SP, DW_SLEB128_7BIT, DW_SLEB128_28BIT, CFI_CFA_ON_STACK)
(CFI_REG_ON_STACK): New.
* cipher/camellia-aarch64.S: Add CFI directives.
* cipher/chacha20-aarch64.S: Add CFI directives.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Add CFI directives.
* cipher/crc-armv8-aarch64-ce.S: Add CFI directives.
* cipher/rijndael-aarch64.S: Add CFI directives.
* cipher/rijndael-armv8-aarch64-ce.S: Add CFI directives.
* cipher/sha1-armv8-aarch64-ce.S: Add CFI directives.
* cipher/sha256-armv8-aarch64-ce.S: Add CFI directives.
* cipher/twofish-aarch64.S: Add CFI directives.
* mpi/aarch64/mpih-add1.S: Add CFI directives.
* mpi/aarch64/mpih-mul1.S: Add CFI directives.
* mpi/aarch64/mpih-mul2.S: Add CFI directives.
* mpi/aarch64/mpih-mul3.S: Add CFI directives.
* mpi/aarch64/mpih-sub1.S: Add CFI directives.
* mpi/asm-common-aarch64.h: Include "../cipher/asm-common-aarch64.h".
(ELF): Remove.
--

This commit adds CFI directives that add DWARF unwinding information for
debugger to backtrace when executing code from 64-bit ARM assembly files.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoAdd 64-bit ARMv8/CE PMULL implementation of CRC
Jussi Kivilinna [Fri, 26 Apr 2019 16:28:11 +0000 (19:28 +0300)]
Add 64-bit ARMv8/CE PMULL implementation of CRC

* cipher/Makefile.am: Add 'crc-armv8-ce.c' and
'crc-armv8-aarch64-ce.S'.
* cipher/asm-common-aarch64.h [HAVE_GCC_ASM_CFI_DIRECTIVES]: Add CFI
helper macros.
* cipher/crc-armv8-aarch64-ce.S: New.
* cipher/crc-armv8-ce.c: New.
* cipher/crc.c (USE_ARM_PMULL): New.
(CRC_CONTEXT) [USE_ARM_PMULL]: Add 'use_pmull'.
[USE_ARM_PMULL] (_gcry_crc32_armv8_ce_pmull)
(_gcry_crc24rfc2440_armv8_ce_pmull): New prototypes.
(crc32_init, crc32rfc1510_init, crc24rfc2440_init): Enable ARM PMULL
implementations if supported by HW features.
(crc32_write, crc24rfc2440_write) [USE_ARM_PMULL]: Use ARM PMULL
implementations if enabled.
* configure.ac: Add 'crc-armv8-ce.lo' and 'crc-armv8-aarch64-ce.lo'.
--

Benchmark on Cortex-A53 (at 1104 Mhz):

Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 CRC32          |      2.89 ns/B     330.2 MiB/s      3.19 c/B
 CRC32RFC1510   |      2.89 ns/B     330.2 MiB/s      3.19 c/B
 CRC24RFC2440   |      2.72 ns/B     350.8 MiB/s      3.00 c/B

After (crc32 ~8.4x faster, crc24 ~6.8x faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 CRC32          |     0.341 ns/B      2796 MiB/s     0.377 c/B
 CRC32RFC1510   |     0.342 ns/B      2792 MiB/s     0.377 c/B
 CRC24RFC2440   |     0.398 ns/B      2396 MiB/s     0.439 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agompi: make stack unwinding work at i386 mpi functions
Jussi Kivilinna [Thu, 18 Apr 2019 16:23:26 +0000 (19:23 +0300)]
mpi: make stack unwinding work at i386 mpi functions

* mpi/i386/syntax.h: Include 'config.h'.
(CFI_STARTPROC, CFI_ENDPROC, CFI_ADJUST_CFA_OFFSET, CFI_REL_OFFSET)
(CFI_RESTORE, CFI_PUSH, CFI_POP): New.
* mpi/i386/mpih-add1.S: Add CFI directives.
* mpi/i386/mpih-lshift.S: Add CFI directives.
* mpi/i386/mpih-mul1.S: Add CFI directives.
* mpi/i386/mpih-mul2.S: Add CFI directives.
* mpi/i386/mpih-mul3.S: Add CFI directives.
* mpi/i386/mpih-rshift.S: Add CFI directives.
* mpi/i386/mpih-sub1.S: Add CFI directives.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agohwf-x86: make stack unwinding work at i386 cpuid functions
Jussi Kivilinna [Wed, 17 Apr 2019 21:20:42 +0000 (00:20 +0300)]
hwf-x86: make stack unwinding work at i386 cpuid functions

* src/hwf-x86.c (FORCE_FUNC_FRAME_POINTER): New.
[__i386__] (is_cpuid_available): Force use of stack frame pointer as
inline assembly modifies stack register; Add 'memory' constraint for
inline assembly.
[__i386__] (get_cpuid): Avoid push/pop instruction when preserving
%ebx register over cpuid.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoLimit and document Blowfish key lengths to 8-576 bits
Jussi Kivilinna [Thu, 18 Apr 2019 15:53:35 +0000 (18:53 +0300)]
Limit and document Blowfish key lengths to 8-576 bits

* cipher/blowfish.c (BLOWFISH_KEY_MIN_BITS)
(BLOWFISH_KEY_MAX_BITS): New.
(do_bf_setkey): Check input key length to MIN_BITS and MAX_BITS.
* doc/gcrypt.texi: Update supported Blowfish key lengths.
* tests/basic.c (check_ecb_cipher): New, with Blowfish test vectors
for different key lengths.
(check_cipher_modes): Call 'check_ecb_cipher'.
--

As noted by Peter Wu, Blowfish cipher implementation already supports key
lengths 8 to 576 bits [1]. This change updates documentation to reflect
that and adds new test vectors to check handling of different key lengths.

[1] https://lists.gnupg.org/pipermail/gcrypt-devel/2019-April/004680.html

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd CFI unwind assembly directives for AMD64 assembly
Jussi Kivilinna [Mon, 15 Apr 2019 16:46:53 +0000 (19:46 +0300)]
Add CFI unwind assembly directives for AMD64 assembly

* configure.ac (gcry_cv_gcc_asm_cfi_directives): New.
* cipher/asm-common-amd64.h (ADD_RIP, CFI_STARTPROC, CFI_ENDPROC)
(CFI_REMEMBER_STATE, CFI_RESTORE_STATE, CFI_ADJUST_CFA_OFFSET)
(CFI_REL_OFFSET, CFI_DEF_CFA_REGISTER, CFI_REGISTER, CFI_RESTORE)
(CFI_PUSH, CFI_POP, CFI_POP_TMP_REG, CFI_LEAVE, DW_REGNO)
(DW_SLEB128_7BIT, DW_SLEB128_28BIT, CFI_CFA_ON_STACK)
(CFI_REG_ON_STACK): New.
(ENTER_SYSV_FUNCPARAMS_0_4, EXIT_SYSV_FUNC): Add CFI directives.
* cipher/arcfour-amd64.S: Add CFI directives.
* cipher/blake2b-amd64-avx2.S: Add CFI directives.
* cipher/blake2s-amd64-avx.S: Add CFI directives.
* cipher/blowfish-amd64.S: Add CFI directives.
* cipher/camellia-aesni-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/camellia-aesni-avx2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/cast5-amd64.S: Add CFI directives.
* cipher/chacha20-amd64-avx2.S: Add CFI directives.
* cipher/chacha20-amd64-ssse3.S: Add CFI directives.
* cipher/des-amd64.S: Add CFI directives.
* cipher/rijndael-amd64.S: Add CFI directives.
* cipher/rijndael-ssse3-amd64-asm.S: Add CFI directives.
* cipher/salsa20-amd64.S: Add CFI directives; Use 'asm-common-amd64.h'.
* cipher/serpent-avx2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/serpent-sse2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-avx-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-avx2-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-ssse3-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha256-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha256-avx2-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha256-ssse3-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha512-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha512-avx2-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha512-ssse3-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/twofish-amd64.S: Add CFI directives.
* cipher/twofish-avx2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/whirlpool-sse2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* mpi/amd64/func_abi.h: Include 'config.h'.
(CFI_STARTPROC, CFI_ENDPROC, CFI_ADJUST_CFA_OFFSET, CFI_REL_OFFSET)
(CFI_RESTORE, CFI_PUSH, CFI_POP): New.
(FUNC_ENTRY, FUNC_EXIT): Add CFI directives.
--

This commit adds CFI directives that add DWARF unwinding information for
debugger to backtrace when executing code from AMD64 assembly files.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agotwofish-amd64: do not use xchg instruction
Jussi Kivilinna [Mon, 15 Apr 2019 19:09:24 +0000 (22:09 +0300)]
twofish-amd64: do not use xchg instruction

* cipher/twofish-amd64.S (g1g2_3): Swap ab and cd registers using
'movq' instructions instead of 'xchgq'.
--

Avoiding xchg instruction improves three block parallel performance
by ~3% on Intel Haswell.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoUse FreeBSD's elf_aux_info for detecting ARM HW features
Jussi Kivilinna [Tue, 9 Apr 2019 17:04:19 +0000 (20:04 +0300)]
Use FreeBSD's elf_aux_info for detecting ARM HW features

* configure.ac: Add function check for 'elf_aux_info'.
* src/hwf-arm.c [HAVE_ELF_AUX_INFO]: Include 'sys/auxv.h'.
[HAVE_ELF_AUX_INFO && !HAVE_GETAUXVAL] (HAVE_GETAUXVAL)
(getauxval): New.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoUse getauxval system function for detecting ARM HW features
Jussi Kivilinna [Mon, 8 Apr 2019 17:44:08 +0000 (20:44 +0300)]
Use getauxval system function for detecting ARM HW features

* configure.ac: Add header check for 'sys/auxv.h'; Add function check
for 'getauxval'.
* src/hwf-arm.c [HAVE_SYS_AUXV_H && HAVE_GETAUXVAL]: Include
'sys/auxv.h'.
(HAS_SYS_AT_HWCAP): Enable AT_HWCAP if have 'getauxval' in addition of
__linux__.
(AT_HWCAP, AT_HWCAP2, HWCAP_NEON, HWCAP2_AES, HWCAP2_PMULL)
(HWCAP2_SHA1, HWCAP2_SHA2, HWCAP_ASIMD, HWCAP_AES)
(HWCAP_PMULL, HWCAP_SHA1, HWCAP_SHA2): Define these macros only if not
already defined.
(get_hwcap) [HAVE_SYS_AUXV_H && HAVE_GETAUXVAL]: Use 'getauxval' to
fetch HW capability flags.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoDisable SM3 in FIPS mode
Jussi Kivilinna [Mon, 8 Apr 2019 14:32:36 +0000 (17:32 +0300)]
Disable SM3 in FIPS mode

* cipher/sm3.h (_gcry_digest_spec_sm3): Set flags.fips to zero.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoTune SHA-512/AVX2 and SHA-256/AVX2 implementations
Jussi Kivilinna [Sun, 7 Apr 2019 14:53:19 +0000 (17:53 +0300)]
Tune SHA-512/AVX2 and SHA-256/AVX2 implementations

* cipher/sha256-avx2-bmi2-amd64.S (ONE_ROUND_PART1, ONE_ROUND_PART2)
(ONE_ROUND): New round function.
(FOUR_ROUNDS_AND_SCHED, FOUR_ROUNDS): Use new round function.
(_gcry_sha256_transform_amd64_avx2): Exit early if number of blocks is
zero; Writing XFER to stack earlier and handle XREF writing in
FOUR_ROUNDS_AND_SCHED.
* cipher/sha512-avx2-bmi2-amd64.S (MASK_YMM_LO, MASK_YMM_LOx): New.
(ONE_ROUND_PART1, ONE_ROUND_PART2, ONE_ROUND): New round function.
(FOUR_ROUNDS_AND_SCHED, FOUR_ROUNDS): Use new round function.
(_gcry_sha512_transform_amd64_avx2): Writing XFER to stack earlier and
handle XREF writing in FOUR_ROUNDS_AND_SCHED.
--

Benchmark on Intel Haswell (4.0Ghz):

Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      2.17 ns/B     439.0 MiB/s      8.68 c/B
 SHA512         |      1.56 ns/B     612.5 MiB/s      6.23 c/B

After (~4-6% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      2.05 ns/B     465.9 MiB/s      8.18 c/B
 SHA512         |      1.49 ns/B     640.3 MiB/s      5.95 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd SHA512/224 and SHA512/256 algorithms
Jussi Kivilinna [Fri, 5 Apr 2019 17:10:32 +0000 (20:10 +0300)]
Add SHA512/224 and SHA512/256 algorithms

* cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping for SHA512/224
and SHA512/256.
(_gcry_mac_type_spec_hmac_sha512_256)
(_gcry_mac_type_spec_hmac_sha512_224): New.
* cipher/mac-internal.h (_gcry_mac_type_spec_hmac_sha512_256)
(_gcry_mac_type_spec_hmac_sha512_224): New.
* cipher/mac.c (mac_list, mac_list_algo101): Add SHA512/224 and
SHA512/256.
* cipher/md.c (digest_list, digest_list_algo301)
(prepare_macpads): Ditto.
* cipher/sha512.c (run_selftests): Ditto.
(sha512_init_common): Move common initialization here.
(sha512_init, sha384_init): Use common initialization function.
(sha512_224_init, sha512_256_init, _gcry_sha512_224_hash_buffer)
(_gcry_sha512_224_hash_buffers, _gcry_sha512_256_hash_buffer)
(_gcry_sha512_256_hash_buffers, selftests_sha512_224)
(selftests_sha512_256, sha512_224_asn, oid_spec_sha512_224)
(_gcry_digest_spec_sha512_224, sha512_256_asn, oid_spec_sha512_256)
(_gcry_digest_spec_sha512_256): New.
* doc/gcrypt.texi: Add SHA512/224 and SHA512/256; Add missing
HMAC-BLAKE2s and HMAC-BLAKE2b.
* src/cipher.h (_gcry_digest_spec_sha512_224)
(_gcry_digest_spec_sha512_256): New.
* src/gcrypt.h.in (GCRY_MD_SHA512_256, GCRY_MD_SHA512_224): New.
(GCRY_MAC_HMAC_SHA512_256, GCRY_MAC_HMAC_SHA512_224): New.
* tests/basic.c (check_digests): Add SHA512/224 and SHA512/256
test vectors.
--

This change adds truncated SHA512/224 and SHA512/256 algorithms
specified in FIPS 180-4.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoRemove extra buffer flush at begining of digest final functions
Jussi Kivilinna [Fri, 5 Apr 2019 15:48:13 +0000 (18:48 +0300)]
Remove extra buffer flush at begining of digest final functions

* cipher/md2.c (md2_final): Remove _gcry_md_block_write flush call
from entry.
* cipher/md4.c (md4_final): Ditto.
* cipher/md5.c (md5_final): Ditto.
* cipher/rmd160.c (rmd160_final): Ditto.
* cipher/sha1.c (sha1_final): Ditto.
* cipher/sha256.c (sha256_final): Ditto.
* cipher/sha512.c (sha512_final): Ditto.
* cipher/sm3.c (sm3_final): Ditto.
* cipher/stribog.c (stribog_final): Ditto.
* cipher/tiger.c (tiger_final): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoOptimizations for digest final functions
Jussi Kivilinna [Fri, 5 Apr 2019 15:52:47 +0000 (18:52 +0300)]
Optimizations for digest final functions

* cipher/md4.c (md4_final): Avoid byte-by-byte buffer setting when
padding; Merge extra and last block processing.
* cipher/md5.c (md5_final): Ditto.
* cipher/rmd160.c (rmd160_final): Ditto.
* cipher/sha1.c (sha1_final): Ditto.
* cipher/sha256.c (sha256_final): Ditto.
* cipher/sm3.c (sm3_final): Ditto.
* cipher/tiger.c (tiger_final): Ditto.
* cipher/sha512.c (sha512_final): Avoid byte-by-byte buffer setting
when padding.
* cipher/stribog.c (stribog_final): Ditto.
* cipher/whirlpool.c (whirlpool_final): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agotests/basic: add hash test for small block sizes
Jussi Kivilinna [Fri, 5 Apr 2019 15:19:45 +0000 (18:19 +0300)]
tests/basic: add hash test for small block sizes

* tests/basic.c (check_one_md): Compare hashing buffers sizes from 1 to
129 as full buffer input and byte-by-byte input.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoBurn stack in transform functions for SHA2 AMD64 implementations
Jussi Kivilinna [Fri, 5 Apr 2019 14:38:39 +0000 (17:38 +0300)]
Burn stack in transform functions for SHA2 AMD64 implementations

* cipher/sha256-avx-amd64.S: Burn stack inside transform functions.
* cipher/sha256-avx2-bmi2-amd64.S: Ditto.
* cipher/sha256-ssse3-amd64.S: Ditto.
* cipher/sha512-avx-amd64.S: Ditto.
* cipher/sha512-avx2-bmi2-amd64.S: Ditto.
* cipher/sha512-ssse3-amd64.S: Ditto.
--

This change reduces per call overhead for SHA256 & SHA512.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoBurn stack in transform functions for SHA1 AMD64 implementations
Jussi Kivilinna [Fri, 5 Apr 2019 14:37:42 +0000 (17:37 +0300)]
Burn stack in transform functions for SHA1 AMD64 implementations

* cipher/sha1-avx-amd64.S: Burn stack inside transform functions.
* cipher/sha1-avx-bmi2-amd64.S: Ditto.
* cipher/sha1-avx2-bmi2-amd64.S: Ditto.
* cipher/sha1-ssse3-amd64.S: Ditto.
--

This change reduces per call overhead for SHA1.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd AVX2/BMI2 implementation of SHA1
Jussi Kivilinna [Fri, 5 Apr 2019 14:39:22 +0000 (17:39 +0300)]
Add AVX2/BMI2 implementation of SHA1

* cipher/Makefile.am: Add 'sha1-avx2-bmi2-amd64.S'.
* cipher/hash-common.h (MD_BLOCK_CTX_BUFFER_SIZE): New.
(gcry_md_block_ctx): Change buffer length to MD_BLOCK_CTX_BUFFER_SIZE.
* cipher/sha1-avx-amd64.S: Add missing .size for transform function.
* cipher/sha1-ssse3-amd64.S: Add missing .size for transform function.
* cipher/sha1-avx-bmi2-amd64.S: Add missing .size for transform
function; Tweak implementation for small ~1% speed increase.
* cipher/sha1-avx2-bmi2-amd64.S: New.
* cipher/sha1.c (USE_AVX2, _gcry_sha1_transform_amd64_avx2_bmi2)
(do_sha1_transform_amd64_avx2_bmi2): New.
(sha1_init) [USE_AVX2]: Enable AVX2 implementation if supported by
HW features.
(sha1_final): Merge processing of two last blocks when extra block is
needed.
--

Benchmarks on Intel Haswell (4.0 Ghz):

Before (AVX/BMI2):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.970 ns/B     983.2 MiB/s      3.88 c/B

After (AVX/BMI2, ~1% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.960 ns/B     993.1 MiB/s      3.84 c/B

After (AVX2/BMI2, ~9% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.890 ns/B      1071 MiB/s      3.56 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoblowfish: add three rounds parallel handling to generic C implementation
Jussi Kivilinna [Sun, 31 Mar 2019 15:30:25 +0000 (18:30 +0300)]
blowfish: add three rounds parallel handling to generic C implementation

* cipher/blowfish.c (BLOWFISH_ROUNDS): Remove.
[BLOWFISH_ROUNDS != 16] (function_F): Remove.
(F): Replace big-endian and little-endian version with single
endian-neutral version.
(R3, do_encrypt_3, do_decrypt_3): New.
(_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec): Use new three block functions.
--

Benchmark on aarch64 (cortex-a53, 816 Mhz):

Before:
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     29.58 ns/B     32.24 MiB/s     24.13 c/B
        CFB dec |     33.38 ns/B     28.57 MiB/s     27.24 c/B
        CTR enc |     34.18 ns/B     27.90 MiB/s     27.89 c/B
After (~60%-70% faster):
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     18.18 ns/B     52.45 MiB/s     14.84 c/B
        CFB dec |     19.67 ns/B     48.50 MiB/s     16.05 c/B
        CTR enc |     19.77 ns/B     48.25 MiB/s     16.13 c/B

Benchmark on i386 (haswell, 4000 Mhz):

Before:
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      6.10 ns/B     156.4 MiB/s     24.39 c/B
        CFB dec |      6.39 ns/B     149.2 MiB/s     25.56 c/B
        CTR enc |      6.73 ns/B     141.6 MiB/s     26.93 c/B
After (~80% faster):
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      3.46 ns/B     275.5 MiB/s     13.85 c/B
        CFB dec |      3.53 ns/B     270.4 MiB/s     14.11 c/B
        CTR enc |      3.56 ns/B     268.0 MiB/s     14.23 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agocast5: add three rounds parallel handling to generic C implementation
Jussi Kivilinna [Sun, 31 Mar 2019 15:26:58 +0000 (18:26 +0300)]
cast5: add three rounds parallel handling to generic C implementation

* cipher/cast5.c (do_encrypt_block_3, do_decrypt_block_3): New.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec): Use
new three block functions.
--

Benchmark on aarch64 (cortex-a53, 816 Mhz):

Before:
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     35.24 ns/B     27.07 MiB/s     28.75 c/B
        CFB dec |     34.62 ns/B     27.54 MiB/s     28.25 c/B
        CTR enc |     35.39 ns/B     26.95 MiB/s     28.88 c/B
After (~40%-50% faster):
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     23.05 ns/B     41.38 MiB/s     18.81 c/B
        CFB dec |     24.49 ns/B     38.94 MiB/s     19.98 c/B
        CTR dec |     24.57 ns/B     38.82 MiB/s     20.05 c/B

Benchmark on i386 (haswell, 4000 Mhz):

Before:
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      6.92 ns/B     137.7 MiB/s     27.69 c/B
        CFB dec |      6.83 ns/B     139.7 MiB/s     27.32 c/B
        CTR enc |      7.01 ns/B     136.1 MiB/s     28.03 c/B
After (~70% faster):
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      3.97 ns/B     240.1 MiB/s     15.89 c/B
        CFB dec |      3.96 ns/B     241.0 MiB/s     15.83 c/B
        CTR enc |      4.01 ns/B     237.8 MiB/s     16.04 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agocast5: read Kr four blocks at time and shift for current round
Jussi Kivilinna [Sun, 31 Mar 2019 15:25:04 +0000 (18:25 +0300)]
cast5: read Kr four blocks at time and shift for current round

* cipher/cast5.c (do_encrypt_block, do_decrypt_block): Read Kr as
32-bit words instead of bytes and shift value for each round.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd helper function for adding value to cipher block
Jussi Kivilinna [Sun, 31 Mar 2019 15:21:20 +0000 (18:21 +0300)]
Add helper function for adding value to cipher block

* cipher/cipher-internal.h (cipher_block_add): New.
* cipher/blowfish.c (_gcry_blowfish_ctr_enc): Use new helper function
for CTR block increment.
* cipher/camellia-glue.c (_gcry_camellia_ctr_enc): Ditto.
* cipher/cast5.c (_gcry_cast5_ctr_enc): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/des.c (_gcry_3des_ctr_enc): Ditto.
* cipher/rijndael.c (_gcry_aes_ctr_enc): Ditto.
* cipher/serpent.c (_gcry_serpent_ctr_enc): Ditto.
* cipher/twofish.c (_gcry_twofish_ctr_enc): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoOptimize OCB set_key and set_nonce
Jussi Kivilinna [Thu, 28 Mar 2019 21:25:52 +0000 (23:25 +0200)]
Optimize OCB set_key and set_nonce

* cipher/cipher-ocb.c (double_block): Change to input/output
host-endian block instead of big-endian buffer.
(double_block_cpy): Remove.
(bit_copy): Use fixed length copy and 'u64' for calculations.
(ocb_get_L_big): Handle block endian conversions for double_block.
(_gcry_cipher_ocb_setkey): Handle block endian conversions for
double_block.
(_gcry_cipher_ocb_set_nonce): Set full length of 'ktop' to zero; Drop
length parameter for bit_copy.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAES-NI/OCB: Optimize last and first key XORing
Jussi Kivilinna [Thu, 28 Mar 2019 18:49:37 +0000 (20:49 +0200)]
AES-NI/OCB: Optimize last and first key XORing

* cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec)
[__x86_64__]: Reorder and mix first and last key XORing with OCB offset
XOR operations.
--

OCB pre-XORing and post-XORing can be mixed and reordered with
first and last round XORing of AES cipher. This commit utilizes
this fact for additional optimization of AES-NI/OCB encryption
and decryption.

Benchmark on Intel Haswell:

Before:
AES       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.174 ns/B      5468 MiB/s     0.697 c/B      3998
  OCB dec |     0.170 ns/B      5617 MiB/s     0.679 c/B      3998

After (enc ~11% faster, dec ~6% faster):
AES       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.157 ns/B      6065 MiB/s     0.629 c/B      3998
  OCB dec |     0.160 ns/B      5956 MiB/s     0.640 c/B      3998

For reference, CTR:
AES       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  CTR enc |     0.157 ns/B      6090 MiB/s     0.626 c/B      3998
  CTR dec |     0.157 ns/B      6092 MiB/s     0.626 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAES-NI/OCB: Perform checksumming inline with encryption
Jussi Kivilinna [Wed, 27 Mar 2019 21:10:31 +0000 (23:10 +0200)]
AES-NI/OCB: Perform checksumming inline with encryption

* cipher/rijndael-aesni.c (aesni_ocb_enc): Remove call to
'aesni_ocb_checksum', instead perform checksumming inline with offset
calculations.
--

This patch reverts the OCB checksumming split for encryption to avoid
performance issue seen on Intel CPUs.

Commit b42de67f34 "Optimizations for AES-NI OCB" changed AES-NI/OCB
implementation perform checksumming as separate pass from encryption
and decryption. While this change improved performance for buffer
sizes 16 to 4096 bytes (buffer sizes used by bench-slope), it
introduced performance anomalia with OCB encryption on Intel
processors. Below is large buffer OCB encryption results on Intel
Haswell. There we can see that with buffer sizes larger than 32 KiB
performance starts dropping. Decryption does not suffer from the same
issue.

 MiB/s                Speed by Data Length (at 2 Ghz)
 2800 +-------------------------------------------------------------+
 2600 |-+  +          +       **.****.****+         +          +  +-|
      |                  **.**           *.****.****.****           |
 2400 |-+            *.**                               *.*****.****|
 2200 |-+         ***                                             +-|
 2000 |-+      *.*                                                +-|
      |       **                                                    |
 1800 |-+   **                                                    +-|
 1600 |-+ *.*                                                     +-|
 1400 |-+**                                                       +-|
      |**                                                           |
 1200 |*+  +          +         +         +         +          +  +-|
 1000 +-------------------------------------------------------------+
         1024       4096      16384     65536    262144     1048576
                           Data Length in Bytes

I've tested and reproduced this issue on Intel Ivy-Bridge, Haswell
and Skylake processors. Same performance drop on large buffers is not
seen on AMD Ryzen. Below is OCB decryption speed plot from Haswell for
reference, showing expected performance curve over increasing buffer
sizes.

 MiB/s                Speed by Data Length (at 2 Ghz)
 2800 +-------------------------------------------------------------+
 2600 |-+  +          +       **.****.****.****.****.****.*****.****|
      |                  **.**                                      |
 2400 |-+            *.**                                         +-|
 2200 |-+         ***                                             +-|
 2000 |-+      *.*                                                +-|
      |       **                                                    |
 1800 |-+   **                                                    +-|
 1600 |-+ *.*                                                     +-|
 1400 |-+**                                                       +-|
      |**                                                           |
 1200 |*+  +          +         +         +         +          +  +-|
 1000 +-------------------------------------------------------------+
         1024       4096      16384     65536    262144     1048576
                           Data Length in Bytes

After this patch, bench-slope shows ~2% reduction on performance on
Intel Haswell:

Before:
 AES      |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.171 ns/B      5581 MiB/s     0.683 c/B      3998

After:
 AES      |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.174 ns/B      5468 MiB/s     0.697 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAES-NI/OCB: Use stack for temporary storage
Jussi Kivilinna [Wed, 27 Mar 2019 21:50:07 +0000 (23:50 +0200)]
AES-NI/OCB: Use stack for temporary storage

* cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec): Use stack
allocated 'tmpbuf' instead of output buffer as temporary storage.
--

This change gives (very) small improvement for performance (~0.5%) when
output buffer is unaligned.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agotests/basic: add large buffer testing for ciphers
Jussi Kivilinna [Tue, 26 Mar 2019 17:28:50 +0000 (19:28 +0200)]
tests/basic: add large buffer testing for ciphers

* tests/basic.c (check_one_cipher_core): Allocate buffers from heap.
(check_one_cipher): Add testing with large buffer (~65 KiB) in addition
to medium size buffer (~2 KiB).
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agochacha20-poly1305: fix wrong en/decryption on large input buffers
Jussi Kivilinna [Tue, 26 Mar 2019 17:27:00 +0000 (19:27 +0200)]
chacha20-poly1305: fix wrong en/decryption on large input buffers

* cipher/chacha20.c (_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): Correctly use 'currlen' for chacha20
on the non-stitched code path.
--

This patch fixes bug which was introduced by commit:
  "Add stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementations"
  d6330dfb4b0e9fb3f8eef65ea13146060b804a97

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agodoc: add mention about aligning data to cachelines for best performance
Jussi Kivilinna [Sun, 24 Mar 2019 08:49:29 +0000 (10:49 +0200)]
doc: add mention about aligning data to cachelines for best performance

* doc/gcrypt.text: Add mention about aligning data to cachelines for
best performance.
--

GnuPG-bug-id: 2388
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agorandom-drbg: do not use calloc for zero ctr
Jussi Kivilinna [Sun, 24 Mar 2019 08:23:34 +0000 (10:23 +0200)]
random-drbg: do not use calloc for zero ctr

* random/random-drbg.c (DRBG_CTR_NULL_LEN): Move to 'constants'
section.
(drbg_state_s): Remove 'ctr_null' member.
(drbg_ctr_generate): Add 'drbg_ctr_null'.
(drbg_sym_fini, drbg_sym_init): Remove 'drbg->ctr_null' usage.
--

GnuPG-bug-id: 3878
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agoAdd ARMv7/NEON accelerated GCM implementation
Jussi Kivilinna [Sat, 23 Mar 2019 14:15:49 +0000 (16:15 +0200)]
Add ARMv7/NEON accelerated GCM implementation

* cipher/Makefile.am: Add 'cipher-gcm-armv7-neon.S'.
* cipher/cipher-gcm-armv7-neon.S: New.
* cipher/cipher-gcm.c [GCM_USE_ARM_NEON] (_gcry_ghash_setup_armv7_neon)
(_gcry_ghash_armv7_neon, ghash_setup_armv7_neon)
(ghash_armv7_neon): New.
(setupM) [GCM_USE_ARM_NEON]: Use armv7/neon implementation if have
HWF_ARM_NEON.
* cipher/cipher-internal.h (GCM_USE_ARM_NEON): New.
--

Benchmark on Cortex-A53 (816 Mhz):

Before:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 GMAC_AES           |     34.81 ns/B     27.40 MiB/s     28.41 c/B

After (3.0x faster):
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 GMAC_AES           |     11.49 ns/B     82.99 MiB/s      9.38 c/B

Reported-by: Yuriy M. Kaminskiy <yumkam@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agoUse memset instead of setting buffers byte by byte
Jussi Kivilinna [Thu, 21 Mar 2019 18:43:46 +0000 (20:43 +0200)]
Use memset instead of setting buffers byte by byte

* cipher/cipher-ccm.c (do_cbc_mac): Replace buffer setting loop with memset call.
* cipher/cipher-gcm.c (do_ghash_buf): Ditto.
* cipher/poly1305.c (poly1305_final): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agoUse buf_cpy instead of copying buffers byte by byte
Jussi Kivilinna [Thu, 21 Mar 2019 18:42:28 +0000 (20:42 +0200)]
Use buf_cpy instead of copying buffers byte by byte

* cipher/bufhelp.h (buf_cpy): Skip memcpy if length is zero.
* cipher/cipher-ccm.c (do_cbc_mac): Replace buffer copy loops with buf_cpy call.
* cipher/cipher-cmac.c (_gcry_cmac_write): Ditto.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agoReduce overhead on generic hash write function
Jussi Kivilinna [Thu, 21 Mar 2019 17:43:05 +0000 (19:43 +0200)]
Reduce overhead on generic hash write function

* cipher/hash-common.c (_gcry_md_block_write): Remove recursive
function call; Use buf_cpy for copying buffers; Burn stack only once.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agosha1-avx: use vmovdqa instead of movdqa
Jussi Kivilinna [Tue, 19 Mar 2019 20:02:28 +0000 (22:02 +0200)]
sha1-avx: use vmovdqa instead of movdqa

* cipher/sha1-avx-amd64.S: Replace 'movdqa' with 'vmovdqa'.
* cipher/sha1-avx-bmi2-amd64.S: Replace 'movdqa' with 'vmovdqa'.
--

Replace SSE instruction 'movdqa' with AVX instruction 'vmovdqa' as
mixing SSE and AVX instructions can lead to bad performance.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agodoc/gcrypt.texi: update HW feature list
Jussi Kivilinna [Tue, 19 Mar 2019 20:08:37 +0000 (22:08 +0200)]
doc/gcrypt.texi: update HW feature list

* doc/gcrypt.texi: Update FW feature list.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 months agoecc: Adjust debugging output
Daniel Kahn Gillmor [Wed, 20 Mar 2019 01:59:54 +0000 (21:59 -0400)]
ecc: Adjust debugging output

* cipher/ecc.c (ecc_check_secret_key): Adjust debugging output to use
full column titles.

--

Without this change, the debugging headers say "inf" and "nam".  With
this change, the alignment for all columns stay the same, but the
headers say "info" and "name", which are much more legible.

Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
GnuPG-bug-id: 4414

6 months agofips: Only test check_binary_integrity when fips_mode is enabled.
NIIBE Yutaka [Mon, 25 Feb 2019 00:02:59 +0000 (09:02 +0900)]
fips: Only test check_binary_integrity when fips_mode is enabled.

* src/fips.c (_gcry_fips_run_selftests): Check the status of fips_mode
before calling check_binary_integrity.

--

GnuPG-bug-id: 4274
Reported-by: Pedro Monreal
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
7 months agoAdd 2-way path for SSSE3 version of ChaCha20
Jussi Kivilinna [Thu, 7 Feb 2019 18:50:02 +0000 (20:50 +0200)]
Add 2-way path for SSSE3 version of ChaCha20

* cipher/chacha20-amd64-ssse3.S (_gcry_chacha20_amd64_ssse3_blocks1)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1): Add 2-way code paths.
* cipher/chacha20.c (_gcry_chacha20_poly1305_encrypt): Add
preprosessing of 2 blocks with SSSE3.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoDo not precalculate OCB offset L0+L1+L0
Jussi Kivilinna [Sun, 27 Jan 2019 10:55:22 +0000 (12:55 +0200)]
Do not precalculate OCB offset L0+L1+L0

* cipher/cipher-internal.h (gcry_cipher_handle): Remove OCB L0L1L0.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_setkey): Ditto.
* cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec)
(_gcry_aes_aesni_ocb_auth): Replace L0L1L0 use with L1.
--

Patch fixes L0+L1+L0 thinko. This is same as L1 (L0 xor L1 xor L0).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoCalculate OCB L-tables when setting key instead of when setting nonce
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
Calculate OCB L-tables when setting key instead of when setting nonce

* cipher/cipher-internal.h (gcry_cipher_handle): Mark areas of
u_mode.ocb that are and are not cleared by gcry_cipher_reset.
(_gcry_cipher_ocb_setkey): New.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Split
L-table generation to ...
(_gcry_cipher_ocb_setkey): ... this new function.
* cipher/cipher.c (cipher_setkey): Add handling for OCB mode.
(cipher_reset): Do not clear L-values for OCB mode.
--

OCB L-tables do not depend on nonce value, but only on cipher key.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agochacha20-amd64-avx2: optimize output xoring
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
chacha20-amd64-avx2: optimize output xoring

* cipher/chacha20-amd64-avx2.S (STACK_TMP2): Remove.
(transpose_16byte_2x2, xor_src_dst): New.
(BUF_XOR_256_TO_128): Remove.
(_gcry_chaha20_amd64_avx2_blocks8)
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): Replace
BUF_XOR_256_TO_128 with transpose_16byte_2x2/xor_src_dst; Reduce stack
usage; Better interleave chacha20 state merging and output xoring.
--

Benchmark on Intel i7-4790K:

Before:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
     STREAM enc |     0.314 ns/B      3035 MiB/s      1.26 c/B      3998
     STREAM dec |     0.314 ns/B      3037 MiB/s      1.26 c/B      3998
   POLY1305 enc |     0.451 ns/B      2117 MiB/s      1.80 c/B      3998
   POLY1305 dec |     0.441 ns/B      2162 MiB/s      1.76 c/B      3998

After:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
     STREAM enc |     0.309 ns/B      3086 MiB/s      1.24 c/B      3998
     STREAM dec |     0.309 ns/B      3083 MiB/s      1.24 c/B      3998
   POLY1305 enc |     0.445 ns/B      2141 MiB/s      1.78 c/B      3998
   POLY1305 dec |     0.436 ns/B      2188 MiB/s      1.74 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/bench-slope: prevent auto-mhz detection getting stuck
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/bench-slope: prevent auto-mhz detection getting stuck

* cipher/bench-slope.c (bench_ghz, bench_ghz_diff): New static
variables.
(AUTO_GHZ_TARGET_DIFF): New macro.
(do_slope_benchmark): Reduce target auto-mhz accuracy after
repeated failures.
(bench_print_result_csv, bench_print_result_std): Print auto-ghz
different if 1 Mhz or more.
(do_slope_benchmark, bench_print_result_csv, bench_print_result_std)
(bench_print_result): Remove 'bench_ghz' parameter.
(cipher_bench_one, hash_bench_one, mac_bench_one)
(kdf_bench_one): Remove 'bench_ghz' variable.
--

This patch prevents auto-mhz detection getting stuck on systems with
high load or unstable CPU frequency.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/bench-slope: add missing cipher context reset
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/bench-slope: add missing cipher context reset

* tests/bench-slope.c (bench_encrypt_do_bench)
(bench_decrypt_do_bench): Add call to 'gcry_cipher_reset'.
--

Some non-AEAD results were negativily affected by missing state
reset (~1% for aesni-ctr and chacha20-stream).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoAdd stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementations
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
Add stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementations

* cipher/asm-poly1305-amd64.h: New.
* cipher/Makefile.am: Add 'asm-poly1305-amd64.h'.
* cipher/chacha20-amd64-avx2.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New.
* cipher/chacha20-amd64-ssse3.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1): New.
* cipher/chacha20.c (_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1)
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New prototypes.
(chacha20_encrypt_stream): Split tail to...
(do_chacha20_encrypt_stream_tail): ... new function.
(_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New.
* cipher/cipher-internal.h (_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New prototypes.
* cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt): Call
'_gcry_chacha20_poly1305_encrypt' if cipher is ChaCha20.
(_gcry_cipher_poly1305_decrypt): Call
'_gcry_chacha20_poly1305_decrypt' if cipher is ChaCha20.
* cipher/poly1305-internal.h (_gcry_cipher_poly1305_update_burn): New
prototype.
* cipher/poly1305.c (poly1305_blocks): Make static.
(_gcry_poly1305_update): Split main function body to ...
(_gcry_poly1305_update_burn): ... new function.
--

Benchmark on Intel Skylake (i5-6500, 3200 Mhz):

Before, 8-way AVX2:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.378 ns/B      2526 MiB/s      1.21 c/B
     STREAM dec |     0.373 ns/B      2560 MiB/s      1.19 c/B
   POLY1305 enc |     0.685 ns/B      1392 MiB/s      2.19 c/B
   POLY1305 dec |     0.686 ns/B      1390 MiB/s      2.20 c/B
  POLY1305 auth |     0.315 ns/B      3031 MiB/s      1.01 c/B

After, 8-way AVX2 (~36% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |     0.503 ns/B      1896 MiB/s      1.61 c/B
   POLY1305 dec |     0.485 ns/B      1965 MiB/s      1.55 c/B

Benchmark on Intel Haswell (i7-4790K, 3998 Mhz):

Before, 8-way AVX2:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.318 ns/B      2999 MiB/s      1.27 c/B
     STREAM dec |     0.317 ns/B      3004 MiB/s      1.27 c/B
   POLY1305 enc |     0.586 ns/B      1627 MiB/s      2.34 c/B
   POLY1305 dec |     0.586 ns/B      1627 MiB/s      2.34 c/B
  POLY1305 auth |     0.271 ns/B      3524 MiB/s      1.08 c/B

After, 8-way AVX2 (~30% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |     0.452 ns/B      2108 MiB/s      1.81 c/B
   POLY1305 dec |     0.440 ns/B      2167 MiB/s      1.76 c/B

Before, 4-way SSSE3:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.627 ns/B      1521 MiB/s      2.51 c/B
     STREAM dec |     0.626 ns/B      1523 MiB/s      2.50 c/B
   POLY1305 enc |     0.895 ns/B      1065 MiB/s      3.58 c/B
   POLY1305 dec |     0.896 ns/B      1064 MiB/s      3.58 c/B
  POLY1305 auth |     0.271 ns/B      3521 MiB/s      1.08 c/B

After, 4-way SSSE3 (~20% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |     0.733 ns/B      1301 MiB/s      2.93 c/B
   POLY1305 dec |     0.726 ns/B      1314 MiB/s      2.90 c/B

Before, 1-way SSSE3:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |      1.56 ns/B     609.6 MiB/s      6.25 c/B
   POLY1305 dec |      1.56 ns/B     609.4 MiB/s      6.26 c/B

After, 1-way SSSE3 (~18% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |      1.31 ns/B     725.4 MiB/s      5.26 c/B
   POLY1305 dec |      1.31 ns/B     727.3 MiB/s      5.24 c/B

For comparison to other libraries (on Intel i7-4790K, 3998 Mhz):

bench-slope-openssl: OpenSSL 1.1.1  11 Sep 2018
Cipher:
 chacha20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.301 ns/B    3166.4 MiB/s      1.20 c/B
     STREAM dec |     0.300 ns/B    3174.7 MiB/s      1.20 c/B
   POLY1305 enc |     0.463 ns/B    2060.6 MiB/s      1.85 c/B
   POLY1305 dec |     0.462 ns/B    2063.8 MiB/s      1.85 c/B
  POLY1305 auth |     0.162 ns/B    5899.3 MiB/s     0.646 c/B

bench-slope-nettle: Nettle 3.4
Cipher:
 chacha         |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      1.65 ns/B     578.2 MiB/s      6.59 c/B
     STREAM dec |      1.65 ns/B     578.2 MiB/s      6.59 c/B
   POLY1305 enc |      2.05 ns/B     464.8 MiB/s      8.20 c/B
   POLY1305 dec |      2.05 ns/B     464.7 MiB/s      8.20 c/B
  POLY1305 auth |     0.404 ns/B    2359.1 MiB/s      1.62 c/B

bench-slope-botan: Botan 2.6.0
Cipher:
 ChaCha         |  nanosecs/byte   mebibytes/sec   cycles/byte
 STREAM enc/dec |     0.855 ns/B    1116.0 MiB/s      3.42 c/B
   POLY1305 enc |      1.60 ns/B     595.4 MiB/s      6.40 c/B
   POLY1305 dec |      1.60 ns/B     595.8 MiB/s      6.40 c/B
  POLY1305 auth |     0.752 ns/B    1268.3 MiB/s      3.01 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoAdd SSSE3 optimized non-parallel ChaCha20 function
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
Add SSSE3 optimized non-parallel ChaCha20 function

* cipher/chacha20-amd64-ssse3.S (ROTATE_SHUF, ROTATE, WORD_SHUF)
(QUARTERROUND4, _gcry_chacha20_amd64_ssse3_blocks1): New.
* cipher/chacha20.c (_gcry_chacha20_amd64_ssse3_blocks1): New
prototype.
(chacha20_blocks): Rename to ...
(do_chacha20_blocks): ... this.
(chacha20_blocks): New.
(chacha20_encrypt_stream): Adjust for new chacha20_blocks function.
--

This patch provides SSSE3 optimized version of non-parallel
ChaCha20 core block function. On Intel Haswell generic C function
runs at 6.9 cycles/byte. New function runs at 5.2 cycles/byte, thus
being ~32% faster.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/basic: increase buffer size for check_one_cipher
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/basic: increase buffer size for check_one_cipher

* tests/basic.c (check_one_cipher_core)
(check_one_cipher): Increase buffer from 1040 to 1904 bytes.
--

This is for better test coverage of highly parallel cipher
implementations.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/basic: check AEAD tags in check_one_cipher test
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/basic: check AEAD tags in check_one_cipher test

* tests/basic.c (get_algo_mode_taglen): New.
(check_one_cipher_core_reset): Check that tags are same with
AEAD modes.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agobuild: With LD_LIBRARY_PATH defined, use --disable-new-dtags.
NIIBE Yutaka [Tue, 15 Jan 2019 07:14:51 +0000 (16:14 +0900)]
build: With LD_LIBRARY_PATH defined, use --disable-new-dtags.

* configure.ac (LDADD_FOR_TESTS_KLUDGE): New for --disable-new-dtags.
* tests/Makefile.am (LDADD, t_lock_LDADD): Use LDADD_FOR_TESTS_KLUDGE.

--

GnuPG-bug-id: 4298
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
8 months agorandom: Fix previous commit for getentropy function.
NIIBE Yutaka [Tue, 15 Jan 2019 06:48:25 +0000 (15:48 +0900)]
random: Fix previous commit for getentropy function.

* random/rndlinux.c [__NR_getrandom] (_gcry_rndlinux_gather_random):
Check return value only for use of syscall.

--

The function returns 0 on success.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
8 months agorandom: Use getentropy when available for not GNU/Linux.
NIIBE Yutaka [Tue, 15 Jan 2019 04:53:45 +0000 (13:53 +0900)]
random: Use getentropy when available for not GNU/Linux.

* configure.ac: Detect getentropy.
* random/rndlinux.c [__linux__] (getentropy): Macro defined.
[HAVE_GETENTROPY] (_gcry_rndlinux_gather_random): Use getentropy.

--

GnuPG-bug-id: 4288
Reported-by: David Carlier
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
8 months agocamellia-aarch64: do not export look-up table globally
Jussi Kivilinna [Mon, 14 Jan 2019 20:14:24 +0000 (22:14 +0200)]
camellia-aarch64: do not export look-up table globally

* cipher/camellia-aarch64.S (_gcry_camellia_arm_tables): Remove
'.globl' export.
--

Reported-by: Martin Husemann <martin@NetBSD.org>
GnuPG-bug-id: 4317
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agoProcess CCM/EAX/GCM/Poly1305 AEAD cipher modes input in 24 KiB chucks
Jussi Kivilinna [Wed, 2 Jan 2019 19:25:44 +0000 (21:25 +0200)]
Process CCM/EAX/GCM/Poly1305 AEAD cipher modes input in 24 KiB chucks

* cipher/cipher-ccm.c (_gcry_cipher_ccm_encrypt)
(_gcry_cipher_ccm_decrypt): Process data in 24 KiB chunks.
* cipher/cipher-eax.c (_gcry_cipher_eax_encrypt)
(_gcry_cipher_eax_decrypt): Ditto.
* cipher/cipher-gcm.c (_gcry_cipher_gcm_encrypt)
(_gcry_cipher_gcm_decrypt): Ditto.
* cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt)
(_gcry_cipher_poly1305_decrypt): Ditto.
--

Patch changes AEAD modes to process input in 24 KiB chuncks to improve
cache locality when processing large buffers.

Huge buffer test in tests/benchmark show 0.7% improvement for AES-CCM
and AES-EAX, 6% for AES-GCM and 4% for Chacha20-Poly1305 on Intel Core
i7-4790K.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agotests/benchmark: add Chacha20-Poly1305 benchmarking
Jussi Kivilinna [Wed, 2 Jan 2019 19:25:44 +0000 (21:25 +0200)]
tests/benchmark: add Chacha20-Poly1305 benchmarking

* tests/benchmark.c (cipher_bench): Add Chacha20-Poly1305.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agotests/benchmark: add --huge-buffers option for cipher tests
Jussi Kivilinna [Wed, 2 Jan 2019 19:25:44 +0000 (21:25 +0200)]
tests/benchmark: add --huge-buffers option for cipher tests

* tests/benchmark.c (huge_buffers, cipher_encrypt, cipher_decrypt): New.
(cipher_bench): Add 'max_inlen' to modes structure; add huge buffers
mode selection.
(main): Add '--huge-buffers'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
9 months agorandom: Add finalizer for rndjent.
NIIBE Yutaka [Wed, 19 Dec 2018 01:28:32 +0000 (10:28 +0900)]
random: Add finalizer for rndjent.

* random/rand-internal.h (_gcry_rndjent_fini): New.
* random/rndjent.c (_gcry_rndjent_fini): New.
* random/rndlinux.c (_gcry_rndlinux_gather_random): Call the finalizer
when GCRYCTL_CLOSE_RANDOM_DEVICE.

--

GnuPG-bug-id: 3731
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
9 months agosecmem: Prepare for easier debugging.
Werner Koch [Wed, 12 Dec 2018 07:34:10 +0000 (08:34 +0100)]
secmem: Prepare for easier debugging.

* src/secmem.c (_gcry_secmem_dump_stats): Factor code out to ...
(secmem_dump_stats_internal): new.
--

This allows to insert call to the dump function during debug sessions
inside of the allocators or call secmem_dump_stats_internal from gdb.

Signed-off-by: Werner Koch <wk@gnupg.org>
9 months agorijndael-aesni: interleave last CTR encryption round with xoring
Jussi Kivilinna [Sat, 1 Dec 2018 10:21:14 +0000 (12:21 +0200)]
rijndael-aesni: interleave last CTR encryption round with xoring

* cipher/rijndael-aesni.c (do_aesni_ctr_8): Interleave aesenclast
with input xoring.
--

Structure of 'aesenclast' instruction allows reordering last
encryption round and xoring of input block for small ~0.5%
improvement in performance.

Intel i7-4790K @ 4.0 Ghz:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        CTR enc |     0.159 ns/B      6002 MiB/s     0.636 c/B
        CTR dec |     0.159 ns/B      6001 MiB/s     0.636 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoUse explicit_bzero for wipememory
Jussi Kivilinna [Tue, 13 Nov 2018 20:08:50 +0000 (22:08 +0200)]
Use explicit_bzero for wipememory

* configure.ac (AC_CHECK_FUNCS): Check for 'explicit_bzero'.
* src/g10lib.h (wipememory2): Use _gcry_fast_wipememory if _SET is
zero.
(_gcry_fast_wipememory): New.
(_gcry_wipememory2): Rename to...
(_gcry_fast_wipememory2): ...this.
* src/misc.c (_gcry_wipememory): New.
(_gcry_wipememory2): Rename to...
(_gcry_fast_wipememory2): ...this.
(_gcry_fast_wipememory2) [HAVE_EXPLICIT_BZERO]: Use explicit_bzero if
SET is zero.
(_gcry_burn_stack): Use _gcry_fast_wipememory.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoAdd clang target pragma for mixed C/assembly x86-64 implementations
Jussi Kivilinna [Tue, 20 Nov 2018 19:16:08 +0000 (21:16 +0200)]
Add clang target pragma for mixed C/assembly x86-64 implementations

* cipher/cipher-gcm-intel-pclmul.c: Add target 'no-sse' attribute
pragma for clang.
* cipher/crc-intel-pclmul.c: Ditto.
* cipher/rijndael-aesni.c: Ditto.
* cipher/rijndael-ssse3-amd64.c: Ditto.
* cipher/sha1-intel-shaext.c: Ditto.
* cipher/sha256-intel-shaext.c: Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoOptimizations for AES-NI OCB
Jussi Kivilinna [Tue, 20 Nov 2018 19:16:08 +0000 (21:16 +0200)]
Optimizations for AES-NI OCB

* cipher/cipher-internal.h (gcry_cipher_handle): New pre-computed OCB
values L0L1 and L0L1L0; Swap dimensions for OCB L table.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Setup L0L1 and
L0L1L0 values.
(ocb_crypt): Process input in 24KiB chunks for better cache locality
for checksumming.
* cipher/rijndael-aesni.c (ALWAYS_INLINE): New macro for always
inlining functions, change all functions with 'inline' to use
ALWAYS_INLINE.
(NO_INLINE): New macro.
(aesni_prepare_2_6_variable, aesni_prepare_7_15_variable): Rename to...
(aesni_prepare_2_7_variable, aesni_prepare_8_15_variable): ...these and
adjust accordingly (xmm7 moved from *_7_15 to *_2_7).
(aesni_prepare_2_6, aesni_prepare_7_15): Rename to...
(aesni_prepare_2_7, aesni_prepare_8_15): ...these and adjust
accordingly.
(aesni_cleanup_2_6, aesni_cleanup_7_15): Rename to...
(aesni_cleanup_2_7, aesni_cleanup_8_15): ...these and adjust
accordingly.
(aesni_ocb_checksum): New.
(aesni_ocb_enc, aesni_ocb_dec): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0; Do checksum
calculation as separate pass instead of inline; Use NO_INLINE.
(_gcry_aes_aesni_ocb_auth): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0.
* cipher/rijndael-internal.h (RIJNDAEL_context_s) [USE_AESNI]: Add
'use_avx2' and 'use_avx'.
* cipher/rijndael.c (do_setkey) [USE_AESNI]: Set 'use_avx2' if
Intel AVX2 HW feature is available and 'use_avx' if Intel AVX HW
feature is available.
* tests/basic.c (do_check_ocb_cipher): New test vector; increase
size of temporary buffers for new test vector.
(check_ocb_cipher_largebuf_split): Make test plaintext non-uniform
for better checksum testing.
(check_ocb_cipher_checksum): New.
(check_ocb_cipher_largebuf): Call check_ocb_cipher_checksum.
(check_ocb_cipher): New expected tags for check_ocb_cipher_largebuf
test runs.
--

Benchmark on Haswell i7-4970k @ 4.0Ghz:

Before:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        OCB enc |     0.175 ns/B      5436 MiB/s     0.702 c/B
        OCB dec |     0.184 ns/B      5184 MiB/s     0.736 c/B
       OCB auth |     0.156 ns/B      6097 MiB/s     0.626 c/B

After (enc +2% faster, dec +7% faster):
        OCB enc |     0.172 ns/B      5547 MiB/s     0.688 c/B
        OCB dec |     0.171 ns/B      5582 MiB/s     0.683 c/B
       OCB auth |     0.156 ns/B      6097 MiB/s     0.626 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agodoc: Fix library initialization examples
Andreas Metzler [Sun, 18 Nov 2018 15:01:21 +0000 (16:01 +0100)]
doc: Fix library initialization examples

Signed-off-by: Andreas Metzler <ametzler@bebt.de>
10 months agorandom: Initialize variable as requested by valgrind
Werner Koch [Wed, 14 Nov 2018 13:14:23 +0000 (14:14 +0100)]
random: Initialize variable as requested by valgrind

random/jitterentropy-base.c: Init.
--

The variable ec does not need initialization for proper functioning of
the analyzer code. However, valgrind complains about the uninitialized
variable. Thus, initialize it.

Original-repo: https://github.com/smuellerDD/jitterentropy-library.git
Original-commit: 9048af7f06fc1488904f54852e0a2f8da45a4745
Original-Author:: Stephan Mueller <smueller@chronox.de>
Original-Date: Sun, 15 Jul 2018 19:14:02 +0200
Reported-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agolibgcrypt.m4: Prefer gpgrt-config to SYSROOT support.
NIIBE Yutaka [Tue, 13 Nov 2018 01:30:39 +0000 (10:30 +0900)]
libgcrypt.m4: Prefer gpgrt-config to SYSROOT support.

* libgcrypt.m4: Move SYSROOT support after check of GPGRT_CONFIG.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Update autogen.rc.
NIIBE Yutaka [Tue, 13 Nov 2018 00:36:37 +0000 (09:36 +0900)]
build: Update autogen.rc.

* autogen.rc: Remove obsolete --with-gpg-error-prefix option.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agoFix 'variable may be used uninitialized' warning for CTR mode
Jussi Kivilinna [Wed, 7 Nov 2018 17:12:29 +0000 (19:12 +0200)]
Fix 'variable may be used uninitialized' warning for CTR mode

* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Set N to BLOCKSIZE
before counter loop.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoFix inlining of ocb_get_l for x86 AES implementations
Jussi Kivilinna [Tue, 6 Nov 2018 18:27:34 +0000 (20:27 +0200)]
Fix inlining of ocb_get_l for x86 AES implementations

* cipher/rijndael-aesni.c (aes_ocb_get_l): New.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Use
'aes_ocb_get_l'.
* cipher/rijndael-ssse3-amd4.c (aes_ocb_get_l): New.
(ssse3_ocb_enc, ssse3_ocb_dec, _gcry_aes_ssse3_ocb_auth): Use
'aes_ocb_get_l'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agostdmem: free: only call _gcry_secmem_free if needed
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
stdmem: free: only call _gcry_secmem_free if needed

* src/stdmem.c (_gcry_private_free): Check if memory is secure before
calling _gcry_secmem_free to avoid unnecessarily taking secmem lock.
--

Unnecessarily taking secmem lock on non-secure memory can result poor
performance on multi-threaded workloads:
  https://lists.gnupg.org/pipermail/gcrypt-devel/2018-August/004535.html

Reported-by: Christian Grothoff <grothoff@gnunet.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agosecmem: fix potential memory visibility issue
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
secmem: fix potential memory visibility issue

* configure.ac (gcry_cv_have_sync_synchronize): New check.
* src/secmem.c (pooldesc_s): Make next pointer volatile.
(memory_barrier): New.
(_gcry_secmem_malloc_internal): Insert memory barrier between
pool->next and mainpool.next assigments.
(_gcry_private_is_secure): Update comments.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agowipememory: use memset for non-constant length or large buffer wipes
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
wipememory: use memset for non-constant length or large buffer wipes

* src/g10lib.h (CONSTANT_P): New.
(_gcry_wipememory2): New prototype.
(wipememory2): Use _gcry_wipememory2 if _len not constant expression or
lenght is larger than 64 bytes.
(FASTWIPE_T, FASTWIPE_MULT, fast_wipememory2_unaligned_head): Remove.
(fast_wipememory2): Always handle buffer as unaligned.
* src/misc.c (__gcry_burn_stack): Move memset_ptr variable to...
(memset_ptr): ... here. New.
(_gcry_wipememory2): New.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoChange buf_cpy and buf_xor* functions to use buf_put/buf_get helpers
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
Change buf_cpy and buf_xor* functions to use buf_put/buf_get helpers

* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS)
(bufhelp_int_s, buf_xor_1): Remove.
(buf_cpy, buf_xor, buf_xor_2dst, buf_xor_n_copy_2): Use
buf_put/buf_get helpers to handle unaligned memory accesses.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agorijndael: fix unused parameter warning
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
rijndael: fix unused parameter warning

* cipher/rijndael.c (do_setkey): Silence unused 'hd' warning.
--

This commit fixes "warning: unused parameter 'hd'" warning seen on
architectures that do not have alternative AES implementations.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agompi/longlong.h: enable inline assembly for powerpc64
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
mpi/longlong.h: enable inline assembly for powerpc64

* mpi/longlong.h [__powerpc__ && W_TYPE_SIZE == 64]: Remove '#if 0'.
--

PowerPC64 inline assembly was tested on QEMU ('make check' pass).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoChange remaining users of _gcry_fips_mode to use fips_mode
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
Change remaining users of _gcry_fips_mode to use fips_mode

* src/fips.c (_gcry_fips_mode): Remove.
(_gcry_enforced_fips_mode, _gcry_inactivate_fips_mode)
(_gcry_is_fips_mode_inactive): Use fips_mode.
* src/g10lib.h (_gcry_fips_mode): Remove.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoaarch64: mpi: Distribute the header file as a part of source.
NIIBE Yutaka [Fri, 2 Nov 2018 09:54:02 +0000 (18:54 +0900)]
aarch64: mpi: Distribute the header file as a part of source.

* mpi/Makefile.am (EXTRA_libmpi_la_SOURCES): Add asm-common-aarch64.h.

--

Fixes-commit: ec0a2f25c0f64a7b65b373508ce9081e10461965
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Fix GCRYPT_HWF_MODULES.
NIIBE Yutaka [Fri, 2 Nov 2018 04:51:40 +0000 (13:51 +0900)]
build: Fix GCRYPT_HWF_MODULES.

* configure.ac (GCRYPT_HWF_MODULES): Add libgcrypt_la- prefix.

--

Before this change "make distcheck" fails because
src/.deps/hwf-x86.Plo remains.  Note that the distclean entry for the
file is libgcrypt_la-hwf-x86.Plo.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Update gpg-error.m4 and libgcrypt.m4.
NIIBE Yutaka [Fri, 2 Nov 2018 03:06:11 +0000 (12:06 +0900)]
build: Update gpg-error.m4 and libgcrypt.m4.

* m4/gpg-error.m4: Update to 2018-11-02.
* src/libgrypt.m4: Add AC_MSG_NOTICE.
Bump the version date.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Update gpg-error.m4 and ksba.m4.
NIIBE Yutaka [Mon, 29 Oct 2018 03:51:19 +0000 (12:51 +0900)]
build: Update gpg-error.m4 and ksba.m4.

* m4/gpg-error.m4: Update to 2018-10-29.
* src/libgrypt.m4: Follow the change of gpgrt-config.
Bump the version date.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agoFix missing global initialization in fips_is_operational
Jussi Kivilinna [Sat, 27 Oct 2018 12:48:29 +0000 (15:48 +0300)]
Fix missing global initialization in fips_is_operational

* src/g10lib.h (_gcry_global_any_init_done): New extern.
(fips_is_operational): Check for _gcry_global_any_init_done and call
_gcry_global_is_operational.
* src/global.c (any_init_done): Rename to ...
(_gcry_global_any_init_done): ... this and make externally available.
--

Commit b6e6ace324440f564df664e27f8276ef01f76795 "Add fast path for
_gcry_fips_is_operational" inadvertently replaced function call to
_gcry_global_is_operational with call to _gcry_fips_is_operational
in fips_is_operational macro. This can cause libgcrypt to miss
initialization. This patch restores _gcry_global_is_operational
functionality to fips_is_operational macro while keeping fast-path
to reduce call-overhead to gcry_* functions.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoMerge release info from 1.8.4
Werner Koch [Fri, 26 Oct 2018 18:04:44 +0000 (20:04 +0200)]
Merge release info from 1.8.4

--

10 months agorandom: use getrandom() on Linux where available
Daniel Kahn Gillmor [Wed, 5 Sep 2018 14:34:04 +0000 (10:34 -0400)]
random: use getrandom() on Linux where available

* random/rndlinux.c (_gcry_rndlinux_gather_random): use the
getrandom() syscall on Linux if it exists, regardless of what kind of
entropy was requested.

--

This change avoids the serious usability problem of unnecessary
blocking on /dev/random when the kernel's PRNG is already seeded,
without introducing the risk of pulling from an uninitialized PRNG.
It only has an effect on Linux systems with a functioning getrandom()
syscall.  If that syscall is unavailable or fails, it should fall
through to the pre-existing behavior.

GnuPG-bug-id: 3894
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
10 months agorandom: Make sure to re-open /dev/random after a fork
Werner Koch [Fri, 26 Oct 2018 11:22:16 +0000 (13:22 +0200)]
random: Make sure to re-open /dev/random after a fork

* random/rndlinux.c (_gcry_rndlinux_gather_random): Detect fork and
re-open devices.
--

This mitigates about ill-behaving software which has closed the
standard fds but later dups them to /dev/null.

GnuPG-bug-id: 3491
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agoprimes: Avoid leaking bits of the prime test to pageable memory.
Werner Koch [Fri, 26 Oct 2018 10:57:30 +0000 (12:57 +0200)]
primes: Avoid leaking bits of the prime test to pageable memory.

* cipher/primegen.c (gen_prime): Allocate MODS in secure memory.
--

This increases the pressure on the secure memory by about 1400 byte
but given that we can meanwhile increase the size of the secmem area,
this is acceptable.

GnuPG-bug-id: 3848
Signed-off-by: Werner Koch <wk@gnupg.org>