libgcrypt.git
4 months agoAdd CFI unwind assembly directives for 64-bit ARM assembly
Jussi Kivilinna [Fri, 26 Apr 2019 16:29:08 +0000 (19:29 +0300)]
Add CFI unwind assembly directives for 64-bit ARM assembly

* cipher/asm-common-aarch64.h (CFI_STARTPROC, CFI_ENDPROC)
(CFI_REMEMBER_STATE, CFI_RESTORE_STATE, CFI_ADJUST_CFA_OFFSET)
(CFI_REL_OFFSET, CFI_DEF_CFA_REGISTER, CFI_REGISTER, CFI_RESTORE)
(DW_REGNO_SP, DW_SLEB128_7BIT, DW_SLEB128_28BIT, CFI_CFA_ON_STACK)
(CFI_REG_ON_STACK): New.
* cipher/camellia-aarch64.S: Add CFI directives.
* cipher/chacha20-aarch64.S: Add CFI directives.
* cipher/cipher-gcm-armv8-aarch64-ce.S: Add CFI directives.
* cipher/crc-armv8-aarch64-ce.S: Add CFI directives.
* cipher/rijndael-aarch64.S: Add CFI directives.
* cipher/rijndael-armv8-aarch64-ce.S: Add CFI directives.
* cipher/sha1-armv8-aarch64-ce.S: Add CFI directives.
* cipher/sha256-armv8-aarch64-ce.S: Add CFI directives.
* cipher/twofish-aarch64.S: Add CFI directives.
* mpi/aarch64/mpih-add1.S: Add CFI directives.
* mpi/aarch64/mpih-mul1.S: Add CFI directives.
* mpi/aarch64/mpih-mul2.S: Add CFI directives.
* mpi/aarch64/mpih-mul3.S: Add CFI directives.
* mpi/aarch64/mpih-sub1.S: Add CFI directives.
* mpi/asm-common-aarch64.h: Include "../cipher/asm-common-aarch64.h".
(ELF): Remove.
--

This commit adds CFI directives that add DWARF unwinding information for
debugger to backtrace when executing code from 64-bit ARM assembly files.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoAdd 64-bit ARMv8/CE PMULL implementation of CRC
Jussi Kivilinna [Fri, 26 Apr 2019 16:28:11 +0000 (19:28 +0300)]
Add 64-bit ARMv8/CE PMULL implementation of CRC

* cipher/Makefile.am: Add 'crc-armv8-ce.c' and
'crc-armv8-aarch64-ce.S'.
* cipher/asm-common-aarch64.h [HAVE_GCC_ASM_CFI_DIRECTIVES]: Add CFI
helper macros.
* cipher/crc-armv8-aarch64-ce.S: New.
* cipher/crc-armv8-ce.c: New.
* cipher/crc.c (USE_ARM_PMULL): New.
(CRC_CONTEXT) [USE_ARM_PMULL]: Add 'use_pmull'.
[USE_ARM_PMULL] (_gcry_crc32_armv8_ce_pmull)
(_gcry_crc24rfc2440_armv8_ce_pmull): New prototypes.
(crc32_init, crc32rfc1510_init, crc24rfc2440_init): Enable ARM PMULL
implementations if supported by HW features.
(crc32_write, crc24rfc2440_write) [USE_ARM_PMULL]: Use ARM PMULL
implementations if enabled.
* configure.ac: Add 'crc-armv8-ce.lo' and 'crc-armv8-aarch64-ce.lo'.
--

Benchmark on Cortex-A53 (at 1104 Mhz):

Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 CRC32          |      2.89 ns/B     330.2 MiB/s      3.19 c/B
 CRC32RFC1510   |      2.89 ns/B     330.2 MiB/s      3.19 c/B
 CRC24RFC2440   |      2.72 ns/B     350.8 MiB/s      3.00 c/B

After (crc32 ~8.4x faster, crc24 ~6.8x faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 CRC32          |     0.341 ns/B      2796 MiB/s     0.377 c/B
 CRC32RFC1510   |     0.342 ns/B      2792 MiB/s     0.377 c/B
 CRC24RFC2440   |     0.398 ns/B      2396 MiB/s     0.439 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agompi: make stack unwinding work at i386 mpi functions
Jussi Kivilinna [Thu, 18 Apr 2019 16:23:26 +0000 (19:23 +0300)]
mpi: make stack unwinding work at i386 mpi functions

* mpi/i386/syntax.h: Include 'config.h'.
(CFI_STARTPROC, CFI_ENDPROC, CFI_ADJUST_CFA_OFFSET, CFI_REL_OFFSET)
(CFI_RESTORE, CFI_PUSH, CFI_POP): New.
* mpi/i386/mpih-add1.S: Add CFI directives.
* mpi/i386/mpih-lshift.S: Add CFI directives.
* mpi/i386/mpih-mul1.S: Add CFI directives.
* mpi/i386/mpih-mul2.S: Add CFI directives.
* mpi/i386/mpih-mul3.S: Add CFI directives.
* mpi/i386/mpih-rshift.S: Add CFI directives.
* mpi/i386/mpih-sub1.S: Add CFI directives.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agohwf-x86: make stack unwinding work at i386 cpuid functions
Jussi Kivilinna [Wed, 17 Apr 2019 21:20:42 +0000 (00:20 +0300)]
hwf-x86: make stack unwinding work at i386 cpuid functions

* src/hwf-x86.c (FORCE_FUNC_FRAME_POINTER): New.
[__i386__] (is_cpuid_available): Force use of stack frame pointer as
inline assembly modifies stack register; Add 'memory' constraint for
inline assembly.
[__i386__] (get_cpuid): Avoid push/pop instruction when preserving
%ebx register over cpuid.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoLimit and document Blowfish key lengths to 8-576 bits
Jussi Kivilinna [Thu, 18 Apr 2019 15:53:35 +0000 (18:53 +0300)]
Limit and document Blowfish key lengths to 8-576 bits

* cipher/blowfish.c (BLOWFISH_KEY_MIN_BITS)
(BLOWFISH_KEY_MAX_BITS): New.
(do_bf_setkey): Check input key length to MIN_BITS and MAX_BITS.
* doc/gcrypt.texi: Update supported Blowfish key lengths.
* tests/basic.c (check_ecb_cipher): New, with Blowfish test vectors
for different key lengths.
(check_cipher_modes): Call 'check_ecb_cipher'.
--

As noted by Peter Wu, Blowfish cipher implementation already supports key
lengths 8 to 576 bits [1]. This change updates documentation to reflect
that and adds new test vectors to check handling of different key lengths.

[1] https://lists.gnupg.org/pipermail/gcrypt-devel/2019-April/004680.html

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 months agoAdd CFI unwind assembly directives for AMD64 assembly
Jussi Kivilinna [Mon, 15 Apr 2019 16:46:53 +0000 (19:46 +0300)]
Add CFI unwind assembly directives for AMD64 assembly

* configure.ac (gcry_cv_gcc_asm_cfi_directives): New.
* cipher/asm-common-amd64.h (ADD_RIP, CFI_STARTPROC, CFI_ENDPROC)
(CFI_REMEMBER_STATE, CFI_RESTORE_STATE, CFI_ADJUST_CFA_OFFSET)
(CFI_REL_OFFSET, CFI_DEF_CFA_REGISTER, CFI_REGISTER, CFI_RESTORE)
(CFI_PUSH, CFI_POP, CFI_POP_TMP_REG, CFI_LEAVE, DW_REGNO)
(DW_SLEB128_7BIT, DW_SLEB128_28BIT, CFI_CFA_ON_STACK)
(CFI_REG_ON_STACK): New.
(ENTER_SYSV_FUNCPARAMS_0_4, EXIT_SYSV_FUNC): Add CFI directives.
* cipher/arcfour-amd64.S: Add CFI directives.
* cipher/blake2b-amd64-avx2.S: Add CFI directives.
* cipher/blake2s-amd64-avx.S: Add CFI directives.
* cipher/blowfish-amd64.S: Add CFI directives.
* cipher/camellia-aesni-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/camellia-aesni-avx2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/cast5-amd64.S: Add CFI directives.
* cipher/chacha20-amd64-avx2.S: Add CFI directives.
* cipher/chacha20-amd64-ssse3.S: Add CFI directives.
* cipher/des-amd64.S: Add CFI directives.
* cipher/rijndael-amd64.S: Add CFI directives.
* cipher/rijndael-ssse3-amd64-asm.S: Add CFI directives.
* cipher/salsa20-amd64.S: Add CFI directives; Use 'asm-common-amd64.h'.
* cipher/serpent-avx2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/serpent-sse2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-avx-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-avx2-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha1-ssse3-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha256-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha256-avx2-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha256-ssse3-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha512-avx-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha512-avx2-bmi2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/sha512-ssse3-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/twofish-amd64.S: Add CFI directives.
* cipher/twofish-avx2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* cipher/whirlpool-sse2-amd64.S: Add CFI directives; Use
'asm-common-amd64.h'.
* mpi/amd64/func_abi.h: Include 'config.h'.
(CFI_STARTPROC, CFI_ENDPROC, CFI_ADJUST_CFA_OFFSET, CFI_REL_OFFSET)
(CFI_RESTORE, CFI_PUSH, CFI_POP): New.
(FUNC_ENTRY, FUNC_EXIT): Add CFI directives.
--

This commit adds CFI directives that add DWARF unwinding information for
debugger to backtrace when executing code from AMD64 assembly files.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agotwofish-amd64: do not use xchg instruction
Jussi Kivilinna [Mon, 15 Apr 2019 19:09:24 +0000 (22:09 +0300)]
twofish-amd64: do not use xchg instruction

* cipher/twofish-amd64.S (g1g2_3): Swap ab and cd registers using
'movq' instructions instead of 'xchgq'.
--

Avoiding xchg instruction improves three block parallel performance
by ~3% on Intel Haswell.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoUse FreeBSD's elf_aux_info for detecting ARM HW features
Jussi Kivilinna [Tue, 9 Apr 2019 17:04:19 +0000 (20:04 +0300)]
Use FreeBSD's elf_aux_info for detecting ARM HW features

* configure.ac: Add function check for 'elf_aux_info'.
* src/hwf-arm.c [HAVE_ELF_AUX_INFO]: Include 'sys/auxv.h'.
[HAVE_ELF_AUX_INFO && !HAVE_GETAUXVAL] (HAVE_GETAUXVAL)
(getauxval): New.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoUse getauxval system function for detecting ARM HW features
Jussi Kivilinna [Mon, 8 Apr 2019 17:44:08 +0000 (20:44 +0300)]
Use getauxval system function for detecting ARM HW features

* configure.ac: Add header check for 'sys/auxv.h'; Add function check
for 'getauxval'.
* src/hwf-arm.c [HAVE_SYS_AUXV_H && HAVE_GETAUXVAL]: Include
'sys/auxv.h'.
(HAS_SYS_AT_HWCAP): Enable AT_HWCAP if have 'getauxval' in addition of
__linux__.
(AT_HWCAP, AT_HWCAP2, HWCAP_NEON, HWCAP2_AES, HWCAP2_PMULL)
(HWCAP2_SHA1, HWCAP2_SHA2, HWCAP_ASIMD, HWCAP_AES)
(HWCAP_PMULL, HWCAP_SHA1, HWCAP_SHA2): Define these macros only if not
already defined.
(get_hwcap) [HAVE_SYS_AUXV_H && HAVE_GETAUXVAL]: Use 'getauxval' to
fetch HW capability flags.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoDisable SM3 in FIPS mode
Jussi Kivilinna [Mon, 8 Apr 2019 14:32:36 +0000 (17:32 +0300)]
Disable SM3 in FIPS mode

* cipher/sm3.h (_gcry_digest_spec_sm3): Set flags.fips to zero.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoTune SHA-512/AVX2 and SHA-256/AVX2 implementations
Jussi Kivilinna [Sun, 7 Apr 2019 14:53:19 +0000 (17:53 +0300)]
Tune SHA-512/AVX2 and SHA-256/AVX2 implementations

* cipher/sha256-avx2-bmi2-amd64.S (ONE_ROUND_PART1, ONE_ROUND_PART2)
(ONE_ROUND): New round function.
(FOUR_ROUNDS_AND_SCHED, FOUR_ROUNDS): Use new round function.
(_gcry_sha256_transform_amd64_avx2): Exit early if number of blocks is
zero; Writing XFER to stack earlier and handle XREF writing in
FOUR_ROUNDS_AND_SCHED.
* cipher/sha512-avx2-bmi2-amd64.S (MASK_YMM_LO, MASK_YMM_LOx): New.
(ONE_ROUND_PART1, ONE_ROUND_PART2, ONE_ROUND): New round function.
(FOUR_ROUNDS_AND_SCHED, FOUR_ROUNDS): Use new round function.
(_gcry_sha512_transform_amd64_avx2): Writing XFER to stack earlier and
handle XREF writing in FOUR_ROUNDS_AND_SCHED.
--

Benchmark on Intel Haswell (4.0Ghz):

Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      2.17 ns/B     439.0 MiB/s      8.68 c/B
 SHA512         |      1.56 ns/B     612.5 MiB/s      6.23 c/B

After (~4-6% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA256         |      2.05 ns/B     465.9 MiB/s      8.18 c/B
 SHA512         |      1.49 ns/B     640.3 MiB/s      5.95 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd SHA512/224 and SHA512/256 algorithms
Jussi Kivilinna [Fri, 5 Apr 2019 17:10:32 +0000 (20:10 +0300)]
Add SHA512/224 and SHA512/256 algorithms

* cipher/mac-hmac.c (map_mac_algo_to_md): Add mapping for SHA512/224
and SHA512/256.
(_gcry_mac_type_spec_hmac_sha512_256)
(_gcry_mac_type_spec_hmac_sha512_224): New.
* cipher/mac-internal.h (_gcry_mac_type_spec_hmac_sha512_256)
(_gcry_mac_type_spec_hmac_sha512_224): New.
* cipher/mac.c (mac_list, mac_list_algo101): Add SHA512/224 and
SHA512/256.
* cipher/md.c (digest_list, digest_list_algo301)
(prepare_macpads): Ditto.
* cipher/sha512.c (run_selftests): Ditto.
(sha512_init_common): Move common initialization here.
(sha512_init, sha384_init): Use common initialization function.
(sha512_224_init, sha512_256_init, _gcry_sha512_224_hash_buffer)
(_gcry_sha512_224_hash_buffers, _gcry_sha512_256_hash_buffer)
(_gcry_sha512_256_hash_buffers, selftests_sha512_224)
(selftests_sha512_256, sha512_224_asn, oid_spec_sha512_224)
(_gcry_digest_spec_sha512_224, sha512_256_asn, oid_spec_sha512_256)
(_gcry_digest_spec_sha512_256): New.
* doc/gcrypt.texi: Add SHA512/224 and SHA512/256; Add missing
HMAC-BLAKE2s and HMAC-BLAKE2b.
* src/cipher.h (_gcry_digest_spec_sha512_224)
(_gcry_digest_spec_sha512_256): New.
* src/gcrypt.h.in (GCRY_MD_SHA512_256, GCRY_MD_SHA512_224): New.
(GCRY_MAC_HMAC_SHA512_256, GCRY_MAC_HMAC_SHA512_224): New.
* tests/basic.c (check_digests): Add SHA512/224 and SHA512/256
test vectors.
--

This change adds truncated SHA512/224 and SHA512/256 algorithms
specified in FIPS 180-4.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoRemove extra buffer flush at begining of digest final functions
Jussi Kivilinna [Fri, 5 Apr 2019 15:48:13 +0000 (18:48 +0300)]
Remove extra buffer flush at begining of digest final functions

* cipher/md2.c (md2_final): Remove _gcry_md_block_write flush call
from entry.
* cipher/md4.c (md4_final): Ditto.
* cipher/md5.c (md5_final): Ditto.
* cipher/rmd160.c (rmd160_final): Ditto.
* cipher/sha1.c (sha1_final): Ditto.
* cipher/sha256.c (sha256_final): Ditto.
* cipher/sha512.c (sha512_final): Ditto.
* cipher/sm3.c (sm3_final): Ditto.
* cipher/stribog.c (stribog_final): Ditto.
* cipher/tiger.c (tiger_final): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoOptimizations for digest final functions
Jussi Kivilinna [Fri, 5 Apr 2019 15:52:47 +0000 (18:52 +0300)]
Optimizations for digest final functions

* cipher/md4.c (md4_final): Avoid byte-by-byte buffer setting when
padding; Merge extra and last block processing.
* cipher/md5.c (md5_final): Ditto.
* cipher/rmd160.c (rmd160_final): Ditto.
* cipher/sha1.c (sha1_final): Ditto.
* cipher/sha256.c (sha256_final): Ditto.
* cipher/sm3.c (sm3_final): Ditto.
* cipher/tiger.c (tiger_final): Ditto.
* cipher/sha512.c (sha512_final): Avoid byte-by-byte buffer setting
when padding.
* cipher/stribog.c (stribog_final): Ditto.
* cipher/whirlpool.c (whirlpool_final): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agotests/basic: add hash test for small block sizes
Jussi Kivilinna [Fri, 5 Apr 2019 15:19:45 +0000 (18:19 +0300)]
tests/basic: add hash test for small block sizes

* tests/basic.c (check_one_md): Compare hashing buffers sizes from 1 to
129 as full buffer input and byte-by-byte input.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoBurn stack in transform functions for SHA2 AMD64 implementations
Jussi Kivilinna [Fri, 5 Apr 2019 14:38:39 +0000 (17:38 +0300)]
Burn stack in transform functions for SHA2 AMD64 implementations

* cipher/sha256-avx-amd64.S: Burn stack inside transform functions.
* cipher/sha256-avx2-bmi2-amd64.S: Ditto.
* cipher/sha256-ssse3-amd64.S: Ditto.
* cipher/sha512-avx-amd64.S: Ditto.
* cipher/sha512-avx2-bmi2-amd64.S: Ditto.
* cipher/sha512-ssse3-amd64.S: Ditto.
--

This change reduces per call overhead for SHA256 & SHA512.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoBurn stack in transform functions for SHA1 AMD64 implementations
Jussi Kivilinna [Fri, 5 Apr 2019 14:37:42 +0000 (17:37 +0300)]
Burn stack in transform functions for SHA1 AMD64 implementations

* cipher/sha1-avx-amd64.S: Burn stack inside transform functions.
* cipher/sha1-avx-bmi2-amd64.S: Ditto.
* cipher/sha1-avx2-bmi2-amd64.S: Ditto.
* cipher/sha1-ssse3-amd64.S: Ditto.
--

This change reduces per call overhead for SHA1.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd AVX2/BMI2 implementation of SHA1
Jussi Kivilinna [Fri, 5 Apr 2019 14:39:22 +0000 (17:39 +0300)]
Add AVX2/BMI2 implementation of SHA1

* cipher/Makefile.am: Add 'sha1-avx2-bmi2-amd64.S'.
* cipher/hash-common.h (MD_BLOCK_CTX_BUFFER_SIZE): New.
(gcry_md_block_ctx): Change buffer length to MD_BLOCK_CTX_BUFFER_SIZE.
* cipher/sha1-avx-amd64.S: Add missing .size for transform function.
* cipher/sha1-ssse3-amd64.S: Add missing .size for transform function.
* cipher/sha1-avx-bmi2-amd64.S: Add missing .size for transform
function; Tweak implementation for small ~1% speed increase.
* cipher/sha1-avx2-bmi2-amd64.S: New.
* cipher/sha1.c (USE_AVX2, _gcry_sha1_transform_amd64_avx2_bmi2)
(do_sha1_transform_amd64_avx2_bmi2): New.
(sha1_init) [USE_AVX2]: Enable AVX2 implementation if supported by
HW features.
(sha1_final): Merge processing of two last blocks when extra block is
needed.
--

Benchmarks on Intel Haswell (4.0 Ghz):

Before (AVX/BMI2):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.970 ns/B     983.2 MiB/s      3.88 c/B

After (AVX/BMI2, ~1% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.960 ns/B     993.1 MiB/s      3.84 c/B

After (AVX2/BMI2, ~9% faster):
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |     0.890 ns/B      1071 MiB/s      3.56 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoblowfish: add three rounds parallel handling to generic C implementation
Jussi Kivilinna [Sun, 31 Mar 2019 15:30:25 +0000 (18:30 +0300)]
blowfish: add three rounds parallel handling to generic C implementation

* cipher/blowfish.c (BLOWFISH_ROUNDS): Remove.
[BLOWFISH_ROUNDS != 16] (function_F): Remove.
(F): Replace big-endian and little-endian version with single
endian-neutral version.
(R3, do_encrypt_3, do_decrypt_3): New.
(_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec): Use new three block functions.
--

Benchmark on aarch64 (cortex-a53, 816 Mhz):

Before:
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     29.58 ns/B     32.24 MiB/s     24.13 c/B
        CFB dec |     33.38 ns/B     28.57 MiB/s     27.24 c/B
        CTR enc |     34.18 ns/B     27.90 MiB/s     27.89 c/B
After (~60%-70% faster):
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     18.18 ns/B     52.45 MiB/s     14.84 c/B
        CFB dec |     19.67 ns/B     48.50 MiB/s     16.05 c/B
        CTR enc |     19.77 ns/B     48.25 MiB/s     16.13 c/B

Benchmark on i386 (haswell, 4000 Mhz):

Before:
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      6.10 ns/B     156.4 MiB/s     24.39 c/B
        CFB dec |      6.39 ns/B     149.2 MiB/s     25.56 c/B
        CTR enc |      6.73 ns/B     141.6 MiB/s     26.93 c/B
After (~80% faster):
 BLOWFISH       |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      3.46 ns/B     275.5 MiB/s     13.85 c/B
        CFB dec |      3.53 ns/B     270.4 MiB/s     14.11 c/B
        CTR enc |      3.56 ns/B     268.0 MiB/s     14.23 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agocast5: add three rounds parallel handling to generic C implementation
Jussi Kivilinna [Sun, 31 Mar 2019 15:26:58 +0000 (18:26 +0300)]
cast5: add three rounds parallel handling to generic C implementation

* cipher/cast5.c (do_encrypt_block_3, do_decrypt_block_3): New.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec): Use
new three block functions.
--

Benchmark on aarch64 (cortex-a53, 816 Mhz):

Before:
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     35.24 ns/B     27.07 MiB/s     28.75 c/B
        CFB dec |     34.62 ns/B     27.54 MiB/s     28.25 c/B
        CTR enc |     35.39 ns/B     26.95 MiB/s     28.88 c/B
After (~40%-50% faster):
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     23.05 ns/B     41.38 MiB/s     18.81 c/B
        CFB dec |     24.49 ns/B     38.94 MiB/s     19.98 c/B
        CTR dec |     24.57 ns/B     38.82 MiB/s     20.05 c/B

Benchmark on i386 (haswell, 4000 Mhz):

Before:
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      6.92 ns/B     137.7 MiB/s     27.69 c/B
        CFB dec |      6.83 ns/B     139.7 MiB/s     27.32 c/B
        CTR enc |      7.01 ns/B     136.1 MiB/s     28.03 c/B
After (~70% faster):
 CAST5          |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |      3.97 ns/B     240.1 MiB/s     15.89 c/B
        CFB dec |      3.96 ns/B     241.0 MiB/s     15.83 c/B
        CTR enc |      4.01 ns/B     237.8 MiB/s     16.04 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agocast5: read Kr four blocks at time and shift for current round
Jussi Kivilinna [Sun, 31 Mar 2019 15:25:04 +0000 (18:25 +0300)]
cast5: read Kr four blocks at time and shift for current round

* cipher/cast5.c (do_encrypt_block, do_decrypt_block): Read Kr as
32-bit words instead of bytes and shift value for each round.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd helper function for adding value to cipher block
Jussi Kivilinna [Sun, 31 Mar 2019 15:21:20 +0000 (18:21 +0300)]
Add helper function for adding value to cipher block

* cipher/cipher-internal.h (cipher_block_add): New.
* cipher/blowfish.c (_gcry_blowfish_ctr_enc): Use new helper function
for CTR block increment.
* cipher/camellia-glue.c (_gcry_camellia_ctr_enc): Ditto.
* cipher/cast5.c (_gcry_cast5_ctr_enc): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/des.c (_gcry_3des_ctr_enc): Ditto.
* cipher/rijndael.c (_gcry_aes_ctr_enc): Ditto.
* cipher/serpent.c (_gcry_serpent_ctr_enc): Ditto.
* cipher/twofish.c (_gcry_twofish_ctr_enc): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoOptimize OCB set_key and set_nonce
Jussi Kivilinna [Thu, 28 Mar 2019 21:25:52 +0000 (23:25 +0200)]
Optimize OCB set_key and set_nonce

* cipher/cipher-ocb.c (double_block): Change to input/output
host-endian block instead of big-endian buffer.
(double_block_cpy): Remove.
(bit_copy): Use fixed length copy and 'u64' for calculations.
(ocb_get_L_big): Handle block endian conversions for double_block.
(_gcry_cipher_ocb_setkey): Handle block endian conversions for
double_block.
(_gcry_cipher_ocb_set_nonce): Set full length of 'ktop' to zero; Drop
length parameter for bit_copy.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAES-NI/OCB: Optimize last and first key XORing
Jussi Kivilinna [Thu, 28 Mar 2019 18:49:37 +0000 (20:49 +0200)]
AES-NI/OCB: Optimize last and first key XORing

* cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec)
[__x86_64__]: Reorder and mix first and last key XORing with OCB offset
XOR operations.
--

OCB pre-XORing and post-XORing can be mixed and reordered with
first and last round XORing of AES cipher. This commit utilizes
this fact for additional optimization of AES-NI/OCB encryption
and decryption.

Benchmark on Intel Haswell:

Before:
AES       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.174 ns/B      5468 MiB/s     0.697 c/B      3998
  OCB dec |     0.170 ns/B      5617 MiB/s     0.679 c/B      3998

After (enc ~11% faster, dec ~6% faster):
AES       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.157 ns/B      6065 MiB/s     0.629 c/B      3998
  OCB dec |     0.160 ns/B      5956 MiB/s     0.640 c/B      3998

For reference, CTR:
AES       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  CTR enc |     0.157 ns/B      6090 MiB/s     0.626 c/B      3998
  CTR dec |     0.157 ns/B      6092 MiB/s     0.626 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAES-NI/OCB: Perform checksumming inline with encryption
Jussi Kivilinna [Wed, 27 Mar 2019 21:10:31 +0000 (23:10 +0200)]
AES-NI/OCB: Perform checksumming inline with encryption

* cipher/rijndael-aesni.c (aesni_ocb_enc): Remove call to
'aesni_ocb_checksum', instead perform checksumming inline with offset
calculations.
--

This patch reverts the OCB checksumming split for encryption to avoid
performance issue seen on Intel CPUs.

Commit b42de67f34 "Optimizations for AES-NI OCB" changed AES-NI/OCB
implementation perform checksumming as separate pass from encryption
and decryption. While this change improved performance for buffer
sizes 16 to 4096 bytes (buffer sizes used by bench-slope), it
introduced performance anomalia with OCB encryption on Intel
processors. Below is large buffer OCB encryption results on Intel
Haswell. There we can see that with buffer sizes larger than 32 KiB
performance starts dropping. Decryption does not suffer from the same
issue.

 MiB/s                Speed by Data Length (at 2 Ghz)
 2800 +-------------------------------------------------------------+
 2600 |-+  +          +       **.****.****+         +          +  +-|
      |                  **.**           *.****.****.****           |
 2400 |-+            *.**                               *.*****.****|
 2200 |-+         ***                                             +-|
 2000 |-+      *.*                                                +-|
      |       **                                                    |
 1800 |-+   **                                                    +-|
 1600 |-+ *.*                                                     +-|
 1400 |-+**                                                       +-|
      |**                                                           |
 1200 |*+  +          +         +         +         +          +  +-|
 1000 +-------------------------------------------------------------+
         1024       4096      16384     65536    262144     1048576
                           Data Length in Bytes

I've tested and reproduced this issue on Intel Ivy-Bridge, Haswell
and Skylake processors. Same performance drop on large buffers is not
seen on AMD Ryzen. Below is OCB decryption speed plot from Haswell for
reference, showing expected performance curve over increasing buffer
sizes.

 MiB/s                Speed by Data Length (at 2 Ghz)
 2800 +-------------------------------------------------------------+
 2600 |-+  +          +       **.****.****.****.****.****.*****.****|
      |                  **.**                                      |
 2400 |-+            *.**                                         +-|
 2200 |-+         ***                                             +-|
 2000 |-+      *.*                                                +-|
      |       **                                                    |
 1800 |-+   **                                                    +-|
 1600 |-+ *.*                                                     +-|
 1400 |-+**                                                       +-|
      |**                                                           |
 1200 |*+  +          +         +         +         +          +  +-|
 1000 +-------------------------------------------------------------+
         1024       4096      16384     65536    262144     1048576
                           Data Length in Bytes

After this patch, bench-slope shows ~2% reduction on performance on
Intel Haswell:

Before:
 AES      |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.171 ns/B      5581 MiB/s     0.683 c/B      3998

After:
 AES      |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
  OCB enc |     0.174 ns/B      5468 MiB/s     0.697 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAES-NI/OCB: Use stack for temporary storage
Jussi Kivilinna [Wed, 27 Mar 2019 21:50:07 +0000 (23:50 +0200)]
AES-NI/OCB: Use stack for temporary storage

* cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec): Use stack
allocated 'tmpbuf' instead of output buffer as temporary storage.
--

This change gives (very) small improvement for performance (~0.5%) when
output buffer is unaligned.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agotests/basic: add large buffer testing for ciphers
Jussi Kivilinna [Tue, 26 Mar 2019 17:28:50 +0000 (19:28 +0200)]
tests/basic: add large buffer testing for ciphers

* tests/basic.c (check_one_cipher_core): Allocate buffers from heap.
(check_one_cipher): Add testing with large buffer (~65 KiB) in addition
to medium size buffer (~2 KiB).
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agochacha20-poly1305: fix wrong en/decryption on large input buffers
Jussi Kivilinna [Tue, 26 Mar 2019 17:27:00 +0000 (19:27 +0200)]
chacha20-poly1305: fix wrong en/decryption on large input buffers

* cipher/chacha20.c (_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): Correctly use 'currlen' for chacha20
on the non-stitched code path.
--

This patch fixes bug which was introduced by commit:
  "Add stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementations"
  d6330dfb4b0e9fb3f8eef65ea13146060b804a97

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agodoc: add mention about aligning data to cachelines for best performance
Jussi Kivilinna [Sun, 24 Mar 2019 08:49:29 +0000 (10:49 +0200)]
doc: add mention about aligning data to cachelines for best performance

* doc/gcrypt.text: Add mention about aligning data to cachelines for
best performance.
--

GnuPG-bug-id: 2388
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agorandom-drbg: do not use calloc for zero ctr
Jussi Kivilinna [Sun, 24 Mar 2019 08:23:34 +0000 (10:23 +0200)]
random-drbg: do not use calloc for zero ctr

* random/random-drbg.c (DRBG_CTR_NULL_LEN): Move to 'constants'
section.
(drbg_state_s): Remove 'ctr_null' member.
(drbg_ctr_generate): Add 'drbg_ctr_null'.
(drbg_sym_fini, drbg_sym_init): Remove 'drbg->ctr_null' usage.
--

GnuPG-bug-id: 3878
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoAdd ARMv7/NEON accelerated GCM implementation
Jussi Kivilinna [Sat, 23 Mar 2019 14:15:49 +0000 (16:15 +0200)]
Add ARMv7/NEON accelerated GCM implementation

* cipher/Makefile.am: Add 'cipher-gcm-armv7-neon.S'.
* cipher/cipher-gcm-armv7-neon.S: New.
* cipher/cipher-gcm.c [GCM_USE_ARM_NEON] (_gcry_ghash_setup_armv7_neon)
(_gcry_ghash_armv7_neon, ghash_setup_armv7_neon)
(ghash_armv7_neon): New.
(setupM) [GCM_USE_ARM_NEON]: Use armv7/neon implementation if have
HWF_ARM_NEON.
* cipher/cipher-internal.h (GCM_USE_ARM_NEON): New.
--

Benchmark on Cortex-A53 (816 Mhz):

Before:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 GMAC_AES           |     34.81 ns/B     27.40 MiB/s     28.41 c/B

After (3.0x faster):
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 GMAC_AES           |     11.49 ns/B     82.99 MiB/s      9.38 c/B

Reported-by: Yuriy M. Kaminskiy <yumkam@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoUse memset instead of setting buffers byte by byte
Jussi Kivilinna [Thu, 21 Mar 2019 18:43:46 +0000 (20:43 +0200)]
Use memset instead of setting buffers byte by byte

* cipher/cipher-ccm.c (do_cbc_mac): Replace buffer setting loop with memset call.
* cipher/cipher-gcm.c (do_ghash_buf): Ditto.
* cipher/poly1305.c (poly1305_final): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoUse buf_cpy instead of copying buffers byte by byte
Jussi Kivilinna [Thu, 21 Mar 2019 18:42:28 +0000 (20:42 +0200)]
Use buf_cpy instead of copying buffers byte by byte

* cipher/bufhelp.h (buf_cpy): Skip memcpy if length is zero.
* cipher/cipher-ccm.c (do_cbc_mac): Replace buffer copy loops with buf_cpy call.
* cipher/cipher-cmac.c (_gcry_cmac_write): Ditto.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_authenticate): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoReduce overhead on generic hash write function
Jussi Kivilinna [Thu, 21 Mar 2019 17:43:05 +0000 (19:43 +0200)]
Reduce overhead on generic hash write function

* cipher/hash-common.c (_gcry_md_block_write): Remove recursive
function call; Use buf_cpy for copying buffers; Burn stack only once.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agosha1-avx: use vmovdqa instead of movdqa
Jussi Kivilinna [Tue, 19 Mar 2019 20:02:28 +0000 (22:02 +0200)]
sha1-avx: use vmovdqa instead of movdqa

* cipher/sha1-avx-amd64.S: Replace 'movdqa' with 'vmovdqa'.
* cipher/sha1-avx-bmi2-amd64.S: Replace 'movdqa' with 'vmovdqa'.
--

Replace SSE instruction 'movdqa' with AVX instruction 'vmovdqa' as
mixing SSE and AVX instructions can lead to bad performance.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agodoc/gcrypt.texi: update HW feature list
Jussi Kivilinna [Tue, 19 Mar 2019 20:08:37 +0000 (22:08 +0200)]
doc/gcrypt.texi: update HW feature list

* doc/gcrypt.texi: Update FW feature list.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 months agoecc: Adjust debugging output
Daniel Kahn Gillmor [Wed, 20 Mar 2019 01:59:54 +0000 (21:59 -0400)]
ecc: Adjust debugging output

* cipher/ecc.c (ecc_check_secret_key): Adjust debugging output to use
full column titles.

--

Without this change, the debugging headers say "inf" and "nam".  With
this change, the alignment for all columns stay the same, but the
headers say "info" and "name", which are much more legible.

Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
GnuPG-bug-id: 4414

6 months agofips: Only test check_binary_integrity when fips_mode is enabled.
NIIBE Yutaka [Mon, 25 Feb 2019 00:02:59 +0000 (09:02 +0900)]
fips: Only test check_binary_integrity when fips_mode is enabled.

* src/fips.c (_gcry_fips_run_selftests): Check the status of fips_mode
before calling check_binary_integrity.

--

GnuPG-bug-id: 4274
Reported-by: Pedro Monreal
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
7 months agoAdd 2-way path for SSSE3 version of ChaCha20
Jussi Kivilinna [Thu, 7 Feb 2019 18:50:02 +0000 (20:50 +0200)]
Add 2-way path for SSSE3 version of ChaCha20

* cipher/chacha20-amd64-ssse3.S (_gcry_chacha20_amd64_ssse3_blocks1)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1): Add 2-way code paths.
* cipher/chacha20.c (_gcry_chacha20_poly1305_encrypt): Add
preprosessing of 2 blocks with SSSE3.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoDo not precalculate OCB offset L0+L1+L0
Jussi Kivilinna [Sun, 27 Jan 2019 10:55:22 +0000 (12:55 +0200)]
Do not precalculate OCB offset L0+L1+L0

* cipher/cipher-internal.h (gcry_cipher_handle): Remove OCB L0L1L0.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_setkey): Ditto.
* cipher/rijndael-aesni.c (aesni_ocb_enc, aesni_ocb_dec)
(_gcry_aes_aesni_ocb_auth): Replace L0L1L0 use with L1.
--

Patch fixes L0+L1+L0 thinko. This is same as L1 (L0 xor L1 xor L0).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoCalculate OCB L-tables when setting key instead of when setting nonce
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
Calculate OCB L-tables when setting key instead of when setting nonce

* cipher/cipher-internal.h (gcry_cipher_handle): Mark areas of
u_mode.ocb that are and are not cleared by gcry_cipher_reset.
(_gcry_cipher_ocb_setkey): New.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Split
L-table generation to ...
(_gcry_cipher_ocb_setkey): ... this new function.
* cipher/cipher.c (cipher_setkey): Add handling for OCB mode.
(cipher_reset): Do not clear L-values for OCB mode.
--

OCB L-tables do not depend on nonce value, but only on cipher key.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agochacha20-amd64-avx2: optimize output xoring
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
chacha20-amd64-avx2: optimize output xoring

* cipher/chacha20-amd64-avx2.S (STACK_TMP2): Remove.
(transpose_16byte_2x2, xor_src_dst): New.
(BUF_XOR_256_TO_128): Remove.
(_gcry_chaha20_amd64_avx2_blocks8)
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): Replace
BUF_XOR_256_TO_128 with transpose_16byte_2x2/xor_src_dst; Reduce stack
usage; Better interleave chacha20 state merging and output xoring.
--

Benchmark on Intel i7-4790K:

Before:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
     STREAM enc |     0.314 ns/B      3035 MiB/s      1.26 c/B      3998
     STREAM dec |     0.314 ns/B      3037 MiB/s      1.26 c/B      3998
   POLY1305 enc |     0.451 ns/B      2117 MiB/s      1.80 c/B      3998
   POLY1305 dec |     0.441 ns/B      2162 MiB/s      1.76 c/B      3998

After:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte  auto Mhz
     STREAM enc |     0.309 ns/B      3086 MiB/s      1.24 c/B      3998
     STREAM dec |     0.309 ns/B      3083 MiB/s      1.24 c/B      3998
   POLY1305 enc |     0.445 ns/B      2141 MiB/s      1.78 c/B      3998
   POLY1305 dec |     0.436 ns/B      2188 MiB/s      1.74 c/B      3998

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/bench-slope: prevent auto-mhz detection getting stuck
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/bench-slope: prevent auto-mhz detection getting stuck

* cipher/bench-slope.c (bench_ghz, bench_ghz_diff): New static
variables.
(AUTO_GHZ_TARGET_DIFF): New macro.
(do_slope_benchmark): Reduce target auto-mhz accuracy after
repeated failures.
(bench_print_result_csv, bench_print_result_std): Print auto-ghz
different if 1 Mhz or more.
(do_slope_benchmark, bench_print_result_csv, bench_print_result_std)
(bench_print_result): Remove 'bench_ghz' parameter.
(cipher_bench_one, hash_bench_one, mac_bench_one)
(kdf_bench_one): Remove 'bench_ghz' variable.
--

This patch prevents auto-mhz detection getting stuck on systems with
high load or unstable CPU frequency.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/bench-slope: add missing cipher context reset
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/bench-slope: add missing cipher context reset

* tests/bench-slope.c (bench_encrypt_do_bench)
(bench_decrypt_do_bench): Add call to 'gcry_cipher_reset'.
--

Some non-AEAD results were negativily affected by missing state
reset (~1% for aesni-ctr and chacha20-stream).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoAdd stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementations
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
Add stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementations

* cipher/asm-poly1305-amd64.h: New.
* cipher/Makefile.am: Add 'asm-poly1305-amd64.h'.
* cipher/chacha20-amd64-avx2.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New.
* cipher/chacha20-amd64-ssse3.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1): New.
* cipher/chacha20.c (_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1)
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New prototypes.
(chacha20_encrypt_stream): Split tail to...
(do_chacha20_encrypt_stream_tail): ... new function.
(_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New.
* cipher/cipher-internal.h (_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New prototypes.
* cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt): Call
'_gcry_chacha20_poly1305_encrypt' if cipher is ChaCha20.
(_gcry_cipher_poly1305_decrypt): Call
'_gcry_chacha20_poly1305_decrypt' if cipher is ChaCha20.
* cipher/poly1305-internal.h (_gcry_cipher_poly1305_update_burn): New
prototype.
* cipher/poly1305.c (poly1305_blocks): Make static.
(_gcry_poly1305_update): Split main function body to ...
(_gcry_poly1305_update_burn): ... new function.
--

Benchmark on Intel Skylake (i5-6500, 3200 Mhz):

Before, 8-way AVX2:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.378 ns/B      2526 MiB/s      1.21 c/B
     STREAM dec |     0.373 ns/B      2560 MiB/s      1.19 c/B
   POLY1305 enc |     0.685 ns/B      1392 MiB/s      2.19 c/B
   POLY1305 dec |     0.686 ns/B      1390 MiB/s      2.20 c/B
  POLY1305 auth |     0.315 ns/B      3031 MiB/s      1.01 c/B

After, 8-way AVX2 (~36% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |     0.503 ns/B      1896 MiB/s      1.61 c/B
   POLY1305 dec |     0.485 ns/B      1965 MiB/s      1.55 c/B

Benchmark on Intel Haswell (i7-4790K, 3998 Mhz):

Before, 8-way AVX2:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.318 ns/B      2999 MiB/s      1.27 c/B
     STREAM dec |     0.317 ns/B      3004 MiB/s      1.27 c/B
   POLY1305 enc |     0.586 ns/B      1627 MiB/s      2.34 c/B
   POLY1305 dec |     0.586 ns/B      1627 MiB/s      2.34 c/B
  POLY1305 auth |     0.271 ns/B      3524 MiB/s      1.08 c/B

After, 8-way AVX2 (~30% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |     0.452 ns/B      2108 MiB/s      1.81 c/B
   POLY1305 dec |     0.440 ns/B      2167 MiB/s      1.76 c/B

Before, 4-way SSSE3:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.627 ns/B      1521 MiB/s      2.51 c/B
     STREAM dec |     0.626 ns/B      1523 MiB/s      2.50 c/B
   POLY1305 enc |     0.895 ns/B      1065 MiB/s      3.58 c/B
   POLY1305 dec |     0.896 ns/B      1064 MiB/s      3.58 c/B
  POLY1305 auth |     0.271 ns/B      3521 MiB/s      1.08 c/B

After, 4-way SSSE3 (~20% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |     0.733 ns/B      1301 MiB/s      2.93 c/B
   POLY1305 dec |     0.726 ns/B      1314 MiB/s      2.90 c/B

Before, 1-way SSSE3:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |      1.56 ns/B     609.6 MiB/s      6.25 c/B
   POLY1305 dec |      1.56 ns/B     609.4 MiB/s      6.26 c/B

After, 1-way SSSE3 (~18% faster):
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
   POLY1305 enc |      1.31 ns/B     725.4 MiB/s      5.26 c/B
   POLY1305 dec |      1.31 ns/B     727.3 MiB/s      5.24 c/B

For comparison to other libraries (on Intel i7-4790K, 3998 Mhz):

bench-slope-openssl: OpenSSL 1.1.1  11 Sep 2018
Cipher:
 chacha20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.301 ns/B    3166.4 MiB/s      1.20 c/B
     STREAM dec |     0.300 ns/B    3174.7 MiB/s      1.20 c/B
   POLY1305 enc |     0.463 ns/B    2060.6 MiB/s      1.85 c/B
   POLY1305 dec |     0.462 ns/B    2063.8 MiB/s      1.85 c/B
  POLY1305 auth |     0.162 ns/B    5899.3 MiB/s     0.646 c/B

bench-slope-nettle: Nettle 3.4
Cipher:
 chacha         |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      1.65 ns/B     578.2 MiB/s      6.59 c/B
     STREAM dec |      1.65 ns/B     578.2 MiB/s      6.59 c/B
   POLY1305 enc |      2.05 ns/B     464.8 MiB/s      8.20 c/B
   POLY1305 dec |      2.05 ns/B     464.7 MiB/s      8.20 c/B
  POLY1305 auth |     0.404 ns/B    2359.1 MiB/s      1.62 c/B

bench-slope-botan: Botan 2.6.0
Cipher:
 ChaCha         |  nanosecs/byte   mebibytes/sec   cycles/byte
 STREAM enc/dec |     0.855 ns/B    1116.0 MiB/s      3.42 c/B
   POLY1305 enc |      1.60 ns/B     595.4 MiB/s      6.40 c/B
   POLY1305 dec |      1.60 ns/B     595.8 MiB/s      6.40 c/B
  POLY1305 auth |     0.752 ns/B    1268.3 MiB/s      3.01 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agoAdd SSSE3 optimized non-parallel ChaCha20 function
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
Add SSSE3 optimized non-parallel ChaCha20 function

* cipher/chacha20-amd64-ssse3.S (ROTATE_SHUF, ROTATE, WORD_SHUF)
(QUARTERROUND4, _gcry_chacha20_amd64_ssse3_blocks1): New.
* cipher/chacha20.c (_gcry_chacha20_amd64_ssse3_blocks1): New
prototype.
(chacha20_blocks): Rename to ...
(do_chacha20_blocks): ... this.
(chacha20_blocks): New.
(chacha20_encrypt_stream): Adjust for new chacha20_blocks function.
--

This patch provides SSSE3 optimized version of non-parallel
ChaCha20 core block function. On Intel Haswell generic C function
runs at 6.9 cycles/byte. New function runs at 5.2 cycles/byte, thus
being ~32% faster.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/basic: increase buffer size for check_one_cipher
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/basic: increase buffer size for check_one_cipher

* tests/basic.c (check_one_cipher_core)
(check_one_cipher): Increase buffer from 1040 to 1904 bytes.
--

This is for better test coverage of highly parallel cipher
implementations.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
7 months agotests/basic: check AEAD tags in check_one_cipher test
Jussi Kivilinna [Sun, 27 Jan 2019 09:19:56 +0000 (11:19 +0200)]
tests/basic: check AEAD tags in check_one_cipher test

* tests/basic.c (get_algo_mode_taglen): New.
(check_one_cipher_core_reset): Check that tags are same with
AEAD modes.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agobuild: With LD_LIBRARY_PATH defined, use --disable-new-dtags.
NIIBE Yutaka [Tue, 15 Jan 2019 07:14:51 +0000 (16:14 +0900)]
build: With LD_LIBRARY_PATH defined, use --disable-new-dtags.

* configure.ac (LDADD_FOR_TESTS_KLUDGE): New for --disable-new-dtags.
* tests/Makefile.am (LDADD, t_lock_LDADD): Use LDADD_FOR_TESTS_KLUDGE.

--

GnuPG-bug-id: 4298
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
8 months agorandom: Fix previous commit for getentropy function.
NIIBE Yutaka [Tue, 15 Jan 2019 06:48:25 +0000 (15:48 +0900)]
random: Fix previous commit for getentropy function.

* random/rndlinux.c [__NR_getrandom] (_gcry_rndlinux_gather_random):
Check return value only for use of syscall.

--

The function returns 0 on success.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
8 months agorandom: Use getentropy when available for not GNU/Linux.
NIIBE Yutaka [Tue, 15 Jan 2019 04:53:45 +0000 (13:53 +0900)]
random: Use getentropy when available for not GNU/Linux.

* configure.ac: Detect getentropy.
* random/rndlinux.c [__linux__] (getentropy): Macro defined.
[HAVE_GETENTROPY] (_gcry_rndlinux_gather_random): Use getentropy.

--

GnuPG-bug-id: 4288
Reported-by: David Carlier
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
8 months agocamellia-aarch64: do not export look-up table globally
Jussi Kivilinna [Mon, 14 Jan 2019 20:14:24 +0000 (22:14 +0200)]
camellia-aarch64: do not export look-up table globally

* cipher/camellia-aarch64.S (_gcry_camellia_arm_tables): Remove
'.globl' export.
--

Reported-by: Martin Husemann <martin@NetBSD.org>
GnuPG-bug-id: 4317
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agoProcess CCM/EAX/GCM/Poly1305 AEAD cipher modes input in 24 KiB chucks
Jussi Kivilinna [Wed, 2 Jan 2019 19:25:44 +0000 (21:25 +0200)]
Process CCM/EAX/GCM/Poly1305 AEAD cipher modes input in 24 KiB chucks

* cipher/cipher-ccm.c (_gcry_cipher_ccm_encrypt)
(_gcry_cipher_ccm_decrypt): Process data in 24 KiB chunks.
* cipher/cipher-eax.c (_gcry_cipher_eax_encrypt)
(_gcry_cipher_eax_decrypt): Ditto.
* cipher/cipher-gcm.c (_gcry_cipher_gcm_encrypt)
(_gcry_cipher_gcm_decrypt): Ditto.
* cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt)
(_gcry_cipher_poly1305_decrypt): Ditto.
--

Patch changes AEAD modes to process input in 24 KiB chuncks to improve
cache locality when processing large buffers.

Huge buffer test in tests/benchmark show 0.7% improvement for AES-CCM
and AES-EAX, 6% for AES-GCM and 4% for Chacha20-Poly1305 on Intel Core
i7-4790K.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agotests/benchmark: add Chacha20-Poly1305 benchmarking
Jussi Kivilinna [Wed, 2 Jan 2019 19:25:44 +0000 (21:25 +0200)]
tests/benchmark: add Chacha20-Poly1305 benchmarking

* tests/benchmark.c (cipher_bench): Add Chacha20-Poly1305.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agotests/benchmark: add --huge-buffers option for cipher tests
Jussi Kivilinna [Wed, 2 Jan 2019 19:25:44 +0000 (21:25 +0200)]
tests/benchmark: add --huge-buffers option for cipher tests

* tests/benchmark.c (huge_buffers, cipher_encrypt, cipher_decrypt): New.
(cipher_bench): Add 'max_inlen' to modes structure; add huge buffers
mode selection.
(main): Add '--huge-buffers'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
8 months agorandom: Add finalizer for rndjent.
NIIBE Yutaka [Wed, 19 Dec 2018 01:28:32 +0000 (10:28 +0900)]
random: Add finalizer for rndjent.

* random/rand-internal.h (_gcry_rndjent_fini): New.
* random/rndjent.c (_gcry_rndjent_fini): New.
* random/rndlinux.c (_gcry_rndlinux_gather_random): Call the finalizer
when GCRYCTL_CLOSE_RANDOM_DEVICE.

--

GnuPG-bug-id: 3731
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
9 months agosecmem: Prepare for easier debugging.
Werner Koch [Wed, 12 Dec 2018 07:34:10 +0000 (08:34 +0100)]
secmem: Prepare for easier debugging.

* src/secmem.c (_gcry_secmem_dump_stats): Factor code out to ...
(secmem_dump_stats_internal): new.
--

This allows to insert call to the dump function during debug sessions
inside of the allocators or call secmem_dump_stats_internal from gdb.

Signed-off-by: Werner Koch <wk@gnupg.org>
9 months agorijndael-aesni: interleave last CTR encryption round with xoring
Jussi Kivilinna [Sat, 1 Dec 2018 10:21:14 +0000 (12:21 +0200)]
rijndael-aesni: interleave last CTR encryption round with xoring

* cipher/rijndael-aesni.c (do_aesni_ctr_8): Interleave aesenclast
with input xoring.
--

Structure of 'aesenclast' instruction allows reordering last
encryption round and xoring of input block for small ~0.5%
improvement in performance.

Intel i7-4790K @ 4.0 Ghz:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        CTR enc |     0.159 ns/B      6002 MiB/s     0.636 c/B
        CTR dec |     0.159 ns/B      6001 MiB/s     0.636 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
9 months agoUse explicit_bzero for wipememory
Jussi Kivilinna [Tue, 13 Nov 2018 20:08:50 +0000 (22:08 +0200)]
Use explicit_bzero for wipememory

* configure.ac (AC_CHECK_FUNCS): Check for 'explicit_bzero'.
* src/g10lib.h (wipememory2): Use _gcry_fast_wipememory if _SET is
zero.
(_gcry_fast_wipememory): New.
(_gcry_wipememory2): Rename to...
(_gcry_fast_wipememory2): ...this.
* src/misc.c (_gcry_wipememory): New.
(_gcry_wipememory2): Rename to...
(_gcry_fast_wipememory2): ...this.
(_gcry_fast_wipememory2) [HAVE_EXPLICIT_BZERO]: Use explicit_bzero if
SET is zero.
(_gcry_burn_stack): Use _gcry_fast_wipememory.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
9 months agoAdd clang target pragma for mixed C/assembly x86-64 implementations
Jussi Kivilinna [Tue, 20 Nov 2018 19:16:08 +0000 (21:16 +0200)]
Add clang target pragma for mixed C/assembly x86-64 implementations

* cipher/cipher-gcm-intel-pclmul.c: Add target 'no-sse' attribute
pragma for clang.
* cipher/crc-intel-pclmul.c: Ditto.
* cipher/rijndael-aesni.c: Ditto.
* cipher/rijndael-ssse3-amd64.c: Ditto.
* cipher/sha1-intel-shaext.c: Ditto.
* cipher/sha256-intel-shaext.c: Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
9 months agoOptimizations for AES-NI OCB
Jussi Kivilinna [Tue, 20 Nov 2018 19:16:08 +0000 (21:16 +0200)]
Optimizations for AES-NI OCB

* cipher/cipher-internal.h (gcry_cipher_handle): New pre-computed OCB
values L0L1 and L0L1L0; Swap dimensions for OCB L table.
* cipher/cipher-ocb.c (_gcry_cipher_ocb_set_nonce): Setup L0L1 and
L0L1L0 values.
(ocb_crypt): Process input in 24KiB chunks for better cache locality
for checksumming.
* cipher/rijndael-aesni.c (ALWAYS_INLINE): New macro for always
inlining functions, change all functions with 'inline' to use
ALWAYS_INLINE.
(NO_INLINE): New macro.
(aesni_prepare_2_6_variable, aesni_prepare_7_15_variable): Rename to...
(aesni_prepare_2_7_variable, aesni_prepare_8_15_variable): ...these and
adjust accordingly (xmm7 moved from *_7_15 to *_2_7).
(aesni_prepare_2_6, aesni_prepare_7_15): Rename to...
(aesni_prepare_2_7, aesni_prepare_8_15): ...these and adjust
accordingly.
(aesni_cleanup_2_6, aesni_cleanup_7_15): Rename to...
(aesni_cleanup_2_7, aesni_cleanup_8_15): ...these and adjust
accordingly.
(aesni_ocb_checksum): New.
(aesni_ocb_enc, aesni_ocb_dec): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0; Do checksum
calculation as separate pass instead of inline; Use NO_INLINE.
(_gcry_aes_aesni_ocb_auth): Calculate OCB offsets in parallel
with help of pre-computed offsets L0+L1 ja L0+L1+L0.
* cipher/rijndael-internal.h (RIJNDAEL_context_s) [USE_AESNI]: Add
'use_avx2' and 'use_avx'.
* cipher/rijndael.c (do_setkey) [USE_AESNI]: Set 'use_avx2' if
Intel AVX2 HW feature is available and 'use_avx' if Intel AVX HW
feature is available.
* tests/basic.c (do_check_ocb_cipher): New test vector; increase
size of temporary buffers for new test vector.
(check_ocb_cipher_largebuf_split): Make test plaintext non-uniform
for better checksum testing.
(check_ocb_cipher_checksum): New.
(check_ocb_cipher_largebuf): Call check_ocb_cipher_checksum.
(check_ocb_cipher): New expected tags for check_ocb_cipher_largebuf
test runs.
--

Benchmark on Haswell i7-4970k @ 4.0Ghz:

Before:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        OCB enc |     0.175 ns/B      5436 MiB/s     0.702 c/B
        OCB dec |     0.184 ns/B      5184 MiB/s     0.736 c/B
       OCB auth |     0.156 ns/B      6097 MiB/s     0.626 c/B

After (enc +2% faster, dec +7% faster):
        OCB enc |     0.172 ns/B      5547 MiB/s     0.688 c/B
        OCB dec |     0.171 ns/B      5582 MiB/s     0.683 c/B
       OCB auth |     0.156 ns/B      6097 MiB/s     0.626 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
9 months agodoc: Fix library initialization examples
Andreas Metzler [Sun, 18 Nov 2018 15:01:21 +0000 (16:01 +0100)]
doc: Fix library initialization examples

Signed-off-by: Andreas Metzler <ametzler@bebt.de>
10 months agorandom: Initialize variable as requested by valgrind
Werner Koch [Wed, 14 Nov 2018 13:14:23 +0000 (14:14 +0100)]
random: Initialize variable as requested by valgrind

random/jitterentropy-base.c: Init.
--

The variable ec does not need initialization for proper functioning of
the analyzer code. However, valgrind complains about the uninitialized
variable. Thus, initialize it.

Original-repo: https://github.com/smuellerDD/jitterentropy-library.git
Original-commit: 9048af7f06fc1488904f54852e0a2f8da45a4745
Original-Author:: Stephan Mueller <smueller@chronox.de>
Original-Date: Sun, 15 Jul 2018 19:14:02 +0200
Reported-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agolibgcrypt.m4: Prefer gpgrt-config to SYSROOT support.
NIIBE Yutaka [Tue, 13 Nov 2018 01:30:39 +0000 (10:30 +0900)]
libgcrypt.m4: Prefer gpgrt-config to SYSROOT support.

* libgcrypt.m4: Move SYSROOT support after check of GPGRT_CONFIG.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Update autogen.rc.
NIIBE Yutaka [Tue, 13 Nov 2018 00:36:37 +0000 (09:36 +0900)]
build: Update autogen.rc.

* autogen.rc: Remove obsolete --with-gpg-error-prefix option.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agoFix 'variable may be used uninitialized' warning for CTR mode
Jussi Kivilinna [Wed, 7 Nov 2018 17:12:29 +0000 (19:12 +0200)]
Fix 'variable may be used uninitialized' warning for CTR mode

* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Set N to BLOCKSIZE
before counter loop.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoFix inlining of ocb_get_l for x86 AES implementations
Jussi Kivilinna [Tue, 6 Nov 2018 18:27:34 +0000 (20:27 +0200)]
Fix inlining of ocb_get_l for x86 AES implementations

* cipher/rijndael-aesni.c (aes_ocb_get_l): New.
(aesni_ocb_enc, aesni_ocb_dec, _gcry_aes_aesni_ocb_auth): Use
'aes_ocb_get_l'.
* cipher/rijndael-ssse3-amd4.c (aes_ocb_get_l): New.
(ssse3_ocb_enc, ssse3_ocb_dec, _gcry_aes_ssse3_ocb_auth): Use
'aes_ocb_get_l'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agostdmem: free: only call _gcry_secmem_free if needed
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
stdmem: free: only call _gcry_secmem_free if needed

* src/stdmem.c (_gcry_private_free): Check if memory is secure before
calling _gcry_secmem_free to avoid unnecessarily taking secmem lock.
--

Unnecessarily taking secmem lock on non-secure memory can result poor
performance on multi-threaded workloads:
  https://lists.gnupg.org/pipermail/gcrypt-devel/2018-August/004535.html

Reported-by: Christian Grothoff <grothoff@gnunet.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agosecmem: fix potential memory visibility issue
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
secmem: fix potential memory visibility issue

* configure.ac (gcry_cv_have_sync_synchronize): New check.
* src/secmem.c (pooldesc_s): Make next pointer volatile.
(memory_barrier): New.
(_gcry_secmem_malloc_internal): Insert memory barrier between
pool->next and mainpool.next assigments.
(_gcry_private_is_secure): Update comments.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agowipememory: use memset for non-constant length or large buffer wipes
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
wipememory: use memset for non-constant length or large buffer wipes

* src/g10lib.h (CONSTANT_P): New.
(_gcry_wipememory2): New prototype.
(wipememory2): Use _gcry_wipememory2 if _len not constant expression or
lenght is larger than 64 bytes.
(FASTWIPE_T, FASTWIPE_MULT, fast_wipememory2_unaligned_head): Remove.
(fast_wipememory2): Always handle buffer as unaligned.
* src/misc.c (__gcry_burn_stack): Move memset_ptr variable to...
(memset_ptr): ... here. New.
(_gcry_wipememory2): New.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoChange buf_cpy and buf_xor* functions to use buf_put/buf_get helpers
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
Change buf_cpy and buf_xor* functions to use buf_put/buf_get helpers

* cipher/bufhelp.h (BUFHELP_FAST_UNALIGNED_ACCESS)
(bufhelp_int_s, buf_xor_1): Remove.
(buf_cpy, buf_xor, buf_xor_2dst, buf_xor_n_copy_2): Use
buf_put/buf_get helpers to handle unaligned memory accesses.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agorijndael: fix unused parameter warning
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
rijndael: fix unused parameter warning

* cipher/rijndael.c (do_setkey): Silence unused 'hd' warning.
--

This commit fixes "warning: unused parameter 'hd'" warning seen on
architectures that do not have alternative AES implementations.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agompi/longlong.h: enable inline assembly for powerpc64
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
mpi/longlong.h: enable inline assembly for powerpc64

* mpi/longlong.h [__powerpc__ && W_TYPE_SIZE == 64]: Remove '#if 0'.
--

PowerPC64 inline assembly was tested on QEMU ('make check' pass).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoChange remaining users of _gcry_fips_mode to use fips_mode
Jussi Kivilinna [Mon, 5 Nov 2018 18:42:58 +0000 (20:42 +0200)]
Change remaining users of _gcry_fips_mode to use fips_mode

* src/fips.c (_gcry_fips_mode): Remove.
(_gcry_enforced_fips_mode, _gcry_inactivate_fips_mode)
(_gcry_is_fips_mode_inactive): Use fips_mode.
* src/g10lib.h (_gcry_fips_mode): Remove.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoaarch64: mpi: Distribute the header file as a part of source.
NIIBE Yutaka [Fri, 2 Nov 2018 09:54:02 +0000 (18:54 +0900)]
aarch64: mpi: Distribute the header file as a part of source.

* mpi/Makefile.am (EXTRA_libmpi_la_SOURCES): Add asm-common-aarch64.h.

--

Fixes-commit: ec0a2f25c0f64a7b65b373508ce9081e10461965
Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Fix GCRYPT_HWF_MODULES.
NIIBE Yutaka [Fri, 2 Nov 2018 04:51:40 +0000 (13:51 +0900)]
build: Fix GCRYPT_HWF_MODULES.

* configure.ac (GCRYPT_HWF_MODULES): Add libgcrypt_la- prefix.

--

Before this change "make distcheck" fails because
src/.deps/hwf-x86.Plo remains.  Note that the distclean entry for the
file is libgcrypt_la-hwf-x86.Plo.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Update gpg-error.m4 and libgcrypt.m4.
NIIBE Yutaka [Fri, 2 Nov 2018 03:06:11 +0000 (12:06 +0900)]
build: Update gpg-error.m4 and libgcrypt.m4.

* m4/gpg-error.m4: Update to 2018-11-02.
* src/libgrypt.m4: Add AC_MSG_NOTICE.
Bump the version date.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Update gpg-error.m4 and ksba.m4.
NIIBE Yutaka [Mon, 29 Oct 2018 03:51:19 +0000 (12:51 +0900)]
build: Update gpg-error.m4 and ksba.m4.

* m4/gpg-error.m4: Update to 2018-10-29.
* src/libgrypt.m4: Follow the change of gpgrt-config.
Bump the version date.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agoFix missing global initialization in fips_is_operational
Jussi Kivilinna [Sat, 27 Oct 2018 12:48:29 +0000 (15:48 +0300)]
Fix missing global initialization in fips_is_operational

* src/g10lib.h (_gcry_global_any_init_done): New extern.
(fips_is_operational): Check for _gcry_global_any_init_done and call
_gcry_global_is_operational.
* src/global.c (any_init_done): Rename to ...
(_gcry_global_any_init_done): ... this and make externally available.
--

Commit b6e6ace324440f564df664e27f8276ef01f76795 "Add fast path for
_gcry_fips_is_operational" inadvertently replaced function call to
_gcry_global_is_operational with call to _gcry_fips_is_operational
in fips_is_operational macro. This can cause libgcrypt to miss
initialization. This patch restores _gcry_global_is_operational
functionality to fips_is_operational macro while keeping fast-path
to reduce call-overhead to gcry_* functions.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
10 months agoMerge release info from 1.8.4
Werner Koch [Fri, 26 Oct 2018 18:04:44 +0000 (20:04 +0200)]
Merge release info from 1.8.4

--

10 months agorandom: use getrandom() on Linux where available
Daniel Kahn Gillmor [Wed, 5 Sep 2018 14:34:04 +0000 (10:34 -0400)]
random: use getrandom() on Linux where available

* random/rndlinux.c (_gcry_rndlinux_gather_random): use the
getrandom() syscall on Linux if it exists, regardless of what kind of
entropy was requested.

--

This change avoids the serious usability problem of unnecessary
blocking on /dev/random when the kernel's PRNG is already seeded,
without introducing the risk of pulling from an uninitialized PRNG.
It only has an effect on Linux systems with a functioning getrandom()
syscall.  If that syscall is unavailable or fails, it should fall
through to the pre-existing behavior.

GnuPG-bug-id: 3894
Signed-off-by: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
10 months agorandom: Make sure to re-open /dev/random after a fork
Werner Koch [Fri, 26 Oct 2018 11:22:16 +0000 (13:22 +0200)]
random: Make sure to re-open /dev/random after a fork

* random/rndlinux.c (_gcry_rndlinux_gather_random): Detect fork and
re-open devices.
--

This mitigates about ill-behaving software which has closed the
standard fds but later dups them to /dev/null.

GnuPG-bug-id: 3491
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agoprimes: Avoid leaking bits of the prime test to pageable memory.
Werner Koch [Fri, 26 Oct 2018 10:57:30 +0000 (12:57 +0200)]
primes: Avoid leaking bits of the prime test to pageable memory.

* cipher/primegen.c (gen_prime): Allocate MODS in secure memory.
--

This increases the pressure on the secure memory by about 1400 byte
but given that we can meanwhile increase the size of the secmem area,
this is acceptable.

GnuPG-bug-id: 3848
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agolibgcrypt.m4: Better compatibility support.
NIIBE Yutaka [Fri, 26 Oct 2018 01:35:51 +0000 (10:35 +0900)]
libgcrypt.m4: Better compatibility support.

* src/gpg-error.m4: Update.
* src/libgcrypt.m4: Don't assume libgcrypt-config is newer.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Fix libgcrypt.m4.
NIIBE Yutaka [Fri, 26 Oct 2018 00:38:47 +0000 (09:38 +0900)]
build: Fix libgcrypt.m4.

* src/libgcrypt.m4: Use AC_PATH_PROG to detect libgcrypt-config.

--

Last commit using AC_PATH_TOOL was wrong.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Relax build requirements.
NIIBE Yutaka [Fri, 26 Oct 2018 00:09:51 +0000 (09:09 +0900)]
build: Relax build requirements.

* m4/gpg-error.m4: Update from libgpg-error 1.33.
* src/libgcrypt.m4: Don't require AM_PATH_GPG_ERROR.  Use GPGRT_CONFIG
instead of libgcrypt-config when it is confirmed that it is available
and working well.
* configure.ac (AM_PATH_GPG_ERROR): No requirement for newer version
(It was because of new gpgrt-config which supports *.pc files).

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agocipher: Add comments about future OIDs.
Werner Koch [Thu, 25 Oct 2018 11:04:21 +0000 (13:04 +0200)]
cipher: Add comments about future OIDs.

--

10 months agobuild: Require libgpg-error >= 1.33.
NIIBE Yutaka [Thu, 25 Oct 2018 01:11:59 +0000 (10:11 +0900)]
build: Require libgpg-error >= 1.33.

* configure.ac (NEED_GPG_ERROR_VERSION): Require 1.33.
* m4/gpg-error.m4: Update from libgpg-error 1.33.
* src/libgcrypt.m4: Bump version date.
Use --variable option.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Add release make target
Werner Koch [Wed, 24 Oct 2018 10:24:44 +0000 (12:24 +0200)]
build: Add release make target

* Makefile.am (release, sign-release): New targets.

Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agobuild: Make distcheck work again.
Werner Koch [Wed, 24 Oct 2018 10:23:47 +0000 (12:23 +0200)]
build: Make distcheck work again.

* cipher/Makefile.am: Prettified source file lists.
EXTRA_libcipher_la_SOURCES): Add missing asm-common-aarch64.h.

Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agodoc: Update yat2m.c from upstream (libgpg-error)
Werner Koch [Wed, 24 Oct 2018 10:06:07 +0000 (12:06 +0200)]
doc: Update yat2m.c from upstream (libgpg-error)

--
GnuPG-bug-id: 4102

Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agoFix memory leak in secmem in out of core conditions.
Werner Koch [Wed, 24 Oct 2018 09:55:34 +0000 (11:55 +0200)]
Fix memory leak in secmem in out of core conditions.

* src/secmem.c (_gcry_secmem_malloc_internal): Release pool descriptor
if the pool could not be allocated.
--

GnuPG-bug-id: 4211
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agoecc: Fix memory leak in the error case of ecc_encrypt_raw
Werner Koch [Wed, 24 Oct 2018 09:50:46 +0000 (11:50 +0200)]
ecc: Fix memory leak in the error case of ecc_encrypt_raw

* cipher/ecc.c (ecc_encrypt_raw): Add proper error cleanup in the main
block.
--

GnuPG-bug-id: 4210
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agoecc: Fix possible memory leakage in parameter check of eddsa.
Werner Koch [Wed, 24 Oct 2018 07:50:17 +0000 (09:50 +0200)]
ecc: Fix possible memory leakage in parameter check of eddsa.

* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_verify): Fix mem leak.
--

GnuPG-bug-id: 4209
Signed-off-by: Werner Koch <wk@gnupg.org>
10 months agobuild: Fix libgcrypt.pc.
NIIBE Yutaka [Wed, 24 Oct 2018 06:34:57 +0000 (15:34 +0900)]
build: Fix libgcrypt.pc.

* src/libgcrypt.pc.in: Fix typo.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Compatibility to pkg-config.
NIIBE Yutaka [Wed, 24 Oct 2018 06:13:40 +0000 (15:13 +0900)]
build: Compatibility to pkg-config.

* src/libgcrypt-config.in: Support --variable and --modversion.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Make libgcrypt.m4 use gpg-error-config.
NIIBE Yutaka [Wed, 24 Oct 2018 06:07:18 +0000 (15:07 +0900)]
build: Make libgcrypt.m4 use gpg-error-config.

* src/libgcrypt.m4: Use gpg-error-config.

--

With the option --with-libgcrypt-prefix, it still keeps using
libgcrypt-config script.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Provide libgcrypt.pc, generated by configure.
NIIBE Yutaka [Wed, 24 Oct 2018 05:57:53 +0000 (14:57 +0900)]
build: Provide libgcrypt.pc, generated by configure.

* configure.ac: Generate src/libgcrypt.pc.
* src/Makefile.am (pkgconfigdir, pkgconfig_DATA): New.
(EXTRA_DIST): Add libgcrypt.pc.in.
* src/libgcrypt-config.in: Use @PACKAGE_VERSION@.
* src/libgcrypt.pc.in: New.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Update gpg-error.m4 from libgpg-error.
NIIBE Yutaka [Wed, 24 Oct 2018 05:33:23 +0000 (14:33 +0900)]
build: Update gpg-error.m4 from libgpg-error.

* m4/gpg-error.m4: Update from libgpg-error 1.33.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
10 months agobuild: Don't default to underscore=yes for cross-build.
NIIBE Yutaka [Wed, 24 Oct 2018 05:29:45 +0000 (14:29 +0900)]
build: Don't default to underscore=yes for cross-build.

* acinclude.m4: Don't set ac_cv_sys_symbol_underscore
for cross build.

--

It made sense in the past when cross compile were basically for a.out
system, but nowadays, it's better not to assume that.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>