libgcrypt.git
4 years agoMake make distcheck work again.
Werner Koch [Tue, 6 Jan 2015 19:30:37 +0000 (20:30 +0100)]
Make make distcheck work again.

* Makefile.am (DISTCHECK_CONFIGURE_FLAGS): Remove --enable-ciphers.
* cipher/Makefile.am (DISTCLEANFILES): Add gost-sb.h.

4 years agoRemove the old Manifest files
Werner Koch [Tue, 6 Jan 2015 17:54:24 +0000 (18:54 +0100)]
Remove the old Manifest files

--

The Manifest file have been part of an experiment a long time ago to
implement source level integrity.  I is not maintained for more than a
decade and with the advent of git this is superfluous anyway.

4 years agostribog: Reduce table size to the needed one.
Dmitry Eremin-Solenikov [Sun, 28 Dec 2014 09:15:33 +0000 (12:15 +0300)]
stribog: Reduce table size to the needed one.

* cipher/stribog.c (C16): Avoid allocating superfluous space.

--

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agogostr3411-94: Fix the iteration count for length filling loop.
Dmitry Eremin-Solenikov [Sun, 28 Dec 2014 09:05:43 +0000 (12:05 +0300)]
gostr3411-94: Fix the iteration count for length filling loop.

* cipher/gostr3411-94.c (gost3411_final): Fix loop
--

The maximum iteration count for filling the l (bit length) array was
incrrectly set to 32 (missed that in u8->u32 refactoring). This was
not resulting in stack corruption, since nblocks variable would be
exausted earlier compared to 8 32-bit values (the size of the array).

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agobuild: Add a commit-msg git-hook script.
Werner Koch [Tue, 6 Jan 2015 13:51:39 +0000 (14:51 +0100)]
build: Add a commit-msg git-hook script.

--

This is the same script as used by GnuPG.  It makes sure that lines
are not too long and checks some other basic things.  ./autogen.sh
installs it.

4 years agorandom: Silent warning under NetBSD using rndunix
Werner Koch [Mon, 5 Jan 2015 18:38:29 +0000 (19:38 +0100)]
random: Silent warning under NetBSD using rndunix

* random/rndunix.c (STDERR_FILENO): Define if needed.
(start_gatherer): Re-open standard descriptors.  Fix an
unsigned/signed pointer warning.
--

GnuPG-bug-id: 1702

4 years agoprimegen: Fix memory leak for invalid call sequences.
Werner Koch [Mon, 5 Jan 2015 17:58:39 +0000 (18:58 +0100)]
primegen: Fix memory leak for invalid call sequences.

* cipher/primegen.c (prime_generate_internal): Refactor generator code
to not leak memory for non-implemented feature.
(_gcry_prime_group_generator): Refactor to not leak memory for invalid
args.  Also make sure that R_G is set as soon as possible.
--

GnuPG-bug-id: 1705
Signed-off-by: Werner Koch <wk@gnupg.org>
4 years agodoc: Update yat2m to current upstream version (GnuPG).
Werner Koch [Mon, 5 Jan 2015 16:47:26 +0000 (17:47 +0100)]
doc: Update yat2m to current upstream version (GnuPG).

4 years agobuild: Require automake 1.14.
Werner Koch [Mon, 5 Jan 2015 16:46:05 +0000 (17:46 +0100)]
build: Require automake 1.14.

* configure.ac (AM_INIT_AUTOMAKE): Add serial-tests.

Signed-off-by: Werner Koch <wk@gnupg.org>
4 years agocipher: Add the original PD notice to rijndael-ssse3-amd64.c
Werner Koch [Mon, 5 Jan 2015 16:16:04 +0000 (17:16 +0100)]
cipher: Add the original PD notice to rijndael-ssse3-amd64.c

--

4 years agoReplace camel case of internal scrypt functions.
Werner Koch [Mon, 5 Jan 2015 16:04:10 +0000 (17:04 +0100)]
Replace camel case of internal scrypt functions.

* cipher/scrypt.c (_salsa20_core): Rename to salsa20_core.  Change
callers.
(_scryptBlockMix): Rename to scrypt_block_mix.  Change callers.
(_scryptROMix): Rename to scrypt_ro_mix. Change callers.
--

Signed-off-by: Werner Koch <wk@gnupg.org>
4 years agodoc: State that gcry_md_write et al may be used after md_read.
Werner Koch [Sun, 28 Dec 2014 13:26:48 +0000 (14:26 +0100)]
doc: State that gcry_md_write et al may be used after md_read.

--

4 years agodoc: typo fix
Werner Koch [Fri, 19 Dec 2014 08:11:08 +0000 (09:11 +0100)]
doc: typo fix

--
GnuPG-bug-id: 1589

4 years agormd160: restore native-endian store in _gcry_rmd160_mixblock
Jussi Kivilinna [Fri, 2 Jan 2015 17:07:24 +0000 (19:07 +0200)]
rmd160: restore native-endian store in _gcry_rmd160_mixblock

* cipher/rmd160.c (_gcry_rmd160_mixblock): Store result to buffer in
native-endianess.
--

Commit 4515315f61fbf79413e150fbd1d5f5a2435f2bc5 unintendedly changed this
native-endian store to little-endian.

Reported-by: Yuriy Kaminskiy <yumkam@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoAdd Intel SSSE3 based vector permutation AES implementation
Jussi Kivilinna [Sat, 27 Dec 2014 10:37:16 +0000 (12:37 +0200)]
Add Intel SSSE3 based vector permutation AES implementation

* cipher/Makefile.am: Add 'rijndael-ssse3-amd64.c'.
* cipher/rijndael-internal.h (USE_SSSE3): New.
(RIJNDAEL_context_s) [USE_SSSE3]: Add 'use_ssse3'.
* cipher/rijndael-ssse3-amd64.c: New.
* cipher/rijndael.c [USE_SSSE3] (_gcry_aes_ssse3_do_setkey)
(_gcry_aes_ssse3_prepare_decryption, _gcry_aes_ssse3_encrypt)
(_gcry_aes_ssse3_decrypt, _gcry_aes_ssse3_cfb_enc)
(_gcry_aes_ssse3_cbc_enc, _gcry_aes_ssse3_ctr_enc)
(_gcry_aes_ssse3_cfb_dec, _gcry_aes_ssse3_cbc_dec): New.
(do_setkey): Add HWF check for SSSE3 and setup for SSSE3
implementation.
(prepare_decryption, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Add
selection for SSSE3 implementation.
* configure.ac [host=x86_64]: Add 'rijndael-ssse3-amd64.lo'.
--

This patch adds "AES with vector permutations" implementation by
Mike Hamburg. Public-domain source-code is available at:
  http://crypto.stanford.edu/vpaes/

Benchmark on Intel Core2 T8100 (2.1Ghz, no turbo):

Old (AMD64 asm):
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      8.79 ns/B     108.5 MiB/s     18.46 c/B
        ECB dec |      9.07 ns/B     105.1 MiB/s     19.05 c/B
        CBC enc |      7.77 ns/B     122.7 MiB/s     16.33 c/B
        CBC dec |      7.74 ns/B     123.2 MiB/s     16.26 c/B
        CFB enc |      7.88 ns/B     121.0 MiB/s     16.54 c/B
        CFB dec |      7.56 ns/B     126.1 MiB/s     15.88 c/B
        OFB enc |      9.02 ns/B     105.8 MiB/s     18.94 c/B
        OFB dec |      9.07 ns/B     105.1 MiB/s     19.05 c/B
        CTR enc |      7.80 ns/B     122.2 MiB/s     16.38 c/B
        CTR dec |      7.81 ns/B     122.2 MiB/s     16.39 c/B

New (ssse3):
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      5.77 ns/B     165.2 MiB/s     12.13 c/B
        ECB dec |      7.13 ns/B     133.7 MiB/s     14.98 c/B
        CBC enc |      5.27 ns/B     181.0 MiB/s     11.06 c/B
        CBC dec |      6.39 ns/B     149.3 MiB/s     13.42 c/B
        CFB enc |      5.27 ns/B     180.9 MiB/s     11.07 c/B
        CFB dec |      5.28 ns/B     180.7 MiB/s     11.08 c/B
        OFB enc |      6.11 ns/B     156.1 MiB/s     12.83 c/B
        OFB dec |      6.13 ns/B     155.5 MiB/s     12.88 c/B
        CTR enc |      5.26 ns/B     181.5 MiB/s     11.04 c/B
        CTR dec |      5.24 ns/B     182.0 MiB/s     11.00 c/B

Benchmark on Intel i5-2450M (2.5Ghz, no turbo, aes-ni disabled):

Old (AMD64 asm):
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      8.06 ns/B     118.3 MiB/s     20.15 c/B
        ECB dec |      8.21 ns/B     116.1 MiB/s     20.53 c/B
        CBC enc |      7.88 ns/B     121.1 MiB/s     19.69 c/B
        CBC dec |      7.57 ns/B     126.0 MiB/s     18.92 c/B
        CFB enc |      7.87 ns/B     121.2 MiB/s     19.67 c/B
        CFB dec |      7.56 ns/B     126.2 MiB/s     18.89 c/B
        OFB enc |      8.27 ns/B     115.3 MiB/s     20.67 c/B
        OFB dec |      8.28 ns/B     115.1 MiB/s     20.71 c/B
        CTR enc |      8.02 ns/B     119.0 MiB/s     20.04 c/B
        CTR dec |      8.02 ns/B     118.9 MiB/s     20.05 c/B

New (ssse3):
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      4.03 ns/B     236.6 MiB/s     10.07 c/B
        ECB dec |      5.28 ns/B     180.8 MiB/s     13.19 c/B
        CBC enc |      3.77 ns/B     252.7 MiB/s      9.43 c/B
        CBC dec |      4.69 ns/B     203.3 MiB/s     11.73 c/B
        CFB enc |      3.75 ns/B     254.3 MiB/s      9.37 c/B
        CFB dec |      3.69 ns/B     258.6 MiB/s      9.22 c/B
        OFB enc |      4.17 ns/B     228.7 MiB/s     10.43 c/B
        OFB dec |      4.17 ns/B     228.7 MiB/s     10.42 c/B
        CTR enc |      3.72 ns/B     256.5 MiB/s      9.30 c/B
        CTR dec |      3.72 ns/B     256.1 MiB/s      9.31 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agorandom-csprng: fix compiler warnings on ARM
Jussi Kivilinna [Tue, 23 Dec 2014 11:33:12 +0000 (13:33 +0200)]
random-csprng: fix compiler warnings on ARM

* random/random-csprng.c (_gcry_rngcsprng_update_seed_file)
(read_pool): Cast keypool and rndpool to 'unsigned long *' through
'void *'.
--

Patch fixes 'cast increases required alignment' warnings seen on GCC:

random-csprng.c: In function '_gcry_rngcsprng_update_seed_file':
random-csprng.c:867:15: warning: cast increases required alignment of target type [-Wcast-align]
   for (i=0,dp=(unsigned long*)keypool, sp=(unsigned long*)rndpool;
               ^
random-csprng.c:867:43: warning: cast increases required alignment of target type [-Wcast-align]
   for (i=0,dp=(unsigned long*)keypool, sp=(unsigned long*)rndpool;
                                           ^
random-csprng.c: In function 'read_pool':
random-csprng.c:1023:14: warning: cast increases required alignment of target type [-Wcast-align]
   for(i=0,dp=(unsigned long*)keypool, sp=(unsigned long*)rndpool;
              ^
random-csprng.c:1023:42: warning: cast increases required alignment of target type [-Wcast-align]
   for(i=0,dp=(unsigned long*)keypool, sp=(unsigned long*)rndpool;
                                          ^

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoscrypt: fix compiler warnings on ARM
Jussi Kivilinna [Tue, 23 Dec 2014 11:31:58 +0000 (13:31 +0200)]
scrypt: fix compiler warnings on ARM

* cipher/scrypt.c (_scryptBlockMix): Cast X to 'u32 *' through 'void *'.
--

Patch fixes 'cast increases required alignment' warnings seen on GCC:

scrypt.c: In function '_scryptBlockMix':
scrypt.c:145:22: warning: cast increases required alignment of target type [-Wcast-align]
       _salsa20_core ((u32*)X, (u32*)X, 8);
                      ^
scrypt.c:145:31: warning: cast increases required alignment of target type [-Wcast-align]
       _salsa20_core ((u32*)X, (u32*)X, 8);
                               ^

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agosecmem: fix compiler warnings on ARM
Jussi Kivilinna [Tue, 23 Dec 2014 11:31:09 +0000 (13:31 +0200)]
secmem: fix compiler warnings on ARM

* src/secmem.c (ADDR_TO_BLOCK, mb_get_next, mb_get_new): Cast pointer
from 'char *' to 'memblock_t *' through 'void *'.
(MB_WIPE_OUT): Remove unneeded cast to 'memblock_t *'.
--

Patch fixes 'cast increases required alignment' warnings seen on GCC:

secmem.c: In function 'mb_get_next':
secmem.c:140:13: warning: cast increases required alignment of target type [-Wcast-align]
   mb_next = (memblock_t *) ((char *) mb + BLOCK_HEAD_SIZE + mb->size);
             ^
secmem.c: In function 'mb_get_new':
secmem.c:208:17: warning: cast increases required alignment of target type [-Wcast-align]
      mb_split = (memblock_t *) (((char *) mb) + BLOCK_HEAD_SIZE + size);
                 ^
secmem.c: In function '_gcry_secmem_free_internal':
secmem.c:101:3: warning: cast increases required alignment of target type [-Wcast-align]
   (memblock_t *) ((char *) addr - BLOCK_HEAD_SIZE)
   ^
secmem.c:603:8: note: in expansion of macro 'ADDR_TO_BLOCK'
   mb = ADDR_TO_BLOCK (a);
        ^
In file included from secmem.c:40:0:
secmem.c:609:16: warning: cast increases required alignment of target type [-Wcast-align]
   wipememory2 ((memblock_t *) ((char *) mb + BLOCK_HEAD_SIZE), (byte), size);
                ^
g10lib.h:309:54: note: in definition of macro 'wipememory2'
               volatile char *_vptr=(volatile char *)(_ptr); \
                                                      ^
secmem.c:611:3: note: in expansion of macro 'MB_WIPE_OUT'
   MB_WIPE_OUT (0xff);
   ^
secmem.c:609:16: warning: cast increases required alignment of target type [-Wcast-align]
   wipememory2 ((memblock_t *) ((char *) mb + BLOCK_HEAD_SIZE), (byte), size);
                ^
g10lib.h:309:54: note: in definition of macro 'wipememory2'
               volatile char *_vptr=(volatile char *)(_ptr); \
                                                      ^
secmem.c:612:3: note: in expansion of macro 'MB_WIPE_OUT'
   MB_WIPE_OUT (0xaa);
   ^
secmem.c:609:16: warning: cast increases required alignment of target type [-Wcast-align]
   wipememory2 ((memblock_t *) ((char *) mb + BLOCK_HEAD_SIZE), (byte), size);
                ^
g10lib.h:309:54: note: in definition of macro 'wipememory2'
               volatile char *_vptr=(volatile char *)(_ptr); \
                                                      ^
secmem.c:613:3: note: in expansion of macro 'MB_WIPE_OUT'
   MB_WIPE_OUT (0x55);
   ^
secmem.c:609:16: warning: cast increases required alignment of target type [-Wcast-align]
   wipememory2 ((memblock_t *) ((char *) mb + BLOCK_HEAD_SIZE), (byte), size);
                ^
g10lib.h:309:54: note: in definition of macro 'wipememory2'
               volatile char *_vptr=(volatile char *)(_ptr); \
                                                      ^
secmem.c:614:3: note: in expansion of macro 'MB_WIPE_OUT'
   MB_WIPE_OUT (0x00);
   ^
secmem.c: In function '_gcry_secmem_realloc':
secmem.c:644:8: warning: cast increases required alignment of target type [-Wcast-align]
   mb = (memblock_t *) ((char *) p - ((size_t) &((memblock_t *) 0)->aligned.c));
        ^

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agohash: fix compiler warning on ARM
Jussi Kivilinna [Tue, 23 Dec 2014 11:01:33 +0000 (13:01 +0200)]
hash: fix compiler warning on ARM

* cipher/md.c (md_open, md_copy): Cast 'char *' to ctx through
'void *'.
* cipher/md4.c (md4_final): Use buf_put_* helper instead of
converting 'char *' to 'u32 *'.
* cipher/md5.c (md5_final): Ditto.
* cipher/rmd160.c (_gcry_rmd160_mixblock, rmd160_final): Ditto.
* cipher/sha1.c (sha1_final): Ditto.
* cipher/sha256.c (sha256_final): Ditto.
* cipher/sha512.c (sha512_final): Ditto.
* cipher/tiger.c (tiger_final): Ditto.
--

Patch fixes 'cast increases required alignment' warnings seen on GCC:

md.c: In function 'md_open':
md.c:318:23: warning: cast increases required alignment of target type [-Wcast-align]
       hd->ctx = ctx = (struct gcry_md_context *) ((char *) hd + n);
                       ^
md.c: In function 'md_copy':
md.c:491:22: warning: cast increases required alignment of target type [-Wcast-align]
       bhd->ctx = b = (struct gcry_md_context *) ((char *) bhd + n);
                      ^
md4.c: In function 'md4_final':
md4.c:258:20: warning: cast increases required alignment of target type [-Wcast-align]
 #define X(a) do { *(u32*)p = le_bswap32((*hd).a) ; p += 4; } while(0)
                    ^
md4.c:259:3: note: in expansion of macro 'X'
   X(A);
   ^
md4.c:258:20: warning: cast increases required alignment of target type [-Wcast-align]
 #define X(a) do { *(u32*)p = le_bswap32((*hd).a) ; p += 4; } while(0)
                    ^
md4.c:260:3: note: in expansion of macro 'X'
   X(B);
   ^
[removed the rest]

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agorijndael: fix compiler warnings on ARM
Jussi Kivilinna [Tue, 23 Dec 2014 10:13:50 +0000 (12:13 +0200)]
rijndael: fix compiler warnings on ARM

* cipher/rijndael-internal.h (RIJNDAEL_context_s): Add u32 variants of
keyschedule arrays to unions u1 and u2.
(keyschedenc32, keyscheddec32): New.
* cipher/rijndael.c (u32_a_t): Remove.
(do_setkey): Add and use tkk[].data32, k_u32, tk_u32 and W_u32; Remove
casting byte arrays to u32_a_t.
(prepare_decryption, do_encrypt_fn, do_decrypt_fn): Use keyschedenc32
and keyscheddec32; Remove casting byte arrays to u32_a_t.
--

Patch fixes 'cast increases required alignment' compiler warnings that GCC was showing:

rijndael.c: In function 'do_setkey':
rijndael.c:310:13: warning: cast increases required alignment of target type [-Wcast-align]
           *((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]);
             ^
rijndael.c:310:34: warning: cast increases required alignment of target type [-Wcast-align]
           *((u32_a_t*)tk[j]) = *((u32_a_t*)k[j]);
[removed the rest]

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoPoly1305-AEAD: updated implementation to match draft-irtf-cfrg-chacha20-poly1305-03
Jussi Kivilinna [Sun, 21 Dec 2014 15:36:59 +0000 (17:36 +0200)]
Poly1305-AEAD: updated implementation to match draft-irtf-cfrg-chacha20-poly1305-03

* cipher/cipher-internal.h (gcry_cipher_handle): Use separate byte
counters for AAD and data in Poly1305.
* cipher/cipher-poly1305.c (poly1305_fill_bytecount): Remove.
(poly1305_fill_bytecounts, poly1305_do_padding): New.
(poly1305_aad_finish): Fill padding to Poly1305 and do not fill AAD
length.
(_gcry_cipher_poly1305_authenticate, _gcry_cipher_poly1305_encrypt)
(_gcry_cipher_poly1305_decrypt): Update AAD and data length separately.
(_gcry_cipher_poly1305_tag): Fill padding and bytecounts to Poly1305.
(_gcry_cipher_poly1305_setkey, _gcry_cipher_poly1305_setiv): Reset
AAD and data byte counts; only allow 96-bit IV.
* cipher/cipher.c (_gcry_cipher_open_internal): Limit Poly1305-AEAD to
ChaCha20 cipher.
* tests/basic.c (_check_poly1305_cipher): Update test-vectors.
(check_ciphers): Limit Poly1305-AEAD checks to ChaCha20.
* tests/bench-slope.c (cipher_bench_one): Ditto.
--

Latest Internet-Draft version for "ChaCha20 and Poly1305 for IETF protocols"
has added additional padding to Poly1305-AEAD and limited support IV size to
96-bits:
 https://www.ietf.org/rfcdiff?url1=draft-nir-cfrg-chacha20-poly1305-03&difftype=--html&submit=Go!&url2=draft-irtf-cfrg-chacha20-poly1305-03

Patch makes Poly1305-AEAD implementation to match the changes and limits
Poly1305-AEAD to ChaCha20 only.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20: allow setting counter for stream random access
Jussi Kivilinna [Sun, 21 Dec 2014 15:36:59 +0000 (17:36 +0200)]
chacha20: allow setting counter for stream random access

* cipher/chacha20.c (CHACHA20_CTR_SIZE): New.
(chacha20_ivsetup): Add setup for full counter.
(chacha20_setiv): Allow ivlen == CHACHA20_CTR_SIZE.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agogcm: do not pass extra key pointer for setupM/fillM
Jussi Kivilinna [Tue, 23 Dec 2014 10:35:37 +0000 (12:35 +0200)]
gcm: do not pass extra key pointer for setupM/fillM

* cipher/cipher-gcm-intel-pclmul.c
(_gcry_ghash_setup_intel_pclmul): Remove 'h' parameter.
* cipher/cipher-gcm.c (_gcry_ghash_setup_intel_pclmul): Ditto.
(fillM): Get 'h' pointer from 'c'.
(setupM): Remome 'h' parameter.
(_gcry_cipher_gcm_setkey): Only pass 'c' to setupM.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agorijndael: use more compact look-up tables and add table prefetching
Jussi Kivilinna [Tue, 23 Dec 2014 10:35:28 +0000 (12:35 +0200)]
rijndael: use more compact look-up tables and add table prefetching

* cipher/rijndael-internal.h (rijndael_prefetchfn_t): New.
(RIJNDAEL_context): Add 'prefetch_enc_fn' and 'prefetch_dec_fn'.
* cipher/rijndael-tables.h (S, T1, T2, T3, T4, T5, T6, T7, T8, S5, U1)
(U2, U3, U4): Remove.
(encT, dec_tables, decT, inv_sbox): Add.
* cipher/rijndael.c (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block, _gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_encrypt_block): Add parameter for passing table pointer
to assembly implementation.
(prefetch_table, prefetch_enc, prefetch_dec): New.
(do_setkey): Setup context prefetch functions depending on selected
rijndael implementation; Use new tables for key setup.
(prepare_decryption): Use new tables for decryption key setup.
(do_encrypt_aligned): Rename to...
(do_encrypt_fn): ... to this, change to use new compact tables,
make handle unaligned input and unroll rounds loop by two.
(do_encrypt): Remove handling of unaligned input/output; pass table
pointer to assembly implementations.
(rijndael_encrypt, _gcry_aes_cfb_enc, _gcry_aes_cbc_enc)
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec): Prefetch encryption tables
before encryption.
(do_decrypt_aligned): Rename to...
(do_decrypt_fn): ... to this, change to use new compact tables,
make handle unaligned input and unroll rounds loop by two.
(do_decrypt): Remove handling of unaligned input/output; pass table
pointer to assembly implementations.
(rijndael_decrypt, _gcry_aes_cbc_dec): Prefetch decryption tables
before decryption.
* cipher/rijndael-amd64.S: Use 1+1.25 KiB tables for
encryption+decryption; remove tables from assembly file.
* cipher/rijndael-arm.S: Ditto.
--

Patch replaces 4+4.25 KiB look-up tables in generic implementation and
8+8 KiB look-up tables in AMD64 implementation and 2+2 KiB look-up tables in
ARM implementation with 1+1.25 KiB look-up tables, and adds prefetching of
look-up tables.

AMD64 assembly is slower than before because of additional rotation
instructions. The generic C implementation is now better optimized and
actually faster than before.

Benchmark results on Intel i5-4570 (turbo off) (64-bit, AMD64 assembly):

tests/bench-slope --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes

Old:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      3.10 ns/B     307.5 MiB/s      9.92 c/B
        ECB dec |      3.15 ns/B     302.5 MiB/s     10.09 c/B
        CBC enc |      3.46 ns/B     275.5 MiB/s     11.08 c/B
        CBC dec |      3.19 ns/B     299.2 MiB/s     10.20 c/B
        CFB enc |      3.48 ns/B     274.4 MiB/s     11.12 c/B
        CFB dec |      3.23 ns/B     294.8 MiB/s     10.35 c/B
        OFB enc |      3.29 ns/B     290.2 MiB/s     10.52 c/B
        OFB dec |      3.31 ns/B     288.3 MiB/s     10.58 c/B
        CTR enc |      3.64 ns/B     261.7 MiB/s     11.66 c/B
        CTR dec |      3.65 ns/B     261.6 MiB/s     11.67 c/B

New:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      4.21 ns/B     226.7 MiB/s     13.46 c/B
        ECB dec |      4.27 ns/B     223.2 MiB/s     13.67 c/B
        CBC enc |      4.15 ns/B     229.8 MiB/s     13.28 c/B
        CBC dec |      3.85 ns/B     247.8 MiB/s     12.31 c/B
        CFB enc |      4.16 ns/B     229.1 MiB/s     13.32 c/B
        CFB dec |      3.88 ns/B     245.9 MiB/s     12.41 c/B
        OFB enc |      4.38 ns/B     217.8 MiB/s     14.01 c/B
        OFB dec |      4.36 ns/B     218.6 MiB/s     13.96 c/B
        CTR enc |      4.30 ns/B     221.6 MiB/s     13.77 c/B
        CTR dec |      4.30 ns/B     221.7 MiB/s     13.76 c/B

Benchmark on Intel i5-4570 (turbo off) (32-bit mingw, generic C):

tests/bench-slope.exe --disable-hwf intel-aesni --cpu-mhz 3200 cipher aes

Old:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      6.03 ns/B     158.2 MiB/s     19.29 c/B
        ECB dec |      5.81 ns/B     164.1 MiB/s     18.60 c/B
        CBC enc |      6.22 ns/B     153.4 MiB/s     19.90 c/B
        CBC dec |      5.91 ns/B     161.3 MiB/s     18.92 c/B
        CFB enc |      6.25 ns/B     152.7 MiB/s     19.99 c/B
        CFB dec |      6.24 ns/B     152.8 MiB/s     19.97 c/B
        OFB enc |      6.33 ns/B     150.6 MiB/s     20.27 c/B
        OFB dec |      6.33 ns/B     150.7 MiB/s     20.25 c/B
        CTR enc |      6.28 ns/B     152.0 MiB/s     20.08 c/B
        CTR dec |      6.28 ns/B     151.7 MiB/s     20.11 c/B

New:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |      5.02 ns/B     190.0 MiB/s     16.06 c/B
        ECB dec |      5.33 ns/B     178.8 MiB/s     17.07 c/B
        CBC enc |      4.64 ns/B     205.4 MiB/s     14.86 c/B
        CBC dec |      4.95 ns/B     192.7 MiB/s     15.84 c/B
        CFB enc |      4.75 ns/B     200.7 MiB/s     15.20 c/B
        CFB dec |      4.74 ns/B     201.1 MiB/s     15.18 c/B
        OFB enc |      5.29 ns/B     180.3 MiB/s     16.93 c/B
        OFB dec |      5.29 ns/B     180.3 MiB/s     16.93 c/B
        CTR enc |      4.77 ns/B     200.0 MiB/s     15.26 c/B
        CTR dec |      4.77 ns/B     199.8 MiB/s     15.27 c/B

Benchmark on Cortex-A8 (ARM assembly):

tests/bench-slope --cpu-mhz 1008 cipher aes

Old:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     21.84 ns/B     43.66 MiB/s     22.02 c/B
        ECB dec |     22.35 ns/B     42.67 MiB/s     22.53 c/B
        CBC enc |     22.97 ns/B     41.53 MiB/s     23.15 c/B
        CBC dec |     23.48 ns/B     40.61 MiB/s     23.67 c/B
        CFB enc |     22.72 ns/B     41.97 MiB/s     22.90 c/B
        CFB dec |     23.41 ns/B     40.74 MiB/s     23.59 c/B
        OFB enc |     23.65 ns/B     40.32 MiB/s     23.84 c/B
        OFB dec |     23.67 ns/B     40.29 MiB/s     23.86 c/B
        CTR enc |     23.24 ns/B     41.03 MiB/s     23.43 c/B
        CTR dec |     23.23 ns/B     41.05 MiB/s     23.42 c/B

New:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     26.03 ns/B     36.64 MiB/s     26.24 c/B
        ECB dec |     26.97 ns/B     35.36 MiB/s     27.18 c/B
        CBC enc |     23.21 ns/B     41.09 MiB/s     23.39 c/B
        CBC dec |     23.36 ns/B     40.83 MiB/s     23.54 c/B
        CFB enc |     23.02 ns/B     41.42 MiB/s     23.21 c/B
        CFB dec |     23.67 ns/B     40.28 MiB/s     23.86 c/B
        OFB enc |     27.86 ns/B     34.24 MiB/s     28.08 c/B
        OFB dec |     27.87 ns/B     34.21 MiB/s     28.10 c/B
        CTR enc |     23.47 ns/B     40.63 MiB/s     23.66 c/B
        CTR dec |     23.49 ns/B     40.61 MiB/s     23.67 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agobuild: Add configure option --disable-doc.
Werner Koch [Mon, 15 Dec 2014 11:05:32 +0000 (12:05 +0100)]
build: Add configure option --disable-doc.

* Makefile.am (AUTOMAKE_OPTIONS): Remove.
(doc) [!BUILD_DOC]: Do not recurse into the dir.
* configure.ac (AM_INIT_AUTOMAKE): Add option formerly in Makefile.am.
(BUILD_DOC): Add new am_conditional.

4 years agorijndael: further optimizations for AES-NI accelerated CBC and CFB bulk modes
Jussi Kivilinna [Sat, 6 Dec 2014 13:09:13 +0000 (15:09 +0200)]
rijndael: further optimizations for AES-NI accelerated CBC and CFB bulk modes

* cipher/rijndael-aesni.c (do_aesni_enc, do_aesni_dec): Pass
input/output through SSE register XMM0.
(do_aesni_cfb): Remove.
(_gcry_aes_aesni_encrypt, _gcry_aes_aesni_decrypt): Add loading/storing
input/output to/from XMM0.
(_gcry_aes_aesni_cfb_enc, _gcry_aes_aesni_cbc_enc)
(_gcry_aes_aesni_cfb_dec): Update to use renewed 'do_aesni_enc' and
move IV loading/storing outside loop.
(_gcry_aes_aesni_cbc_dec): Update to use renewed 'do_aesni_dec'.
--

CBC encryption speed is improved ~16% on Intel Haswell and CFB encryption ~8%.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoGCM: move Intel PCLMUL accelerated implementation to separate file
Jussi Kivilinna [Sat, 6 Dec 2014 08:38:36 +0000 (10:38 +0200)]
GCM: move Intel PCLMUL accelerated implementation to separate file

* cipher/Makefile.am: Add 'cipher-gcm-intel-pclmul.c'.
* cipher/cipher-gcm-intel-pclmul.c: New.
* cipher/cipher-gcm.c [GCM_USE_INTEL_PCLMUL]
(_gcry_ghash_setup_intel_pclmul, _gcry_ghash_intel_pclmul): New
prototypes.
[GCM_USE_INTEL_PCLMUL] (gfmul_pclmul, gfmul_pclmul_aggr4): Move
to 'cipher-gcm-intel-pclmul.c'.
(ghash): Rename to...
(ghash_internal): ...this and move GCM_USE_INTEL_PCLMUL part to new
function in 'cipher-gcm-intel-pclmul.c'.
(setupM): Move GCM_USE_INTEL_PCLMUL part to new function in
'cipher-gcm-intel-pclmul.c'; Add selection of ghash function based
on available HW acceleration.
(do_ghash_buf): Change use of 'ghash' to 'c->u_mode.gcm.ghash_fn'.
* cipher/internal.h (ghash_fn_t): New.
(gcry_cipher_handle): Remove 'use_intel_pclmul'; Add 'ghash_fn'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agorijndael: split Padlock part to separate file
Jussi Kivilinna [Mon, 1 Dec 2014 19:10:19 +0000 (21:10 +0200)]
rijndael: split Padlock part to separate file

* cipher/Makefile.am: Add 'rijndael-padlock.c'.
* cipher/rijndael-padlock.c: New.
* cipher/rijndael.c (do_padlock, do_padlock_encrypt)
(do_padlock_decrypt): Move to 'rijndael-padlock.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-padlock.lo'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agorijndael: refactor to reduce number of #ifdefs and branches
Jussi Kivilinna [Mon, 1 Dec 2014 19:10:19 +0000 (21:10 +0200)]
rijndael: refactor to reduce number of #ifdefs and branches

* cipher/rijndael-aesni.c (_gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt): Make return stack burn depth.
* cipher/rijndael-amd64.S (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block): Ditto.
* cipher/rijndael-arm.S (_gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Ditto.
* cipher/rijndael-internal.h (RIJNDAEL_context_s)
(rijndael_cryptfn_t): New.
(RIJNDAEL_context): New members 'encrypt_fn' and 'decrypt_fn'.
* cipher/rijndael.c (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block, _gcry_aes_aesni_encrypt)
(_gcry_aes_aesni_decrypt, _gcry_aes_arm_encrypt_block)
(_gcry_aes_arm_decrypt_block): Change prototypes.
(do_padlock_encrypt, do_padlock_decrypt): New.
(do_setkey): Separate key-length to rounds conversion from
HW features check; Add selection for ctx->encrypt_fn and
ctx->decrypt_fn.
(do_encrypt_aligned, do_decrypt_aligned): Move inside
'[!USE_AMD64_ASM && !USE_ARM_ASM]'; Move USE_AMD64_ASM and
USE_ARM_ASM to...
(do_encrypt, do_decrypt): ...here; Return stack depth; Remove second
temporary buffer from non-aligned input/output case.
(do_padlock): Move decrypt_flag to last argument; Return stack depth.
(rijndael_encrypt): Remove #ifdefs, just call ctx->encrypt_fn.
(_gcry_aes_cfb_enc, _gcry_aes_cbc_enc): Remove USE_PADLOCK; Call
ctx->encrypt_fn in place of do_encrypt/do_encrypt_aligned.
(_gcry_aes_ctr_enc): Call ctx->encrypt_fn in place of
do_encrypt_aligned; Make tmp buffer 16-byte aligned and wipe buffer
after use.
(rijndael_encrypt): Remove #ifdefs, just call ctx->decrypt_fn.
(_gcry_aes_cfb_dec): Remove USE_PADLOCK; Call ctx->decrypt_fn in place
of do_decrypt/do_decrypt_aligned.
(_gcry_aes_cbc_dec): Ditto; Make savebuf buffer 16-byte aligned.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agorijndael: move AES-NI blocks before Padlock
Jussi Kivilinna [Mon, 1 Dec 2014 19:10:19 +0000 (21:10 +0200)]
rijndael: move AES-NI blocks before Padlock

* cipher/rijndael.c (do_setkey, rijndael_encrypt, _gcry_aes_cfb_enc)
(rijndael_decrypt, _gcry_aes_cfb_dec): Move USE_AESNI before
USE_PADLOCK.
(check_decryption_praparation) [USE_PADLOCK]: Move to...
(prepare_decryption) [USE_PADLOCK]: ...here.
--

Make order of AES-NI and Padlock #ifdefs consistent.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agorijndael: split AES-NI functions to separate file
Jussi Kivilinna [Mon, 1 Dec 2014 19:10:19 +0000 (21:10 +0200)]
rijndael: split AES-NI functions to separate file

* cipher/Makefile.in: Add 'rijndael-aesni.c'.
* cipher/rijndael-aesni.c: New.
* cipher/rijndael-internal.h: New.
* cipher/rijndael.c (MAXKC, MAXROUNDS, BLOCKSIZE, ATTR_ALIGNED_16)
(USE_AMD64_ASM, USE_ARM_ASM, USE_PADLOCK, USE_AESNI, RIJNDAEL_context)
(keyschenc, keyschdec, padlockkey): Move to 'rijndael-internal.h'.
(u128_s, aesni_prepare, aesni_cleanup, aesni_cleanup_2_6)
(aesni_do_setkey, do_aesni_enc, do_aesni_dec, do_aesni_enc_vec4)
(do_aesni_dec_vec4, do_aesni_cfb, do_aesni_ctr, do_aesni_ctr_4): Move
to 'rijndael-aesni.c'.
(prepare_decryption, rijndael_encrypt, _gcry_aes_cfb_enc)
(_gcry_aes_cbc_enc, _gcry_aes_ctr_enc, rijndael_decrypt)
(_gcry_aes_cfb_dec, _gcry_aes_cbc_dec) [USE_AESNI]: Move to functions
in 'rijdael-aesni.c'.
* configure.ac [mpi_cpu_arch=x86]: Add 'rijndael-aesni.lo'.
--

Clean-up rijndael.c before new new hardware acceleration support gets added.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoRemove duplicated prototypes.
Werner Koch [Mon, 24 Nov 2014 11:28:33 +0000 (12:28 +0100)]
Remove duplicated prototypes.

* src/gcrypt-int.h (_gcry_mpi_ec_new, _gcry_mpi_ec_set_mpi)
(gcry_mpi_ec_set_point): Remove.
--

Thos used gpg_error_t instead of gpg_err_code_t and the picky AIX
compiler takes this as a severe error.

Signed-off-by: Werner Koch <wk@gnupg.org>
4 years agotests: Add a prime mode to benchmark.
Werner Koch [Tue, 14 Oct 2014 19:29:33 +0000 (21:29 +0200)]
tests: Add a prime mode to benchmark.

* tests/benchmark.c (progress_cb): Add a single char mode.
(prime_bench): New.
(main): Add a "prime" mode.  Factor with_progress out to file scope.

Signed-off-by: Werner Koch <wk@gnupg.org>
4 years agoecc: Improve Montgomery curve implementation.
NIIBE Yutaka [Wed, 19 Nov 2014 06:48:12 +0000 (15:48 +0900)]
ecc: Improve Montgomery curve implementation.

* cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Support
MPI_EC_MONTGOMERY.
* cipher/ecc.c (test_ecdh_only_keys): New.
(nist_generate_key): Call test_ecdh_only_keys for MPI_EC_MONTGOMERY.
(check_secret_key): Handle Montgomery curve of x-coordinate only.
* mpi/ec.c (_gcry_mpi_ec_mul_point): Resize points before the loop.
Simplify, using pointers of Q1, Q2, PRD, and SUM.
--

4 years agoDisable NEON for CPUs that are known to have broken NEON implementation
Jussi Kivilinna [Sun, 2 Nov 2014 15:45:35 +0000 (17:45 +0200)]
Disable NEON for CPUs that are known to have broken NEON implementation

* src/hwf-arm.c (detect_arm_proc_cpuinfo): Add parsing for CPU version
information and check if CPU is known to have broken NEON
implementation.
(_gcry_hwf_detect_arm): Filter out broken HW features.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoAdd ARM/NEON implementation of Poly1305
Jussi Kivilinna [Sun, 2 Nov 2014 14:01:11 +0000 (16:01 +0200)]
Add ARM/NEON implementation of Poly1305

* cipher/Makefile.am: Add 'poly1305-armv7-neon.S'.
* cipher/poly1305-armv7-neon.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_NEON)
(POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE)
(POLY1305_NEON_ALIGNMENT): New.
* cipher/poly1305.c [POLY1305_USE_NEON]
(_gcry_poly1305_armv7_neon_init_ext)
(_gcry_poly1305_armv7_neon_finish_ext)
(_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation
if HWF_ARM_NEON set.
* configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'.
--

Add Andrew Moon's public domain NEON implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt

Benchmark on Cortex-A8 (--cpu-mhz 1008):

Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     12.34 ns/B     77.27 MiB/s     12.44 c/B

New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |      2.12 ns/B     450.7 MiB/s      2.13 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20: add ARMv7/NEON implementation
Jussi Kivilinna [Wed, 6 Aug 2014 17:05:16 +0000 (20:05 +0300)]
chacha20: add ARMv7/NEON implementation

* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'.
* cipher/chacha20-armv7-neon.S: New.
* cipher/chacha20.c (USE_NEON): New.
[USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New.
(chacha20_do_setkey) [USE_NEON]: Use Neon implementation if
HWF_ARM_NEON flag set.
(selftest): Self-test encrypting buffer byte by byte.
* configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'.
--

Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt

Benchmark on Cortex-A8 (--cpu-mhz 1008):

Old:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     13.45 ns/B     70.92 MiB/s     13.56 c/B
     STREAM dec |     13.45 ns/B     70.90 MiB/s     13.56 c/B

New:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      6.20 ns/B     153.9 MiB/s      6.25 c/B
     STREAM dec |      6.20 ns/B     153.9 MiB/s      6.25 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoRegister DCO for Markus Teich
Werner Koch [Thu, 9 Oct 2014 06:31:35 +0000 (08:31 +0200)]
Register DCO for Markus Teich

--

4 years agompi: Add gcry_mpi_ec_sub.
Markus Teich [Tue, 7 Oct 2014 16:24:27 +0000 (18:24 +0200)]
mpi: Add gcry_mpi_ec_sub.

* NEWS (gcry_mpi_ec_sub): New.
* doc/gcrypt.texi (gcry_mpi_ec_sub): New.
* mpi/ec.c (_gcry_mpi_ec_sub, sub_points_edwards): New.
(sub_points_montgomery, sub_points_weierstrass): New stubs.
* src/gcrypt-int.h (_gcry_mpi_ec_sub): New.
* src/gcrypt.h.in (gcry_mpi_ec_sub): New.
* src/libgcrypt.def (gcry_mpi_ec_sub): New.
* src/libgcrypt.vers (gcry_mpi_ec_sub): New.
* src/mpi.h (_gcry_mpi_ec_sub_points): New.
* src/visibility.c (gcry_mpi_ec_sub): New.
* src/visibility.h (gcry_mpi_ec_sub): New.
--

This function subtracts two points on the curve. Only Twisted Edwards
curves are supported with this change.

Signed-off-by: Markus Teich <markus dot teich at stusta dot mhn dot de>
4 years agodoc: Fix a configure option name.
Werner Koch [Wed, 8 Oct 2014 12:42:36 +0000 (14:42 +0200)]
doc: Fix a configure option name.

--

4 years agoFix prime test for 2 and lower and add check command to mpicalc.
Werner Koch [Wed, 8 Oct 2014 12:41:21 +0000 (14:41 +0200)]
Fix prime test for 2 and lower and add check command to mpicalc.

* cipher/primegen.c (check_prime): Return true for the small primes.
(_gcry_prime_check): Return correct values for 2 and lower numbers.

* src/mpicalc.c (do_primecheck): New.
(main): Add command 'P'.
(main): Allow for larger input data.

4 years agoAdd Whirlpool AMD64/SSE2 assembly implementation
Jussi Kivilinna [Sun, 31 Aug 2014 10:17:24 +0000 (13:17 +0300)]
Add Whirlpool AMD64/SSE2 assembly implementation

* cipher/Makefile.am: Add 'whirlpool-sse2-amd64.S'.
* cipher/whirlpool-sse2-amd64.S: New.
* cipher/whirlpool.c (USE_AMD64_ASM): New.
(whirlpool_tables_s): New.
(rc, C0, C1, C2, C3, C4, C5, C6, C7): Combine these tables into single
structure and replace old tables with macros of same name.
(tab): New structure containing above tables.
[USE_AMD64_ASM] (_gcry_whirlpool_transform_amd64)
(whirlpool_transform): New.
* configure.ac [host=x86_64]: Add 'whirlpool-sse2-amd64.lo'.
--

Benchmark results:

On Intel Core i5-4570 (3.2 Ghz):
After:
 WHIRLPOOL      |      4.82 ns/B     197.8 MiB/s     15.43 c/B
Before:
 WHIRLPOOL      |      9.10 ns/B     104.8 MiB/s     29.13 c/B

On Intel Core i5-2450M (2.5 Ghz):
After:
 WHIRLPOOL      |      8.43 ns/B     113.1 MiB/s     21.09 c/B
Before:
 WHIRLPOOL      |     13.45 ns/B     70.92 MiB/s     33.62 c/B

On Intel Core2 T8100 (2.1 Ghz):
After:
 WHIRLPOOL      |     10.22 ns/B     93.30 MiB/s     21.47 c/B
Before:
 WHIRLPOOL      |     19.87 ns/B     48.00 MiB/s     41.72 c/B

Summary, old vs new ratio:

 Intel Core i5-4570: 1.88x
 Intel Core i5-2450M: 1.59x
 Intel Core2 T8100: 1.94x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoImproved ripemd160 performance
Andrei Scherer [Thu, 28 Aug 2014 17:45:35 +0000 (09:45 -0800)]
Improved ripemd160 performance

* cipher/rmd160.c (transform): Interleave the left and right lane
rounds to introduce more instruction level parallelism.
--

The benchmarks on different systems:

Intel(R) Atom(TM) CPU N570   @ 1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |     13.07 ns/B     72.97 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |     11.37 ns/B     83.84 MiB/s         - c/B

Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |      3.31 ns/B     288.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |      2.08 ns/B     458.5 MiB/s         - c/B

Signed-off-by: Andrei Scherer <andsch@inbox.com>
4 years agobuild: Document SYSROOT.
Werner Koch [Thu, 2 Oct 2014 12:49:31 +0000 (14:49 +0200)]
build: Document SYSROOT.

* configure.ac: Mark SYSROOT as arg var.

4 years agobuild: Support SYSROOT based config script finding.
Werner Koch [Thu, 2 Oct 2014 10:51:49 +0000 (12:51 +0200)]
build: Support SYSROOT based config script finding.

* src/libgcrypt.m4: Add support for SYSROOT and set
gpg_config_script_warn.  Use AC_PATH_PROG instead of AC_PATH_TOOL
because the config script is not expected to be installed with a
prefix for its name
* configure.ac: Print a library mismatch warning.
* m4/gpg-error.m4: Update from git master.
--

Also fixed the false copyright notice in libgcrypt.m4.

4 years agomac: Fix gcry_mac_close to allow for a NULL handle.
Werner Koch [Mon, 29 Sep 2014 15:34:28 +0000 (17:34 +0200)]
mac: Fix gcry_mac_close to allow for a NULL handle.

* cipher/mac.c (_gcry_mac_close): Check for NULL.
--

We always allow this for easier cleanup.  actually the docs already
tell that this is allowed.

4 years agoAdd a constant for a forthcoming new RNG.
Werner Koch [Wed, 3 Sep 2014 06:53:43 +0000 (08:53 +0200)]
Add a constant for a forthcoming new RNG.

* src/gcrypt.h.in (GCRYCTL_DRBG_REINIT): New constant.

4 years agoAdd new Poly1305 MAC test vectors
Jussi Kivilinna [Tue, 2 Sep 2014 17:40:07 +0000 (20:40 +0300)]
Add new Poly1305 MAC test vectors

* tests/basic.c (check_mac): Add new test vectors for Poly1305 MAC.
--

Patch adds new test vectors for Poly1305 MAC from Internet Draft
draft-irtf-cfrg-chacha20-poly1305-01.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoasm: Allow building x86 and amd64 using old compilers.
Werner Koch [Tue, 2 Sep 2014 07:25:20 +0000 (09:25 +0200)]
asm: Allow building x86 and amd64 using old compilers.

* src/hwf-x86.c (get_xgetbv): Build only if AVX support is enabled.
--

Old as(1) versions do not support the xgetvb instruction.  Thus build
this function only if asm support has been requested.

GnuPG-bug-id: 1708

4 years agoAdd DCO entries for Andrei Scherer and Stefan Mueller.
Werner Koch [Mon, 1 Sep 2014 09:40:31 +0000 (11:40 +0200)]
Add DCO entries for Andrei Scherer and Stefan Mueller.

--

4 years agompi: Re-indent longlong.h.
Werner Koch [Fri, 29 Aug 2014 12:54:11 +0000 (14:54 +0200)]
mpi: Re-indent longlong.h.

--
Indenting the cpp statements should make longlong.h better readable.

4 years agosexp: Check args of gcry_sexp_build.
Werner Koch [Thu, 21 Aug 2014 12:12:55 +0000 (14:12 +0200)]
sexp: Check args of gcry_sexp_build.

* src/sexp.c (do_vsexp_sscan): Return error for invalid args.
--

This helps to avoid usage errors by passing NULL for the return
variable and the format string.

4 years agocipher: Fix a segv in case of calling with wrong parameters.
Werner Koch [Thu, 21 Aug 2014 09:47:16 +0000 (11:47 +0200)]
cipher: Fix a segv in case of calling with wrong parameters.

* cipher/md.c (_gcry_md_info): Fix arg testing.
--

GnuPG-bug-id: 1697

4 years agocipher: Fix possible NULL deref in call to prime generator.
Werner Koch [Thu, 21 Aug 2014 09:39:17 +0000 (11:39 +0200)]
cipher: Fix possible NULL deref in call to prime generator.

* cipher/primegen.c (_gcry_generate_elg_prime): Change to return an
error code.
* cipher/dsa.c (generate): Take care of new return code.
* cipher/elgamal.c (generate): Change to return an error code.  Take
care of _gcry_generate_elg_prime return code.
(generate_using_x): Take care of _gcry_generate_elg_prime return code.
(elg_generate): Propagate return code from generate.
--

GnuPG-bug-id: 1699, 1700
Reported-by: S.K. Gupta
Note that the NULL deref may have only happened on malloc failure.

4 years agoecc: Support Montgomery curve for gcry_mpi_ec_mul_point.
NIIBE Yutaka [Tue, 12 Aug 2014 01:03:39 +0000 (10:03 +0900)]
ecc: Support Montgomery curve for gcry_mpi_ec_mul_point.

* mpi/ec.c (_gcry_mpi_ec_get_affine): Support Montgomery curve.
(montgomery_ladder): New.
(_gcry_mpi_ec_mul_point): Implemention using montgomery_ladder.
(_gcry_mpi_ec_curve_point): Check x-coordinate is valid.
--

Given Montgomery curve: b * y^2 == x^3 + a * x^2 + x
CTX->A has (a-2)/4 and CTX->B has b^-1

Note that _gcry_mpi_ec_add_points is not supported for this curve.

4 years agotests: Add a benchmark for Elgamal.
Werner Koch [Sat, 9 Aug 2014 12:36:59 +0000 (14:36 +0200)]
tests: Add a benchmark for Elgamal.

* tests/benchmark.c (sample_public_elg_key_1024): New.
(sample_private_elg_key_1024): New.
(sample_public_elg_key_2048, sample_private_elg_key_2048): New.
(sample_public_elg_key_3072, sample_private_elg_key_3072): New.
(elg_bench): New.
(main): Add elg_bench.  Add commands "elg" and "public".

4 years agoecc: Add cofactor to domain parameters.
NIIBE Yutaka [Fri, 8 Aug 2014 00:35:31 +0000 (09:35 +0900)]
ecc: Add cofactor to domain parameters.

* src/ec-context.h (mpi_ec_ctx_s): Add cofactor 'h'.
* cipher/ecc-common.h (elliptic_curve_t): Add cofactor 'h'.
(_gcry_ecc_update_curve_param): New API adding cofactor.

* cipher/ecc-curves.c (ecc_domain_parms_t): Add cofactor 'h'.
(ecc_domain_parms_t domain_parms): Add cofactors.
(_gcry_ecc_fill_in_curve, _gcry_ecc_update_curve_param)
(_gcry_ecc_get_curve, _gcry_mpi_ec_new, _gcry_ecc_get_param_sexp)
(_gcry_ecc_get_mpi): Handle cofactor.
* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Likewise.
* cipher/ecc-misc.c (_gcry_ecc_curve_free)
(_gcry_ecc_curve_copy): Likewise.
* cipher/ecc.c (nist_generate_key, ecc_generate)
(ecc_check_secret_key, ecc_sign, ecc_verify, ecc_encrypt_raw)
(ecc_decrypt_raw, _gcry_pk_ecc_get_sexp, _gcry_pubkey_spec_ecc):
Likewise.
(compute_keygrip): Handle cofactor, but skip it for its computation.
* mpi/ec.c (ec_deinit): Likewise.
* tests/t-mpi-point.c (context_param): Likewise.
(test_curve): Add cofactors.
* tests/curves.c (sample_key_1, sample_key_2): Add cofactors.
* tests/keygrip.c (key_grips): Add cofactors.
--

We keep compatibility of compute_keygrip in cipher/ecc.c.

4 years agompi: Fix regression for powerpc-apple-darwin detection.
Werner Koch [Tue, 5 Aug 2014 10:26:36 +0000 (12:26 +0200)]
mpi: Fix regression for powerpc-apple-darwin detection.

* mpi/config.links: Add separate entry for powerpc-apple-darwin.
--

GnuPG-bug-id: 1616

4 years agoFix bug inhibiting the use of the sentinel attribute.
Werner Koch [Tue, 5 Aug 2014 10:15:26 +0000 (12:15 +0200)]
Fix bug inhibiting the use of the sentinel attribute.

* src/gcrypt.h.in: Fix typo in macro.
--

Reported-by: Rafaël Carré <funman@videolan.org>
4 years agompi: Use BSD syntax for x86_64-apple-darwin
Werner Koch [Tue, 5 Aug 2014 10:12:52 +0000 (12:12 +0200)]
mpi: Use BSD syntax for x86_64-apple-darwin

* mpi/config.links: Add case for x86_64-apple-darwin.
--

Suggested by gniibe on 2014-04-24.

4 years agoFix building for the x32 target without asm modules.
Kristian Fiskerstrand [Tue, 29 Jul 2014 17:34:31 +0000 (19:34 +0200)]
Fix building for the x32 target without asm modules.

* mpi/generic/mpi-asm-defs.h: Use a fixed value for the x32 ABI.
--

See commit fd6721c235a5bdcb332c8eb708fbd4f96e52e824 for details.

4 years agoecc: Support the non-standard 0x40 compression flag for EdDSA.
Werner Koch [Thu, 24 Jul 2014 10:30:32 +0000 (12:30 +0200)]
ecc: Support the non-standard 0x40 compression flag for EdDSA.

* cipher/ecc.c (ecc_generate): Check the "comp" flag for EdDSA.
* cipher/ecc-eddsa.c (eddsa_encode_x_y): Add arg WITH_PREFIX.
(_gcry_ecc_eddsa_encodepoint): Ditto.
(_gcry_ecc_eddsa_ensure_compact): Handle the 0x40 compression prefix.
(_gcry_ecc_eddsa_decodepoint): Ditto.
* tests/keygrip.c: Check an compresssed with prefix Ed25519 key.
* tests/t-ed25519.inp: Ditto.

4 years agompi: Extend the internal mpi_get_buffer.
Werner Koch [Thu, 24 Jul 2014 14:16:53 +0000 (16:16 +0200)]
mpi: Extend the internal mpi_get_buffer.

* mpi/mpicoder.c (do_get_buffer): Add arg EXTRAALLOC.
(_gcry_mpi_get_buffer_extra): New.

4 years agocipher: Fix compiler warning for chacha20.
Werner Koch [Thu, 24 Jul 2014 09:12:37 +0000 (11:12 +0200)]
cipher: Fix compiler warning for chacha20.

* cipher/chacha20.c (chacha20_blocks) [!USE_SSE2]: Do not build.

4 years agompi: Add mpi_swap_cond.
NIIBE Yutaka [Wed, 16 Jul 2014 08:05:55 +0000 (17:05 +0900)]
mpi: Add mpi_swap_cond.

* mpi/mpiutil.c (_gcry_mpi_swap_cond): New.
* src/mpi.h (mpi_swap_cond): New.
--

This is an internal function for now.

4 years agoSpeed-up SHA-1 NEON assembly implementation
Jussi Kivilinna [Sun, 29 Jun 2014 14:36:29 +0000 (17:36 +0300)]
Speed-up SHA-1 NEON assembly implementation

* cipher/sha1-armv7-neon.S: Tweak implementation for speed-up.
--

Benchmark on Cortex-A8 1008Mhz:

New:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |      7.04 ns/B     135.4 MiB/s      7.10 c/B

Old:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |      7.79 ns/B     122.4 MiB/s      7.85 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agogostr3411_94: rewrite to use u32 mathematic
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:36 +0000 (22:48 +0400)]
gostr3411_94: rewrite to use u32 mathematic

* cipher/gost28147.c (_gcry_gost_enc_data): New.
* cipher/gostr3411-94.c: Rewrite implementation to use u32 mathematic
  internally.
* cipher/gost28147.c (_gcry_gost_enc_one): Remove.

--
On my box (Core2 Duo, i386) this highly improves GOST R 34.11-94 speed.

Before:
 GOSTR3411_94   |     55.04 ns/B     17.33 MiB/s         - c/B

After:
 GOSTR3411_94   |     36.70 ns/B     25.99 MiB/s         - c/B

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agogost28147: use bufhelp helpers
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:35 +0000 (22:48 +0400)]
gost28147: use bufhelp helpers

* cipher/gost28147.c (gost_setkey, gost_encrypt_block, gost_decrypt_block):
  use buf_get_le32/buf_put_le32 helpers.

--
On my box this boosts GOST 28147-89 speed from 36 MiB/s up to 44.5 MiB/s.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agoFixup curve name in the GOST2012 test case
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:34 +0000 (22:48 +0400)]
Fixup curve name in the GOST2012 test case

* tests/basic.c (check_pubkey): fixup curve name in public key.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agoUpdate PBKDF2 tests with GOST R 34.11-94 test cases
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:33 +0000 (22:48 +0400)]
Update PBKDF2 tests with GOST R 34.11-94 test cases

* tests/t-kdf.c (check_pbkdf2): Add MD_GOSTR3411_CP test cases.

--
TC26 (Technical Comitee for standardization "Cryptography and security
mechanisms") published a document with test vectors for PBKDF2 used
with GOST R 34.11-94 message digest function.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agoAdd GOST R 34.11-94 variant using id-GostR3411-94-CryptoProParamSet
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:32 +0000 (22:48 +0400)]
Add GOST R 34.11-94 variant using id-GostR3411-94-CryptoProParamSet

* src/gcrypt.h.in (GCRY_MD_GOSTR3411_CP): New.
* src/cipher.h (_gcry_digest_spec_gost3411_cp): New.
* cipher/gost28147.c (_gcry_gost_enc_one): Differentiate between
  CryptoPro and Test S-Boxes.
* cipher/gostr3411-94.c (_gcry_digest_spec_gost3411_cp,
  gost3411_cp_init): New.
* cipher/md.c (md_open): GCRY_MD_GOSTR3411_CP also uses B=32.

--
RFC4357 defines only two S-Boxes that should be used together with
GOST R 34.11-94 - a testing one (from standard itself, for testing only)
and CryptoPro one. Instead of adding a separate gcry_md_ctrl() function
just to switch s-boxes, add a separate MD algorithm using CryptoPro
S-box.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agogost28147: support GCRYCTL_SET_SBOX
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:31 +0000 (22:48 +0400)]
gost28147: support GCRYCTL_SET_SBOX

cipher/gost28147.c (gost_set_extra_info, gost_set_sbox): New.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agoSupport setting s-box for the ciphers that require it
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:30 +0000 (22:48 +0400)]
Support setting s-box for the ciphers that require it

* src/gcrypt.h.in (GCRYCTL_SET_SBOX, gcry_cipher_set_sbox): New.
* cipher/cipher.c (_gcry_cipher_ctl): pass GCRYCTL_SET_SBOX to
  set_extra_info callback.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agocipher/gost28147: generate optimized s-boxes from compact ones
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:29 +0000 (22:48 +0400)]
cipher/gost28147: generate optimized s-boxes from compact ones

* cipher/gost-s-box.c: New. Outputs optimized expanded representation of
  s-boxes (4x256) from compact 16x8 representation.
* cipher/Makefile.am: Add gost-sb.h dependency to gost28147.lo
* cipher/gost.h: Add sbox to the GOST28147_context structure.
* cipher/gost28147.c (gost_setkey): Set default s-box to test s-box from
  GOST R 34.11 (this was the only one S-box before).
* cipher/gost28147.c (gost_val): Use sbox from the context.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agogost28147: add OIDs used to define cipher mode
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:28 +0000 (22:48 +0400)]
gost28147: add OIDs used to define cipher mode

* cipher/gost28147 (oids_gost28147): Add OID from RFC4357.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agoGOST R 34.11-94 add OIDs
Dmitry Eremin-Solenikov [Fri, 6 Jun 2014 18:48:26 +0000 (22:48 +0400)]
GOST R 34.11-94 add OIDs

* cipher/gostr3411-94.c: Add OIDs for GOST R 34.11-94 from RFC 4357.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
4 years agotests: add larger test-vectors for hash algorithms
Jussi Kivilinna [Wed, 21 May 2014 05:30:30 +0000 (08:30 +0300)]
tests: add larger test-vectors for hash algorithms

* tests/basic.c (check_digests): Add large test-vectors for MD5, SHA1,
SHA224, SHA256, SHA384, RMD160, CRC32, TIGER1, WHIRLPOOL and
GOSTR3411_94.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agosha512: fix ARM/NEON implementation
Jussi Kivilinna [Wed, 21 May 2014 05:30:30 +0000 (08:30 +0300)]
sha512: fix ARM/NEON implementation

* cipher/sha512-armv7-neon.S
(_gcry_sha512_transform_armv7_neon): Byte-swap RW67q and RW1011q
correctly in multi-block loop.
* tests/basic.c (check_digests): Add large test vector for SHA512.
--

Patch fixes bug introduced to multi-block processing by commit df629ba53a6,
"Improve performance of SHA-512/ARM/NEON implementation". Patch also adds
multi-block test vector for SHA-512.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoFix ARM assembly when building __PIC__
Jussi Kivilinna [Tue, 20 May 2014 17:35:51 +0000 (20:35 +0300)]
Fix ARM assembly when building __PIC__

* cipher/camellia-arm.S (GET_DATA_POINTER): New.
(_gcry_camellia_arm_encrypt_block): Use GET_DATA_POINTER.
(_gcry_camellia_arm_decrypt_block): Ditto.
* cipher/cast5-arm.S (GET_DATA_POINTER): New.
(_gcry_cast5_arm_encrypt_block, _gcry_cast5_arm_decrypt_block)
(_gcry_cast5_arm_enc_blk2, _gcry_cast5_arm_dec_blk2): Use
GET_DATA_POINTER.
* cipher/rijndael-arm.S (GET_DATA_POINTER): New.
(_gcry_aes_arm_encrypt_block, _gcry_aes_arm_decrypt_block): Use
GET_DATA_POINTER.
* cipher/sha1-armv7-neon.S (GET_DATA_POINTER): New.
(.LK_VEC): Move from .text to .data section.
(_gcry_sha1_transform_armv7_neon): Use GET_DATA_POINTER.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoAdd Poly1305 to documentation
Jussi Kivilinna [Sat, 17 May 2014 15:30:39 +0000 (18:30 +0300)]
Add Poly1305 to documentation

* doc/gcrypt.texi: Add documentation for Poly1305 MACs and AEAD mode.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20: add SSE2/AMD64 optimized implementation
Jussi Kivilinna [Fri, 16 May 2014 18:28:26 +0000 (21:28 +0300)]
chacha20: add SSE2/AMD64 optimized implementation

* cipher/Makefile.am: Add 'chacha20-sse2-amd64.S'.
* cipher/chacha20-sse2-amd64.S: New.
* cipher/chacha20.c (USE_SSE2): New.
[USE_SSE2] (_gcry_chacha20_amd64_sse2_blocks): New.
(chacha20_do_setkey) [USE_SSE2]: Use SSE2 implementation for blocks
function.
* configure.ac [host=x86-64]: Add 'chacha20-sse2-amd64.lo'.
--

Add Andrew Moon's public domain SSE2 implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt

Benchmark on Intel i5-4570 (haswell),
with "--disable-hwf intel-avx2 --disable-hwf intel-ssse3":

Old:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      1.97 ns/B     483.8 MiB/s      6.31 c/B
     STREAM dec |      1.97 ns/B     483.6 MiB/s      6.31 c/B

New:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.931 ns/B    1024.7 MiB/s      2.98 c/B
     STREAM dec |     0.930 ns/B    1025.0 MiB/s      2.98 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agopoly1305: add AMD64/AVX2 optimized implementation
Jussi Kivilinna [Sun, 11 May 2014 17:52:27 +0000 (20:52 +0300)]
poly1305: add AMD64/AVX2 optimized implementation

* cipher/Makefile.am: Add 'poly1305-avx2-amd64.S'.
* cipher/poly1305-avx2-amd64.S: New.
* cipher/poly1305-internal.h (POLY1305_USE_AVX2)
(POLY1305_AVX2_BLOCKSIZE, POLY1305_AVX2_STATESIZE)
(POLY1305_AVX2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use AVX2 versions when needed.
* cipher/poly1305.c [POLY1305_USE_AVX2]
(_gcry_poly1305_amd64_avx2_init_ext)
(_gcry_poly1305_amd64_avx2_finish_ext)
(_gcry_poly1305_amd64_avx2_blocks, poly1305_amd64_avx2_ops): New.
(_gcry_poly1305_init) [POLY1305_USE_AVX2]: Use AVX2 implementation if
AVX2 supported by CPU.
* configure.ac [host=x86_64]: Add 'poly1305-avx2-amd64.lo'.
--

Add Andrew Moon's public domain AVX2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt

Benchmarks on Intel i5-4570 (haswell):

Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.448 ns/B    2129.5 MiB/s      1.43 c/B

New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.205 ns/B    4643.5 MiB/s     0.657 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agopoly1305: add AMD64/SSE2 optimized implementation
Jussi Kivilinna [Sun, 11 May 2014 17:18:49 +0000 (20:18 +0300)]
poly1305: add AMD64/SSE2 optimized implementation

* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'.
* cipher/poly1305-internal.h (POLY1305_USE_SSE2)
(POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE)
(POLY1305_SSE2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed.
* cipher/poly1305-sse2-amd64.S: New.
* cipher/poly1305.c [POLY1305_USE_SSE2]
(_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New.
(_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version.
* configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'.
--

Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt

Benchmarks on Intel i5-4570 (haswell):

Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.844 ns/B    1130.2 MiB/s      2.70 c/B

New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.448 ns/B    2129.5 MiB/s      1.43 c/B

Benchmarks on Intel i5-2450M (sandy-bridge):

Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |      1.25 ns/B     763.0 MiB/s      3.12 c/B

New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
 POLY1305           |     0.605 ns/B    1575.9 MiB/s      1.51 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoAdd Poly1305 based cipher AEAD mode
Jussi Kivilinna [Sun, 11 May 2014 09:00:19 +0000 (12:00 +0300)]
Add Poly1305 based cipher AEAD mode

* cipher/Makefile.am: Add 'cipher-poly1305.c'.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.poly1305'.
(_gcry_cipher_poly1305_encrypt, _gcry_cipher_poly1305_decrypt)
(_gcry_cipher_poly1305_setiv, _gcry_cipher_poly1305_authenticate)
(_gcry_cipher_poly1305_get_tag, _gcry_cipher_poly1305_check_tag): New.
* cipher/cipher-poly1305.c: New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_reset, cipher_encrypt, cipher_decrypt, _gcry_cipher_setiv)
(_gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag): Handle 'GCRY_CIPHER_MODE_POLY1305'.
(cipher_setiv): Move handling of 'GCRY_CIPHER_MODE_GCM' to ...
(_gcry_cipher_setiv): ... here, as with other modes.
* src/gcrypt.h.in: Add 'GCRY_CIPHER_MODE_POLY1305'.
* tests/basic.c (_check_poly1305_cipher, check_poly1305_cipher): New.
(check_ciphers): Add Poly1305 check.
(check_cipher_modes): Call 'check_poly1305_cipher'.
* tests/bench-slope.c (bench_gcm_encrypt_do_bench): Rename to
bench_aead_... and take nonce as argument.
(bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench): Ditto.
(bench_gcm_encrypt_do_bench, bench_gcm_decrypt_do_bench)
(bench_gcm_authenticate_do_bench, bench_poly1305_encrypt_do_bench)
(bench_poly1305_decrypt_do_bench)
(bench_poly1305_authenticate_do_bench, poly1305_encrypt_ops)
(poly1305_decrypt_ops, poly1305_authenticate_ops): New.
(cipher_modes): Add Poly1305.
(cipher_bench_one): Add special handling for Poly1305.
--

Patch adds Poly1305 based AEAD cipher mode to libgcrypt. ChaCha20 variant
of this mode is proposed for use in TLS and ipsec:
 https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
 http://tools.ietf.org/html/draft-nir-ipsecme-chacha20-poly1305-02

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoAdd Poly1305-AES (-Camellia, etc) MACs
Jussi Kivilinna [Sun, 11 May 2014 09:00:19 +0000 (12:00 +0300)]
Add Poly1305-AES (-Camellia, etc) MACs

* cipher/mac-internal.h (_gcry_mac_type_spec_poly1305_aes)
(_gcry_mac_type_spec_poly1305_camellia)
(_gcry_mac_type_spec_poly1305_twofish)
(_gcry_mac_type_spec_poly1305_serpent)
(_gcry_mac_type_spec_poly1305_seed): New.
* cipher/mac-poly1305.c (poly1305mac_context_s): Add 'hd' and
'nonce_set'.
(poly1305mac_open, poly1305mac_close, poly1305mac_setkey): Add handling
for Poly1305-*** MACs.
(poly1305mac_prepare_key, poly1305mac_setiv): New.
(poly1305mac_reset, poly1305mac_write, poly1305mac_read): Add handling
for 'nonce_set'.
(poly1305mac_ops): Add 'poly1305mac_setiv'.
(_gcry_mac_type_spec_poly1305_aes)
(_gcry_mac_type_spec_poly1305_camellia)
(_gcry_mac_type_spec_poly1305_twofish)
(_gcry_mac_type_spec_poly1305_serpent)
(_gcry_mac_type_spec_poly1305_seed): New.
* cipher/mac.c (mac_list): Add Poly1305-AES, Poly1305-Twofish,
Poly1305-Serpent, Poly1305-SEED and Poly1305-Camellia.
* src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305_AES',
'GCRY_MAC_POLY1305_CAMELLIA', 'GCRY_MAC_POLY1305_TWOFISH',
'GCRY_MAC_POLY1305_SERPENT' and 'GCRY_MAC_POLY1305_SEED'.
* tests/basic.c (check_mac): Add Poly1305-AES test vectors.
* tests/bench-slope.c (bench_mac_init): Set IV for Poly1305-*** MACs.
* tests/bench-slope.c (mac_bench): Set IV for Poly1305-*** MACs.
--

Patch adds Bernstein's Poly1305-AES message authentication code to libgcrypt
and other variants of Poly1305-<128-bit block cipher>.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoAdd Poly1305 MAC
Jussi Kivilinna [Sun, 11 May 2014 09:00:19 +0000 (12:00 +0300)]
Add Poly1305 MAC

* cipher/Makefile.am: Add 'mac-poly1305.c', 'poly1305.c' and
'poly1305-internal.h'.
* cipher/mac-internal.h (poly1305mac_context_s): New.
(gcry_mac_handle): Add 'u.poly1305mac'.
(_gcry_mac_type_spec_poly1305mac): New.
* cipher/mac-poly1305.c: New.
* cipher/mac.c (mac_list): Add Poly1305.
* cipher/poly1305-internal.h: New.
* cipher/poly1305.c: New.
* src/gcrypt.h.in: Add 'GCRY_MAC_POLY1305'.
* tests/basic.c (check_mac): Add Poly1035 test vectors; Allow
overriding lengths of data and key buffers.
* tests/bench-slope.c (mac_bench): Increase max algo number from 500 to
600.
* tests/benchmark.c (mac_bench): Ditto.
--

Patch adds Bernstein's Poly1305 message authentication code to libgcrypt.
Implementation is based on Andrew Moon's public domain implementation
from: https://github.com/floodyberry/poly1305-opt

The algorithm added by this patch is the plain Poly1305 without AES and
takes 32-bit key that must not be reused.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20/AVX2: clear upper-halfs of YMM registers on entry
Jussi Kivilinna [Mon, 12 May 2014 17:14:32 +0000 (20:14 +0300)]
chacha20/AVX2: clear upper-halfs of YMM registers on entry

* cipher/chacha20-avx2-amd64.S (_gcry_chacha20_amd64_avx2_blocks): Add
'vzeroupper' at beginning.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20/AVX2: check for ENABLE_AVX2_SUPPORT instead of HAVE_GCC_INLINE_ASM_AVX2
Jussi Kivilinna [Mon, 12 May 2014 17:11:33 +0000 (20:11 +0300)]
chacha20/AVX2: check for ENABLE_AVX2_SUPPORT instead of HAVE_GCC_INLINE_ASM_AVX2

* cipher/chacha20.c (USE_AVX2): Enable depending on
ENABLE_AVX2_SUPPORT, not HAVE_GCC_INLINE_ASM_AVX2.
* cipher/chacha20-avx2-amd64.S: Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20/SSSE3: clear XMM registers after use
Jussi Kivilinna [Mon, 12 May 2014 16:55:35 +0000 (19:55 +0300)]
chacha20/SSSE3: clear XMM registers after use

* cipher/chacha20-ssse3-amd64.S (_gcry_chacha20_amd64_ssse3_blocks): On
return, clear XMM registers.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20: add AVX2/AMD64 assembly implementation
Jussi Kivilinna [Sun, 11 May 2014 09:00:19 +0000 (12:00 +0300)]
chacha20: add AVX2/AMD64 assembly implementation

* cipher/Makefile.am: Add 'chacha20-avx2-amd64.S'.
* cipher/chacha20-avx2-amd64.S: New.
* cipher/chacha20.c (USE_AVX2): New macro.
[USE_AVX2] (_gcry_chacha20_amd64_avx2_blocks): New.
(chacha20_do_setkey): Select AVX2 implementation if there is HW
support.
(selftest): Increase size of buf by 256.
* configure.ac [host=x86-64]: Add 'chacha20-avx2-amd64.lo'.
--

Add AVX2 optimized implementation for ChaCha20. Based on implementation by
Andrew Moon.

SSSE3 (Intel Haswell):

 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.742 ns/B    1284.8 MiB/s      2.38 c/B
     STREAM dec |     0.741 ns/B    1286.5 MiB/s      2.37 c/B

AVX2:

 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.393 ns/B    2428.0 MiB/s      1.26 c/B
     STREAM dec |     0.392 ns/B    2433.6 MiB/s      1.25 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agochacha20: add SSSE3 assembly implementation
Jussi Kivilinna [Sun, 11 May 2014 09:00:19 +0000 (12:00 +0300)]
chacha20: add SSSE3 assembly implementation

* cipher/Makefile.am: Add 'chacha20-ssse3-amd64.S'.
* cipher/chacha20-ssse3-amd64.S: New.
* cipher/chacha20.c (USE_SSSE3): New macro.
[USE_SSSE3] (_gcry_chacha20_amd64_ssse3_blocks): New.
(chacha20_do_setkey): Select SSSE3 implementation if there is HW
support.
* configure.ac [host=x86-64]: Add 'chacha20-ssse3-amd64.lo'.
--

Add SSSE3 optimized implementation for ChaCha20. Based on implementation
by Andrew Moon.

Before (Intel Haswell):

 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      1.97 ns/B     483.6 MiB/s      6.31 c/B
     STREAM dec |      1.97 ns/B     484.0 MiB/s      6.31 c/B

After:

 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.742 ns/B    1284.8 MiB/s      2.38 c/B
     STREAM dec |     0.741 ns/B    1286.5 MiB/s      2.37 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agoAdd ChaCha20 stream cipher
Jussi Kivilinna [Sun, 11 May 2014 09:00:19 +0000 (12:00 +0300)]
Add ChaCha20 stream cipher

* cipher/Makefile.am: Add 'chacha20.c'.
* cipher/chacha20.c: New.
* cipher/cipher.c (cipher_list): Add ChaCha20.
* configure.ac: Add ChaCha20.
* doc/gcrypt.texi: Add ChaCha20.
* src/cipher.h (_gcry_cipher_spec_chacha20): New.
* src/gcrypt.h.in (GCRY_CIPHER_CHACHA20): Add new algo.
* tests/basic.c (MAX_DATA_LEN): Increase to 128 from 100.
(check_stream_cipher): Add ChaCha20 test-vectors.
(check_ciphers): Add ChaCha20.
--

Patch adds Bernstein's ChaCha20 cipher to libgcrypt. Implementation is based
on public domain implementations.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
4 years agompi: Fix a subtle bug setting spurious bits with in mpi_set_bit.
Werner Koch [Fri, 9 May 2014 10:35:15 +0000 (12:35 +0200)]
mpi: Fix a subtle bug setting spurious bits with in mpi_set_bit.

* mpi/mpi-bit.c (_gcry_mpi_set_bit, _gcry_mpi_set_highbit): Clear
allocated but not used bits before resizing.
* tests/t-mpi-bits.c (set_bit_with_resize): New.
--

Reported-by: Martin Sewelies.
This bug is probably with us for many years.  Probably due to
different memory allocation patterns, it did first revealed itself
with 1.6.  It could be the reason for other heisenbugs.

Signed-off-by: Werner Koch <wk@gnupg.org>
4 years agoComment typo fix
Werner Koch [Fri, 9 May 2014 10:11:30 +0000 (12:11 +0200)]
Comment typo fix

--

4 years agoBump LT version.
Werner Koch [Wed, 7 May 2014 09:05:36 +0000 (11:05 +0200)]
Bump LT version.

* configure.ac: Bumb LT version to C21/A1/R0.
--

This is to avoid conflicts with the 1.6 series.  Note that if we add a
new interface to 1.6 we would need to bump age again.

4 years agorandom: Small patch for consistency and really burn the stack.
Werner Koch [Tue, 15 Apr 2014 14:40:48 +0000 (16:40 +0200)]
random: Small patch for consistency and really burn the stack.

* random/rndlinux.c (_gcry_rndlinux_gather_random): s/int/size_t/.
(_gcry_rndlinux_gather_random): Replace memset by wipememory.
--

size_t was suggested by Marcus Meissner <meissner@suse.de>.  While
looking at the code I identified the useless (i.e. likely optimized
away) memset.

5 years agopubkey: Re-map all depreccated RSA algo numbers.
Werner Koch [Tue, 15 Apr 2014 14:40:48 +0000 (16:40 +0200)]
pubkey: Re-map all depreccated RSA algo numbers.

* cipher/pubkey.c (map_algo): Mape RSA_E and RSA_S.

5 years agocipher: Fix possible NULL dereference.
Werner Koch [Tue, 15 Apr 2014 14:40:48 +0000 (16:40 +0200)]
cipher: Fix possible NULL dereference.

* cipher/md.c (_gcry_md_selftest): Check for spec being NULL.
--

Also removed left-over code in unused file cipher/test-getrusage.c.

Found by Hans-Christoph Steiner with cppcheck.

5 years ago3des: add amd64 assembly implementation for 3DES
Jussi Kivilinna [Sun, 30 Mar 2014 15:11:09 +0000 (18:11 +0300)]
3des: add amd64 assembly implementation for 3DES

* cipher/Makefile.am: Add 'des-amd64.S'.
* cipher/cipher-selftests.c (_gcry_selftest_helper_cbc)
(_gcry_selftest_helper_cfb, _gcry_selftest_helper_ctr): Handle failures
from 'setkey' function.
* cipher/cipher.c (_gcry_cipher_open_internal) [USE_DES]: Setup bulk
functions for 3DES.
* cipher/des-amd64.S: New file.
* cipher/des.c (USE_AMD64_ASM, ATTR_ALIGNED_16): New macros.
[USE_AMD64_ASM] (_gcry_3des_amd64_crypt_block)
(_gcry_3des_amd64_ctr_enc), _gcry_3des_amd64_cbc_dec)
(_gcry_3des_amd64_cfb_dec): New prototypes.
[USE_AMD64_ASM] (tripledes_ecb_crypt): New function.
(TRIPLEDES_ECB_BURN_STACK): New macro.
(_gcry_3des_ctr_enc, _gcry_3des_cbc_dec, _gcry_3des_cfb_dec)
(bulk_selftest_setkey, selftest_ctr, selftest_cbc, selftest_cfb): New
functions.
(selftest): Add call to CTR, CBC and CFB selftest functions.
(do_tripledes_encrypt, do_tripledes_decrypt): Use
TRIPLEDES_ECB_BURN_STACK.
* configure.ac [host=x86-64]: Add 'des-amd64.lo'.
* src/cipher.h (_gcry_3des_ctr_enc, _gcry_3des_cbc_dec)
(_gcry_3des_cfb_dec): New prototypes.
--

Add non-parallel functions for small speed-up and 3-way parallel functions for
modes of operation that support parallel processing.

Old vs new (Intel Core i5-4570):
================================
        enc    dec
 ECB    1.17x  1.17x
 CBC    1.17x  2.51x
 CFB    1.16x  2.49x
 OFB    1.17x  1.17x
 CTR    2.56x  2.56x

Old vs new (Intel Core i5-2450M):
=================================
        enc    dec
 ECB    1.28x  1.28x
 CBC    1.27x  2.33x
 CFB    1.27x  2.34x
 OFB    1.27x  1.27x
 CTR    2.36x  2.35x

New (Intel Core i5-4570):
=========================
 3DES           |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     28.39 ns/B     33.60 MiB/s     90.84 c/B
        ECB dec |     28.27 ns/B     33.74 MiB/s     90.45 c/B
        CBC enc |     29.50 ns/B     32.33 MiB/s     94.40 c/B
        CBC dec |     13.35 ns/B     71.45 MiB/s     42.71 c/B
        CFB enc |     29.59 ns/B     32.23 MiB/s     94.68 c/B
        CFB dec |     13.41 ns/B     71.12 MiB/s     42.91 c/B
        OFB enc |     28.90 ns/B     33.00 MiB/s     92.47 c/B
        OFB dec |     28.90 ns/B     33.00 MiB/s     92.48 c/B
        CTR enc |     13.39 ns/B     71.20 MiB/s     42.86 c/B
        CTR dec |     13.39 ns/B     71.21 MiB/s     42.86 c/B

Old (Intel Core i5-4570):
=========================
 3DES           |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     33.24 ns/B     28.69 MiB/s     106.4 c/B
        ECB dec |     33.26 ns/B     28.67 MiB/s     106.4 c/B
        CBC enc |     34.45 ns/B     27.69 MiB/s     110.2 c/B
        CBC dec |     33.45 ns/B     28.51 MiB/s     107.1 c/B
        CFB enc |     34.43 ns/B     27.70 MiB/s     110.2 c/B
        CFB dec |     33.41 ns/B     28.55 MiB/s     106.9 c/B
        OFB enc |     33.79 ns/B     28.22 MiB/s     108.1 c/B
        OFB dec |     33.79 ns/B     28.22 MiB/s     108.1 c/B
        CTR enc |     34.27 ns/B     27.83 MiB/s     109.7 c/B
        CTR dec |     34.27 ns/B     27.83 MiB/s     109.7 c/B

New (Intel Core i5-2450M):
==========================
 3DES           |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     42.21 ns/B     22.59 MiB/s     105.5 c/B
        ECB dec |     42.23 ns/B     22.58 MiB/s     105.6 c/B
        CBC enc |     43.70 ns/B     21.82 MiB/s     109.2 c/B
        CBC dec |     23.25 ns/B     41.02 MiB/s     58.12 c/B
        CFB enc |     43.71 ns/B     21.82 MiB/s     109.3 c/B
        CFB dec |     23.23 ns/B     41.05 MiB/s     58.08 c/B
        OFB enc |     42.73 ns/B     22.32 MiB/s     106.8 c/B
        OFB dec |     42.73 ns/B     22.32 MiB/s     106.8 c/B
        CTR enc |     23.31 ns/B     40.92 MiB/s     58.27 c/B
        CTR dec |     23.35 ns/B     40.84 MiB/s     58.38 c/B

Old (Intel Core i5-2450M):
==========================
 3DES           |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     53.98 ns/B     17.67 MiB/s     134.9 c/B
        ECB dec |     54.00 ns/B     17.66 MiB/s     135.0 c/B
        CBC enc |     55.43 ns/B     17.20 MiB/s     138.6 c/B
        CBC dec |     54.27 ns/B     17.57 MiB/s     135.7 c/B
        CFB enc |     55.42 ns/B     17.21 MiB/s     138.6 c/B
        CFB dec |     54.35 ns/B     17.55 MiB/s     135.9 c/B
        OFB enc |     54.49 ns/B     17.50 MiB/s     136.2 c/B
        OFB dec |     54.49 ns/B     17.50 MiB/s     136.2 c/B
        CTR enc |     55.02 ns/B     17.33 MiB/s     137.5 c/B
        CTR dec |     55.01 ns/B     17.34 MiB/s     137.5 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agotests: Print diagnostics for skipped tests.
Werner Koch [Thu, 13 Mar 2014 11:06:55 +0000 (12:06 +0100)]
tests: Print diagnostics for skipped tests.

* tests/basic.c (show_note): New.
(show_md_not_available):
(show_old_hmac_not_available):
(show_mac_not_available):
(check_digests): Remove USE_foo cpp tests from the test table.  Call
show_md_not_available if algo is not available.
(check_hmac): Likewise.
(check_mac): Likewise.

Signed-off-by: Werner Koch <wk@gnupg.org>