libgcrypt.git
5 years agompi: Add debug function to print a point.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
mpi: Add debug function to print a point.

* mpi/ec.c (_gcry_mpi_point_log): New.
* src/mpi.h (log_printpnt): new macro.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agotests: Factor time measurement code out.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
tests: Factor time measurement code out.

* tests/benchmark.c (started_at, stopped_at, start_timer, stop_timer)
(elapsed time): Factor out to ..
* tests/stopwatch.h: new file.

5 years agoFix _gcry_log_printmpi to print 00 instead of a sole sign.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
Fix _gcry_log_printmpi to print 00 instead of a sole sign.

* src/misc.c: Special case an mpi length of 0.

5 years agoStreamline the use of the internal mpi and hex debug functions.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
Streamline the use of the internal mpi and hex debug functions.

* mpi/mpicoder.c (gcry_mpi_dump): Remove.
(_gcry_log_mpidump): Remove.
* src/misc.c (_gcry_log_printhex): Factor all code out to ...
(do_printhex): new.  Add line wrapping a and compact printing.
(_gcry_log_printmpi): New.
* src/mpi.h (log_mpidump): Remove macro.
* src/g10lib.h (log_mpidump): Add compatibility macro.
(log_printmpi): New macro
* src/visibility.c (gcry_mpi_dump): Call _gcry_log_printmpi.
* cipher/primegen.c (prime_generate_internal): Replace gcry_mpi_dump
by log_printmpi.
(gcry_prime_group_generator): Ditto.
* cipher/pubkey.c: Remove extra colons from log_mpidump call.
* cipher/rsa.c (stronger_key_check): Use log_printmpi.
--

The values to debug get longer and longer and the different debug
functions made it hard to check them out. Now MPIs and hex buffers are
printed very similar.  Lines may now wrap with an backslash as
indicator.  MPIs are distinguished from plain buffers in the output by
always using a sign.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agomd: Add function gcry_md_hash_buffers.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
md: Add function gcry_md_hash_buffers.

* src/gcrypt.h.in (gcry_buffer_t): new.
(gcry_md_hash_buffers): New.
* src/visibility.c, src/visibility.h: Add wrapper for new function.
* src/libgcrypt.def, src/libgcrypt.vers: Export new function.
* cipher/md.c (gcry_md_hash_buffers): New.
* cipher/sha1.c (_gcry_sha1_hash_buffers): New.
* tests/basic.c (check_one_md_multi): New.
(check_digests): Run that test.
* tests/hmac.c (check_hmac_multi): New.
(main): Run that test.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agomd: Fix Whirlpool flaw.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
md: Fix Whirlpool flaw.

* cipher/whirlpool.c (whirlpool_add): Remove shortcut return so that
byte counter is always properly updated.
--

Using the forthcoming gcry_md_hash_buffers() and its test suite, I
found that a message of size 62 won't yield the correct hash if it is
fed into Whirlpool into in chunks.  The fix is obvious.  The wrong
code was likely due to using similar structure as SHA-1 but neglecting
that bytes and not blocks are counted.

5 years agomd: Update URL of the Whirlpool specs.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
md: Update URL of the Whirlpool specs.

--

5 years agoFix static build on AMD64
Jussi Kivilinna [Sat, 7 Sep 2013 08:55:19 +0000 (11:55 +0300)]
Fix static build on AMD64

* cipher/rijndael-amd64.S: Correct 'RIP' macro for non-PIC build.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoscrypt: fix for big-endian systems
Jussi Kivilinna [Sat, 7 Sep 2013 08:52:05 +0000 (11:52 +0300)]
scrypt: fix for big-endian systems

* cipher/scrypt.c (_salsa20_core): Fix endianess issues.
--

On big-endian systems 'tests/t-kdf' was failing scrypt tests. Patch fixes the
issue.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoUse gcc "unused" attribute only with gcc >= 3.5.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
Use gcc "unused" attribute only with gcc >= 3.5.

* src/g10lib.h (GCC_ATTR_UNUSED): Fix gcc version detection.
--

Reported-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoAdd support for Salsa20/12 - 12 round version of Salsa20
Dmitry Eremin-Solenikov [Thu, 5 Sep 2013 09:42:11 +0000 (13:42 +0400)]
Add support for Salsa20/12 - 12 round version of Salsa20

* src/gcrypt.h.in (GCRY_CIPHER_SALSA20R12): New.
* src/salsa20.c (salsa20_core, salsa20_do_encrypt_stream): Add support
for reduced round versions.
  (salsa20r12_encrypt_stream, _gcry_cipher_spec_salsa20r12): Implement
Salsa20/12 - a 12 round version of Salsa20 selected by eStream.
* src/cipher.h: Declsare Salsa20/12 definition.
* cipher/cipher.c: Register Salsa20/12
* tests/basic.c: (check_stream_cipher, check_stream_cipher_large_block):
Populate Salsa20/12 tests with test vectors from ecrypt
(check_ciphers): Add simple test for Salsa20/12

--
Salsa20/12 is a reduced round version of Salsa20 that is amongst ciphers
selected by eSTREAM for Phase 3 of Profile 1 algorithm. Moreover it is
one of proposed ciphers for TLS (draft-josefsson-salsa20-tls-02).

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
5 years agoAdd configure option --disable-amd64-as-feature-detection.
Werner Koch [Sat, 7 Sep 2013 07:50:44 +0000 (09:50 +0200)]
Add configure option --disable-amd64-as-feature-detection.

* configure.ac: Implement new disable flag.
--

Doing a static build of Libgcrypt currently throws an as error on my
box.  Adding this configure option as a workaround

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agompi: Improve support for non-Weierstrass support.
Werner Koch [Sat, 7 Sep 2013 08:06:46 +0000 (10:06 +0200)]
mpi: Improve support for non-Weierstrass support.

* mpi/ec.c (ec_p_init): Add args MODEL and P.  Change all callers.
(_gcry_mpi_ec_p_internal_new): Ditto.
(_gcry_mpi_ec_p_new): Ditto.
* cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Return
GPG_ERR_UNKNOWN_CURVE instead of invalid value.  Init curve model.
* cipher/ecc.c (ecc_verify, ecc_encrypt_raw): Ditto.
* cipher/pubkey.c (sexp_data_to_mpi): Fix EDDSA flag error checking.
--

(fixes commit c26be7a337d0bf98193bc58e043209e46d0769bb)

5 years agompi: Add gcry_mpi_ec_curve_point.
Werner Koch [Fri, 6 Sep 2013 18:07:07 +0000 (20:07 +0200)]
mpi: Add gcry_mpi_ec_curve_point.

* mpi/ec.c (_gcry_mpi_ec_curve_point): New.
(ec_powm): Return the absolute value.
* src/visibility.c, src/visibility.c: Add wrappers.
* src/libgcrypt.def, src/libgcrypt.vers: Export them.

5 years agompi: Add functions to manipulate the sign.
Werner Koch [Fri, 6 Sep 2013 17:58:50 +0000 (19:58 +0200)]
mpi: Add functions to manipulate the sign.

* src/gcrypt.h.in (gcry_mpi_is_neg): New.
(gcry_mpi_neg, gcry_mpi_abs): New.
* mpi/mpiutil.c (_gcry_mpi_is_neg): New.
(_gcry_mpi_neg, _gcry_mpi_abs): New.
* src/visibility.c, src/visibility.h: Add wrappers.
* src/libgcrypt.def, src/libgcrypt.vers: Export them.
* src/mpi.h (mpi_is_neg): New.  Rename old macro to mpi_has_sign.
* mpi/mpi-mod.c (_gcry_mpi_mod_barrett): Use mpi_has_sign.
* mpi/mpi-mpow.c (calc_barrett): Ditto.
* cipher/primegen.c (_gcry_derive_x931_prime): Ditto
* cipher/rsa.c (secret): Ditto.

5 years agoTune armv6 mpi assembly
Jussi Kivilinna [Fri, 6 Sep 2013 08:11:37 +0000 (11:11 +0300)]
Tune armv6 mpi assembly

* mpi/armv6/mpih-mul1.S: Tune assembly for Cortex-A8.
* mpi/armv6/mpih-mul2.S: Ditto.
* mpi/armv6/mpih-mul3.S: Ditto.
--

Little bit of tuning of assembly functions with help of Cortex-A8 profiler.

Old (armhf/Cortex-A8 1Ghz):
Algorithm         generate  100*sign  100*verify
------------------------------------------------
RSA 1024 bit         350ms    2230ms        50ms
RSA 2048 bit        3500ms   11890ms       150ms
RSA 3072 bit       23900ms   32540ms       280ms
RSA 4096 bit       15750ms   69420ms       450ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
DSA 1024/160             -     990ms       930ms
DSA 2048/224             -    3840ms      3400ms
DSA 3072/256             -    8280ms      7620ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
ECDSA 192 bit         60ms    1760ms      3300ms
ECDSA 224 bit         80ms    2240ms      4300ms
ECDSA 256 bit        110ms    2740ms      5420ms
ECDSA 384 bit        230ms    5680ms     11300ms
ECDSA 521 bit        540ms   13590ms     26890ms

New:
Algorithm         generate  100*sign  100*verify
------------------------------------------------
RSA 1024 bit         350ms    2190ms        60ms
RSA 2048 bit        8910ms   11800ms       150ms
RSA 3072 bit       11000ms   31810ms       270ms
RSA 4096 bit       50290ms   68690ms       450ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
DSA 1024/160             -     980ms       920ms
DSA 2048/224             -    3780ms      3370ms
DSA 3072/256             -    8100ms      7060ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
ECDSA 192 bit         70ms    1730ms      3200ms
ECDSA 224 bit         90ms    2180ms      4220ms
ECDSA 256 bit        110ms    2660ms      5200ms
ECDSA 384 bit        220ms    5660ms     10910ms
ECDSA 521 bit        530ms   13420ms     26000ms

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoChange _gcry_burn_stack take burn depth as unsigned integer
Jussi Kivilinna [Thu, 5 Sep 2013 06:34:25 +0000 (09:34 +0300)]
Change _gcry_burn_stack take burn depth as unsigned integer

* src/misc.c (_gcry_burn_stack): Change to handle 'unsigned int' bytes.
--

Unsigned integer is better here for code generation because we can now avoid
possible branching caused by (bytes <= 0) check.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agompicalc: fix building on linux and win32
Jussi Kivilinna [Thu, 5 Sep 2013 06:46:29 +0000 (09:46 +0300)]
mpicalc: fix building on linux and win32

* src/Makefile.am (mpicalc): Adjust CFLAGS and LDADD.
--

Building libgcrypt is now failing on Ubuntu 13.04 machine. Patch changes src/Makefile.am for 'mpicalc' to correct this issue.

$ make distclean; ./configure --enable-maintainer-mode; make
...
libtool: link: gcc -g -O2 -fvisibility=hidden -Wall -Wcast-align -Wshadow -Wstrict-prototypes -Wformat -Wno-format-y2k -Wformat-security -W -Wextra -Wbad-function-cast -Wwrite-strings -Wdeclaration-after-statement -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -o .libs/mpicalc mpicalc-mpicalc.o  ../src/.libs/libgcrypt.so
/usr/bin/ld: mpicalc-mpicalc.o: undefined reference to symbol 'gpg_strerror'
/usr/bin/ld: note: 'gpg_strerror' is defined in DSO /lib/x86_64-linux-gnu/libgpg-error.so.0 so try adding it to the linker command line
/lib/x86_64-linux-gnu/libgpg-error.so.0: could not read symbols: Invalid operation

With win32 target, gpg-error.h is not found.

$ make distclean; ./autogen.sh --build-w32; make
...
i686-w64-mingw32-gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -Wall -Wcast-align -Wshadow -Wstrict-prototypes -Wformat -Wno-format-y2k -Wformat-security -W -Wextra -Wbad-function-cast -Wwrite-strings -Wdeclaration-after-statement -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -MT mpicalc-mpicalc.o -MD -MP -MF .deps/mpicalc-mpicalc.Tpo -c -o mpicalc-mpicalc.o `test -f 'mpicalc.c' || echo './'`mpicalc.c
In file included from mpicalc.c:36:0:
gcrypt.h:32:23: fatal error: gpg-error.h: No such file or directory

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoChange mpicalc to use Libgcrypt and install it.
Werner Koch [Wed, 4 Sep 2013 15:51:30 +0000 (17:51 +0200)]
Change mpicalc to use Libgcrypt and install it.

* src/mpicalc.c: Make use of gcry_ functions.
(MPICALC_VERSION): New.  Set to 2.0.
(strusage): Remove.
(scan_mpi): New.  Replaces mpi_fromstr.
(print_mpi): New.  Replaces mpi_print.
(my_getc): New.
(print_help): New.
(main): Use simple option parser and print version info.
* src/Makefile.am (bin_PROGRAMS): Add mpicalc.
(mpicalc_SOURCES, mpicalc_CFLAGS, mpicalc_LDADD): New.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoRe-indent mpicalc.c and change license.
Werner Koch [Wed, 4 Sep 2013 14:17:11 +0000 (16:17 +0200)]
Re-indent mpicalc.c and change license.

--

Changed license to LGPLv2.1+.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoAdd mpicalc.c to help with testing.
Werner Koch [Wed, 4 Sep 2013 13:37:01 +0000 (15:37 +0200)]
Add mpicalc.c to help with testing.

* src/mpicalc.c: Take from GnuPG 1.4
--

Taken from GnuPG commit 45efde9557661ea071a01bcb938f1591ed4ec1a3

5 years agoPrepare support for EdDSA.
Werner Koch [Wed, 4 Sep 2013 09:20:57 +0000 (11:20 +0200)]
Prepare support for EdDSA.

* src/cipher.h (PUBKEY_FLAG_EDDSA): New.
* cipher/pubkey.c (pubkey_verify): Repalce args CMP and OPAQUEV by
CTX.  Pass flags and hash algo to the verify function.  Change all
verify functions to accept these args.
(sexp_data_to_mpi): Implement new flag "eddsa".
(gcry_pk_verify): Pass CTX instead of the compare function to
pubkey_verify.
* cipher/ecc.c (sign): Rename to sign_ecdsa.  Change all callers.
(verify): Rename to verify_ecdsa.  Change all callers.
(sign_eddsa, verify_eddsa): New stub functions.
(ecc_sign): Divert to sign_ecdsa or sign_eddsa.
(ecc_verify): Divert to verify_ecdsa or verify_eddsa.

5 years agoPrepare support for non-Weierstrass EC equations.
Werner Koch [Tue, 3 Sep 2013 10:01:15 +0000 (12:01 +0200)]
Prepare support for non-Weierstrass EC equations.

* src/mpi.h (gcry_mpi_ec_models): New.
* src/ec-context.h (mpi_ec_ctx_s): Add MODEL.
* cipher/ecc-common.h (elliptic_curve_t): Ditto.
* cipher/ecc-curves.c (ecc_domain_parms_t): Ditto.
(domain_parms): Mark als as Weierstrass.
(_gcry_ecc_fill_in_curve): Check model.
(_gcry_ecc_get_curve): Set model to Weierstrass.
* cipher/ecc-misc.c (_gcry_ecc_model2str): New.
* cipher/ecc.c (generate_key, ecc_generate_ext): Print model in the
debug output.

* mpi/ec.c (_gcry_mpi_ec_dup_point): Switch depending on model.
Factor code out to ...
(dup_point_weierstrass): new.
(dup_point_montgomery, dup_point_twistededwards): New stub functions.
(_gcry_mpi_ec_add_points): Switch depending on model.  Factor code out
to ...
(add_points_weierstrass): new.
(add_points_montgomery, add_points_twistededwards): New stub
functions.

* tests/Makefile.am (TESTS): Reorder tests.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agompi: Suppress newer gcc warnings.
Werner Koch [Fri, 30 Aug 2013 15:56:35 +0000 (17:56 +0200)]
mpi: Suppress newer gcc warnings.

* src/g10lib.h (GCC_ATTR_UNUSED): Define for gcc >= 3.5.
* mpi/mpih-div.c (_gcry_mpih_mod_1, _gcry_mpih_divmod_1): Mark dummy
as unused.
* mpi/mpi-internal.h (UDIV_QRNND_PREINV): Mark _ql as unused.
--

Due to the use of macros and longlong.h, we use variables which are
only used by some architectures.  At least gcc 4.7.2 prints new
warnings abot set but not used variables.  This patch silences them.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoDo not check with cpp for typedefed constants.
Werner Koch [Fri, 30 Aug 2013 15:52:17 +0000 (17:52 +0200)]
Do not check with cpp for typedefed constants.

* src/gcrypt-int.h: Include error code replacements depeding on the
version of libgpg-error.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoMake _gcry_burn_stack use variable length array
Jussi Kivilinna [Wed, 4 Sep 2013 07:00:45 +0000 (10:00 +0300)]
Make _gcry_burn_stack use variable length array

* configure.ac (HAVE_VLA): Add check.
* src/misc.c (_gcry_burn_stack) [HAVE_VLA]: Add VLA code.
--

Some gcc versions convert _gcry_burn_stack into loop that overwrites the same
64-byte stack buffer instead of burn stack deeper. It's argued at GCC bugzilla
that _gcry_burn_stack is doing wrong thing here [1] and that this kind of
optimization is allowed.

So lets fix _gcry_burn_stack by using variable length array when VLAs are
supported by compiler. This should ensure proper stack burning to the requested
depth and avoid GCC loop optimizations.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoMove stack burning from block ciphers to cipher modes
Jussi Kivilinna [Wed, 4 Sep 2013 07:00:45 +0000 (10:00 +0300)]
Move stack burning from block ciphers to cipher modes

* src/gcrypt-module.h (gcry_cipher_encrypt_t)
(gcry_cipher_decrypt_t): Return 'unsigned int'.
* cipher/cipher.c (dummy_encrypt_block, dummy_decrypt_block): Return
zero.
(do_ecb_encrypt, do_ecb_decrypt): Get largest stack burn depth from
block cipher crypt function and burn stack at end.
* cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt)
(_gcry_cipher_aeswrap_decrypt): Ditto.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt)
(_gcry_cipher_cbc_decrypt): Ditto.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_cbc_encrypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_ofb_encrypt)
(_gcry_cipher_ofb_decrypt): Ditto.
* cipher/blowfish.c (encrypt_block, decrypt_block): Return burn stack
depth.
* cipher/camellia-glue.c (camellia_encrypt, camellia_decrypt): Ditto.
* cipher/cast5.c (encrypt_block, decrypt_block): Ditto.
* cipher/des.c (do_tripledes_encrypt, do_tripledes_decrypt)
(do_des_encrypt, do_des_decrypt): Ditto.
* cipher/idea.c (idea_encrypt, idea_decrypt): Ditto.
* cipher/rijndael.c (rijndael_encrypt, rijndael_decrypt): Ditto.
* cipher/seed.c (seed_encrypt, seed_decrypt): Ditto.
* cipher/serpent.c (serpent_encrypt, serpent_decrypt): Ditto.
* cipher/twofish.c (twofish_encrypt, twofish_decrypt): Ditto.
* cipher/rfc2268.c (encrypt_block, decrypt_block): New.
(_gcry_cipher_spec_rfc2268_40): Use encrypt_block and decrypt_block.
--

Patch moves stack burning from block ciphers and cipher mode loop to end of
cipher mode functions. This greatly reduces the overall CPU usage of the
problematic _gcry_burn_stack. Internal cipher module API is changed so
that encrypt/decrypt functions now return the stack burn depth as unsigned
int to cipher mode function.

(Note, patch also adds missing burn_stack for RFC2268_40 cipher).

_gcry_burn_stack CPU time (looping tests/benchmark cipher blowfish):

arch CPU Old New
i386 Intel-Haswell 4.1% 0.16%
x86_64 Intel-Haswell 3.4% 0.07%
armhf Cortex-A8 8.7% 0.14%

New vs. old (armhf/Cortex-A8):
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
IDEA          1.05x   1.05x   1.04x   1.04x   1.04x   1.04x   1.07x   1.05x   1.04x   1.04x
3DES          1.04x   1.03x   1.04x   1.03x   1.04x   1.04x   1.04x   1.04x   1.04x   1.04x
CAST5         1.19x   1.20x   1.15x   1.00x   1.17x   1.00x   1.15x   1.05x   1.00x   1.00x
BLOWFISH      1.21x   1.22x   1.16x   1.00x   1.18x   1.00x   1.16x   1.16x   1.00x   1.00x
AES           1.09x   1.09x   1.00x   1.00x   1.00x   1.00x   1.07x   1.07x   1.00x   1.00x
AES192        1.11x   1.11x   1.00x   1.00x   1.00x   1.00x   1.08x   1.09x   1.01x   1.00x
AES256        1.07x   1.08x   1.01x   .99x    1.00x   1.00x   1.07x   1.06x   1.00x   1.00x
TWOFISH       1.10x   1.09x   1.09x   1.00x   1.09x   1.00x   1.08x   1.09x   1.00x   1.00x
ARCFOUR       1.00x   1.00x
DES           1.07x   1.11x   1.06x   1.08x   1.07x   1.07x   1.06x   1.06x   1.06x   1.06x
TWOFISH128    1.10x   1.10x   1.09x   1.00x   1.09x   1.00x   1.08x   1.08x   1.00x   1.00x
SERPENT128    1.06x   1.07x   1.02x   1.00x   1.06x   1.00x   1.06x   1.05x   1.00x   1.00x
SERPENT192    1.07x   1.06x   1.03x   1.00x   1.06x   1.00x   1.06x   1.05x   1.00x   1.00x
SERPENT256    1.06x   1.07x   1.02x   1.00x   1.06x   1.00x   1.05x   1.06x   1.00x   1.00x
RFC2268_40    0.97x   1.01x   0.99x   0.98x   1.00x   0.97x   0.96x   0.96x   0.97x   0.97x
SEED          1.45x   1.54x   1.53x   1.56x   1.50x   1.51x   1.50x   1.50x   1.42x   1.42x
CAMELLIA128   1.08x   1.07x   1.06x   1.00x   1.07x   1.00x   1.06x   1.06x   1.00x   1.00x
CAMELLIA192   1.08x   1.08x   1.08x   1.00x   1.07x   1.00x   1.07x   1.07x   1.00x   1.00x
CAMELLIA256   1.08x   1.09x   1.07x   1.01x   1.08x   1.00x   1.07x   1.07x   1.00x   1.00x
SALSA20 .99x  1.00x

Raw data:

New (armhf/Cortex-A8):
Running each test 100 times.
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
IDEA          8620ms  8680ms  9640ms 10010ms  9140ms  8960ms  9630ms  9660ms  9180ms  9180ms
3DES         13990ms 14000ms 14780ms 15300ms 14320ms 14370ms 14780ms 14780ms 14480ms 14480ms
CAST5         2980ms  2980ms  3780ms  2300ms  3290ms  2320ms  3770ms  4100ms  2320ms  2320ms
BLOWFISH      2740ms  2660ms  3530ms  2060ms  3050ms  2080ms  3530ms  3530ms  2070ms  2070ms
AES           2200ms  2330ms  2330ms  2450ms  2270ms  2270ms  2700ms  2690ms  2330ms  2320ms
AES192        2550ms  2670ms  2700ms  2910ms  2630ms  2640ms  3060ms  3060ms  2680ms  2690ms
AES256        2920ms  3010ms  3040ms  3190ms  3010ms  3000ms  3380ms  3420ms  3050ms  3050ms
TWOFISH       2790ms  2840ms  3300ms  2950ms  3010ms  2870ms  3310ms  3280ms  2940ms  2940ms
ARCFOUR       2050ms  2050ms
DES           5640ms  5630ms  6440ms  6970ms  5960ms  6000ms  6440ms  6440ms  6120ms  6120ms
TWOFISH128    2790ms  2840ms  3300ms  2950ms  3010ms  2890ms  3310ms  3290ms  2930ms  2930ms
SERPENT128    4530ms  4340ms  5210ms  4470ms  4740ms  4620ms  5020ms  5030ms  4680ms  4680ms
SERPENT192    4510ms  4340ms  5190ms  4460ms  4750ms  4620ms  5020ms  5030ms  4680ms  4680ms
SERPENT256    4540ms  4330ms  5220ms  4460ms  4730ms  4600ms  5030ms  5020ms  4680ms  4680ms
RFC2268_40   10530ms  7790ms 11140ms  9490ms 10650ms 10710ms 11710ms 11690ms 11000ms 11000ms
SEED          4530ms  4540ms  5050ms  5380ms  4760ms  4810ms  5060ms  5060ms  4850ms  4860ms
CAMELLIA128   2660ms  2630ms  3170ms  2750ms  2880ms  2740ms  3170ms  3170ms  2780ms  2780ms
CAMELLIA192   3430ms  3400ms  3930ms  3530ms  3650ms  3500ms  3940ms  3940ms  3570ms  3560ms
CAMELLIA256   3430ms  3390ms  3940ms  3500ms  3650ms  3510ms  3930ms  3940ms  3550ms  3550ms
SALSA20       1910ms  1900ms

Old (armhf/Cortex-A8):
Running each test 100 times.
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
IDEA          9030ms  9100ms 10050ms 10410ms  9540ms  9360ms 10350ms 10190ms  9560ms  9570ms
3DES         14580ms 14460ms 15300ms 15720ms 14880ms 14900ms 15350ms 15330ms 15030ms 15020ms
CAST5         3560ms  3570ms  4350ms  2300ms  3860ms  2330ms  4340ms  4320ms  2330ms  2320ms
BLOWFISH      3320ms  3250ms  4110ms  2060ms  3610ms  2080ms  4100ms  4090ms  2070ms  2070ms
AES           2390ms  2530ms  2320ms  2460ms  2280ms  2270ms  2890ms  2880ms  2330ms  2330ms
AES192        2830ms  2970ms  2690ms  2900ms  2630ms  2650ms  3320ms  3330ms  2700ms  2690ms
AES256        3110ms  3250ms  3060ms  3170ms  3000ms  3000ms  3610ms  3610ms  3050ms  3060ms
TWOFISH       3080ms  3100ms  3600ms  2940ms  3290ms  2880ms  3560ms  3570ms  2940ms  2930ms
ARCFOUR       2060ms  2050ms
DES           6060ms  6230ms  6850ms  7540ms  6380ms  6400ms  6830ms  6840ms  6500ms  6510ms
TWOFISH128    3060ms  3110ms  3600ms  2940ms  3290ms  2890ms  3560ms  3560ms  2940ms  2930ms
SERPENT128    4820ms  4630ms  5330ms  4460ms  5030ms  4620ms  5300ms  5300ms  4680ms  4680ms
SERPENT192    4830ms  4620ms  5320ms  4460ms  5040ms  4620ms  5300ms  5300ms  4680ms  4680ms
SERPENT256    4820ms  4640ms  5330ms  4460ms  5030ms  4620ms  5300ms  5300ms  4680ms  4660ms
RFC2268_40   10260ms  7850ms 11080ms  9270ms 10620ms 10380ms 11250ms 11230ms 10690ms 10710ms
SEED          6580ms  6990ms  7710ms  8370ms  7140ms  7240ms  7600ms  7610ms  6870ms  6900ms
CAMELLIA128   2860ms  2820ms  3360ms  2750ms  3080ms  2740ms  3350ms  3360ms  2790ms  2790ms
CAMELLIA192   3710ms  3680ms  4240ms  3520ms  3910ms  3510ms  4200ms  4210ms  3560ms  3560ms
CAMELLIA256   3700ms  3680ms  4230ms  3520ms  3930ms  3510ms  4200ms  4210ms  3550ms  3560ms
SALSA20       1900ms  1900ms

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agocamellia-aesni-avx2-amd64: Move register clearing to assembly functions
Jussi Kivilinna [Sun, 1 Sep 2013 13:50:55 +0000 (16:50 +0300)]
camellia-aesni-avx2-amd64: Move register clearing to assembly functions

* cipher/camellia-aesni-avx2-amd64.S
(_gcry_camellia_aesni_avx2_ctr_enc): Add 'vzeroall'.
(_gcry_camellia_aesni_avx2_cbc_dec)
(_gcry_camellia_aesni_avx2_cfb_dec): Add 'vzeroupper' at head and
'vzeroall' at tail.
* cipher/camellia-glue.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_avx2_cfb_dec) [USE_AESNI_AVX2]: Remove register
clearing.
--

Patch moves register clearing with 'vzeroall' to assembly functions and
adds missing 'vzeroupper' instructions at head of assembly functions.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agocamellia-aesni-avx-amd64: Move register clearing to assembly functions
Jussi Kivilinna [Sun, 1 Sep 2013 13:50:55 +0000 (16:50 +0300)]
camellia-aesni-avx-amd64: Move register clearing to assembly functions

* cipher/camellia-aesni-avx-amd64.S (_gcry_camellia_aesni_avx_ctr_enc)
(_gcry_camellia_aesni_avx_cbc_dec)
(_gcry_camellia_aesni_avx_cfb_dec): Add 'vzeroupper' at head and
'vzeroall' at tail.
* cipher/camellia-glue.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_avx2_cfb_dec) [USE_AESNI_AVX]: Remove register clearing.
--

Patch moves register clearing with 'vzeroall' to assembly functions and
adds missing 'vzeroupper' instructions at head of assembly functions.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoserpent-avx2-amd64: Move register clearing to assembly
Jussi Kivilinna [Sun, 1 Sep 2013 13:50:55 +0000 (16:50 +0300)]
serpent-avx2-amd64: Move register clearing to assembly

* cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_ctr_enc)
(_gcry_serpent_avx2_cbc_dec, _gcry_serpent_avx2_cfb_dec): Change last
'vzeroupper' to 'vzeroall'.
* cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_avx2_cfb_dec) [USE_AVX2]: Remove register clearing with
'vzeroall'.
--

AVX2 implementation was already clearing upper halfs of YMM registers at end of
assembly functions to prevent long SSE<->AVX transition stalls present on Intel
CPUs. Patch changes these 'vzeroupper' instructions to 'vzeroall' to fully
clear YMM registers. After this change register clearing in serpent.c in not
needed.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoFix building for x32 target
Jussi Kivilinna [Sun, 1 Sep 2013 13:46:32 +0000 (16:46 +0300)]
Fix building for x32 target

* mpi/amd64/mpi-asm-defs.h: New file.
* random/rndhw.c (poll_padlock) [__x86_64__]: Also check if __LP64__ is
defined.
[USE_DRNG, __x86_64__]: Also check if __LP64__ is defined.
--

In short, x32 is new x86-64 ABI with 32-bit pointers. Adding support is
straightforward, small fix for mpi and fixes for random/rndhw.c. AMD64 assembly
functions appear to work fine with x32 and 'make check' passes.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agosha512: add ARM/NEON assembly version of transform function
Jussi Kivilinna [Sat, 31 Aug 2013 09:48:31 +0000 (12:48 +0300)]
sha512: add ARM/NEON assembly version of transform function

* cipher/Makefile.am: Add 'sha512-armv7-neon.S'.
* cipher/sha512-armv7-neon.S: New file.
* cipher/sha512.c (USE_ARM_NEON_ASM): New macro.
(SHA512_CONTEXT) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(sha512_init, sha384_init) [USE_ARM_NEON_ASM]: Enable 'use_neon' if
CPU support NEON instructions.
(k): Round constant array moved outside of 'transform' function.
(__transform): Renamed from 'tranform' function.
[USE_ARM_NEON_ASM] (_gcry_sha512_transform_armv7_neon): New prototype.
(transform): New wrapper function for different transform versions.
(sha512_write, sha512_final): Burn stack by the amount returned by
transform function.
* configure.ac (sha512) [neonsupport]: Add 'sha512-armv7-neon.lo'.
--

Add NEON assembly for transform function for faster SHA512 on ARM. Major speed
up thanks to 64-bit integer registers and large register file that can hold
full input buffer.

Benchmark results on Cortex-A8, 1Ghz:

Old:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512       17050ms 18780ms 29120ms 18040ms 17190ms
SHA384       17130ms 18720ms 29160ms 18090ms 17280ms

New:
$ tests/benchmark --hash-repetitions 100 md sha512 sha384
SHA512        3600ms  5070ms 15330ms  4510ms  3480ms
SHA384        3590ms  5060ms 15350ms  4510ms  3520ms

New vs old:
SHA512        4.74x   3.70x   1.90x   4.00x   4.94x
SHA384        4.77x   3.70x   1.90x   4.01x   4.91x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agosha512: reduce stack use in transform function by 512 bytes
Jussi Kivilinna [Sat, 31 Aug 2013 09:48:30 +0000 (12:48 +0300)]
sha512: reduce stack use in transform function by 512 bytes

* cipher/sha512.c (transform): Change 'u64 w[80]' to 'u64 w[16]' and
inline input expansion to first 64 rounds.
(sha512_write, sha512_final): Reduce burn_stack depth by 512 bytes.
--

The input expansion to w[] array can be inlined with rounds and size of array
reduced from u64[80] to u64[16]. On Cortex-A8, this change gives small boost,
possibly thanks to reduced burn_stack depth.

New vs old (tests/benchmark md sha512 sha384):
SHA512 1.09x 1.11x 1.06x 1.09x 1.08x
SHA384 1.09x 1.11x 1.06x 1.09x 1.09x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd ARM HW feature detection module and add NEON detection
Jussi Kivilinna [Sat, 31 Aug 2013 09:48:30 +0000 (12:48 +0300)]
Add ARM HW feature detection module and add NEON detection

* configure.ac: Add option --disable-neon-support.
(HAVE_GCC_INLINE_ASM_NEON): New.
(ENABLE_NEON_SUPPORT): New.
[arm]: Add 'hwf-arm.lo' as HW feature module.
* src/Makefile.am: Add 'hwf-arm.c'.
* src/g10lib.h (HWF_ARM_NEON): New macro.
* src/global.c (hwflist): Add HWF_ARM_NEON entry.
* src/hwf-arm.c: New file.
* src/hwf-common.h (_gcry_hwf_detect_arm): New prototype.
* src/hwfeatures.c (_gcry_detect_hw_features) [HAVE_CPU_ARCH_ARM]: Add
call to _gcry_hwf_detect_arm.
--

Add HW detection module for detecting ARM NEON instruction set. ARM does not
have cpuid instruction so we have to rely on OS to pass feature set information
to user-space. For linux, NEON support can be detected by parsing
'/proc/self/auxv' for hardware capabilities information. For other OSes, NEON
can be detected by checking if platform/compiler only supports NEON capable
CPUs (by check if __ARM_NEON__ macro is defined).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoCorrect mpi_cpu_arch for ARMv6
Jussi Kivilinna [Sat, 31 Aug 2013 09:48:30 +0000 (12:48 +0300)]
Correct mpi_cpu_arch for ARMv6

* mpi/config.links [armv6]: Set mpi_cpu_arch to "arm", instead of
"armv6".
--

Without this change, HAVE_CPU_ARCH_ARM stays undefined.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agompi: Make gcry_mpi_print work with negative zeroes.
Werner Koch [Fri, 30 Aug 2013 15:04:21 +0000 (17:04 +0200)]
mpi: Make gcry_mpi_print work with negative zeroes.

* mpi/mpicoder.c (gcry_mpi_print): Take care of negative zero.
(gcry_mpi_aprint): Allocate at least 1 byte.
* tests/t-convert.c: New.
* tests/Makefile.am (TESTS): Add t-convert.
--

Reported-by: Christian Fuchs
Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoRefactor the ECC code into 3 files.
Werner Koch [Thu, 29 Aug 2013 19:37:30 +0000 (21:37 +0200)]
Refactor the ECC code into 3 files.

* cipher/ecc-common.h, cipher/ecc-curves.c, cipher/ecc-misc.c: New.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add new files.
* configure.ac (GCRYPT_PUBKEY_CIPHERS): Add new .c files.
* cipher/ecc.c (curve_aliases, ecc_domain_parms_t, domain_parms)
(scanval): Move to ecc-curves.c.
(fill_in_curve): Move to ecc-curve.c as _gcry_ecc_fill_in_curve.
(ecc_get_curve): Move to ecc-curve.c as _gcry_ecc_get_curve.
(_gcry_mpi_ec_ec2os): Move to ecc-misc.c.
(ec2os): Move to ecc-misc.c as _gcry_ecc_ec2os.
(os2ec): Move to ecc-misc.c as _gcry_ecc_os2ec.
(point_set): Move as inline function to ecc-common.h.
(_gcry_ecc_curve_free): Move to ecc-misc.c as _gcry_ecc_curve_free.
(_gcry_ecc_curve_copy): Move to ecc-misc.c as _gcry_ecc_curve_copy.
(mpi_from_keyparam, point_from_keyparam): Move to ecc-curves.c.
(_gcry_mpi_ec_new): Move to ecc-curves.c.
(ecc_get_param): Move to ecc-curves.c as _gcry_ecc_get_param.
(ecc_get_param_sexp): Move to ecc-curves.c as _gcry_ecc_get_param_sexp.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoserpent-sse2-amd64: Move register clearing to assembly functions
Jussi Kivilinna [Thu, 22 Aug 2013 12:26:52 +0000 (15:26 +0300)]
serpent-sse2-amd64: Move register clearing to assembly functions

cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_ctr_enc)
(_gcry_serpent_sse2_cbc_dec, _gcry_serpent_sse2_cfb_dec): Clear used
XMM registers.
cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
( _gcry_serpent_cfb_dec) [USE_SSE2]: Remove XMM register clearing from
bulk functions.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agotwofish-amd64: do not make __twofish_dec_blk3 global
Jussi Kivilinna [Thu, 22 Aug 2013 12:26:52 +0000 (15:26 +0300)]
twofish-amd64: do not make __twofish_dec_blk3 global

* cipher/twofish-amd64.S (__twofish_dec_blk3): Do not export symbol as
global.
(__twofish_dec_blk3): Mark symbol as function.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agompi: add ARMv6 assembly
Jussi Kivilinna [Sat, 17 Aug 2013 10:41:03 +0000 (13:41 +0300)]
mpi: add ARMv6 assembly

* mpi/armv6/mpi-asm-defs.h: New.
* mpi/armv6/mpih-add1.S: New.
* mpi/armv6/mpih-mul1.S: New.
* mpi/armv6/mpih-mul2.S: New.
* mpi/armv6/mpih-mul3.S: New.
* mpi/armv6/mpih-sub1.S: New.
* mpi/config.links [arm]: Enable ARMv6 assembly.
--

Add mpi assembly for ARMv6 (or later). These are partly based on ARM assembly
found in GMP 4.2.1.

Old vs new (Cortex-A8, 1Ghz):

Algorithm         generate  100*sign  100*verify
------------------------------------------------
ECDSA 192 bit        1.14x     1.10x       1.13x
ECDSA 224 bit        1.11x     1.12x       1.12x
ECDSA 256 bit        1.20x     1.13x       1.14x
ECDSA 384 bit        1.13x     1.21x       1.21x
ECDSA 521 bit        1.17x     1.20x       1.22x
Algorithm         generate  100*sign  100*verify
------------------------------------------------
RSA 1024 bit             -     1.31x       1.60x
RSA 2048 bit             -     1.41x       1.47x
RSA 3072 bit             -     1.50x       1.63x
RSA 4096 bit             -     1.50x       1.57x
Algorithm         generate  100*sign  100*verify
------------------------------------------------
DSA 1024/160             -     1.39x       1.38x
DSA 2048/224             -     1.50x       1.51x
DSA 3072/256             -     1.59x       1.64x

NEW:

Algorithm         generate  100*sign  100*verify
------------------------------------------------
ECDSA 192 bit         70ms    1750ms      3170ms
ECDSA 224 bit         90ms    2210ms      4250ms
ECDSA 256 bit        100ms    2710ms      5170ms
ECDSA 384 bit        230ms    5670ms     11040ms
ECDSA 521 bit        540ms   13370ms     25870ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
RSA 1024 bit         360ms    2200ms        50ms
RSA 2048 bit        2770ms   11900ms       150ms
RSA 3072 bit        6680ms   32530ms       270ms
RSA 4096 bit       10320ms   69440ms       460ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
DSA 1024/160             -     990ms       910ms
DSA 2048/224             -    3830ms      3410ms
DSA 3072/256             -    8270ms      7030ms

OLD:

Algorithm         generate  100*sign  100*verify
------------------------------------------------
ECDSA 192 bit         80ms    1920ms      3580ms
ECDSA 224 bit        100ms    2470ms      4760ms
ECDSA 256 bit        120ms    3050ms      5870ms
ECDSA 384 bit        260ms    6840ms     13330ms
ECDSA 521 bit        630ms   16080ms     31500ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
RSA 1024 bit         450ms    2890ms        80ms
RSA 2048 bit        2320ms   16760ms       220ms
RSA 3072 bit       26300ms   48650ms       440ms
RSA 4096 bit       15700ms   103910ms      720ms
Algorithm         generate  100*sign  100*verify
------------------------------------------------
DSA 1024/160             -    1380ms      1260ms
DSA 2048/224             -    5740ms      5140ms
DSA 3072/256             -   13130ms     11510ms

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoMove ARMv6 detection to configure.ac
Jussi Kivilinna [Sat, 17 Aug 2013 10:41:46 +0000 (13:41 +0300)]
Move ARMv6 detection to configure.ac

* cipher/blowfish-armv6.S: Replace __ARM_ARCH >= 6 checks with
HAVE_ARM_ARCH_V6.
* cipher/blowfish.c: Ditto.
* cipher/camellia-armv6.S: Ditto.
* cipher/camellia.h: Ditto.
* cipher/cast5-armv6.S: Ditto.
* cipher/cast5.c: Ditto.
* cipher/rijndael-armv6.S: Ditto.
* cipher/rijndael.c: Ditto.
* configure.ac: Add HAVE_ARM_ARCH_V6 check.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd optimized wipememory for ARM
Jussi Kivilinna [Sat, 17 Aug 2013 07:09:33 +0000 (10:09 +0300)]
Add optimized wipememory for ARM

src/g10lib.h [__arm__] (fast_wipememory2_unaligned_head)
(fast_wipememory2): New macros.
--

Previous patch that removed _gcry_burn_stack optimization causes burn_stack
take over 30% CPU usage when looping 'benchmark cipher blowfish' on
ARM/Cortex-A8. Optimizing wipememory2 for ARM helps situation a lot.

Old vs new (Cortex-A8):
                  ECB/Stream         CBC             CFB             OFB             CTR
               --------------- --------------- --------------- --------------- ---------------
IDEA            1.20x   1.18x   1.16x   1.15x   1.16x   1.18x   1.18x   1.16x   1.16x   1.17x
3DES            1.14x   1.14x   1.12x   1.13x   1.12x   1.13x   1.12x   1.13x   1.13x   1.15x
CAST5           1.66x   1.67x   1.43x   1.00x   1.48x   1.00x   1.44x   1.44x   1.04x   0.96x
BLOWFISH        1.56x   1.66x   1.47x   1.00x   1.54x   1.05x   1.44x   1.47x   1.00x   1.00x
AES             1.52x   1.42x   1.04x   1.00x   1.00x   1.00x   1.38x   1.37x   1.00x   1.00x
AES192          1.36x   1.36x   1.00x   1.00x   1.00x   1.04x   1.26x   1.22x   1.00x   1.04x
AES256          1.32x   1.31x   1.03x   1.00x   1.00x   1.00x   1.24x   1.30x   1.03x   0.97x
TWOFISH         1.31x   1.26x   1.23x   1.00x   1.25x   1.00x   1.24x   1.23x   1.00x   1.03x
ARCFOUR         1.05x   0.96x
DES             1.31x   1.33x   1.26x   1.29x   1.28x   1.29x   1.26x   1.29x   1.27x   1.29x
TWOFISH128      1.27x   1.24x   1.23x   1.00x   1.28x   1.00x   1.21x   1.26x   0.97x   1.06x
SERPENT128      1.19x   1.19x   1.15x   1.00x   1.14x   1.00x   1.17x   1.17x   0.98x   1.00x
SERPENT192      1.19x   1.24x   1.17x   1.00x   1.14x   1.00x   1.15x   1.17x   1.00x   1.00x
SERPENT256      1.16x   1.19x   1.17x   1.00x   1.14x   1.00x   1.15x   1.15x   1.00x   1.00x
RFC2268_40      1.00x   0.99x   1.00x   1.01x   1.00x   1.00x   1.03x   1.00x   1.01x   1.00x
SEED            1.20x   1.20x   1.18x   1.17x   1.17x   1.19x   1.18x   1.16x   1.19x   1.19x
CAMELLIA128     1.38x   1.34x   1.31x   1.00x   1.31x   1.00x   1.29x   1.32x   1.00x   1.00x
CAMELLIA192     1.27x   1.27x   1.23x   1.00x   1.25x   1.03x   1.20x   1.23x   1.00x   1.00x
CAMELLIA256     1.27x   1.27x   1.26x   1.00x   1.25x   1.03x   1.20x   1.23x   1.00x   1.00x
SALSA20         1.04x   1.00x

(Note: bulk encryption/decryption do burn_stack after full buffer processing,
instead of after each block.)

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocipher: bufhelp: allow unaligned memory accesses on ARM
Jussi Kivilinna [Fri, 16 Aug 2013 16:44:55 +0000 (19:44 +0300)]
cipher: bufhelp: allow unaligned memory accesses on ARM

* cipher/bufhelp.h [__arm__ && __ARM_FEATURE_UNALIGNED]: Enable
BUFHELP_FAST_UNALIGNED_ACCESS.
--

Newer ARM systems support unaligned memory accesses and on gcc-4.7 and onwards
this is identified by __ARM_FEATURE_UNALIGNED macro.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoRemove burn_stack optimization
Jussi Kivilinna [Sat, 17 Aug 2013 07:48:36 +0000 (10:48 +0300)]
Remove burn_stack optimization

* src/misc.c (_gcry_burn_stack): Remove SIZEOF_UNSIGNED_LONG == 4 or 8
optimization.
--

At least GCC 4.6 on Debian Wheezy (armhf) generates wrong code for burn_stack,
causing recursive structure to be transformed in to iterative without updating
stack pointer between iterations. Therefore only first 64 bytes of stack get
zeroed. This appears to be fixed in GCC 4.7, but lets play this safe and
remove this optimization.

Better approach would probably be to add architecture specific assembly
routine(s) that replace this generic function.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocamellia: add ARMv6 assembly implementation
Jussi Kivilinna [Fri, 16 Aug 2013 11:40:34 +0000 (14:40 +0300)]
camellia: add ARMv6 assembly implementation

* cipher/Makefile.am: Add 'camellia-armv6.S'.
* cipher/camellia-armv6.S: New file.
* cipher/camellia-glue.c [USE_ARMV6_ASM]
(_gcry_camellia_armv6_encrypt_block)
(_gcry_camellia_armv6_decrypt_block): New prototypes.
[USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock)
(camellia_encrypt, camellia_decrypt): New functions.
* cipher/camellia.c [!USE_ARMV6_ASM]: Compile encryption and decryption
routines if USE_ARMV6_ASM macro is _not_ defined.
* cipher/camellia.h (USE_ARMV6_ASM): New macro.
[!USE_ARMV6_ASM] (Camellia_EncryptBlock, Camellia_DecryptBlock): If
USE_ARMV6_ASM is defined, disable these function prototypes.
(camellia) [arm]: Add 'camellia-armv6.lo'.
--

Add optimized ARMv6 assembly implementation for Camellia. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.

For now. only enable this on little-endian systems as big-endian correctness
have not been tested yet.

Old vs new. Cortex-A8 (on Debian Wheezy/armhf):
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
CAMELLIA128   1.44x   1.47x   1.35x   1.34x   1.43x   1.39x   1.38x   1.36x   1.38x   1.39x
CAMELLIA192   1.60x   1.62x   1.52x   1.47x   1.56x   1.54x   1.52x   1.53x   1.52x   1.53x
CAMELLIA256   1.59x   1.60x   1.49x   1.47x   1.53x   1.54x   1.51x   1.50x   1.52x   1.53x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoblowfish: add ARMv6 assembly implementation
Jussi Kivilinna [Fri, 16 Aug 2013 09:51:52 +0000 (12:51 +0300)]
blowfish: add ARMv6 assembly implementation

* cipher/Makefile.am: Add 'blowfish-armv6.S'.
* cipher/blowfish-armv6.S: New file.
* cipher/blowfish.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_blowfish_armv6_do_encrypt)
(_gcry_blowfish_armv6_encrypt_block)
(_gcry_blowfish_armv6_decrypt_block, _gcry_blowfish_armv6_ctr_enc)
(_gcry_blowfish_armv6_cbc_dec, _gcry_blowfish_armv6_cfb_dec): New
prototypes.
[USE_ARMV6_ASM] (do_encrypt, do_encrypt_block, do_decrypt_block)
(encrypt_block, decrypt_block): New functions.
(_gcry_blowfish_ctr_enc) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_blowfish_cbc_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_blowfish_cfb_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
* configure.ac (blowfish) [arm]: Add 'blowfish-armv6.lo'.
--

Patch provides non-parallel implementations for small speed-up and 2-way
parallel implementations that gets accelerated on multi-issue CPUs (hand-tuned
for in-order dual-issue Cortex-A8). Unaligned access handling is done in
assembly.

For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.

Old vs new (Cortex-A8, Debian Wheezy/armhf):

             ECB/Stream         CBC             CFB             OFB             CTR
  --------------- --------------- --------------- --------------- ---------------
BLOWFISH   1.28x   1.16x   1.21x   2.16x   1.26x   1.86x   1.21x   1.25x   1.89x   1.96x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocast5: add ARMv6 assembly implementation
Jussi Kivilinna [Wed, 14 Aug 2013 18:06:15 +0000 (21:06 +0300)]
cast5: add ARMv6 assembly implementation

* cipher/Makefile.am: Add 'cast5-armv6.S'.
* cipher/cast5-armv6.S: New file.
* cipher/cast5.c (USE_ARMV6_ASM): New macro.
(CAST5_context) [USE_ARMV6_ASM]: New members 'Kr_arm_enc' and
'Kr_arm_dec'.
[USE_ARMV6_ASM] (_gcry_cast5_armv6_encrypt_block)
(_gcry_cast5_armv6_decrypt_block, _gcry_cast5_armv6_ctr_enc)
(_gcry_cast5_armv6_cbc_dec, _gcry_cast5_armv6_cfb_dec): New prototypes.
[USE_ARMV6_ASM] (do_encrypt_block, do_decrypt_block, encrypt_block)
(decrypt_block): New functions.
(_gcry_cast5_ctr_enc) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_cast5_cbc_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(_gcry_cast5_cfb_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_cast_setkey) [USE_ARMV6_ASM]: Initialize 'Kr_arm_enc' and
'Kr_arm_dec'.
* configure.ac (cast5) [arm]: Add 'cast5-armv6.lo'.
--

Provides non-parallel implementations for small speed-up and 2-way parallel
implementations that gets accelerated on multi-issue CPUs (hand-tuned for
in-order dual-issue Cortex-A8). Unaligned access handling is done in assembly.

For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.

Old vs new (Cortex-A8, Debian Wheezy/armhf):

          ECB/Stream         CBC             CFB             OFB             CTR
       --------------- --------------- --------------- --------------- ---------------
CAST5   1.15x   1.12x   1.12x   2.07x   1.14x   1.60x   1.12x   1.13x   1.62x   1.63x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agorijndael: add ARMv6 assembly implementation
Jussi Kivilinna [Wed, 14 Aug 2013 14:10:00 +0000 (17:10 +0300)]
rijndael: add ARMv6 assembly implementation

* cipher/Makefile.am: Add 'rijndael-armv6.S'.
* cipher/rijndael-armv6.S: New file.
* cipher/rijndael.c (USE_ARMV6_ASM): New macro.
[USE_ARMV6_ASM] (_gcry_aes_armv6_encrypt_block)
(_gcry_aes_armv6_decrypt_block): New prototypes.
(do_encrypt_aligned) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_encrypt): Disable input/output alignment when USE_ARMV6_ASM.
(do_decrypt_aligned) [USE_ARMV6_ASM]: Use ARMv6 assembly function.
(do_decrypt): Disable input/output alignment when USE_ARMV6_ASM.
* configure.ac (HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS): New check for
gcc/as compatibility with ARM assembly implementations.
(aes) [arm]: Add 'rijndael-armv6.lo'.
--

Add optimized ARMv6 assembly implementation for AES. Implementation is tuned
for Cortex-A8. Unaligned access handling is done in assembly part.

For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.

Old vs new. Cortex-A8 (on Debian Wheezy/armhf):
          ECB/Stream         CBC             CFB             OFB             CTR
       --------------- --------------- --------------- --------------- ---------------
AES     2.61x   3.12x   2.16x   2.59x   2.26x   2.25x   2.08x   2.08x   2.23x   2.23x
AES192  2.60x   3.06x   2.18x   2.65x   2.29x   2.29x   2.12x   2.12x   2.25x   2.27x
AES256  2.62x   3.09x   2.24x   2.72x   2.30x   2.34x   2.17x   2.19x   2.32x   2.32x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocipher: fix memory leak.
NIIBE Yutaka [Thu, 8 Aug 2013 23:26:27 +0000 (08:26 +0900)]
cipher: fix memory leak.

* cipher/pubkey.c (gcry_pk_sign): Handle the specific case of ECC,
where there is NULL whichi is not the sentinel.

--

This is a kind of makeshift fix, but the MPI array API is internal
only and will be removed, it is better not to change API now.

6 years agompi: Clear immutable flag on the result of gcry_mpi_set.
Werner Koch [Thu, 8 Aug 2013 13:16:48 +0000 (15:16 +0200)]
mpi: Clear immutable flag on the result of gcry_mpi_set.

* mpi/mpiutil.c (gcry_mpi_set): Reset immutable and const flags.
* tests/mpitests.c (test_const_and_immutable): Add a test for this.
--

gcry_mpi_set shall behave like gcry_mpi_copy and thus reset those
special flags.  Problem reported by Christian Grothoff.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agotests: fix memory leaks.
NIIBE Yutaka [Tue, 6 Aug 2013 23:56:18 +0000 (08:56 +0900)]
tests: fix memory leaks.

* tests/benchmark.c (dsa_bench): Release SIG.

* tests/mpitests.c (test_powm): Release BASE, EXP, MOD, and RES.

* tests/prime.c (check_primes): Release PRIME.

* tests/tsexp.c (basic): Use intermediate variable M for constant.
Release S1, S2 and A.

6 years agoFix building on W32 (cannot export symbol 'gcry_sexp_get_buffer')
Jussi Kivilinna [Wed, 7 Aug 2013 07:36:41 +0000 (10:36 +0300)]
Fix building on W32 (cannot export symbol 'gcry_sexp_get_buffer')

* src/libgcrypt.def: Change 'gcry_sexp_get_buffer' to
'gcry_sexp_nth_buffer'.
--

Commit 2d3e8d4d9 "sexp: Add function gcry_sexp_nth_buffer." added
'gcry_sexp_get_buffer' to libgcrypt.def, when it should have been
'gcry_sexp_nth_buffer'.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocipher: fix another memory leak.
NIIBE Yutaka [Tue, 6 Aug 2013 05:38:51 +0000 (14:38 +0900)]
cipher: fix another memory leak.

* cipher/ecc.c (ecc_get_curve): Free TMP.

6 years agotests: fix memory leaks.
NIIBE Yutaka [Tue, 6 Aug 2013 03:59:35 +0000 (12:59 +0900)]
tests: fix memory leaks.

* tests/pubkey.c (check_keys_crypt): Release L, X0, and X1.
(check_keys): Release X.

6 years agocipher: fix memory leaks.
NIIBE Yutaka [Tue, 6 Aug 2013 03:57:10 +0000 (12:57 +0900)]
cipher: fix memory leaks.

* cipher/elgamal.c (elg_generate_ext): Free XVALUE.

* cipher/pubkey.c (sexp_elements_extract): Don't use IDX for loop.
Call mpi_free.
(sexp_elements_extract_ecc): Call mpi_free.

6 years agompi: Improve gcry_mpi_invm to detect bad input.
Werner Koch [Mon, 5 Aug 2013 16:58:41 +0000 (18:58 +0200)]
mpi: Improve gcry_mpi_invm to detect bad input.

* mpi/mpi-inv.c (gcry_mpi_invm): Return 0 for bad input.
--

Without this patch the function may enter and endless loop.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoCorrect checks for ecc secret key
Dmitry Eremin-Solenikov [Wed, 31 Jul 2013 13:20:58 +0000 (17:20 +0400)]
Correct checks for ecc secret key

* cipher/ecc.c (check_secret_key): replace wrong comparison of Q and
sk->Q points with correct one.

--
Currently check_secret_keys compares pointers to coordinates of Q
(calculated) and sk->Q (provided) points. Instead it should convert them
to affine representations and use mpi_cmp to compare coordinates.

This has an implication that keys that were (erroneously) verified as
valid could now become invalid.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
6 years agosexp: Allow white space anywhere in a hex format.
Werner Koch [Mon, 29 Jul 2013 13:16:02 +0000 (15:16 +0200)]
sexp: Allow white space anywhere in a hex format.

* src/sexp.c (hextobyte): Remove.
(hextonibble): New.
(vsexp_sscan): Skip whtespace between hex nibbles.
--

Before that patch a string
  "(a #123"
  "    456#")
was not correctly parsed because white space was only allowed between
two hex digits but not in between nibbles.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoImplement deterministic ECDSA as specified by rfc-6979.
Werner Koch [Mon, 29 Jul 2013 13:09:33 +0000 (15:09 +0200)]
Implement deterministic ECDSA as specified by rfc-6979.

* cipher/ecc.c (sign): Add args FLAGS and HASHALGO.  Convert an opaque
MPI as INPUT.  Implement rfc-6979.
(ecc_sign): Remove the opaque MPI code and pass FLAGS to sign.
(verify): Do not allocate and compute Y; it is not used.
(ecc_verify): Truncate the hash value if needed.
* tests/dsa-rfc6979.c (check_dsa_rfc6979): Add ECDSA test cases.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoImplement deterministic DSA as specified by rfc-6979.
Werner Koch [Fri, 26 Jul 2013 18:15:53 +0000 (20:15 +0200)]
Implement deterministic DSA as specified by rfc-6979.

* cipher/dsa.c (dsa_sign): Move opaque mpi extraction to sign.
(sign): Add args FLAGS and HASHALGO.  Implement deterministic DSA.
Add code path for R==0 to comply with the standard.
(dsa_verify): Left fill opaque mpi based hash values.
* cipher/dsa-common.c (int2octets, bits2octets): New.
(_gcry_dsa_gen_rfc6979_k): New.
* tests/dsa-rfc6979.c: New.
* tests/Makefile.am (TESTS): Add dsa-rfc6979.
--

This patch also fixes a recent patch (37d0a1e) which allows to pass
the hash in a (hash) element.

Support for deterministic ECDSA will come soon.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoAllow the use of a private-key s-expression with gcry_pk_verify.
Werner Koch [Fri, 26 Jul 2013 17:22:36 +0000 (19:22 +0200)]
Allow the use of a private-key s-expression with gcry_pk_verify.

* cipher/pubkey.c (sexp_to_key): Fallback to private key.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoMitigate a flush+reload cache attack on RSA secret exponents.
Werner Koch [Thu, 25 Jul 2013 09:17:52 +0000 (11:17 +0200)]
Mitigate a flush+reload cache attack on RSA secret exponents.

* mpi/mpi-pow.c (gcry_mpi_powm): Always perfrom the mpi_mul for
exponents in secure memory.
--

The attack is published as http://eprint.iacr.org/2013/448 :

Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel
Attack by Yuval Yarom and Katrina Falkner. 18 July 2013.

  Flush+Reload is a cache side-channel attack that monitors access to
  data in shared pages. In this paper we demonstrate how to use the
  attack to extract private encryption keys from GnuPG.  The high
  resolution and low noise of the Flush+Reload attack enables a spy
  program to recover over 98% of the bits of the private key in a
  single decryption or signing round. Unlike previous attacks, the
  attack targets the last level L3 cache. Consequently, the spy
  program and the victim do not need to share the execution core of
  the CPU. The attack is not limited to a traditional OS and can be
  used in a virtualised environment, where it can attack programs
  executing in a different VM.

(cherry picked from commit 55237c8f6920c6629debd23db65e90b42a3767de)

6 years agopk: Allow the use of a hash element for DSA sign and verify.
Werner Koch [Fri, 19 Jul 2013 16:14:38 +0000 (18:14 +0200)]
pk: Allow the use of a hash element for DSA sign and verify.

* cipher/pubkey.c (pubkey_sign): Add arg ctx and pass it to the sign
module.
(gcry_pk_sign): Pass CTX to pubkey_sign.
(sexp_data_to_mpi): Add flag rfc6979 and code to alls hash with *DSA
* cipher/rsa.c (rsa_sign, rsa_verify): Return an error if an opaque
MPI is given for DATA/HASH.
* cipher/elgamal.c (elg_sign, elg_verify): Ditto.
* cipher/dsa.c (dsa_sign, dsa_verify): Convert a given opaque MPI.
* cipher/ecc.c (ecc_sign, ecc_verify): Ditto.
* tests/basic.c (check_pubkey_sign_ecdsa): Add a test for using a hash
element with DSA.
--

This patch allows the use of

  (data (flags raw)
    (hash sha256 #80112233445566778899AABBCCDDEEFF
                  000102030405060708090A0B0C0D0E0F#))

in addition to the old but more efficient

  (data (flags raw)
    (value #80112233445566778899AABBCCDDEEFF
            000102030405060708090A0B0C0D0E0F#))

for DSA and ECDSA.  With the hash element the flag "raw" must be
explicitly given because existing regression test code expects that
conflict error is return if no flags but a hash element is given.

Note that the hash algorithm name is currently not checked.  It may
eventually be used to cross-check the length of the provided hash
value.  It is suggested that the correct hash name is given - even if
a truncated hash value is used.

Finally this patch adds a way to pass the hash algorithm and flag
values to the signing module.  "rfc6979" as been implemented as a new
but not yet used flag.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agosexp: Add function gcry_sexp_nth_buffer.
Werner Koch [Fri, 19 Jul 2013 13:54:03 +0000 (15:54 +0200)]
sexp: Add function gcry_sexp_nth_buffer.

* src/sexp.c (gcry_sexp_nth_buffer): New.
* src/visibility.c, src/visibility.h: Add function wrapper.
* src/libgcrypt.vers, src/libgcrypt.def: Add to API.
* src/gcrypt.h.in: Add prototype.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoUpdate AUTHORS with info on Salsa20.
Werner Koch [Thu, 18 Jul 2013 19:37:35 +0000 (21:37 +0200)]
Update AUTHORS with info on Salsa20.

--

6 years agoAdd support for Salsa20.
Werner Koch [Thu, 18 Jul 2013 19:32:05 +0000 (21:32 +0200)]
Add support for Salsa20.

* src/gcrypt.h.in (GCRY_CIPHER_SALSA20): New.
* cipher/salsa20.c: New.
* configure.ac (available_ciphers): Add Salsa20.
* cipher/cipher.c: Register Salsa20.
(cipher_setiv): Allow to divert an IV to a cipher module.
* src/cipher-proto.h (cipher_setiv_func_t): New.
(cipher_extra_spec): Add field setiv.
* src/cipher.h: Declare Salsa20 definitions.
* tests/basic.c (check_stream_cipher): New.
(check_stream_cipher_large_block): New.
(check_cipher_modes): Run new test functions.
(check_ciphers): Add simple test for Salsa20.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoTypo fix in comment.
Werner Koch [Wed, 17 Jul 2013 14:55:37 +0000 (16:55 +0200)]
Typo fix in comment.

--

6 years agoAllow gcry_mpi_dump to print opaque MPIs.
Werner Koch [Wed, 17 Jul 2013 14:55:02 +0000 (16:55 +0200)]
Allow gcry_mpi_dump to print opaque MPIs.

* mpi/mpicoder.c (gcry_mpi_dump): Detect abd print opaque MPIs.
* tests/mpitests.c (test_opaque): New.
(main): Call new test.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agocipher: Prepare to pass extra info to the sign functions.
Werner Koch [Wed, 17 Jul 2013 13:54:32 +0000 (15:54 +0200)]
cipher: Prepare to pass extra info to the sign functions.

* src/gcrypt-module.h (gcry_pk_sign_t): Add parms flags and hashalgo.
* cipher/rsa.c (rsa_sign): Add parms and mark them as unused.
* cipher/dsa.c (dsa_sign): Ditto.
* cipher/elgamal.c (elg_sign): Ditto.
* cipher/pubkey.c (dummy_sign): Ditto.
(pubkey_sign): Pass 0 for the new args.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoFix a special case bug in mpi_powm for e==0.
Werner Koch [Wed, 17 Jul 2013 08:18:39 +0000 (10:18 +0200)]
Fix a special case bug in mpi_powm for e==0.

* mpi/mpi-pow.c (gcry_mpi_powm): For a zero exponent, make sure that
the result has been allocated.
--

This code triggered the problem:

    modulus = gcry_mpi_set_ui(NULL, 100);
    generator = gcry_mpi_set_ui(NULL, 3);
    exponent = gcry_mpi_set_ui(NULL, 0);
    result = gcry_mpi_new(0);
    gcry_mpi_powm(result, generator, exponent, modulus);

gcry_mpi_new(0) does not allocate the limb space thus it is not
possible to write even into the first limb.  Workaround was to use
gcry_mpi_new (1) but a real fix is better.

Reported-by: Ian Goldberg
Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoRegister DCO for Dmitry Kasatkin.
Werner Koch [Mon, 15 Jul 2013 07:46:38 +0000 (09:46 +0200)]
Register DCO for Dmitry Kasatkin.

--

6 years agoFix memory leak in t-mpi-point test
Dmitry Eremin-Solenikov [Sat, 13 Jul 2013 14:50:05 +0000 (18:50 +0400)]
Fix memory leak in t-mpi-point test

* tests/t-mpi-point.c (basic_ec_math, basic_ec_math_simplified): add
calls to gcry_ctx_release() to free contexts after they become unused.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
6 years agoFix 'Please include winsock2.h before windows.h' warnings with mingw32
Jussi Kivilinna [Wed, 26 Jun 2013 12:28:49 +0000 (15:28 +0300)]
Fix 'Please include winsock2.h before windows.h' warnings with mingw32

* random/rndw32.c: include winsock2.h before windows.h.
* src/ath.h [_WIN32]: Ditto.
* tests/benchmark.c [_WIN32]: Ditto.
--

Patch silences warnings of following type:
/usr/lib/gcc/i686-w64-mingw32/4.6/../../../../i686-w64-mingw32/include/winsock2.h:15:2: warning: #warning Please include winsock2.h before windows.h [-Wcpp]

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoRemove duplicate header from mpi/amd64/mpih-mul2.S
Jussi Kivilinna [Wed, 26 Jun 2013 13:57:00 +0000 (16:57 +0300)]
Remove duplicate header from mpi/amd64/mpih-mul2.S

* mpi/amd64/mpih-mul2.S: remove duplicated header.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoFix i386/amd64 inline assembly "cc" clobbers
Jussi Kivilinna [Thu, 27 Jun 2013 11:40:12 +0000 (14:40 +0300)]
Fix i386/amd64 inline assembly "cc" clobbers

* cipher/bithelp.h [__GNUC__, __i386__] (rol, ror): add "cc" globber
for inline assembly.
* cipher/cast5.c [__GNUC__, __i386__] (rol): Ditto.
* random/rndhw.c [USE_DRNG] (rdrand_long): Ditto.
* src/hmac256.c [__GNUC__, __i386__] (ror): Ditto.
* mpi/longlong.c [__i386__] (add_ssaaaa, sub_ddmmss, umul_ppmm)
(udiv_qrnnd, count_leading_zeros, count_trailing_zeros): Ditto.
--

These assembly snippets modify cflags but do not mark "cc" clobber.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agobufhelp: Suppress 'cast increases required alignment' warning
Jussi Kivilinna [Wed, 3 Jul 2013 09:14:56 +0000 (12:14 +0300)]
bufhelp: Suppress 'cast increases required alignment' warning

* cipher/bufhelp.h (buf_xor, buf_xor_2dst, buf_xor_n_copy): Cast
to larger element pointer through (void *) to suppress -Wcast-error.
--

Patch disables bogus warnings caused by -Wcast-error. We know that byte
pointers are properly aligned at these phases, or that hardware can handle
unaligned accesses.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agompi: Add __ARM_ARCH for older GCC
Jussi Kivilinna [Wed, 3 Jul 2013 08:32:25 +0000 (11:32 +0300)]
mpi: Add __ARM_ARCH for older GCC

* mpi/longlong.h [__arm__]: Construct __ARM_ARCH if not provided by
compiler.
--

GCC 4.8 defines __ARM_ARCH which provides forward compatible way to detect
ARM architecture. Use this when available and construct otherwise.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agompi: add missing "cc" clobber for ARM assembly
Jussi Kivilinna [Wed, 3 Jul 2013 12:10:11 +0000 (15:10 +0300)]
mpi: add missing "cc" clobber for ARM assembly

* mpi/longlong.h [__arm__] (add_ssaaaa, sub_ddmmss): Add __CLOBBER_CC.
[__arm__][__ARM_ARCH <= 3] (umul_ppmm): Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoTweak ARM inline assembly for mpi
Jussi Kivilinna [Wed, 3 Jul 2013 08:14:56 +0000 (11:14 +0300)]
Tweak ARM inline assembly for mpi

mpi/longlong.h [__arm__]: Enable inline assembly if __thumb2__ is
defined.
[__arm__]: Use __ARCH_ARM when defined.
[__arm__] [__ARM_ARCH >= 5] (count_leading_zeros): New.
--

Current ARM Linux distributions use EABI that enables thumb2, and therefore
inline assembly is disable (because !defined(__thumb__) selector). However
thumb2 allows the use of assembly instructions that longlong.h contains for
ARM. So this patch enables inline assembly for ARM when __thumb2__ is defined
in addition to __thumb__.

Patch also adds optimization for count_leading_zeros() macro for ARM.

Results on Cortex-A8, 1Ghz:
===

Before:

Algorithm         generate  100*sign  100*verify
------------------------------------------------
RSA 1024 bit         750ms    2780ms       110ms
RSA 2048 bit       14280ms   17250ms       300ms
RSA 3072 bit       38630ms   51300ms       650ms
RSA 4096 bit       60940ms   111430ms      1000ms
jussi@cubie:~/libgcrypt$ tests/benchmark dsa
Algorithm         generate  100*sign  100*verify
------------------------------------------------
DSA 1024/160             -    1410ms      1680ms
DSA 2048/224             -    6100ms      7390ms
DSA 3072/256             -   14350ms     17120ms
jussi@cubie:~/libgcrypt$ tests/benchmark ecc
Algorithm         generate  100*sign  100*verify
------------------------------------------------
ECDSA 192 bit         90ms    2160ms      3940ms
ECDSA 224 bit        110ms    2810ms      5400ms
ECDSA 256 bit        150ms    3570ms      6970ms
ECDSA 384 bit        340ms    8320ms     16420ms
ECDSA 521 bit        850ms   19760ms     38480ms

After:

jussi@cubie:~/libgcrypt$ tests/benchmark rsa
Algorithm         generate  100*sign  100*verify
------------------------------------------------
RSA 1024 bit         590ms    2230ms        80ms
RSA 2048 bit        2320ms   13090ms       240ms
RSA 3072 bit       60580ms   38420ms       460ms
RSA 4096 bit       115130ms   82250ms       750ms
jussi@cubie:~/libgcrypt$ tests/benchmark dsa
Algorithm         generate  100*sign  100*verify
------------------------------------------------
DSA 1024/160             -    1070ms      1290ms
DSA 2048/224             -    4500ms      5550ms
DSA 3072/256             -   10280ms     12200ms
jussi@cubie:~/libgcrypt$ tests/benchmark ecc
Algorithm         generate  100*sign  100*verify
------------------------------------------------
ECDSA 192 bit         70ms    1900ms      3560ms
ECDSA 224 bit        100ms    2490ms      4750ms
ECDSA 256 bit        120ms    3140ms      5920ms
ECDSA 384 bit        270ms    6990ms     13790ms
ECDSA 521 bit        680ms   17080ms     33490ms

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoMake gpg-error replacement defines more robust.
Werner Koch [Wed, 26 Jun 2013 09:09:42 +0000 (11:09 +0200)]
Make gpg-error replacement defines more robust.

* configure.ac (AH_BOTTOM): Move GPG_ERR_ replacement defines to ...
* src/gcrypt-int.h: new file.
* src/visibility.h, src/cipher.h: Replace gcrypt.h by gcrypt-int.h.
* tests/: Ditto for all test files.
--

Defining newer gpg-error codes in config.h was not a good idea,
because config.h is usually included before gpg-error.h and thus
gpg-error.h would be double defines to lead to faulty code there like

  typedef enum
    {
      [...]
      191 = 191,
      [...]
    };

6 years agoCheck if assembler is compatible with AMD64 assembly implementations cipher-amd64-optimizations
Jussi Kivilinna [Thu, 20 Jun 2013 11:20:36 +0000 (14:20 +0300)]
Check if assembler is compatible with AMD64 assembly implementations

* cipher/blowfish-amd64.S: Enable only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined.
* cipher/camellia-aesni-avx-amd64.S: Ditto.
* cipher/camellia-aesni-avx2-amd64.S: Ditto.
* cipher/cast5-amd64.S: Ditto.
* cipher/rinjdael-amd64.S: Ditto.
* cipher/serpent-avx2-amd64.S: Ditto.
* cipher/serpent-sse2-amd64.S: Ditto.
* cipher/twofish-amd64.S: Ditto.
* cipher/blowfish.c: Use AMD64 assembly implementation only if
HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS is defined
* cipher/camellia-glue.c: Ditto.
* cipher/cast5.c: Ditto.
* cipher/rijndael.c: Ditto.
* cipher/serpent.c: Ditto.
* cipher/twofish.c: Ditto.
* configure.ac: Check gcc/as compatibility with AMD64 assembly
implementations.
--

Later these checks can be split and assembly implementations adapted to handle
different platforms, but for now disable AMD64 assembly implementations if
assembler does not look to be able to handle them.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoOptimize _gcry_burn_stack for 32-bit and 64-bit architectures
Jussi Kivilinna [Sun, 9 Jun 2013 13:37:38 +0000 (16:37 +0300)]
Optimize _gcry_burn_stack for 32-bit and 64-bit architectures

* src/misc.c (_gcry_burn_stack): Add optimization for 32-bit and 64-bit
architectures.
--

Busy looping 'tests/benchmark --cipher-repetitions 10 cipher blowfish' on ARM
Cortex-A8 shows that _gcry_burn_stack takes 21% of CPU time. With this patch,
that number drops to 3.4%.

On AMD64 (Intel i5-4570) CPU usage for _gcry_burn_stack in the same test drops
from 3.5% to 1.1%.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd Camellia AES-NI/AVX2 implementation
Jussi Kivilinna [Sun, 9 Jun 2013 13:37:38 +0000 (16:37 +0300)]
Add Camellia AES-NI/AVX2 implementation

* cipher/Makefile.am: Add 'camellia-aesni-avx2-amd64.S'.
* cipher/camellia-aesni-avx2-amd64.S: New file.
* cipher/camellia-glue.c (USE_AESNI_AVX2): New macro.
(CAMELLIA_context) [USE_AESNI_AVX2]: Add 'use_aesni_avx2'.
[USE_AESNI_AVX2] (_gcry_camellia_aesni_avx2_ctr_enc)
(_gcry_camellia_aesni_avx2_cbc_dec)
(_gcry_camellia_aesni_avx2_cfb_dec): New prototypes.
(camellia_setkey) [USE_AESNI_AVX2]: Check AVX2+AES-NI capable hardware
and set 'ctx->use_aesni_avx2'.
(_gcry_camellia_ctr_enc) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(_gcry_camellia_cbc_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(_gcry_camellia_cfb_dec) [USE_AESNI_AVX2]: Add AVX2 accelerated code.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks'
so that AVX2 codepaths get tested.
* configure.ac (camellia) [avx2support, aesnisupport]: Add
'camellia-aesni-avx2-amd64.lo'.
--

Add new AVX2/AES-NI implementation of Camellia that processes 32 blocks in
parallel.

Speed old (AVX/AES-NI) vs. new (AVX2/AES-NI) on Intel Core i5-4570:
                 ECB/Stream         CBC             CFB             OFB             CTR
              --------------- --------------- --------------- --------------- ---------------
CAMELLIA128    1.00x   0.99x   1.00x   1.53x   1.00x   1.49x   1.00x   1.00x   1.54x   1.54x
CAMELLIA256    0.99x   1.00x   1.00x   1.50x   1.00x   1.50x   1.00x   1.00x   1.54x   1.52x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd Serpent AVX2 implementation
Jussi Kivilinna [Sun, 9 Jun 2013 13:37:38 +0000 (16:37 +0300)]
Add Serpent AVX2 implementation

* cipher/Makefile.am: Add 'serpent-avx2-amd64.S'.
* cipher/serpent-avx2-amd64.S: New file.
* cipher/serpent.c (USE_AVX2): New macro.
(serpent_context_t) [USE_AVX2]: Add 'use_avx2'.
[USE_AVX2] (_gcry_serpent_avx2_ctr_enc, _gcry_serpent_avx2_cbc_dec)
(_gcry_serpent_avx2_cfb_dec): New prototypes.
(serpent_setkey_internal) [USE_AVX2]: Check for AVX2 capable hardware
and set 'use_avx2'.
(_gcry_serpent_ctr_enc) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cbc_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(_gcry_serpent_cfb_dec) [USE_AVX2]: Use AVX2 accelerated functions.
(selftest_ctr_128, selftest_cbc_128, selftest_cfb_128): Grow 'nblocks'
so that AVX2 codepaths are tested.
* configure.ac (serpent) [avx2support]: Add 'serpent-avx2-amd64.lo'.
--

Add new AVX2 implementation of Serpent that processes 16 blocks in parallel.

Speed old (SSE2) vs. new (AVX2) on Intel Core i5-4570:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.00x   1.00x   1.00x   2.10x   1.00x   2.16x   1.01x   1.00x   2.16x   2.18x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd detection for Intel AVX2 instruction set
Jussi Kivilinna [Sun, 9 Jun 2013 13:37:38 +0000 (16:37 +0300)]
Add detection for Intel AVX2 instruction set

* configure.ac: Add option --disable-avx2-support.
(HAVE_GCC_INLINE_ASM_AVX2): New.
(ENABLE_AVX2_SUPPORT): New.
* src/g10lib.h (HWF_INTEL_AVX2): New.
* src/global.c (hwflist): Add HWF_INTEL_AVX2.
* src/hwf-x86.c [__i386__] (get_cpuid): Initialize registers to zero
before cpuid.
[__x86_64__] (get_cpuid): Initialize registers to zero before cpuid.
(detect_x86_gnuc): Store maximum cpuid level.
(detect_x86_gnuc) [ENABLE_AVX2_SUPPORT]: Add detection for AVX2.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agotwofish: add amd64 assembly implementation
Jussi Kivilinna [Sun, 9 Jun 2013 13:37:38 +0000 (16:37 +0300)]
twofish: add amd64 assembly implementation

* cipher/Makefile.am: Add 'twofish-amd64.S'.
* cipher/twofish-amd64.S: New file.
* cipher/twofish.c (USE_AMD64_ASM): New macro.
[USE_AMD64_ASM] (_gcry_twofish_amd64_encrypt_block)
(_gcry_twofish_amd64_decrypt_block, _gcry_twofish_amd64_ctr_enc)
(_gcry_twofish_amd64_cbc_dec, _gcry_twofish_amd64_cfb_dec): New
prototypes.
[USE_AMD64_ASM] (do_twofish_encrypt, do_twofish_decrypt)
(twofish_encrypt, twofish_decrypt): New functions.
(_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec, _gcry_twofish_cfb_dec)
(selftest_ctr, selftest_cbc, selftest_cfb): New functions.
(selftest): Call new bulk selftests.
* cipher/cipher.c (gcry_cipher_open) [USE_TWOFISH]: Register Twofish
bulk functions for ctr-enc, cbc-dec and cfb-dec.
* configure.ac (twofish) [x86_64]: Add 'twofish-amd64.lo'.
* src/cipher.h (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec)
(gcry_twofish_cfb_dec): New prototypes.
--

Provides non-parallel implementations for small speed-up and 3-way parallel
implementations that gets accelerated on `out-of-order' CPUs.

Speed old vs. new on Intel Core i5-4570:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
TWOFISH128     1.08x  1.07x    1.10x  1.80x    1.09x  1.70x    1.08x  1.08x    1.70x  1.69x

Speed old vs. new on Intel Core2 T8100:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
TWOFISH128     1.11x  1.10x    1.13x  1.65x    1.13x  1.62x    1.12x  1.11x    1.63x  1.59x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agorinjdael: add amd64 assembly implementation
Jussi Kivilinna [Wed, 29 May 2013 13:40:27 +0000 (16:40 +0300)]
rinjdael: add amd64 assembly implementation

* cipher/Makefile.am: Add 'rijndael-amd64.S'.
* cipher/rijndael-amd64.S: New file.
* cipher/rijndael.c (USE_AMD64_ASM): New macro.
[USE_AMD64_ASM] (_gcry_aes_amd64_encrypt_block)
(_gcry_aes_amd64_decrypt_block): New prototypes.
(do_encrypt_aligned) [USE_AMD64_ASM]: Use amd64 assembly function.
(do_encrypt): Disable input/output alignment when USE_AMD64_ASM is set.
(do_decrypt_aligned) [USE_AMD64_ASM]: Use amd64 assembly function.
(do_decrypt): Disable input/output alignment when USE_AMD64_AES is set.
* configure.ac (aes) [x86-64]: Add 'rijndael-amd64.lo'.
--

Add optimized amd64 assembly implementation for AES.

Old vs new, on AMD Phenom II:
          ECB/Stream         CBC             CFB             OFB             CTR
       --------------- --------------- --------------- --------------- ---------------
AES     1.74x   1.72x   1.81x   1.85x   1.82x   1.76x   1.67x   1.64x   1.79x   1.81x
AES192  1.77x   1.77x   1.79x   1.88x   1.90x   1.80x   1.69x   1.69x   1.85x   1.81x
AES256  1.79x   1.81x   1.83x   1.89x   1.88x   1.82x   1.72x   1.70x   1.87x   1.89x

Old vs new, on Intel Core2:
          ECB/Stream         CBC             CFB             OFB             CTR
       --------------- --------------- --------------- --------------- ---------------
AES     1.77x   1.75x   1.78x   1.76x   1.76x   1.77x   1.75x   1.76x   1.76x   1.82x
AES192  1.80x   1.73x   1.81x   1.76x   1.79x   1.85x   1.77x   1.76x   1.80x   1.85x
AES256  1.81x   1.77x   1.81x   1.77x   1.80x   1.79x   1.78x   1.77x   1.81x   1.85x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoblowfish: add amd64 assembly implementation
Jussi Kivilinna [Wed, 29 May 2013 13:40:27 +0000 (16:40 +0300)]
blowfish: add amd64 assembly implementation

* cipher/Makefile.am: Add 'blowfish-amd64.S'.
* cipher/blowfish-amd64.S: New file.
* cipher/blowfish.c (USE_AMD64_ASM): New macro.
[USE_AMD64_ASM] (_gcry_blowfish_amd64_do_encrypt)
(_gcry_blowfish_amd64_encrypt_block)
(_gcry_blowfish_amd64_decrypt_block, _gcry_blowfish_amd64_ctr_enc)
(_gcry_blowfish_amd64_cbc_dec, _gcry_blowfish_amd64_cfb_dec): New
prototypes.
[USE_AMD64_ASM] (do_encrypt, do_encrypt_block, do_decrypt_block)
(encrypt_block, decrypt_block): New functions.
(_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec, selftest_ctr, selftest_cbc, selftest_cfb): New
functions.
(selftest): Call new bulk selftests.
* cipher/cipher.c (gcry_cipher_open) [USE_BLOWFISH]: Register Blowfish
bulk functions for ctr-enc, cbc-dec and cfb-dec.
* configure.ac (blowfish) [x86_64]: Add 'blowfish-amd64.lo'.
* src/cipher.h (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(gcry_blowfish_cfb_dec): New prototypes.
--

Add non-parallel functions for small speed-up and 4-way parallel functions for
modes of operation that support parallel processing.

Speed old vs. new on AMD Phenom II X6 1055T:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
BLOWFISH      1.21x   1.12x   1.17x   3.52x   1.18x   3.34x   1.16x   1.15x   3.38x   3.47x

Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
BLOWFISH      1.16x   1.10x   1.17x   2.98x   1.18x   2.88x   1.16x   1.15x   3.00x   3.02x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoecc: Simplify the compliant point generation.
Werner Koch [Fri, 24 May 2013 14:54:52 +0000 (16:54 +0200)]
ecc: Simplify the compliant point generation.

* cipher/ecc.c (generate_key): Use point_snatch_set, replaces unneeded
variable copies, etc.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Fix a minor flaw in the generation of K.
Werner Koch [Fri, 24 May 2013 13:52:37 +0000 (15:52 +0200)]
ecc: Fix a minor flaw in the generation of K.

* cipher/dsa.c (gen_k): Factor code out to ..
* cipher/dsa-common.c (_gcry_dsa_gen_k): new file and function.  Add
arg security_level and re-indent a bit.
* cipher/ecc.c (gen_k): Remove and change callers to _gcry_dsa_gen_k.
* cipher/dsa.c: Include pubkey-internal.
* cipher/Makefile.am (libcipher_la_SOURCES): Add dsa-common.c
--

The ECDSA code used the simple $k = k \bmod p$ method which introduces
a small bias.  We now use the bias free method we have always used
with DSA.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agocast5: add amd64 assembly implementation
Jussi Kivilinna [Fri, 24 May 2013 09:43:29 +0000 (12:43 +0300)]
cast5: add amd64 assembly implementation

* cipher/Makefile.am: Add 'cast5-amd64.S'.
* cipher/cast5-amd64.S: New file.
* cipher/cast5.c (USE_AMD64_ASM): New macro.
(_gcry_cast5_s1tos4): Merge arrays s1, s2, s3, s4 to single array to
simplify access from assembly implementation.
(s1, s2, s3, s4): New macros pointing to subarrays in
_gcry_cast5_s1tos4.
[USE_AMD64_ASM] (_gcry_cast5_amd64_encrypt_block)
(_gcry_cast5_amd64_decrypt_block, _gcry_cast5_amd64_ctr_enc)
(_gcry_cast5_amd64_cbc_dec, _gcry_cast5_amd64_cfb_dec): New prototypes.
[USE_AMD64_ASM] (do_encrypt_block, do_decrypt_block, encrypt_block)
(decrypt_block): New functions.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec)
(selftest_ctr, selftest_cbc, selftest_cfb): New functions.
(selftest): Call new bulk selftests.
* cipher/cipher.c (gcry_cipher_open) [USE_CAST5]: Register CAST5 bulk
functions for ctr-enc, cbc-dec and cfb-dec.
* configure.ac (cast5) [x86_64]: Add 'cast5-amd64.lo'.
* src/cipher.h (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec)
(gcry_cast5_cfb_dec): New prototypes.
--

Provides non-parallel implementations for small speed-up and 4-way parallel
implementations that gets accelerated on `out-of-order' CPUs.

Speed old vs. new on AMD Phenom II X6 1055T:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
CAST5         1.23x   1.22x   1.21x   2.86x   1.21x   2.83x   1.22x   1.17x   2.73x   2.73x

Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
CAST5         1.00x   1.04x   1.06x   2.56x   1.06x   2.37x   1.03x   1.01x   2.43x   2.41x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocipher-selftest: make selftest work with any block-size
Jussi Kivilinna [Fri, 24 May 2013 09:43:24 +0000 (12:43 +0300)]
cipher-selftest: make selftest work with any block-size

* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc_128)
(_gcry_selftest_helper_cfb_128, _gcry_selftest_helper_ctr_128): Renamed
functions from '<name>_128' to '<name>'.
(_gcry_selftest_helper_cbc, _gcry_selftest_helper_cfb)
(_gcry_selftest_helper_ctr): Make work with different block sizes.
* cipher/cipher-selftest.h (_gcry_selftest_helper_cbc_128)
(_gcry_selftest_helper_cfb_128, _gcry_selftest_helper_ctr_128): Renamed
prototypes from '<name>_128' to '<name>'.
* cipher/camellia-glue.c (selftest_ctr_128, selftest_cfb_128)
(selftest_ctr_128): Change to use new function names.
* cipher/rijndael.c (selftest_ctr_128, selftest_cfb_128)
(selftest_ctr_128): Change to use new function names.
* cipher/serpent.c (selftest_ctr_128, selftest_cfb_128)
(selftest_ctr_128): Change to use new function names.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoserpent: add parallel processing for CFB decryption
Jussi Kivilinna [Thu, 23 May 2013 11:15:51 +0000 (14:15 +0300)]
serpent: add parallel processing for CFB decryption

* cipher/cipher.c (gcry_cipher_open): Add bulf CFB decryption function
for Serpent.
* cipher/serpent-sse2-amd64.S (_gcry_serpent_sse2_cfb_dec): New
function.
* cipher/serpent.c (_gcry_serpent_sse2_cfb_dec): New prototype.
(_gcry_serpent_cfb_dec) New function.
(selftest_cfb_128) New function.
(selftest) Call selftest_cfb_128.
* src/cipher.h (_gcry_serpent_cfb_dec): New prototype.
--

Patch makes Serpent-CFB decryption 4.0 times faster on Intel Sandy-Bridge and
2.7 times faster on AMD K10.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocamellia: add parallel processing for CFB decryption
Jussi Kivilinna [Thu, 23 May 2013 11:15:46 +0000 (14:15 +0300)]
camellia: add parallel processing for CFB decryption

* cipher/camellia-aesni-avx-amd64.S
(_gcry_camellia_aesni_avx_cfb_dec): New function.
* cipher/camellia-glue.c (_gcry_camellia_aesni_avx_cfb_dec): New
prototype.
(_gcry_camellia_cfb_dec): New function.
(selftest_cfb_128): New function.
(selftest): Call selftest_cfb_128.
* cipher/cipher.c (gry_cipher_open): Add bulk CFB decryption function
for Camellia.
* src/cipher.h (_gcry_camellia_cfb_dec): New prototype.
--

Patch makes Camellia-CFB decryption 4.7 times faster on Intel Sandy-Bridge.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agorinjdael: add parallel processing for CFB decryption with AES-NI
Jussi Kivilinna [Thu, 23 May 2013 11:15:41 +0000 (14:15 +0300)]
rinjdael: add parallel processing for CFB decryption with AES-NI

* cipher/cipher-selftest.c (_gcry_selftest_helper_cfb_128): New
function for CFB selftests.
* cipher/cipher-selftest.h (_gcry_selftest_helper_cfb_128): New
prototype.
* cipher/rijndael.c [USE_AESNI] (do_aesni_enc_vec4): New function.
(_gcry_aes_cfb_dec) [USE_AESNI]: Add parallelized CFB decryption.
(selftest_cfb_128): New function.
(selftest): Call selftest_cfb_128.
--

CFB decryption can be parallelized for additional performance. On Intel
Sandy-Bridge processor, this change makes CFB decryption 4.6 times faster.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAvoid compiler warning due to the global symbol setkey.
Werner Koch [Thu, 18 Apr 2013 12:40:43 +0000 (14:40 +0200)]
Avoid compiler warning due to the global symbol setkey.

* cipher/cipher-selftest.c (_gcry_selftest_helper_cbc_128)
(_gcry_selftest_helper_ctr_128): Rename setkey to setkey_func.
--

setkey is a POSIX.1 function defined in stdlib.

6 years agoserpent: add SSE2 accelerated amd64 implementation
Jussi Kivilinna [Thu, 23 May 2013 08:04:18 +0000 (11:04 +0300)]
serpent: add SSE2 accelerated amd64 implementation

* configure.ac (serpent): Add 'serpent-sse2-amd64.lo'.
* cipher/Makefile.am (EXTRA_libcipher_la_SOURCES): Add
'serpent-sse2-amd64.S'.
* cipher/cipher.c (gcry_cipher_open) [USE_SERPENT]: Register bulk
functions for CBC-decryption and CTR-mode.
* cipher/serpent.c (USE_SSE2): New macro.
[USE_SSE2] (_gcry_serpent_sse2_ctr_enc, _gcry_serpent_sse2_cbc_dec):
New prototypes to assembler functions.
(serpent_setkey): Set 'serpent_init_done' before calling serpent_test.
(_gcry_serpent_ctr_enc): New function.
(_gcry_serpent_cbc_dec): New function.
(selftest_ctr_128): New function.
(selftest_cbc_128): New function.
(selftest): Call selftest_ctr_128 and selftest_cbc_128.
* cipher/serpent-sse2-amd64.S: New file.
* src/cipher.h (_gcry_serpent_ctr_enc): New prototype.
(_gcry_serpent_cbc_dec): New prototype.
--

[v2]: Converted to SSE2, to support all amd64 processors (SSE2 is required
      feature by AMD64 SysV ABI).

Patch adds word-sliced SSE2 implementation of Serpent for amd64 for speeding
up parallelizable workloads (CTR mode, CBC mode decryption). Implementation
processes eight blocks in parallel, with two four-block sets interleaved for
out-of-order scheduling.

Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.00x   0.99x   1.00x   3.98x   1.00x   1.01x   1.00x   1.01x   4.04x   4.04x

Speed old vs. new on AMD Phenom II X6 1055T:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.02x   1.01x   1.00x   2.83x   1.00x   1.00x   1.00x   1.00x   2.72x   2.72x

Speed old vs. new on Intel Core2 Duo T8100:
                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
SERPENT128    1.00x   1.02x   0.97x   4.02x   0.98x   1.01x   0.98x   1.00x   3.82x   3.91x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoSerpent: faster S-box implementation
Jussi Kivilinna [Thu, 23 May 2013 08:04:13 +0000 (11:04 +0300)]
Serpent: faster S-box implementation

* cipher/serpent.c (SBOX0, SBOX1, SBOX2, SBOX3, SBOX4, SBOX5, SBOX6)
(SBOX7, SBOX0_INVERSE, SBOX1_INVERSE, SBOX2_INVERSE, SBOX3_INVERSE)
(SBOX4_INVERSE, SBOX5_INVERSE, SBOX6_INVERSE, SBOX7_INVERSE): Replace
with new definitions.
--

These new S-box definitions are from paper:
 D. A. Osvik, “Speeding up Serpent,” in Third AES Candidate Conference,
 (New York, New York, USA), p. 317–329, National Institute of Standards and
 Technology, 2000. Available at http://www.ii.uib.no/~osvik/pub/aes3.ps.gz

Although these were optimized for two-operand instructions on i386 and for
old Pentium-1 processors, they are slightly faster on current processors
on i386 and x86-64. On ARM, the performance of these S-boxes is about the
same as with the old S-boxes.

new vs old speed ratios (AMD K10, x86-64):
                 ECB/Stream         CBC             CFB             OFB             CTR
              --------------- --------------- --------------- --------------- ---------------
 SERPENT128     1.06x   1.02x   1.06x   1.02x   1.06x   1.06x   1.06x   1.05x   1.07x   1.07x

new vs old speed ratios (Intel Atom, i486):
                 ECB/Stream         CBC             CFB             OFB             CTR
              --------------- --------------- --------------- --------------- ---------------
 SERPENT128     1.12x   1.15x   1.12x   1.15x   1.13x   1.11x   1.12x   1.12x   1.12x   1.13x

new vs old speed ratios (ARM Cortex A8):
                 ECB/Stream         CBC             CFB             OFB             CTR
              --------------- --------------- --------------- --------------- ---------------
 SERPENT128     1.04x   1.02x   1.02x   0.99x   1.02x   1.02x   1.03x   1.03x   1.01x   1.01x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agow32: Fix installing of .def file.
Werner Koch [Thu, 18 Apr 2013 12:40:43 +0000 (14:40 +0200)]
w32: Fix installing of .def file.

* src/Makefile.am (install-def-file): Create libdir first.
--

Reported-by: LRN <lrn1986@gmail.com>
6 years agoRegister a DCO.
Werner Koch [Thu, 25 Apr 2013 11:00:16 +0000 (12:00 +0100)]
Register a DCO.

--