libgcrypt.git
5 years agoSHA-1: Add SSSE3 implementation
Jussi Kivilinna [Fri, 13 Dec 2013 10:47:56 +0000 (12:47 +0200)]
SHA-1: Add SSSE3 implementation

* cipher/Makefile.am: Add 'sha1-ssse3-amd64.c'.
* cipher/sha1-ssse3-amd64.c: New.
* cipher/sha1.c (USE_SSSE3): New.
(SHA1_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'.
(sha1_init) [USE_SSSE3]: Initialize 'use_ssse3'.
(transform): Rename to...
(_transform): this.
(transform): New.
* configure.ac [host=x86_64]: Add 'sha1-ssse3-amd64.lo'.
--

Patch adds SSSE3 implementation based on white paper "Improving the Performance
of the Secure Hash Algorithm (SHA-1)" at
 http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1

Benchmarks:

cpu                Old        New        Diff
Intel i5-4570      9.02 c/B   5.22 c/B   1.72x
Intel i5-2450M     12.27 c/B  7.24 c/B   1.69x
Intel Core2 T8100  7.94 c/B   6.76 c/B   1.17x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd missing register clearing in to SHA-256 and SHA-512 assembly
Jussi Kivilinna [Fri, 13 Dec 2013 14:14:05 +0000 (16:14 +0200)]
Add missing register clearing in to SHA-256 and SHA-512 assembly

* cipher/sha256-ssse3-amd64.S: Clear used XMM/YMM registers at return.
* cipher/sha512-avx-amd64.S: Ditto.
* cipher/sha512-avx2-bmi2-amd64.S: Ditto.
* cipher/sha512-ssse3-amd64.S: Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoUpdate license information
Werner Koch [Fri, 13 Dec 2013 13:52:21 +0000 (14:52 +0100)]
Update license information

* LICENSES: New.
* Makefile.am (EXTRA_DIST): Add LICENSES.
* AUTHORS: Add list of copyright holders.
* README: Reference AUTHORS.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agodoc: Minor manual fix.
Werner Koch [Fri, 13 Dec 2013 09:53:26 +0000 (10:53 +0100)]
doc: Minor manual fix.

--

5 years agoFix empty clobber in AVX2 assembly check
Jussi Kivilinna [Thu, 12 Dec 2013 22:00:08 +0000 (00:00 +0200)]
Fix empty clobber in AVX2 assembly check

* configure.ac (gcry_cv_gcc_inline_asm_avx2): Add "cc" as assembly
globber.
--

Appearently empty globbers only work in some cases on linux, and fail on
mingw32.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoFix W32 build
Jussi Kivilinna [Thu, 12 Dec 2013 21:53:28 +0000 (23:53 +0200)]
Fix W32 build

* random/rndw32.c (register_poll, slow_gatherer): Change gcry_xmalloc to
xmalloc, and gcry_xrealloc to xrealloc.
--

Patch fixes following errors:

../random/.libs/librandom.a(rndw32.o): In function `registry_poll':
.../libgcrypt/random/rndw32.c:434: undefined reference to `__gcry_USE_THE_UNDERSCORED_FUNCTION'
.../libgcrypt/random/rndw32.c:454: undefined reference to `__gcry_USE_THE_UNDERSCORED_FUNCTION'
../random/.libs/librandom.a(rndw32.o): In function `slow_gatherer':
.../random/rndw32.c:658: undefined reference to `__gcry_USE_THE_UNDERSCORED_FUNCTION'

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoSHA-512: Add AVX and AVX2 implementations for x86-64
Jussi Kivilinna [Thu, 12 Dec 2013 11:56:13 +0000 (13:56 +0200)]
SHA-512: Add AVX and AVX2 implementations for x86-64

* cipher/Makefile.am: Add 'sha512-avx-amd64.S' and
'sha512-avx2-bmi2-amd64.S'.
* cipher/sha512-avx-amd64.S: New.
* cipher/sha512-avx2-bmi2-amd64.S: New.
* cipher/sha512.c (USE_AVX, USE_AVX2): New.
(SHA512_CONTEXT) [USE_AVX]: Add 'use_avx'.
(SHA512_CONTEXT) [USE_AVX2]: Add 'use_avx2'.
(sha512_init, sha384_init) [USE_AVX]: Initialize 'use_avx'.
(sha512_init, sha384_init) [USE_AVX2]: Initialize 'use_avx2'.
[USE_AVX] (_gcry_sha512_transform_amd64_avx): New.
[USE_AVX2] (_gcry_sha512_transform_amd64_avx2): New.
(transform) [USE_AVX2]: Add call for AVX2 implementation.
(transform) [USE_AVX]: Add call for AVX implementation.
* configure.ac (HAVE_GCC_INLINE_ASM_BMI2): New check.
(sha512): Add 'sha512-avx-amd64.lo' and 'sha512-avx2-bmi2-amd64.lo'.
* doc/gcrypt.texi: Document 'intel-cpu' and 'intel-bmi2'.
* src/g10lib.h (HWF_INTEL_CPU, HWF_INTEL_BMI2): New.
* src/hwfeatures.c (hwflist): Add "intel-cpu" and "intel-bmi2".
* src/hwf-x86.c (detect_x86_gnuc): Check for HWF_INTEL_CPU and
HWF_INTEL_BMI2.
--

Patch adds fast AVX and AVX2 implementation of SHA-512 by Intel Corporation.
The assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
 http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs

Implementation is described in white paper
 "Fast SHA512 Implementations on Intel® Architecture Processors"
 http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementat$

Note: AVX implementation uses SHLD instruction to emulate RORQ, since it's
      faster on Intel Sandy-Bridge. However, on non-Intel CPUs SHLD is much
      slower than RORQ, so therefore AVX implementation is (for now) limited
      to Intel CPUs.
Note: AVX2 implementation also uses BMI2 instruction rorx, thus additional
      HWF flag.

Benchmarks:

cpu                 Old         SSSE3       AVX/AVX2   Old vs AVX/AVX2
                                                              vs SSSE3
Intel i5-4570       10.11 c/B    7.56 c/B   6.72 c/B   1.50x  1.12x
Intel i5-2450M      14.11 c/B   10.53 c/B   8.88 c/B   1.58x  1.18x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoSHA-512: Add SSSE3 implementation for x86-64
Jussi Kivilinna [Thu, 12 Dec 2013 10:43:08 +0000 (12:43 +0200)]
SHA-512: Add SSSE3 implementation for x86-64

* cipher/Makefile.am: Add 'sha512-ssse3-amd64.S'.
* cipher/sha512-ssse3-amd64.S: New.
* cipher/sha512.c (USE_SSSE3): New.
(SHA512_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'.
(sha512_init, sha384_init) [USE_SSSE3]: Initialize 'use_ssse3'.
[USE_SSSE3] (_gcry_sha512_transform_amd64_ssse3): New.
(transform) [USE_SSSE3]: Call SSSE3 implementation.
* configure.ac (sha512): Add 'sha512-ssse3-amd64.lo'.
--

Patch adds fast SSSE3 implementation of SHA-512 by Intel Corporation. The
assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
 http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs

Implementation is described in white paper
 "Fast SHA512 Implementations on Intel® Architecture Processors"
 http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/fast-sha512-implementations-ia-processors-paper.html

Benchmarks:

cpu                 Old         New         Diff
Intel i5-4570       10.11 c/B    7.56 c/B   1.33x
Intel i5-2450M      14.11 c/B   10.53 c/B   1.33x
Intel Core2 T8100   11.92 c/B   10.22 c/B   1.16x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoSHA-256: Add SSSE3 implementation for x86-64
Jussi Kivilinna [Wed, 11 Dec 2013 17:32:08 +0000 (19:32 +0200)]
SHA-256: Add SSSE3 implementation for x86-64

* cipher/Makefile.am: Add 'sha256-ssse3-amd64.S'.
* cipher/sha256-ssse3-amd64.S: New.
* cipher/sha256.c (USE_SSSE3): New.
(SHA256_CONTEXT) [USE_SSSE3]: Add 'use_ssse3'.
(sha256_init, sha224_init) [USE_SSSE3]: Initialize 'use_ssse3'.
(transform): Rename to...
(_transform): This.
[USE_SSSE3] (_gcry_sha256_transform_amd64_ssse3): New.
(transform): New.
* configure.ac (HAVE_INTEL_SYNTAX_PLATFORM_AS): New check.
(sha256): Add 'sha256-ssse3-amd64.lo'.
* doc/gcrypt.texi: Document 'intel-ssse3'.
* src/g10lib.h (HWF_INTEL_SSSE3): New.
* src/hwfeatures.c (hwflist): Add "intel-ssse3".
* src/hwf-x86.c (detect_x86_gnuc): Test for SSSE3.
--

Patch adds fast SSSE3 implementation of SHA-256 by Intel Corporation. The
assembly source is licensed under 3-clause BSD license, thus compatible
with LGPL2.1+. Original source can be accessed at:
 http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing#docs

Implementation is described in white paper
 "Fast SHA - 256 Implementations on Intel® Architecture Processors"
 http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html

Benchmarks:

cpu                 Old         New         Diff
Intel i5-4570       13.99 c/B   10.66 c/B   1.31x
Intel i5-2450M      21.53 c/B   15.79 c/B   1.36x
Intel Core2 T8100   20.84 c/B   15.07 c/B   1.38x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd a configuration file to disable hardware features.
Werner Koch [Thu, 12 Dec 2013 19:26:56 +0000 (20:26 +0100)]
Add a configuration file to disable hardware features.

* src/hwfeatures.c: Inclyde syslog.h and ctype.h.
(HWF_DENY_FILE): New.
(my_isascii): New.
(parse_hwf_deny_file): New.
(_gcry_detect_hw_features): Call it.

* src/mpicalc.c (main): Correctly initialize Libgcrypt.  Add options
"--print-config" and "--disable-hwf".

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoMove list of hardware features to hwfeatures.c.
Werner Koch [Thu, 12 Dec 2013 17:53:39 +0000 (18:53 +0100)]
Move list of hardware features to hwfeatures.c.

* src/global.c (hwflist, disabled_hw_features): Move to ..
* src/hwfeatures.c: here.
(_gcry_disable_hw_feature): New.
(_gcry_enum_hw_features): New.
(_gcry_detect_hw_features): Remove arg DISABLED_FEATURES.
* src/global.c (print_config, _gcry_vcontrol, global_init): Adjust
accordingly.
--

It is better to keep the hardware feature infor at one place.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoRemove macro hacks for internal vs. external functions. Part 2 and last.
Werner Koch [Thu, 12 Dec 2013 14:13:09 +0000 (15:13 +0100)]
Remove macro hacks for internal vs. external functions.  Part 2 and last.

* src/visibility.h: Remove remaining define/undef hacks for symbol
visibility.  Add macros to detect the use of the public functions.
Change all affected functions by replacing them by the x-macros.
* src/g10lib.h: Add internal prototypes.
(xtrymalloc, xtrycalloc, xtrymalloc_secure, xtrycalloc_secure)
(xtryrealloc, xtrystrdup, xmalloc, xcalloc, xmalloc_secure)
(xcalloc_secure, xrealloc, xstrdup, xfree): New macros.

--

The use of xmalloc/xtrymalloc/xfree is a more common pattern than the
gcry_free etc. functions.  Those functions behave like those defined
by C and thus for better readability we  use these macros and not
the underscore prefixed functions.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agorandom: Add a feature to close device file descriptors.
Werner Koch [Wed, 11 Dec 2013 15:59:41 +0000 (16:59 +0100)]
random: Add a feature to close device file descriptors.

* src/gcrypt.h.in (GCRYCTL_CLOSE_RANDOM_DEVICE): New.
* src/global.c (_gcry_vcontrol): Call _gcry_random_close_fds.
* random/random.c (_gcry_random_close_fds): New.
* random/random-csprng.c (_gcry_rngcsprng_close_fds): New.
* random/random-fips.c (_gcry_rngfips_close_fds): New.
* random/random-system.c (_gcry_rngsystem_close_fds): New.
* random/rndlinux.c (open_device): Add arg retry.
(_gcry_rndlinux_gather_random): Add mode to close open fds.

* tests/random.c (check_close_random_device): New.
(main): Call new test.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoFix last commit (9a37470c)
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
Fix last commit (9a37470c)

* src/secmem.c (lock_pool): Remove remaining line.  Reported by Ian
Goldberg.

5 years agoFix one-off memory leak when build with Linux capability support.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
Fix one-off memory leak when build with Linux capability support.

* src/secmem.c (lock_pool, secmem_init): Use cap_free.  Reported by
Mike Crowe <mac@mcrowe.com>.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoUpdate libtool to support Android.
David 'Digit' Turner [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
Update libtool to support Android.

* m4/libtool.m4: Add "linux*android*" case.  Taken from the libtool
repository.
--

The patch, which cleanly applies, is

  commit 8eeeb00daef8c4f720c9b79a0cdb89225d9909b6
  Author: David 'Digit' Turner <digit@google.com>
  Date:   Tue Oct 8 14:37:32 2013 -0700

  This patch adds proper Android support to libtool. The main
  issues are the following:

      - Versioned libraries are not supported by the platform and
        its build/packaging tools.

      - The dynamic linker is not GNU ld, there is no support for
        DT_RUNPATH.

      - Similarly, there is no ldconfig.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agotests: Speed up benchmarks in regression test mode.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
tests: Speed up benchmarks in regression test mode.

* tests/tsexp.c (check_extract_param): Fix compiler warning.
* tests/Makefile.am (TESTS_ENVIRONMENT): Set GCRYPT_IN_REGRESSION_TEST.
* tests/bench-slope.c (main): Speed up if in regression test mode.
* tests/benchmark.c (main): Ditto.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agotests: Add --csv option to bench-slope.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
tests: Add --csv option to bench-slope.

* tests/bench-slope.c (STR, STR2): New.
(cvs_mode): New.
(num_measurement_repetitions): New.  Replace use of
NUM_MEASUREMENT_REPETITIONS by this.
(current_section_name, current_algo_name, current_mode_name): New.
(bench_print_result_csv): New.
(bench_print_result_std): Rename from bench_print_result.
(bench_print_result): New. Divert depending on CSV_MODE.
(bench_print_header, bench_print_footer): take care of CSV_MODE.
(bench_print_algo, bench_print_mode): New.  Use them instead of
explicit printfs.
(main): Add options --csv and --repetitions.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agosexp: Allow long names and white space in gcry_sexp_extract_param.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
sexp: Allow long names and white space in gcry_sexp_extract_param.

* src/sexp.c (_gcry_sexp_vextract_param): Skip white space.  Support
long parameter names.
* tests/tsexp.c (check_extract_param): Add test cases for long parameter
names and white space.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoecc: Merge partly duplicated code.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
ecc: Merge partly duplicated code.

* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_sign): Factor A hashing out to ...
(_gcry_ecc_eddsa_compute_h_d): new function.
* cipher/ecc-misc.c (_gcry_ecc_compute_public): Use new function.
(reverse_buffer): Remove.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoecc: Remove unused internal function.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
ecc: Remove unused internal function.

* src/cipher-proto.h (gcry_pk_spec): Remove get_param.
* cipher/ecc-curves.c (_gcry_ecc_get_param_sexp): Merge in code from
_gcry_ecc_get_param.
(_gcry_ecc_get_param): Remove.
* cipher/ecc.c (_gcry_pubkey_spec_ecc): Remove _gcry_ecc_get_param.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoFix building on mingw32
Jussi Kivilinna [Fri, 6 Dec 2013 00:02:06 +0000 (02:02 +0200)]
Fix building on mingw32

* src/gcrypt-int.h: Include <types.h>.
--

'ulong' is not defined on W32, so we need to include "types.h" in
'gcrypt-int.h'.

 In file included from ../src/visibility.h:53:0,
                  from ../src/g10lib.h:39,
                  from compat.c:22:
 ../src/gcrypt-int.h:365:49: error: unknown type name 'ulong'

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoecc: Change OID for Ed25519.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
ecc: Change OID for Ed25519.

* cipher/ecc-curves.c (curve_aliased): Add more suitable OID for
Ed25519.
--

The formerly used OID has been assigned by Peter Gutmann for
Curve25519.  We better keep them distinct and assign a separate one
for Ed25519.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoRemove macro hacks for internal vs. external functions. Part 1.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
Remove macro hacks for internal vs. external functions.  Part 1.

* src/visibility.h: Remove almost all define/undef hacks for symbol
visibility.  Add macros to detect the use of the public functions.
Change all affected functions by prefixing them explicitly with an
underscore and change all internal callers to call the underscore
prefixed versions.  Provide convenience macros from sexp and mpi
functions.
* src/visibility.c: Change all functions to use only gpg_err_code_t
and translate to gpg_error_t only in visibility.c.
--

The use of the macro magic made if hard to follow the function calls
in the source.  It was not easy to see if an internal or external
function (as defined by visibility.c) was called.  The change is quite
large but hopefully makes  Libgcrypt easier to maintain.  Some
function have not yet been fixed; this will be done soon.

Because Libgcrypt does no make use of any other libgpg-error using
libraries it is useless to always translate between gpg_error_t and
gpg_err_code_t (i.e with and w/o error source identifier).  This
translation has no mostly be moved to the function wrappers in
visibility.c.  An additional advantage of using gpg_err_code_t is that
comparison can be done without using gpg_err_code().

I am sorry for that large patch, but a series of patches would
actually be more work to audit.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agompi: add inline assembly for x86-64
Jussi Kivilinna [Wed, 4 Dec 2013 16:17:22 +0000 (18:17 +0200)]
mpi: add inline assembly for x86-64

* mpi/longlong.h [__x86_64] (add_ssaaaa, sub_ddmmss, umul_ppmm)
(udiv_qrnnd, count_leading_zeros, count_trailing_zeros): New.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agompi: fix gcry_mpi_powm for negative base.
NIIBE Yutaka [Wed, 4 Dec 2013 01:03:57 +0000 (10:03 +0900)]
mpi: fix gcry_mpi_powm for negative base.

* mpi/mpi-pow.c (gcry_mpi_powm) [USE_ALGORITHM_SIMPLE_EXPONENTIATION]:
Fix for the case where BASE is negative.
* tests/mpitests.c (test_powm): Add a test case of (-17)^6 mod 19.

Signed-off-by: NIIBE Yutaka <gniibe@fsij.org>
5 years agoAdd build support for ppc64le.
Werner Koch [Tue, 22 Oct 2013 12:26:53 +0000 (14:26 +0200)]
Add build support for ppc64le.

* config.guess, config.sub: Update to latest version (2013-11-29).
* m4/libtool.m4: Add patches for ppc64le.
--

We don't want to update libtool, thus we use patches supplied by IBM.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agorijndael: fix compiler warning on aarch64
Jussi Kivilinna [Tue, 3 Dec 2013 12:03:09 +0000 (14:03 +0200)]
rijndael: fix compiler warning on aarch64

* cipher/rijndael.c (do_setkey): Use braces for empty if statement
instead of semicolon.
--

Patch fixes following warning:

 rijndael.c: In function 'do_setkey':
 rijndael.c:507:9: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
          ;
          ^

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd aarch64 (arm64) mpi assembly
Jussi Kivilinna [Tue, 3 Dec 2013 11:57:02 +0000 (13:57 +0200)]
Add aarch64 (arm64) mpi assembly

* mpi/aarch64/mpi-asm-defs.h: New.
* mpi/aarch64/mpih-add1.S: New.
* mpi/aarch64/mpih-mul1.S: New.
* mpi/aarch64/mpih-mul2.S: New.
* mpi/aarch64/mpih-mul3.S: New.
* mpi/aarch64/mpih-sub1.S: New.
* mpi/config.links [host=aarch64-*-*]: Add configguration for aarch64
assembly.
* mpi/longlong.h [__aarch64__] (add_ssaaaa, sub_ddmmss, umul_ppmm)
(count_leading_zeros): New.
--

Add preliminary aarch64 assembly implementations for mpi.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoecc: Use constant time point operation for Twisted Edwards.
Werner Koch [Mon, 2 Dec 2013 16:09:04 +0000 (17:09 +0100)]
ecc: Use constant time point operation for Twisted Edwards.

* mpi/ec.c (_gcry_mpi_ec_mul_point): Try to do a constant time
operation if needed.
* tests/benchmark.c (main): Add option --use-secmem.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoecc: Make gcry_pk_testkey work for Ed25519.
Werner Koch [Mon, 2 Dec 2013 15:18:25 +0000 (16:18 +0100)]
ecc: Make gcry_pk_testkey work for Ed25519.

* cipher/ecc-misc.c (_gcry_ecc_compute_public): Add optional args G
and d.  Change all callers.
* cipher/ecc.c (gen_y_2): Remove.
(check_secret_key): Use generic public key compute function.  Adjust
for use with Ed25519 and EdDSA.
(nist_generate_key): Do not use the compliant key thingy for Ed25519.
(ecc_check_secret_key): Make parameter parsing similar to the other
functions.
* cipher/ecc-curves.c (domain_parms): Zero prefix some parameters so
that _gcry_ecc_update_curve_param works correctly.
* tests/keygen.c (check_ecc_keys): Add "param" flag.  Check all
Ed25519 keys.

5 years agoecc: Fix eddsa point decompression.
Werner Koch [Mon, 2 Dec 2013 15:06:40 +0000 (16:06 +0100)]
ecc: Fix eddsa point decompression.

* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_recover_x): Fix the negative
case.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoecc: Fix gcry_mpi_ec_curve_point for Weierstrass.
Werner Koch [Fri, 29 Nov 2013 16:14:33 +0000 (17:14 +0100)]
ecc: Fix gcry_mpi_ec_curve_point for Weierstrass.

* mpi/ec.c (_gcry_mpi_ec_curve_point): Use correct equation.
(ec_pow3): New.
(ec_p_init): Always copy B.
--

The code path was obviously never tested.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agompi: Introduce 4 user flags for gcry_mpi_t.
Werner Koch [Thu, 28 Nov 2013 08:07:15 +0000 (09:07 +0100)]
mpi: Introduce 4 user flags for gcry_mpi_t.

* src/gcrypt.h.in (GCRYMPI_FLAG_USER1, GCRYMPI_FLAG_USER2)
(GCRYMPI_FLAG_USER3, GCRYMPI_FLAG_USER4): New.
* mpi/mpiutil.c (gcry_mpi_set_flag, gcry_mpi_clear_flag)
(gcry_mpi_get_flag, _gcry_mpi_free): Implement them.
(gcry_mpi_set_opaque): Keep user flags.
--

The space for the flags in the MPI struct is free and thus we can help
applications to make use of some flags.  This is for example useful to
indicate that an MPI needs special processing before use.

Signed-off-by: Werner Koch <wk@gnupg.org>
5 years agoFix armv3 compile error
Vladimir 'φ-coder/phcoder' Serbinenko [Fri, 29 Nov 2013 07:56:43 +0000 (08:56 +0100)]
Fix armv3 compile error

* mpi/longlong.h [__arm__ && __ARM_ARCH < 4] (umul_ppmm): Use
__AND_CLOBBER_CC instead of __CLOBBER_CC.
--

ARMv3 code uses __CLOBBER_CC at the end of clobber list while it should have
been __AND_CLOBBER_CC.

[jk: add changelog, rebase on libgcrypt repository]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agolonglong.h on mips with clang
Vladimir 'φ-coder/phcoder' Serbinenko [Fri, 22 Nov 2013 04:24:44 +0000 (05:24 +0100)]
longlong.h on mips with clang

* mpi/longlong.h [__mips__]: Use C-language version with clang.
--
clang doesn't recognise =l / =h assembly operand specifiers but apparently
handles C version well.

[jk: add changelog, rebase on libgcrypt repository, reformat changed line so it
 does not go over 80 characters]
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoCamellia: Tweaks for AES-NI implementations
Jussi Kivilinna [Sun, 24 Nov 2013 15:54:15 +0000 (17:54 +0200)]
Camellia: Tweaks for AES-NI implementations

* cipher/camellia-aesni-avx-amd64.S: Align stack to 16 bytes; tweak
key-setup for small speed up.
* cipher/camellia-aesni-avx2-amd64.S: Use vmovdqu even with aligned
stack; reorder vinsert128 instructions; use rbp for stack frame.
--

Use of 'vmovdqa' with ymm registers produces quite interesting scattering in
measurement timings. By using 'vmovdqu' instead, repeated measuments produce
more stable results.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd GMAC to MAC API
Jussi Kivilinna [Thu, 21 Nov 2013 19:34:21 +0000 (21:34 +0200)]
Add GMAC to MAC API

* cipher/Makefile.am: Add 'mac-gmac.c'.
* cipher/mac-gmac.c: New.
* cipher/mac-internal.h (gcry_mac_handle): Add 'u.gcm'.
(_gcry_mac_type_spec_gmac_aes, _gcry_mac_type_spec_gmac_twofish)
(_gcry_mac_type_spec_gmac_serpent, _gcry_mac_type_spec_gmac_seed)
(_gcry_mac_type_spec_gmac_camellia): New externs.
* cipher/mac.c (mac_list): Add GMAC specifications.
* doc/gcrypt.texi: Add mention of GMAC.
* src/gcrypt.h.in (gcry_mac_algos): Add GCM algorithms.
* tests/basic.c (check_one_mac): Add support for MAC IVs.
(check_mac): Add support for MAC IVs and add GMAC test vectors.
* tests/bench-slope.c (mac_bench): Iterate algorithm numbers to 499.
* tests/benchmark.c (mac_bench): Iterate algorithm numbers to 499.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: Move gcm_table initialization to setkey
Jussi Kivilinna [Wed, 20 Nov 2013 13:44:27 +0000 (15:44 +0200)]
GCM: Move gcm_table initialization to setkey

* cipher/cipher-gcm.c: Change all 'c->u_iv.iv' to
'c->u_mode.gcm.u_ghash_key.key'.
(_gcry_cipher_gcm_setkey): New.
(_gcry_cipher_gcm_initiv): Move ghash initialization to function above.
* cipher/cipher-internal.h (gcry_cipher_handle): Add
'u_mode.gcm.u_ghash_key'; Reorder 'u_mode.gcm' members for partial
clearing in gcry_cipher_reset.
(_gcry_cipher_gcm_setkey): New prototype.
* cipher/cipher.c (cipher_setkey): Add GCM setkey.
(cipher_reset): Clear 'u_mode' only partially for GCM.
--

GHASH tables can be generated at setkey time. No need to regenerate
for every new IV.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: Add support for split data buffers and online operation
Jussi Kivilinna [Wed, 20 Nov 2013 13:06:03 +0000 (15:06 +0200)]
GCM: Add support for split data buffers and online operation

* cipher/cipher-gcm.c (do_ghash_buf): Add buffering for less than
blocksize length input and padding handling.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt): Add handling
for AAD padding and check if data has already being padded.
(_gcry_cipher_gcm_authenticate): Check that AAD or data has not being
padded yet.
(_gcry_cipher_gcm_initiv): Clear padding marks.
(_gcry_cipher_gcm_tag): Add finalization and padding; Clear sensitive
data from cipher handle, since they are not used after generating tag.
* cipher/cipher-internal.h (gcry_cipher_handle): Add 'u_mode.gcm.macbuf',
'u_mode.gcm.mac_unused', 'u_mode.gcm.ghash_data_finalized' and
'u_mode.gcm.ghash_aad_finalized'.
* tests/basic.c (check_gcm_cipher): Rename to...
(_check_gcm_cipher): ...this and add handling for different buffer step
lengths; Enable per byte buffer testing.
(check_gcm_cipher): Call _check_gcm_cipher with different buffer step
sizes.
--

Until now, GCM was expecting full data to be input in one go. This patch adds
support for feeding data continuously (for encryption/decryption/aad).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: Use size_t for buffer sizes
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:27 +0000 (23:26 +0200)]
GCM: Use size_t for buffer sizes

* cipher/cipher-gcm.c (ghash, gcm_bytecounter_add, do_ghash_buf)
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt)
(_gcry_cipher_gcm_authenticate, _gcry_cipher_gcm_geniv)
(_gcry_cipher_gcm_tag): Use size_t for buffer lengths.
* cipher/cipher-internal.h (_gcry_cipher_gcm_encrypt)
(_gcry_cipher_gcm_decrypt, _gcry_cipher_gcm_authenticate): Use size_t
for buffer lengths.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: add FIPS mode restrictions
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:27 +0000 (23:26 +0200)]
GCM: add FIPS mode restrictions

* cipher/cipher-gcm.c (_gcry_cipher_gcm_encrypt)
(_gcry_cipher_gcm_get_tag): Do not allow using in FIPS mode is setiv
was invocated directly.
(_gcry_cipher_gcm_setiv): Rename to...
(_gcry_cipher_gcm_initiv): ...this.
(_gcry_cipher_gcm_setiv): New setiv function with check for FIPS mode.
[TODO] (_gcry_cipher_gcm_getiv): New.
* cipher/cipher-internal.h (gcry_cipher_handle): Add
'u_mode.gcm.disallow_encryption_because_of_setiv_in_fips_mode'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: Add clearing and checking of marks.tag
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:27 +0000 (23:26 +0200)]
GCM: Add clearing and checking of marks.tag

* cipher/cipher-gcm.c (_gcry_cipher_gcm_encrypt)
(_gcry_cipher_gcm_decrypt, _gcry_cipher_gcm_authenticate): Make sure
that tag has not been finalized yet.
(_gcry_cipher_gcm_setiv): Clear 'marks.tag'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: Add stack burning
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:26 +0000 (23:26 +0200)]
GCM: Add stack burning

* cipher/cipher-gcm.c (do_ghash, ghash): Return stack burn depth.
(setupM): Wipe 'tmp' buffer.
(do_ghash_buf): Wipe 'tmp' buffer and add stack burning.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd aggregated bulk processing for GCM on x86-64
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:26 +0000 (23:26 +0200)]
Add aggregated bulk processing for GCM on x86-64

* cipher/cipher-gcm.c [__x86_64__] (gfmul_pclmul_aggr4): New.
(ghash) [GCM_USE_INTEL_PCLMUL]: Add aggregated bulk processing
for __x86_64__.
(setupM) [__x86_64__]: Add initialization for aggregated bulk
processing.
--

Intel Haswell (x86-64):
Old:
AES     GCM enc |     0.990 ns/B     963.3 MiB/s      3.17 c/B
        GCM dec |     0.982 ns/B     970.9 MiB/s      3.14 c/B
       GCM auth |     0.711 ns/B    1340.8 MiB/s      2.28 c/B
New:
AES     GCM enc |     0.535 ns/B    1783.8 MiB/s      1.71 c/B
        GCM dec |     0.531 ns/B    1796.2 MiB/s      1.70 c/B
       GCM auth |     0.255 ns/B    3736.4 MiB/s     0.817 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: Tweak Intel PCLMUL ghash loop for small speed-up
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:26 +0000 (23:26 +0200)]
GCM: Tweak Intel PCLMUL ghash loop for small speed-up

* cipher/cipher-gcm.c (do_ghash): Mark 'inline'.
[GCM_USE_INTEL_PCLMUL] (do_ghash_pclmul): Rename to...
[GCM_USE_INTEL_PCLMUL] (gfmul_pclmul): ..this and make inline function.
(ghash) [GCM_USE_INTEL_PCLMUL]: Preload data before ghash-pclmul loop.
--

Intel Haswell:
Old:
AES     GCM enc |      1.12 ns/B     853.5 MiB/s      3.58 c/B
        GCM dec |      1.12 ns/B     853.4 MiB/s      3.58 c/B
       GCM auth |     0.843 ns/B    1131.5 MiB/s      2.70 c/B
New:
AES     GCM enc |     0.990 ns/B     963.3 MiB/s      3.17 c/B
        GCM dec |     0.982 ns/B     970.9 MiB/s      3.14 c/B
       GCM auth |     0.711 ns/B    1340.8 MiB/s      2.28 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: Use counter mode code for speed-up
Jussi Kivilinna [Wed, 20 Nov 2013 13:01:51 +0000 (15:01 +0200)]
GCM: Use counter mode code for speed-up

* cipher/cipher-gcm.c (ghash): Add process for multiple blocks.
(gcm_bytecounter_add, gcm_add32_be128, gcm_check_datalen)
(gcm_check_aadlen_or_ivlen, do_ghash_buf): New functions.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt)
(_gcry_cipher_gcm_authenticate, _gcry_cipher_gcm_set_iv)
(_gcry_cipher_gcm_tag): Adjust to use above new functions and
counter mode functions for encryption/decryption.
* cipher/cipher-internal.h (gcry_cipher_handle): Remove 'length'; Add
'u_mode.gcm.(addlen|datalen|tagiv|datalen_over_limits)'.
(_gcry_cipher_gcm_setiv): Return gcry_err_code_t.
* cipher/cipher.c (cipher_setiv): Return error code.
(_gcry_cipher_setiv): Handle error code from 'cipher_setiv'.
--

Patch changes GCM to use counter mode code for bulk speed up and also adds data
length checks as given in NIST SP-800-38D section 5.2.1.1.

Bit length requirements from section 5.2.1.1:

 len(plaintext) <= 2^39-256 bits == 2^36-32 bytes == 2^32-2 blocks
 len(aad) <= 2^64-1 bits ~= 2^61-1 bytes
 len(iv) <= 2^64-1 bit ~= 2^61-1 bytes

Intel Haswell:
Old:
AES     GCM enc |      3.00 ns/B     317.4 MiB/s      9.61 c/B
        GCM dec |      1.96 ns/B     486.9 MiB/s      6.27 c/B
       GCM auth |     0.848 ns/B    1124.7 MiB/s      2.71 c/B
New:
AES     GCM enc |      1.12 ns/B     851.8 MiB/s      3.58 c/B
        GCM dec |      1.12 ns/B     853.7 MiB/s      3.57 c/B
       GCM auth |     0.843 ns/B    1131.4 MiB/s      2.70 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd Intel PCLMUL acceleration for GCM
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:26 +0000 (23:26 +0200)]
Add Intel PCLMUL acceleration for GCM

* cipher/cipher-gcm.c (fillM): Rename...
(do_fillM): ...to this.
(ghash): Remove.
(fillM): New macro.
(GHASH): Use 'do_ghash' instead of 'ghash'.
[GCM_USE_INTEL_PCLMUL] (do_ghash_pclmul): New.
(ghash): New.
(setupM): New.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt)
(_gcry_cipher_gcm_authenticate, _gcry_cipher_gcm_setiv)
(_gcry_cipher_gcm_tag): Use 'ghash' instead of 'GHASH' and
'c->u_mode.gcm.u_tag.tag' instead of 'c->u_tag.tag'.
* cipher/cipher-internal.h (GCM_USE_INTEL_PCLMUL): New.
(gcry_cipher_handle): Move 'u_tag' and 'gcm_table' under
'u_mode.gcm'.
* configure.ac (pclmulsupport, gcry_cv_gcc_inline_asm_pclmul): New.
* src/g10lib.h (HWF_INTEL_PCLMUL): New.
* src/global.c: Add "intel-pclmul".
* src/hwf-x86.c (detect_x86_gnuc): Add check for Intel PCLMUL.
--

Speed-up GCM for Intel CPUs.

Intel Haswell (x86-64):
Old:
AES     GCM enc |      5.17 ns/B     184.4 MiB/s     16.55 c/B
        GCM dec |      4.38 ns/B     218.0 MiB/s     14.00 c/B
       GCM auth |      3.17 ns/B     300.4 MiB/s     10.16 c/B
New:
AES     GCM enc |      3.01 ns/B     317.2 MiB/s      9.62 c/B
        GCM dec |      1.96 ns/B     486.9 MiB/s      6.27 c/B
       GCM auth |     0.848 ns/B    1124.8 MiB/s      2.71 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoGCM: GHASH optimizations
Jussi Kivilinna [Tue, 19 Nov 2013 21:26:26 +0000 (23:26 +0200)]
GCM: GHASH optimizations

* cipher/cipher-gcm.c [GCM_USE_TABLES] (gcmR, ghash): Replace with new.
[GCM_USE_TABLES] [GCM_TABLES_USE_U64] (bshift, fillM, do_ghash): New.
[GCM_USE_TABLES] [!GCM_TABLES_USE_U64] (bshift, fillM): Replace with
new.
[GCM_USE_TABLES] [!GCM_TABLES_USE_U64] (do_ghash): New.
(_gcry_cipher_gcm_tag): Remove extra memcpy to outbuf and use
buf_eq_const for comparing authentication tag.
* cipher/cipher-internal.h (gcry_cipher_handle): Different 'gcm_table'
for 32-bit and 64-bit platforms.
--

Patch improves GHASH speed.

Intel Haswell (x86-64):
Old:
       GCM auth |     26.22 ns/B     36.38 MiB/s     83.89 c/B
New:
       GCM auth |      3.18 ns/B     300.0 MiB/s     10.17 c/B

Intel Haswell (mingw32):
Old:
       GCM auth |     27.27 ns/B     34.97 MiB/s     87.27 c/B
New:
       GCM auth |      7.58 ns/B     125.7 MiB/s     24.27 c/B

Cortex-A8:
Old:
       GCM auth |     231.4 ns/B      4.12 MiB/s     233.3 c/B
New:
       GCM auth |     30.82 ns/B     30.94 MiB/s     31.07 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd some documentation for GCM mode
Jussi Kivilinna [Wed, 20 Nov 2013 14:21:19 +0000 (16:21 +0200)]
Add some documentation for GCM mode

* doc/gcrypt.texi: Add mention of GCM mode.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoInitial implementation of GCM
Dmitry Eremin-Solenikov [Tue, 19 Nov 2013 21:26:26 +0000 (23:26 +0200)]
Initial implementation of GCM

* cipher/Makefile.am: Add 'cipher-gcm.c'.
* cipher/cipher-ccm.c (_gcry_ciphert_ccm_set_lengths)
(_gcry_cipher_ccm_authenticate, _gcry_cipher_ccm_tag)
(_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt): Change
'c->u_mode.ccm.tag' to 'c->marks.tag'.
* cipher/cipher-gcm.c: New.
* cipher/cipher-internal.h (GCM_USE_TABLES): New.
(gcry_cipher_handle): Add 'marks.tag', 'u_tag', 'length' and
'gcm_table'; Remove 'u_mode.ccm.tag'.
(_gcry_cipher_gcm_encrypt, _gcry_cipher_gcm_decrypt)
(_gcry_cipher_gcm_setiv, _gcry_cipher_gcm_authenticate)
(_gcry_cipher_gcm_get_tag, _gcry_cipher_gcm_check_tag): New.
* cipher/cipher.c (_gcry_cipher_open_internal, cipher_setkey)
(cipher_encrypt, cipher_decrypt, _gcry_cipher_authenticate)
(_gcry_cipher_gettag, _gcry_cipher_checktag): Add GCM mode handling.
* src/gcrypt.h.in (gcry_cipher_modes): Add GCRY_CIPHER_MODE_GCM.
(GCRY_GCM_BLOCK_LEN): New.
* tests/basic.c (check_gcm_cipher): New.
(check_ciphers): Add GCM check.
(check_cipher_modes): Call 'check_gcm_cipher'.
* tests/bench-slope.c (bench_gcm_encrypt_do_bench)
(bench_gcm_decrypt_do_bench, bench_gcm_authenticate_do_bench)
(gcm_encrypt_ops, gcm_decrypt_ops, gcm_authenticate_ops): New.
(cipher_modes): Add GCM enc/dec/auth.
(cipher_bench_one): Limit GCM to block ciphers with 16 byte block-size.
* tests/benchmark.c (cipher_bench): Add GCM.
--

Currently it is still quite slow.

Still no support for generate_iv(). Is it really necessary?

TODO: Merge/reuse cipher-internal state used by CCM.

Changelog entry will be present in final patch submission.

Changes since v1:
- 6x-7x speedup.
- added bench-slope support

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
[jk: mangle new file throught 'indent -nut']
[jk: few fixes]
[jk: changelog]

5 years agoCamellia: fix compiler warning
Jussi Kivilinna [Mon, 18 Nov 2013 18:27:35 +0000 (20:27 +0200)]
Camellia: fix compiler warning

* cipher/camellia-glue.c (camellia_setkey): Use braces around empty if
statement.
--

Patch silences following warning:

 camellia-glue.c: In function 'camellia_setkey':
 camellia-glue.c:183:5: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki-fi>
5 years agoTweak Camellia-AVX key-setup for small speed-up
Jussi Kivilinna [Tue, 19 Nov 2013 13:48:32 +0000 (15:48 +0200)]
Tweak Camellia-AVX key-setup for small speed-up

* cipher/camellia-aesni-avx-amd64.S (camellia_f): Merge S-function output
rotation with P-function.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
5 years agoAdd CMAC (Cipher-based MAC) to MAC API
Jussi Kivilinna [Thu, 14 Nov 2013 12:10:27 +0000 (14:10 +0200)]
Add CMAC (Cipher-based MAC) to MAC API

* cipher/Makefile.am: Add 'cipher-cmac.c' and 'mac-cmac.c'.
* cipher/cipher-cmac.c: New.
* cipher/cipher-internal.h (gcry_cipher_handle.u_mode): Add 'cmac'.
* cipher/cipher.c (gcry_cipher_open): Rename to...
(_gcry_cipher_open_internal): ...this and add CMAC.
(gcry_cipher_open): New wrapper that disallows use of internal
modes (CMAC) from outside.
(cipher_setkey, cipher_encrypt, cipher_decrypt)
(_gcry_cipher_authenticate, _gcry_cipher_gettag)
(_gcry_cipher_checktag): Add handling for CMAC mode.
(cipher_reset): Do not reset 'marks.key' and do not clear subkeys in
'u_mode' in CMAC mode.
* cipher/mac-cmac.c: New.
* cipher/mac-internal.h: Add CMAC support and algorithms.
* cipher/mac.c: Add CMAC algorithms.
* doc/gcrypt.texi: Add documentation for CMAC.
* src/cipher.h (gcry_cipher_internal_modes): New.
(_gcry_cipher_open_internal, _gcry_cipher_cmac_authenticate)
(_gcry_cipher_cmac_get_tag, _gcry_cipher_cmac_check_tag)
(_gcry_cipher_cmac_set_subkeys): New prototypes.
* src/gcrypt.h.in (gcry_mac_algos): Add CMAC algorithms.
* tests/basic.c (check_mac): Add CMAC test vectors.
--

Patch adds CMAC (Cipher-based MAC) as defined in RFC 4493 and NIST
Special Publication 800-38B.

Internally CMAC is added to cipher module, but is available to outside
only through MAC API.

[v2]:
 - Add documentation.
[v3]:
 - CMAC algorithm ids start from 201.
 - Coding style fixes.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd new MAC API, initially with HMAC
Jussi Kivilinna [Fri, 15 Nov 2013 10:28:07 +0000 (12:28 +0200)]
Add new MAC API, initially with HMAC

* cipher/Makefile.am: Add 'mac.c', 'mac-internal.h' and 'mac-hmac.c'.
* cipher/bufhelp.h (buf_eq_const): New.
* cipher/cipher-ccm.c (_gcry_cipher_ccm_tag): Use 'buf_eq_const' for
constant-time compare.
* cipher/mac-hmac.c: New.
* cipher/mac-internal.h: New.
* cipher/mac.c: New.
* doc/gcrypt.texi: Add documentation for MAC API.
* src/gcrypt-int.h [GPG_ERROR_VERSION_NUMBER < 1.13]
(GPG_ERR_MAC_ALGO): New.
* src/gcrypt.h.in (gcry_mac_handle, gcry_mac_hd_t, gcry_mac_algos)
(gcry_mac_flags, gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name)
(gcry_mac_reset, gcry_mac_test_algo): New.
* src/libgcrypt.def (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/libgcrypt.vers (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/visibility.c (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* src/visibility.h (gcry_mac_open, gcry_mac_close, gcry_mac_ctl)
(gcry_mac_algo_info, gcry_mac_setkey, gcry_mac_setiv, gcry_mac_write)
(gcry_mac_read, gcry_mac_verify, gcry_mac_get_algo_maclen)
(gcry_mac_get_algo_keylen, gcry_mac_algo_name, gcry_mac_map_name): New.
* tests/basic.c (check_one_mac, check_mac): New.
(main): Call 'check_mac'.
* tests/bench-slope.c (bench_print_header, bench_print_footer): Allow
variable algorithm name width.
(_cipher_bench, hash_bench): Update to above change.
(bench_hash_do_bench): Add 'gcry_md_reset'.
(bench_mac_mode, bench_mac_init, bench_mac_free, bench_mac_do_bench)
(mac_ops, mac_modes, mac_bench_one, _mac_bench, mac_bench): New.
(main): Add 'mac' benchmark options.
* tests/benchmark.c (mac_repetitions, mac_bench): New.
(main): Add 'mac' benchmark options.
--

Add MAC API, with HMAC algorithms. Internally uses HMAC functionality of the
MD module.

[v2]:
 - Add documentation for MAC API.
 - Change length argument for gcry_mac_read from size_t to size_t* for
   returning number of written bytes.
[v3]:
 - HMAC algorithm ids start from 101.
 - Fix coding style for new files.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoUse correct blocksize of 32 bytes for GOSTR3411-94 HMAC
Jussi Kivilinna [Sat, 16 Nov 2013 09:07:09 +0000 (11:07 +0200)]
Use correct blocksize of 32 bytes for GOSTR3411-94 HMAC

* cipher/md.c (md_open): Set macpads_Bsize to 32 for
GCRY_MD_GOST24311_94.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocipher: use size_t for internal buffer lengths
Jussi Kivilinna [Fri, 15 Nov 2013 14:23:00 +0000 (16:23 +0200)]
cipher: use size_t for internal buffer lengths

* cipher/arcfour.c (do_encrypt_stream, encrypt_stream): Use 'size_t'
for buffer lengths.
* cipher/blowfish.c (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec): Ditto.
* cipher/camellia-glue.c (_gcry_camellia_ctr_enc)
(_gcry_camellia_cbc_dec, _gcry_blowfish_cfb_dec): Ditto.
* cipher/cast5.c (_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec)
(_gcry_cast5_cfb_dec): Ditto.
* cipher/cipher-aeswrap.c (_gcry_cipher_aeswrap_encrypt)
(_gcry_cipher_aeswrap_decrypt): Ditto.
* cipher/cipher-cbc.c (_gcry_cipher_cbc_encrypt)
(_gcry_cipher_cbc_decrypt): Ditto.
* cipher/cipher-ccm.c (_gcry_cipher_ccm_encrypt)
(_gcry_cipher_ccm_decrypt): Ditto.
* cipher/cipher-cfb.c (_gcry_cipher_cfb_encrypt)
(_gcry_cipher_cfb_decrypt): Ditto.
* cipher/cipher-ctr.c (_gcry_cipher_ctr_encrypt): Ditto.
* cipher/cipher-internal.h (gcry_cipher_handle->bulk)
(_gcry_cipher_cbc_encrypt, _gcry_cipher_cbc_decrypt)
(_gcry_cipher_cfb_encrypt, _gcry_cipher_cfb_decrypt)
(_gcry_cipher_ofb_encrypt, _gcry_cipher_ctr_encrypt)
(_gcry_cipher_aeswrap_encrypt, _gcry_cipher_aeswrap_decrypt)
(_gcry_cipher_ccm_encrypt, _gcry_cipher_ccm_decrypt): Ditto.
* cipher/cipher-ofb.c (_gcry_cipher_cbc_encrypt): Ditto.
* cipher/cipher-selftest.h (gcry_cipher_bulk_cbc_dec_t)
(gcry_cipher_bulk_cfb_dec_t, gcry_cipher_bulk_ctr_enc_t): Ditto.
* cipher/cipher.c (cipher_setkey, cipher_setiv, do_ecb_crypt)
(do_ecb_encrypt, do_ecb_decrypt, cipher_encrypt)
(cipher_decrypt): Ditto.
* cipher/rijndael.c (_gcry_aes_ctr_enc, _gcry_aes_cbc_dec)
(_gcry_aes_cfb_dec, _gcry_aes_cbc_enc, _gcry_aes_cfb_enc): Ditto.
* cipher/salsa20.c (salsa20_setiv, salsa20_do_encrypt_stream)
(salsa20_encrypt_stream, salsa20r12_encrypt_stream): Ditto.
* cipher/serpent.c (_gcry_serpent_ctr_enc, _gcry_serpent_cbc_dec)
(_gcry_serpent_cfb_dec): Ditto.
* cipher/twofish.c (_gcry_twofish_ctr_enc, _gcry_twofish_cbc_dec)
(_gcry_twofish_cfb_dec): Ditto.
* src/cipher-proto.h (gcry_cipher_stencrypt_t)
(gcry_cipher_stdecrypt_t, cipher_setiv_fuct_t): Ditto.
* src/cipher.h (_gcry_aes_cfb_enc, _gcry_aes_cfb_dec)
(_gcry_aes_cbc_enc, _gcry_aes_cbc_dec, _gcry_aes_ctr_enc)
(_gcry_blowfish_cfb_dec, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_ctr_enc, _gcry_cast5_cfb_dec, _gcry_cast5_cbc_dec)
(_gcry_cast5_ctr_enc, _gcry_camellia_cfb_dec, _gcry_camellia_cbc_dec)
(_gcry_camellia_ctr_enc, _gcry_serpent_cfb_dec, _gcry_serpent_cbc_dec)
(_gcry_serpent_ctr_enc, _gcry_twofish_cfb_dec, _gcry_twofish_cbc_dec)
(_gcry_twofish_ctr_enc): Ditto.
--

On 64-bit platforms, cipher module internally converts 64-bit size_t values
to 32-bit unsigned integers.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoCamellia: Add AVX/AES-NI key setup
Jussi Kivilinna [Fri, 15 Nov 2013 14:23:00 +0000 (16:23 +0200)]
Camellia: Add AVX/AES-NI key setup

* cipher/camellia-aesni-avx-amd64.S (key_bitlength, key_table): New
order of fields in ctx.
(camellia_f, vec_rol128, vec_ror128): New macros.
(__camellia_avx_setup128, __camellia_avx_setup256)
(_gcry_camellia_aesni_avx_keygen): New functions.
* cipher/camellia-aesni-avx2-amd64.S (key_bitlength, key_table): New
order of fields in ctx.
* cipher/camellia-arm.S (CAMELLIA_TABLE_BYTE_LEN, key_length): Remove
unused macros.
* cipher/camellia-glue.c (CAMELLIA_context): Move keytable to head for
better alignment; Make 'use_aesni_avx' and 'use_aesni_avx2' bitfield
members.
[USE_AESNI_AVX] (_gcry_camellia_aesni_avx_keygen): New prototype.
(camellia_setkey) [USE_AESNI_AVX || USE_AESNI_AVX2]: Read hw features
to variable 'hwf' and match features from it.
(camellia_setkey) [USE_AESNI_AVX]: Use AES-NI/AVX key setup if
available.
--

Use AVX/AES-NI for key-setup for small speed-up.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAvoid unneeded stack burning with AES-NI and reduce number of 'decryption_prepared...
Jussi Kivilinna [Fri, 15 Nov 2013 14:23:00 +0000 (16:23 +0200)]
Avoid unneeded stack burning with AES-NI and reduce number of 'decryption_prepared' checks

* cipher/rijndael.c (RIJNDAEL_context): Make 'decryption_prepared',
'use_padlock' and 'use_aesni' 1-bit members in bitfield.
(do_setkey): Move 'hwfeatures' inside [USE_AESNI || USE_PADLOCK].
(do_aesni_enc_aligned): Rename to...
(do_aesni_enc): ...this, as function does not require aligned input.
(do_aesni_dec_aligned): Rename to...
(do_aesni_dec): ...this, as function does not require aligned input.
(do_aesni): Remove.
(rijndael_encrypt): Call 'do_aesni_enc' instead of 'do_aesni'.
(rijndael_decrypt): Call 'do_aesni_dec' instead of 'do_aesni'.
(check_decryption_preparation): New.
(do_decrypt): Remove 'decryption_prepared' check.
(rijndael_decrypt): Ditto and call 'check_decryption_preparation'.
(_gcry_aes_cbc_dec): Ditto.
(_gcry_aes_cfb_enc): Add 'burn_depth' and burn stack only when needed.
(_gcry_aes_cbc_enc): Ditto.
(_gcry_aes_ctr_enc): Ditto.
(_gcry_aes_cfb_dec): Ditto.
(_gcry_aes_cbc_dec): Ditto and correct clearing of 'savebuf'.
--

Patch is mostly about reducing overhead for short buffers.

Results on Intel i5-4570:

After:
 $ tests/benchmark --cipher-repetitions 1000 --cipher-with-keysetup cipher aes
 Running each test 1000 times.
                 ECB/Stream         CBC             CFB             OFB             CTR             CCM
              --------------- --------------- --------------- --------------- --------------- ---------------
 AES            480ms   540ms  1750ms   300ms  1630ms   300ms  1640ms  1640ms   350ms   350ms  2130ms  2140ms

Before:
 $ tests/benchmark --cipher-repetitions 1000 --cipher-with-keysetup cipher aes
 Running each test 1000 times.
                 ECB/Stream         CBC             CFB             OFB             CTR             CCM
              --------------- --------------- --------------- --------------- --------------- ---------------
 AES            520ms   590ms  1760ms   310ms  1640ms   310ms  1610ms  1600ms   360ms   360ms  2150ms  2160ms

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agomd: Fix hashing for data >= 256 GB
Werner Koch [Thu, 14 Nov 2013 22:40:41 +0000 (23:40 +0100)]
md: Fix hashing for data >= 256 GB

* cipher/hash-common.h (gcry_md_block_ctx): Add "nblocks_high".
* cipher/hash-common.c (_gcry_md_block_write): Bump NBLOCKS_HIGH.
* cipher/md4.c (md4_init, md4_final): Take care of NBLOCKS_HIGH.
* cipher/md5.c (md5_init, md5_final): Ditto.
* cipher/rmd160.c (_gcry_rmd160_init, rmd160_final): Ditto.
* cipher/sha1.c (sha1_init, sha1_final): Ditto.
* cipher/sha256.c (sha256_init, sha224_init, sha256_final): Ditto.
* cipher/sha512.c (sha512_init, sha384_init, sha512_final): Ditto.
* cipher/tiger.c (do_init, tiger_final): Ditto.
* cipher/whirlpool.c (whirlpool_final): Ditto.

* cipher/md.c (gcry_md_algo_info): Add GCRYCTL_SELFTEST.
(_gcry_md_selftest): Return "not implemented" as required.
* tests/hashtest.c: New.
* tests/genhashdata.c: New.
* tests/Makefile.am (TESTS): Add hashtest.
(noinst_PROGRAMS): Add genhashdata
--

Problem found by Denis Corbin and analyzed by Yuriy Kaminskiy.

sha512 and whirlpool should not have this problem because they use 64
bit types for counting the blocks. However, a similar fix has been
employed to allow for really huge sizes - despite that it will be very
hard to test them.

The test vectors have been produced by sha{1,224,256}sum and the
genhashdata tool.  A sequence of 'a' is used for them because a test
using one million 'a' is commonly used for test vectors.  More test
vectors are required.  Running the large tests needs to be done
manual for now:

  ./hashtest --gigs 256

tests all algorithms,

  ./hashtest --gigs 256 sha1 sha224 sha256

only the given ones.  A configure option to include these test in the
standard regression suite will be useful.  The tests will take looong.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Fix key generation for a plain Ed25519 key.
Christian Grothoff [Mon, 11 Nov 2013 15:04:30 +0000 (16:04 +0100)]
ecc: Fix key generation for a plain Ed25519 key.

* cipher/ecc.c (nist_generate_key): Use custom code for ED25519.
--

I wish there would a an RFC for Curve25519 - the description in the
paper is easy to misunderstand for a non-mathematician.  Source code
and a paper are nice but a proper description (like those in the HAC)
would be better.  Problem spotted by Florian Dold.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Fix some memory leaks
Christian Grothoff [Mon, 11 Nov 2013 15:04:30 +0000 (16:04 +0100)]
ecc: Fix some memory leaks

* cipher/ecc-curves.c (_gcry_mpi_ec_new): Free ec->b before assigning.
* cipher/ecc.c (nist_generate_key): Release Q.
* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_genkey): Ditto.
--

_gcry_mpi_ec_new: Fixing memory leak detected with valgrind; if 'b' is
non-NULL, the code in ec_p_init (ec.c:379) already makes a copy of
'b', so before we clobber ctx->b here, we need to at least release the
old value (however, it would of course be nicer to not first make a
copy of b in the first place, but this is the most localized change to
get rid of the memory leak).

nist_generate_key: Fixing rather obvious local leak; Q is first
initialized, then used, copied into the result but never released.

6 years agoecc: Change keygrip computation for Ed25519+EdDSA.
Werner Koch [Mon, 11 Nov 2013 18:14:40 +0000 (19:14 +0100)]
ecc: Change keygrip computation for Ed25519+EdDSA.

* cipher/ecc.c (compute_keygrip): Rework.
* cipher/ecc-eddsa.c (_gcry_ecc_eddsa_ensure_compact): New.
* cipher/ecc-curves.c (_gcry_ecc_update_curve_param): New.
* tests/keygrip.c (key_grips): Add flag param and test cases for
Ed25519.
--

The keygrip for Ed25519+EdDSA has not yet been used - thus it is
possible to change it.  Using the compact representation saves us the
recovering of x from the standard representation.  Compacting is
basically free.

6 years agompi: Add special format GCRYMPI_FMT_OPAQUE.
Werner Koch [Mon, 11 Nov 2013 10:07:56 +0000 (11:07 +0100)]
mpi: Add special format GCRYMPI_FMT_OPAQUE.

* src/gcrypt.h.in (GCRYMPI_FMT_OPAQUE): New.
(_gcry_sexp_nth_opaque_mpi): Remove.
* src/sexp.c (gcry_sexp_nth_mpi): Add support for GCRYMPI_FMT_OPAQUE.
(_gcry_sexp_vextract_param): Replace removed function by
GCRYMPI_FMT_OPAQUE.
--

Using a new formatting mode is easier than to add a dedicated
extraction function for opaque MPIs.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoFix error output in CTR selftest
Jussi Kivilinna [Sun, 10 Nov 2013 19:32:29 +0000 (21:32 +0200)]
Fix error output in CTR selftest

* cipher/cipher-selftest.c (_gcry_selftest_helper_ctr): Change
fprintf(stderr,...) to syslog(); Correct error output for bulk
IV check, plaintext mismatch => ciphertext mismatch.
--

The 'fprintf's were debugging leftover that leaked into commit.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoFix Serpent-AVX2 and Camellia-AVX2 counter modes
Jussi Kivilinna [Sat, 9 Nov 2013 20:39:19 +0000 (22:39 +0200)]
Fix Serpent-AVX2 and Camellia-AVX2 counter modes

* cipher/camellia-aesni-avx2-amd64.S
(_gcry_camellia_aesni_avx2_ctr_enc): Byte-swap before checking for
overflow handling.
* cipher/camellia-glue.c (selftest_ctr_128, selftest_cfb_128)
(selftest_cbc_128): Add 16 to nblocks.
* cipher/cipher-selftest.c (_gcry_selftest_helper_ctr): Add test with
non-overflowing IV and modify overflow IV to detect broken endianness
handling.
* cipher/serpent-avx2-amd64.S (_gcry_serpent_avx2_ctr_enc): Byte-swap
before checking for overflow handling; Fix crazy-mixed-endian IV
construction to big-endian.
* cipher/serpent.c (selftest_ctr_128, selftest_cfb_128)
(selftest_cbc_128): Add 8 to nblocks.
--

The selftest for CTR was setting counter-IV to all '0xff' except last byte.
This had the effect that even with broken endianness handling Serpent-AVX2 and
Camellia-AVX2 passed the tests.

Patch corrects the CTR selftest and fixes the broken implementations.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agocipher/gost28147: optimization: use precomputed S-box tables
Sergey V [Sat, 9 Nov 2013 16:10:10 +0000 (20:10 +0400)]
cipher/gost28147: optimization: use precomputed S-box tables

* cipher/gost.h (GOST28147_context): Remove unneeded subst and
subst_set members.
* cipher/gost28147.c (max): Remove unneeded macro.
(test_sbox): Replace with new precomputed tables.
(gost_set_subst): Remove function.
(gost_val): Use new S-box tables.
(gost_encrypt_block, gost_decrypt_block): Tweak to use new ctx and
S-box tables.
--

Use generated 8->8 S-boxes with precomputed bitwise shifts and
bitwise rotations. So in the round function gost_val() we no need
to do this operations.

Before this patch:

 GOST28147      |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     24.00 ns/B     39.74 MiB/s         - c/B
        ECB dec |     26.41 ns/B     36.11 MiB/s         - c/B
        CBC enc |     24.57 ns/B     38.81 MiB/s         - c/B
        CBC dec |     26.58 ns/B     35.88 MiB/s         - c/B
        CFB enc |     24.79 ns/B     38.46 MiB/s         - c/B
        CFB dec |     24.72 ns/B     38.57 MiB/s         - c/B
        OFB enc |     24.38 ns/B     39.12 MiB/s         - c/B
        OFB dec |     24.35 ns/B     39.16 MiB/s         - c/B
        CTR enc |     24.83 ns/B     38.41 MiB/s         - c/B
        CTR dec |     25.27 ns/B     37.73 MiB/s         - c/B

After:

 GOST28147      |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     16.29 ns/B     58.55 MiB/s         - c/B
        ECB dec |     16.30 ns/B     58.50 MiB/s         - c/B
        CBC enc |     16.94 ns/B     56.29 MiB/s         - c/B
        CBC dec |     16.81 ns/B     56.72 MiB/s         - c/B
        CFB enc |     17.13 ns/B     55.66 MiB/s         - c/B
        CFB dec |     16.84 ns/B     56.63 MiB/s         - c/B
        OFB enc |     16.69 ns/B     57.13 MiB/s         - c/B
        OFB dec |     16.71 ns/B     57.08 MiB/s         - c/B
        CTR enc |     17.01 ns/B     56.06 MiB/s         - c/B
        CTR dec |     17.05 ns/B     55.93 MiB/s         - c/B

Signed-off-by: Sergey V <sftp.mtuci@gmail.com>
6 years agoFix tail handling for AES-NI counter mode
Jussi Kivilinna [Sat, 9 Nov 2013 19:04:14 +0000 (21:04 +0200)]
Fix tail handling for AES-NI counter mode

* cipher/rijndael.c (do_aesni_ctr): Fix outputting of updated
counter-IV.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoecc: Improve gcry_pk_get_curve.
Werner Koch [Fri, 8 Nov 2013 16:41:42 +0000 (17:41 +0100)]
ecc: Improve gcry_pk_get_curve.

* cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Factor some code out
to ..
(find_domain_parms_idx): new.
(_gcry_ecc_get_curve): Find by curve name on error.
--

This change allows the use of an input with just the curve name which
can be used to test whether a given curve has been implemented.  Is is
required because due to the "param" flag change the caller usually
does not have the key parameters available.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agocipher: Avoid signed divisions in idea.c
Werner Koch [Fri, 8 Nov 2013 16:21:02 +0000 (17:21 +0100)]
cipher: Avoid signed divisions in idea.c

* cipher/idea.c (mul_inv): Use unsigned division.
--

Reported-by: Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@gmail.com>
  Hello, all. While compiling in an environment with only libgcc
  subset for ARM, I found out that idea.c uses signed divisions:
  Reading the code this seems to be unintended. Inlined patch replaces
  them with more appropriate unsigned division.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Implement the "nocomp" flag for key generation.
Werner Koch [Fri, 8 Nov 2013 09:07:40 +0000 (10:07 +0100)]
ecc: Implement the "nocomp" flag for key generation.

* cipher/ecc.c (ecc_generate): Support the "nocomp" flag.
* tests/keygen.c (check_ecc_keys): Add a test for it.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Make "noparam" the default and replace by "param".
Werner Koch [Fri, 8 Nov 2013 08:53:32 +0000 (09:53 +0100)]
ecc: Make "noparam" the default and replace by "param".

* src/cipher.h (PUBKEY_FLAG_NOCOMP): New.
(PUBKEY_FLAG_NOPARAM): Remove.
(PUBKEY_FLAG_PARAM): New.
* cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): Support the new
flags and ignore the obsolete "noparam" flag.
* cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Return the curve name
also for curves selected by NBITS.
(_gcry_mpi_ec_new): Support the "param" flag.
* cipher/ecc.c (ecc_generate, ecc_sign, ecc_verify): Ditto.
* tests/keygen.c (check_ecc_keys): Remove the "noparam" flag.
--

This is an API change but there are not many ECC users yet and adding
the "param" flag for those who really need the parameters (e.g. if
private keys have been stored without the curve name, it can easily be
added.

Note that no version of Libgcrypt with support for "noparam" has been
released but for the sake of projects already working with the master
version we don't bail out on "noparam".

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoFix decryption function size in AES AMD64 assembly
Jussi Kivilinna [Thu, 7 Nov 2013 10:33:59 +0000 (12:33 +0200)]
Fix decryption function size in AES AMD64 assembly

* cipher/rijndael-amd64.S (_gcry_aes_amd64_decrypt_block): Set '.size'
for '_gcry_aes_amd64_decrypt_block', not '..._encrypt_block'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoChange 64-bit shift to 32-bit in AES AMD64 assembly
Jussi Kivilinna [Thu, 7 Nov 2013 10:24:04 +0000 (12:24 +0200)]
Change 64-bit shift to 32-bit in AES AMD64 assembly

* cipher/rijndael-amd64.S (do16bit_shr): Change 'shrq' to 'shrl'.
--

64-bit shift is not needed here as registers are used for 32-bit values.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoSpeed-up AES-NI key setup
Jussi Kivilinna [Wed, 6 Nov 2013 13:52:37 +0000 (15:52 +0200)]
Speed-up AES-NI key setup

* cipher/rijndael.c [USE_AESNI] (m128i_t): Remove.
[USE_AESNI] (u128_t): New.
[USE_AESNI] (aesni_do_setkey): New.
(do_setkey) [USE_AESNI]: Move AES-NI accelerated key setup to
'aesni_do_setkey'.
(do_setkey): Call _gcry_get_hw_features only once. Clear stack after
use in generic key setup part.
(rijndael_setkey): Remove stack burning.
(prepare_decryption) [USE_AESNI]: Use 'u128_t' instead of 'm128i_t' to
avoid compiler generated SSE2 instructions and XMM register usage,
unroll 'aesimc' setup loop
(prepare_decryption): Clear stack after use.
[USE_AESNI] (do_aesni_enc_aligned): Update comment about alignment.
(do_decrypt): Do not burning stack after prepare_decryption.
--

Patch improves the speed of AES key setup with AES-NI instructions. Patch also
removes problematic the use of vector typedef, which might cause interference
with XMM register usage in AES-NI accelerated code.

New:
 $ tests/benchmark --cipher-with-keysetup --cipher-repetitions 1000 cipher aes aes192 aes256
 Running each test 1000 times.
                 ECB/Stream         CBC             CFB             OFB             CTR             CCM
              --------------- --------------- --------------- --------------- --------------- ---------------
 AES            520ms   590ms  1760ms   310ms  1640ms   300ms  1620ms  1610ms   350ms   360ms  2160ms  2140ms
 AES192         640ms   680ms  2030ms   370ms  1920ms   350ms  1890ms  1880ms   400ms   410ms  2490ms  2490ms
 AES256         730ms   780ms  2330ms   430ms  2210ms   420ms  2170ms  2180ms   470ms   480ms  2830ms  2840ms

Old:
 $ tests/benchmark --cipher-with-keysetup --cipher-repetitions 1000 cipher aes aes192 aes256
 Running each test 1000 times.
                 ECB/Stream         CBC             CFB             OFB             CTR             CCM
              --------------- --------------- --------------- --------------- --------------- ---------------
 AES            670ms   740ms  1910ms   470ms  1790ms   470ms  1770ms  1760ms   520ms   510ms  2310ms  2310ms
 AES192         820ms   860ms  2220ms   550ms  2110ms   540ms  2070ms  2070ms   600ms   590ms  2670ms  2680ms
 AES256         920ms   970ms  2510ms   620ms  2390ms   600ms  2360ms  2370ms   650ms   660ms  3020ms  3020ms

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAvoid burn stack in Arcfour setkey
Jussi Kivilinna [Mon, 4 Nov 2013 19:54:33 +0000 (21:54 +0200)]
Avoid burn stack in Arcfour setkey

* cipher/arcfour.c (arcfour_setkey): Remove stack burning.
--

Stack is already cleared in do_arcfour_setkey and GCC is inlining
do_arcfour_setkey to arcfour_setkey which renders this _gcry_burn_stack
broken anyways.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAvoid burn_stack in CAST5 setkey
Jussi Kivilinna [Tue, 5 Nov 2013 10:30:23 +0000 (12:30 +0200)]
Avoid burn_stack in CAST5 setkey

* cipher/cast5.c (do_cast_setkey): Use wipememory instead of memset.
(cast_setkey): Remove stack burning.
--

Burning stack does not work properly when compiler inlines static functions,
therefore use wipememory to clear stack after use instead of relying on
_gcry_burn_stack.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoImprove Serpent key setup speed
Jussi Kivilinna [Mon, 4 Nov 2013 19:28:22 +0000 (21:28 +0200)]
Improve Serpent key setup speed

* cipher/serpent.c (SBOX, SBOX_INVERSE): Remove index argument.
(serpent_subkeys_generate): Use smaller temporary arrays for subkey
generation and perform stack clearing locally.
(serpent_setkey_internal): Use wipememory to clear stack and remove
_gcry_burn_stack.
(serpent_setkey): Remove unneeded _gcry_burn_stack.
--

Avoid using large arrays and large stack burning to gain extra speed for
key setup.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoModify encrypt/decrypt arguments for in-place
Jussi Kivilinna [Sun, 3 Nov 2013 20:07:19 +0000 (22:07 +0200)]
Modify encrypt/decrypt arguments for in-place

* cipher/cipher.c (gcry_cipher_encrypt, gcry_cipher_decrypt): Modify
local arguments if in-place operation.
--

Modify encrypt/decrypt argument variables instead of calling subfunction with
different arguments. This allows compiler to inline the subfunction for small
speedup.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoSpeed up Stribog
Jussi Kivilinna [Tue, 1 Oct 2013 18:47:53 +0000 (21:47 +0300)]
Speed up Stribog

* cipher/stribog.c (STRIBOG_TABLES): Remove.
(Pi): Remove.
[!STRIBOG_TABLES] (A, strido): Remove.
(stribog_table): New table pre-reordered with Pi values.
(strido): Rewrite for new table.
(LPSX): Rewrite for new table.
(xor): Remove.
(g): Small tweaks.
--

Patch optimizes the table-lookup implementation a bit. Patch also removes
the unused non-table implementation from source.

On Intel Core i5-4570 (amd64, 3.2Ghz):

After:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 STRIBOG256     |      9.22 ns/B     103.4 MiB/s     29.53 c/B
 STRIBOG512     |      9.23 ns/B     103.4 MiB/s     29.53 c/B

Before:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 STRIBOG256     |     30.17 ns/B     31.61 MiB/s     96.56 c/B
 STRIBOG512     |     30.20 ns/B     31.57 MiB/s     96.68 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoTweak AES-NI bulk CTR mode slightly
Jussi Kivilinna [Fri, 1 Nov 2013 22:37:43 +0000 (00:37 +0200)]
Tweak AES-NI bulk CTR mode slightly

* cipher/rijndael.c [USE_AESNI] (aesni_cleanup_2_5): Rename to...
(aesni_cleanup_2_6): ...this and clear also 'xmm6'.
[USE_AESNI && __i386__] (do_aesni_ctr, do_aesni_ctr_4): Prevent
inlining only on i386, allow on AMD64.
[USE_AESNI] (do_aesni_ctr, do_aesni_ctr_4): Use counter block from
'xmm5' and byte-swap mask from 'xmm6'.
(_gcry_aes_ctr_enc) [USE_AESNI]: Preload counter block to 'xmm5' and
byte-swap mask to 'xmm6'.
(_gcry_aes_ctr_enc, _gcry_aes_cfb_dec, _gcry_aes_cbc_dec): Use
'aesni_cleanup_2_6'.
--

Small tweak that yeilds ~5% more speed on Intel Core i5-4570.

After:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        CTR enc |     0.274 ns/B    3482.5 MiB/s     0.877 c/B
        CTR dec |     0.274 ns/B    3486.8 MiB/s     0.876 c/B

Before:
 AES            |  nanosecs/byte   mebibytes/sec   cycles/byte
        CTR enc |     0.288 ns/B    3312.5 MiB/s     0.922 c/B
        CTR dec |     0.288 ns/B    3312.6 MiB/s     0.922 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoTweak bench-slope parameters
Jussi Kivilinna [Sat, 2 Nov 2013 12:00:27 +0000 (14:00 +0200)]
Tweak bench-slope parameters

* tests/bench-slope.c (BUF_STEP_SIZE): Half step size to 64.
(NUM_MEASUREMENT_REPETITIONS): Double repetitions to 64.
--

Tweak parameters for better repeatability of results with fast ciphers
(AES-NI).

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoOptimize Blowfish weak key check
Jussi Kivilinna [Mon, 4 Nov 2013 11:32:49 +0000 (13:32 +0200)]
Optimize Blowfish weak key check

* cipher/blowfish.c (hashset_elem, val_to_hidx, add_val): New.
(do_bf_setkey): Use faster algorithm for detecting weak keys.
(bf_setkey): Move stack burning to do_bf_setkey.
--

Patch optimizes the weak key check for Blowfish. Instead of iterating through
sbox-tables for duplicates, insert values to hash-set and detect collisions.

Old check code was taking slightly longer time than the actual key setup of
Blowfish, which by itself is already quite slow.

After:

 $ tests/benchmark --cipher-with-keysetup --cipher-repetitions 10 cipher blowfish
 Running each test 10 times.
                 ECB/Stream         CBC             CFB             OFB             CTR             CCM
              --------------- --------------- --------------- --------------- --------------- ---------------
 BLOWFISH       410ms   440ms   430ms   370ms   440ms   370ms   430ms   440ms   370ms   370ms       -       -

Before:

 $ tests/benchmark --cipher-with-keysetup --cipher-repetitions 10 cipher blowfish
 Running each test 10 times.
                 ECB/Stream         CBC             CFB             OFB             CTR             CCM
              --------------- --------------- --------------- --------------- --------------- ---------------
 BLOWFISH       780ms   770ms   780ms   730ms   780ms   730ms   780ms   790ms   720ms   730ms       -       -

Without key-setup:

 $ tests/benchmark --cipher-repetitions 10 cipher blowfish
 Running each test 10 times.
                 ECB/Stream         CBC             CFB             OFB             CTR             CCM
              --------------- --------------- --------------- --------------- --------------- ---------------
 BLOWFISH        70ms    70ms    80ms    30ms    80ms    30ms    80ms    90ms    20ms    30ms       -       -

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoFix __builtin_bswap32/64 checks
Jussi Kivilinna [Wed, 6 Nov 2013 17:05:09 +0000 (19:05 +0200)]
Fix __builtin_bswap32/64 checks

* configure.ac (gcry_cv_have_builtin_bswap32)
(gcry_cv_have_builtin_bswap64): Change compile checks to link checks.
--

Patch changes compile checks to link checks for __builtin_bswap(32|64).
Compiling obviously works with missing functions, linking not so much.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoFix 'u32' build error with Camellia
Jussi Kivilinna [Wed, 6 Nov 2013 14:51:55 +0000 (16:51 +0200)]
Fix 'u32' build error with Camellia

* cipher/camellia.c: Add include for <config.h> and "types.h".
(u32): Remove.
(u8): Typedef as 'byte'.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agopubkey: Add forward compatibility feature.
Werner Koch [Wed, 6 Nov 2013 07:56:02 +0000 (08:56 +0100)]
pubkey: Add forward compatibility feature.

* cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): Add
"igninvflag".
--

If future versions of Libgcrypt want to add optional flags to a pubkey
s-expression, they may use the "igninvflag" flag to make the flag
parser ignore flags it does not know about.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Require "eddsa" flag for curve Ed25519.
Werner Koch [Tue, 5 Nov 2013 18:00:09 +0000 (19:00 +0100)]
ecc: Require "eddsa" flag for curve Ed25519.

* src/cipher.h (PUBKEY_FLAG_ECDSA): Remove.
* cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): Remove "ecdsa".
* cipher/ecc.c (ecc_generate, ecc_sign, ecc_verify): Require "eddsa" flag.
* cipher/ecc-misc.c (_gcry_ecc_compute_public): Depend "eddsa" flag.
* tests/benchmark.c, tests/keygen.c, tests/pubkey.c
* tests/t-ed25519.c, tests/t-mpi-point.c: Adjust for changed flags.
--

This changes make using ECDSA signatures the default for all curves.
If another signing algorithm is to be used, the corresponding flag
needs to be given.  In particular the flags "eddsa" is now always
required with curve Ed25519 to comply with the specs.  This change
makes the code better readable by not assuming a certain signature
algorithm depending on the curve.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Fully implement Ed25519 compression in ECDSA mode.
Werner Koch [Tue, 5 Nov 2013 16:25:02 +0000 (17:25 +0100)]
ecc: Fully implement Ed25519 compression in ECDSA mode.

* src/ec-context.h (mpi_ec_ctx_s): Add field FLAGS.
* mpi/ec.c (ec_p_init): Add arg FLAGS.  Change all callers to pass it.
* cipher/ecc-curves.c (point_from_keyparam): Add arg EC, parse as
 opaque mpi and use eddsa decoding depending on the flag.
(_gcry_mpi_ec_new): Rearrange to parse Q and D after knowing the
curve.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agompi: Add function gcry_mpi_set_opaque_copy.
Werner Koch [Mon, 4 Nov 2013 15:47:13 +0000 (16:47 +0100)]
mpi: Add function gcry_mpi_set_opaque_copy.

* src/gcrypt.h.in (gcry_mpi_set_opaque_copy): New.
* src/visibility.c (gcry_mpi_set_opaque_copy): New.
* src/visibility.h (gcry_mpi_set_opaque_copy): Mark visible.
* src/libgcrypt.def, src/libgcrypt.vers: Add new API.
* tests/mpitests.c (test_opaque): Add test.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoMake test vectors 'static const'
Jussi Kivilinna [Sun, 13 Oct 2013 09:42:32 +0000 (12:42 +0300)]
Make test vectors 'static const'

* cipher/arcfour.c (selftest): Change test vectors to 'static const'.
* cipher/blowfish.c (selftest): Ditto.
* cipher/camellia-glue.c (selftest): Ditto.
* cipher/cast5.c (selftest): Ditto.
* cipher/des.c (selftest): Ditto.
* cipher/rijndael.c (selftest): Ditto.
* tests/basic.c (cipher_cbc_mac_cipher, check_aes128_cbc_cts_cipher)
(check_ctr_cipher, check_cfb_cipher, check_ofb_cipher)
(check_ccm_cipher, check_stream_cipher)
(check_stream_cipher_large_block, check_bulk_cipher_modes)
(check_ciphers, check_digests, check_hmac, check_pubkey_sign)
(check_pubkey_sign_ecdsa, check_pubkey_crypt, check_pubkey): Ditto.
--

Some test vectors have been defined without 'static' and thus end up being
initialized on runtime. Change these to 'static'. Also change test vectors
const where possible.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoMake jump labels local in Salsa20 assembly
Jussi Kivilinna [Sun, 3 Nov 2013 20:11:30 +0000 (22:11 +0200)]
Make jump labels local in Salsa20 assembly

* cipher/salsa20-amd64.S: Rename '._labels' to '.L_labels'.
* cipher/salsa20-armv7-neon.S: Ditto.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agobithelp: fix undefined behaviour with rol and ror
Jussi Kivilinna [Wed, 30 Oct 2013 06:57:15 +0000 (08:57 +0200)]
bithelp: fix undefined behaviour with rol and ror

* cipher/bithelp.h (rol, ror): Mask shift with 31.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agotests: Add feature to skip benchmarks.
Werner Koch [Tue, 29 Oct 2013 14:01:01 +0000 (15:01 +0100)]
tests: Add feature to skip benchmarks.

* tests/benchmark.c (main): Add feature to skip the test.
* tests/bench-slope.c (main): Ditto.
(get_slope): Repace C++ style comment.
(double_cmp, cipher_bench, _hash_bench): Repalce system reserved
symbols.
--

During development a quick run of the regression is often useful,
however the benchmarks take a lot of time and thus this feature
allows to skip theses tests.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoecc: Finish Ed25519/ECDSA hack.
Werner Koch [Tue, 29 Oct 2013 13:36:58 +0000 (14:36 +0100)]
ecc: Finish Ed25519/ECDSA hack.

* cipher/ecc.c (ecc_generate): Fix Ed25519/ECDSA case.
(ecc_verify): Implement ED25519/ECDSA uncompression.
--

With this change Ed25519 may be used with ECDSA while using the
Ed25519 standard compression technique.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoTypo fix.
Werner Koch [Tue, 29 Oct 2013 10:27:58 +0000 (11:27 +0100)]
Typo fix.

--

6 years agoecc: Add flags "noparam" and "comp".
Werner Koch [Fri, 25 Oct 2013 13:44:03 +0000 (15:44 +0200)]
ecc: Add flags "noparam" and "comp".

* src/cipher.h (PUBKEY_FLAG_NOPARAM, PUBKEY_FLAG_COMP): New.
* cipher/pubkey-util.c (_gcry_pk_util_parse_flaglist): Parse new flags
and change code for possible faster parsing.
* cipher/ecc.c (ecc_generate): Implement the "noparam" flag.
(ecc_sign): Ditto.
(ecc_verify): Ditto.
* tests/keygen.c (check_ecc_keys): Use the "noparam" flag.

* cipher/ecc.c (ecc_generate): Fix parsing of the deprecated
transient-flag parameter.
(ecc_verify): Do not make Q optional in the extract-param call.
--

Note that the "comp" flag has not yet any effect.

Signed-off-by: Werner Koch <wk@gnupg.org>
6 years agoFix typos in documentation
Jussi Kivilinna [Mon, 28 Oct 2013 15:11:21 +0000 (17:11 +0200)]
Fix typos in documentation

* doc/gcrypt.texi: Fix some typos.
--

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd ARM NEON assembly implementation of Serpent
Jussi Kivilinna [Sun, 27 Oct 2013 12:07:59 +0000 (14:07 +0200)]
Add ARM NEON assembly implementation of Serpent

* cipher/Makefile.am: Add 'serpent-armv7-neon.S'.
* cipher/serpent-armv7-neon.S: New.
* cipher/serpent.c (USE_NEON): New macro.
(serpent_context_t) [USE_NEON]: Add 'use_neon'.
[USE_NEON] (_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec): New prototypes.
(serpent_setkey_internal) [USE_NEON]: Detect NEON support.
(_gcry_serpent_neon_ctr_enc, _gcry_serpent_neon_cfb_dec)
(_gcry_serpent_neon_cbc_dec) [USE_NEON]: Use NEON implementations
to process eight blocks in parallel.
* configure.ac [neonsupport]: Add 'serpent-armv7-neon.lo'.
--

Patch adds ARM NEON optimized implementation of Serpent cipher
to speed up parallelizable bulk operations.

Benchmarks on ARM Cortex-A8 (armhf, 1008 Mhz):

Old:
 SERPENT128     |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     43.53 ns/B     21.91 MiB/s     43.88 c/B
        CFB dec |     44.77 ns/B     21.30 MiB/s     45.13 c/B
        CTR enc |     45.21 ns/B     21.10 MiB/s     45.57 c/B
        CTR dec |     45.21 ns/B     21.09 MiB/s     45.57 c/B
New:
 SERPENT128     |  nanosecs/byte   mebibytes/sec   cycles/byte
        CBC dec |     26.26 ns/B     36.32 MiB/s     26.47 c/B
        CFB dec |     26.21 ns/B     36.38 MiB/s     26.42 c/B
        CTR enc |     26.20 ns/B     36.40 MiB/s     26.41 c/B
        CTR dec |     26.20 ns/B     36.40 MiB/s     26.41 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd ARM NEON assembly implementation of Salsa20
Jussi Kivilinna [Sat, 26 Oct 2013 12:00:48 +0000 (15:00 +0300)]
Add ARM NEON assembly implementation of Salsa20

* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'.
* cipher/salsa20-armv7-neon.S: New.
* cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro.
(struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t)
(salsa20_ivsetup_t): New.
(SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'.
(salsa20_core): Change 'src' argument to 'ctx'.
[USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype.
[USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon)
(salsa20_ivsetup_neon): New.
(salsa20_do_setkey): Setup keysetup, ivsetup and core with default
functions.
(salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect,
set keysetup, ivsetup and core with ARM NEON functions.
(salsa20_do_setkey): Call 'ctx->keysetup'.
(salsa20_setiv): Call 'ctx->ivsetup'.
(salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers
in ARM NEON implementation.
(salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly
calling 'salsa20_core'.
(selftest): Add test to check large buffer processing and block counter
updating.
* configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'.
--

Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation
gains extra speed by processing three blocks in parallel with help of ARM
NEON vector processing unit.

This implementation is based on public domain code by Peter Schwabe and D. J.
Bernstein and it is available in SUPERCOP benchmarking framework. For more
details on this work, check paper "NEON crypto" by Daniel J. Bernstein and
Peter Schwabe:
    http://cryptojedi.org/papers/#neoncrypto

Benchmark results on Cortex-A8 (1008 Mhz):

Before:
 SALSA20        |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     18.88 ns/B     50.51 MiB/s     19.03 c/B
     STREAM dec |     18.89 ns/B     50.49 MiB/s     19.04 c/B
                =
 SALSA20R12     |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     13.60 ns/B     70.14 MiB/s     13.71 c/B
     STREAM dec |     13.60 ns/B     70.13 MiB/s     13.71 c/B

After:
 SALSA20        |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      5.48 ns/B     174.1 MiB/s      5.52 c/B
     STREAM dec |      5.47 ns/B     174.2 MiB/s      5.52 c/B
                =
 SALSA20R12     |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      3.65 ns/B     260.9 MiB/s      3.68 c/B
     STREAM dec |      3.65 ns/B     261.6 MiB/s      3.67 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
6 years agoAdd AMD64 assembly implementation of Salsa20
Jussi Kivilinna [Sat, 26 Oct 2013 12:00:48 +0000 (15:00 +0300)]
Add AMD64 assembly implementation of Salsa20

* cipher/Makefile.am: Add 'salsa20-amd64.S'.
* cipher/salsa20-amd64.S: New.
* cipher/salsa20.c (USE_AMD64): New macro.
[USE_AMD64] (_gcry_salsa20_amd64_keysetup, _gcry_salsa20_amd64_ivsetup)
(_gcry_salsa20_amd64_encrypt_blocks): New prototypes.
[USE_AMD64] (salsa20_keysetup, salsa20_ivsetup, salsa20_core): New.
[!USE_AMD64] (salsa20_core): Change 'src' to non-constant, update block
counter in 'salsa20_core' and return burn stack depth.
[!USE_AMD64] (salsa20_keysetup, salsa20_ivsetup): New.
(salsa20_do_setkey): Move generic key setup to 'salsa20_keysetup'.
(salsa20_setkey): Fix burn stack depth.
(salsa20_setiv): Move generic IV setup to 'salsa20_ivsetup'.
(salsa20_do_encrypt_stream) [USE_AMD64]: Process large buffers in AMD64
implementation.
(salsa20_do_encrypt_stream): Move stack burning to this function...
(salsa20_encrypt_stream, salsa20r12_encrypt_stream): ...from these
functions.
* configure.ac [x86-64]: Add 'salsa20-amd64.lo'.
--

Patch adds fast AMD64 assembly implementation for Salsa20. This implementation
is based on public domain code by D. J. Bernstein and it is available at
http://cr.yp.to/snuffle.html (amd64-xmm6). Implementation gains extra speed
by processing four blocks in parallel with help SSE2 instructions.

Benchmark results on Intel Core i5-4570 (3.2 Ghz):

Before:
SALSA20        |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      3.88 ns/B     246.0 MiB/s     12.41 c/B
     STREAM dec |      3.88 ns/B     246.0 MiB/s     12.41 c/B
                =
 SALSA20R12     |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      2.46 ns/B     387.9 MiB/s      7.87 c/B
     STREAM dec |      2.46 ns/B     387.7 MiB/s      7.87 c/B

After:
 SALSA20        |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.985 ns/B     967.8 MiB/s      3.15 c/B
     STREAM dec |     0.987 ns/B     966.5 MiB/s      3.16 c/B
                =
 SALSA20R12     |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     0.636 ns/B    1500.5 MiB/s      2.03 c/B
     STREAM dec |     0.636 ns/B    1499.2 MiB/s      2.04 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>