Improve performance of generic SHA256 implementation
authorJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 29 Jan 2016 15:42:41 +0000 (17:42 +0200)
committerJussi Kivilinna <jussi.kivilinna@iki.fi>
Fri, 29 Jan 2016 15:42:41 +0000 (17:42 +0200)
commitf3e51161036382429c3491c7c881f36c0a653c7b
tree564ea33872329c0461d535b648332a91517a85c8
parent5d41e1a1216c4b341bc737d7fe91438676a5c361
Improve performance of generic SHA256 implementation

* cipher/sha256.c (R): Let caller do variable shuffling.
(Chro, Maj, Sum0, Sum1): Convert from inline functions to macros.
(W, I): New.
(transform_blk): Unroll round loop; inline message expansion to rounds
to make message expansion buffer smaller.
--

Benchmark on Cortex-A8 (armv6, 1008 Mhz):

 Before:
                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |     27.63 ns/B     34.52 MiB/s     27.85 c/B

 After (1.31x faster):
                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |     20.97 ns/B     45.48 MiB/s     21.13 c/B

Benchmark on Cortex-A8 (armv7, 1008 Mhz):

 Before:
                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |     24.18 ns/B     39.43 MiB/s     24.38 c/B

 After (1.13x faster):
                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |     21.28 ns/B     44.82 MiB/s     21.45 c/B

Benchmark on Intel Core i5-4570 (i386, 3.2 Ghz):

 Before:
                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |      5.78 ns/B     164.9 MiB/s     18.51 c/B

 After (1.06x faster)
                 |  nanosecs/byte   mebibytes/sec   cycles/byte
  SHA256         |      5.41 ns/B     176.1 MiB/s     17.33 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
cipher/sha256.c