As most of you familiar with DEF CON know we run all our sites over Tor as .onions (Tor Hidden Services)

In the quest to get the most security and performance out of Tor I've done a bunch of custom compiling and in this post I'll talk about the performance gains I've seen.

The Setup:
I run our Tor server on a lower power 4 core Intel D-2123IT CPU running at 2.2 GHz
As far as the operating system is concerned it is a "Skylake-avx512" CPU

The Compile:
Originally Tor was compiled by using "-march=skylake" in the Makefile and then with "./configure --with-openssl-dir=/openssl-path"
Later I realized I had not compiled with avx512 support so I changed to -march=skylake-avx512

You will also need to make sure OpenSSL is compiled with at least "./Configure -march=skylake-avx512 enable-ec_nistp_64_gcc_128"

The Question?:
Did it really make much of a difference? Let's see! How much does it help or hurt? Does the kind of math that Tor performs take advantage of the AVX features?

The Performance Test:
The tor bench performance testing tool (tor /src/test/bench) was used for these tests.
Note: I only ran the test once instead of averaging multiple runs. I was looking for only large differences.
I have highlighted anything that was more than a 5% increase or decrease in performance.

The Most Important Tests:
Please note the performance numbers are measured in us (Microseconds), ns (Nanosecons), and ms (Miliseconds)
The most important performance numbers are around

The Conclusion:
If your CPU supports the AVX512 extensions it is a

What is AVX512?:
(From wikipedia) AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 (Knights Landing)[1] and Skylake-X CPUs; this includes the Core-X series (excluding the Core i5-7640X and Core i7-7740X), as well as the new Xeon Scalable Processor Family and Xeon D-2100 Embedded Series.[2]

EDIT! NEW DATA from 2021 May 24
I've updated the Tor server OS to FreeBSD 13 to support KTLS and OSSL and I have recompiled Tor and re-run the benchmark and added a new column comparing AVX-512 vs. KTLS.

What is KTLS and OSSL and why is it making such a huge improvement?

The kernel now supports in-kernel framing and encryption of Transport Layer Security (TLS) data on TCP sockets for TLS versions 1.0 through 1.3. Transmit offload via in-kernel crypto drivers is supported for MtE cipher suites using AES-CBC as well as AEAD cipher suites using AES-GCM. Receive offload via in-kernel crypto drivers is supported for AES-GCM cipher suites for TLS 1.2. Using KTLS requires the use of a KTLS-aware userland SSL library. The OpenSSL library included in the base system does not enable KTLS support by default, but support can be enabled by building with the WITH_OPENSSL_KTLS option

Test Name skylake skylake-avx512 Difference Percentage KTLS Difference Percentage
dmap: digestset_probably_contains 127.91 70.37 -57.54 ns 44.98% Faster 109.07 +38.7 ns 55% Slower
dmap: digestmap_set 101.17 71.30 -29.87 ns 29.52% Faster 86.47 +15.17 ns 21% Slower
dmap: digestmap_get 87.79 64.83 -22.96 ns 26.15% Faster 72.63 +7.8 ns 12% Slower
dmap: digestset_add 136.86 72.70 -64.16 ns 46.88% Faster 116.94 +44.24 ns 60% Slower
onion_TAP: Client-side part 1 1006.4394 1026.0896 +19.6502 ns 1.95% Slower 146.24 -879.8496 ns 702% Faster
onion_TAP: Server-side key right 2956.9824 3027.5585 +70.5761 ns 2.38% Slower 364.61 -2662.9485 ns 831% Faster
onion_TAP: Server-side key wrong 4068.7480 4177.2812 +108.5332 ns 2.66% Slower 487.41 -3689.8712 ns 857% Faster
onion_TAP: Client-side part 2 917.4394 933.0117 +15.5723 ns 1.69% Slower 119.68 -813.3317 ns 784% Faster
onion_ntor: 25519 boff: Client part 1 103.1357 101.2304 -1.9053 us 01.84% Faster 74.57 -26.6604 us 35% Faster
onion_ntor: 25519 boff: Server side 316.1425 310.2880 -5.8545 us 01.85% Faster 227.77 -82.518 us 36% Faster
onion_ntor: 25519 boff: Client part 2 212.8984 209.0576 -3.8408 us 01.80% Faster 155.32 -53.7376 us 34% Faster
onion_ntor: 25519 bon: Client part 1 33.6640 32.6943 -0.9697 us 02.88% Faster 23.08 -9.6143 us 41% Faster
onion_ntor: 25519 bon: Server side 246.8164 241.8593 -4.9571 us 02.00% Faster 176.26 -65.5993 us 37% Faster
onion_ntor: 25519 bon: Client part 2 212.8955 209.0898 -3.8057 us 01.78% Faster 153 -56.0898 us 36% Faster
ed25519-donna off: verify signature 209.05 190.92 -18.13 us 08.67% Faster 176.88 -14.04 us 08% Faster
ed25519-donna on: verify signature 87.31 81.85 -5.46 us 06.25% Faster 67.38 -14.47 us 24% Faster
ed25519-donna off: gen pub key 55.89 60.22 +4.33 us 07.74% Slower 46.01 -14.21 us 30% Faster
ed25519-donna on: gen pub key 25.43 24.15 -1.28 us 05.03% Faster 19.62 -4.53 us 23% Faster
dh: Complete DH handshakes 3.7169 3.7855 +0.0686 ms 01.84% Slower 0.49 -3.2955 ms 772% Faster
ecdh_p256: Complete ECDH P-256 0.5149 0.5203 +0.0054 ms 01.04% Slower 0.19 -0.3303 ms 273% Faster
ecdh_p224: Complete ECDH P-224 0.3239 0.3317 +0.0078 ms 02.40% Slower 2.13 +1.79 ms 642% Slower
md_parse: Microdescripter parse 17452.62 13226.09 -4226.53 ns 24.21% Faster 11877.52 -1348.57 ns 11% Faster
crypto_strongest_rand(16) 12051.07 3192.13 -8858.94 ns 377% Faster
sha512 (2048) 7344.62 4228.52 -3116.10 ns 173% Faster