Fast hashes are useful for signing, MACs (symmetric "signatures" so to speak), k...

lame-robot-hoax · on Jan 9, 2020

So I’m not super familiar with things like this, but for example, WireGuard uses BLAKE2 for hashing. What level of undertaking would it be to move from BLAKE2 to BLAKE3 in regards to WireGuard? Can you just pop out BLAKE2 and pop in BLAKE3?

beefhash · on Jan 9, 2020

The two hashes aren't compatible, so a hash of the same message will yield two different hashes under BLAKE2 and BLAKE3.

As far as I can tell, BLAKE2 has effectively all the properties BLAKE3 has (arbitrary output length, keyed hash mode), so the upgrade for communications boils down to negotiating/determining which of the two hash functions to use over the wire (with all the downsides that come with agility of cryptographic primitives); for stored hashes, they have to be recomputed and replaced (or you could store a flag is BLAKE2/is BLAKE3 and update them as you touch hashes, kind of similar to how password hashes are swapped at login time).

Note that BLAKE3 existing doesn't break BLAKE2. It's perfectly fine to just keep trucking BLAKE2, it's just that BLAKE3 has better performance characteristics that make it very attractive.

cesarb · on Jan 9, 2020

Wireguard uses pre-shared keys, which are manually (or through some separate program) configured on each end, so one could in theory make "oh, and by the way we will use BLAKE3 instead of BLAKE2" part of that pre-shared key.

loeg · on Jan 9, 2020

Wireguard would bump protocol revision, and the new protocol would just pop out Blake2 and use Blake3 hashes, yes.

aidenn0 · on Jan 9, 2020

Assuming wireguard hashes data shorter than 4k (i.e. most network packets), there is no reason to switch; BLAKE3 is only faster than BLAKE2 on data longer than 4k.

loeg · on Jan 9, 2020

That isn't literally true; the reduced rounds make it faster on small inputs, too. And jumbo packets can be 4kB or 9000B or whatever, if wireguard is used on such an interface.

aidenn0 · on Jan 10, 2020

Does BLAKE3 reduce rounds vs BLAKE2s?

loup-vaillant · on Jan 10, 2020

7 rounds instead of 10.

Though for Wireguard, you'd compete with Blake2b as well, which has the advantage of using 64-bit words. And if you want a fair comparison, you should reduce the rounds of Blake2b down to 8 (instead of 12), as recommended in Aumasson's "Too Much Crypto".

On a 64-bit machine, such a reduced Blake2b would be much faster than Blake3 on inputs greater than 128 bytes and smaller than 4Kib.

loeg · on Jan 10, 2020

They address this in the paper, to some extent. With SIMD, you get 128, 256, or 512 bits of vector. You can either store 32x4, 32x8, 32x16, or 64x2, 64x4, 64x8 words. But either way you're processing N bits in parallel.

The concern about 64-bit machines and using 64-bit word sizes vs 32-bit word sizes really only matters if your 64-bit machine doesn't have SIMD vector extensions. (All amd64 hardware, for example, has at least SSE2.) And as they point out, being 32-bit native really helps on low-end 32-bit machines without SIMD intrinsics.

(Re: the hypothetical, if wireguard were to do a protocol revision and replace Blake2B with this, it would make sense to also replace Chacha20 with Chacha8 or 12 at the same time. I doubt the WG authors will do any such thing any time soon.)

loup-vaillant · on Jan 10, 2020

I was talking about small-ish inputs, for which vectorisation wouldn't help.

loeg · on Jan 10, 2020

Yes. It is addressed in TFA, as well as the current top-voted comment on the article.

bscphil · on Jan 9, 2020

> as part of cryptographically secure PRNGs (though most of the world has moved on to stream ciphers for that instead)

My understanding is that plenty of stream ciphers are based on hashes. For example each block of the stream can be hash(key + nonce + block counter + constants) that you xor with your plaintext (or don't, if you just want a CSPRNG).

loeg · on Jan 9, 2020

BLAKE and Chacha are pretty intimately related; if this is faster, I don't see any reason to not use it as a CSPRNG over, say, Chacha20. You may have to be careful about not rolling the counter, unlike Chacha20 (which can easily be extended to have a 128-bit counter).

bscphil · on Jan 10, 2020

> unlike Chacha20 (which can easily be extended to have a 128-bit counter)

Is this frequently done in practice? The CSPRNG code for ChaCha20 I've looked at rotates the key itself using 32 out of every 768 bytes. In that case rolling the counter isn't a concern.

loup-vaillant · on Jan 10, 2020

The 128-bit counter would work, and would remove the 32 bytes of overhead. The speed difference however is fairly negligible, and you would lose forward secrecy in the process (if your unrotated seed gets stolen, all past random numbers are revealed).

Now I wonder where this 768 bytes could possibly come from. It's only a multiple of 256, which can only take advantage of 128-bit vectors (4 blocks at a time). Ideally you want an 8 way parallelism (AVX2) or even 16 way parallelism (AVX-512). That is, either 512 byte blocks, or 1024 byte blocks.

bscphil · on Jan 10, 2020

> Now I wonder where this 768 bytes could possibly come from.

This is totally implementation defined, it's not required by the spec. As loeg says (below) I was looking at a reference implementation by djb. I did a quick skim of OpenBSD's arc4random (which also uses ChaCha20) and if I'm reading it correctly, it rekeys every 1024 bytes.

> Ideally you want an 8 way parallelism (AVX2) or even 16 way parallelism (AVX-512)

My guess is that 768 was thought to be a decent enough trade-off between maximum and average latency for calls to the CSPRNG. I wouldn't be surprised to see that most implementations that are optimized for specific CPU architectures use different values.

loeg · on Jan 10, 2020

128-bit counter and key rotation are not mutually exclusive.

I believe the 768 byte figure comes from DJB's blog on fast key erasure[1]. Why he picked 768, I do not know.

[1]: https://blog.cr.yp.to/20170723-random.html

loup-vaillant · on Jan 10, 2020

Oh, I see. Then people blindly copied it, without taking into account that Chacha20 has bigger blocks than AES, and could benefit from vector implementations (while AES has bit slicing or AES-NI).

loeg · on Jan 10, 2020

I don't know about frequently; it is done at least once. (Of course, nothing about a wide counter prevents you from rotating the key, too, for forward secrecy properties.)