BLAKE3 is the most recent evolution of the BLAKE cryptographic hash function. Created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O'Hearn, BLAKE3 combines general purpose cryptographic tree hash bao with BLAKE2 in order to provide a big performance improvement over SHA-1, SHA-2, SHA-3, and BLAKE2.
BLAKE3 differs from BLAKE2 in two significant regards. First, it reduces the number of rounds from 10 to 7. Many block ciphers are defined by specifying a "round", which consists of a number of building blocks that are composed together to create a cryptographic function that is run multiple times. While this surely grants a significant performance improvement, it is unlikely to be the most important factor behind BLAKE3 speed. In fact, BLAKE3 also moves to using binary tree structures to enable an unbounded degree of parallelism:
BLAKE3 splits its input into 1 KiB chunks and arranges them as the leaves of a binary tree. Each chunk is compressed independently, so the degree of parallelism is equal to the number of chunks.
This implies BLAKE3 can leverage the intrinsic parallelism provided by SIMD instructions available on modern CPUs, which provides a better understanding of its performance.
BLAKE3's authors published a benchmark on an Intel Cascade Lake-SP 8275CL processor showing it to be 5x faster than BLAKE2 and 15x faster than SHA3-256.
Another, independent benchmark run on a 1GB file showed BLAKE3 to be almost 10x faster than SHA2.
Additionally, BLAKE3 can effectively exploit multi-core architectures and multi-threading, which provides it with great scalability, as shown in the following graph made available by BLAKE3's authors.
It must be noted that while BLAKE3 greatly outperforms other hashes such as BLAKE2 and SHA-2/3, it is not the only cryptographic function providing such level of performance. Specifically, KangarooTwelve reaches approximately the same throughput as BLAKE3 on an Intel Cascade Lake-SP 8275CL mentioned above, as per BLAKE3 authors' own benchmark. This result is also coherent with KangarooTwelve authors' own benchmark. Conversely, according to the figures provided by BLAKE3's authors, BLAKE3 appears to significantly beat KangarooTwelve on a Raspberry Pi Zero using a 32-bit ARM1176 processor.
When it comes to BLAKE3 security, its authors claim it to be 128-bit secure for all security goals, including preimage, collision, or differentiability attacks. This means BLAKE3 is as secure as SHA3-256 and other hashes that also target 128-bit security. In this respect, the biggest concern for many is BLAKE3 using only seven rounds, down from 12 in BLAKE2 and other hashes. According to some Reddit commenters, for example, this could mean BLAKE3 is less secure against future, currently unknown attacks that are not yet included in current crypto-analysis. This is a highly debated theme and, in fact, one of BLAKE3's authors, Aumasson, is also the author of "Too Much Crypto", a paper where he argues that many symmetric cryptography primitives use too many rounds and could be made much faster with fewer rounds without impacting their security.
BLAKE3's authors provided a reference implementation for their new hash in Rust, which is available on GitHub. They also rewrote it for C, aiming though to a simpler implementation that does not support multi-threading and is therefore slower. Additionally, Luke Champine ported it to Go, attaining what he himself describes as "not great, not terrible" performance.