Surprising they didn't go with Blake3 instead since it has much higher performan...

		nnx on Feb 4, 2020 \| parent \| context \| favorite \| on: A new hash algorithm for Git Surprising they didn't go with Blake3 instead since it has much higher performance and Git's performance-oriented ethos.

dchest on Feb 4, 2020 | [–]

BLAKE3 was released a month ago! The decision for new hash in git was made about two years ago. BLAKE2 was considered, though.

SHA-256 is fine. The biggest problem is switching to it...

simias on Feb 4, 2020 | | [–]

Is a significant part of git's typical profile spent computing hashes? I'm genuinely asking because I don't know the answer. I'd expect all the diffing and (potentially fuzzy) merging to be significantly more expensive operations, at least as far as big-O is concerned.

hannob on Feb 4, 2020 | | [–]

> Is a significant part of git's typical profile spent computing hashes?

No. Hashes are really cheap.

This annoys me a bit, because every discussion about hashing goes into endless bikeshedding which hash function to use. The simple truth is: SHA2, SHA3, Blake2/3 are all good enough from both a security and performance perspective that for almost any use case and the advantages and disadvantages are so minor that it really doesn't matter.

tialaramex on Feb 4, 2020 | | | [–]

Length extension is an unnecessary problem in MD constructions. It makes sense to get rid of the problem. So if you are building a new thing today there's some sense in not picking SHA-256 in order that you won't later hit your head on a length extension attack. SHA-512/256 (that's not a choice, it's just one hash in the SHA2 family) is a reasonable choice though, and of course if Git was vulnerable to length extension somehow they'd be in trouble years ago so for them why not SHA-256.

strenholme on Feb 4, 2020 | | | [–]

The length extension attack is a non-issue for Git’s use case, and SHA-256 (unlike SHA-512) benefits from having hardware acceleration in the new Ice Lake Intel chips (as well as on the AMD side of things), and has been around 11 years longer than SHA-512/256. And, yes, there are places which say “If you will use a hash, you will use SHA-256”.

Personally, the last time I was in a place where I had to choose which cryptography to use, I used SHA3’s direct predecessor, RadioGatún, because I needed a combined hash + stream cipher and, at the time (late 2007), RadioGatún was the only option.

RadioGatún also benefits from being about as fast as BLAKE2 (it would be faster in hardware, FWIW, having SHA3’s hardware advantages), and is approaching 14 years old without being broken by cryptoanalysis. Also, unlike BLAKE2/3, and like SHA3 and all sponge functions, it’s computationally expensive to “fast forward” in RadioGatún’s XOF (stream cipher, if you will) mode, which is beneficial for things like password hashing. Another nice thing about RadioGatún: It doesn’t have any magic constants in its specification, allowing a useful implementation to fit on my coffee mug, e.g.

  #include<stdio.h>//RadioGatun
  #include<stdint.h>/*32-bit**/
  #define b(z) for(c=0;c<z;c++)
  uint32_t c,e[42],f[42],g=19,h
  =13,n[45],i,j,k;void m(){j=0;
  b(12)f[c+c%3*h]^=e[c+1];b(g){
  i=c*7%g;k=e[i++];k^=e[i%g]|~e
  [(i+1)%g];j=j+c;n[c]=n[c+g]=k
  >>j%32|k<<-j%32;}for(i=39;i--
  ;f[i+1]=f[i])e[i]=n[i]^n[i+1]
  ^n[i+4];b(3)e[c+h]^=f[c*h]=f[
  c*h+h];*e^=1;}int main(int c,
  char**v){char*q=v[--c];for(;;
  m()){b(3){for(j=0;j<4;){f[c*h
  ]^=k=(*q?255&*q:1)<<8*j++;e[c
  +16]^=k;if(!*q++){b(18)m();b(
  8){j=c;b(4)printf("%02x",(e[1
  +j%2]>>8*c)&255);c=j;if(c%2)m
  ();}return 0&puts("");}}}}}//

If someone asked me which hash algorithm to use, I would suggest SHA-256, unless I think they needed protection from length extension attacks (so SHA-512/256), or needed an XOF (stream cipher-like) construction (so SHAKE256).

If performance mattered more than a conservative security margin, BLAKE3 (software performance) or KangarooTwelve (SHA3 variant; excellent hardware performance) would be good choices. If I were to do choose a hash + XOF for use today, I would use KangarooTwelve’s variant with a little larger security margin: MarsupilamiFourteen.

tialaramex on Feb 5, 2020 | | | [–]

Genuinely intrigued what you used RadioGatún for? I actually only read about it today because of reading around Keccak.

strenholme on Feb 5, 2020 | | | [–]

Cryptographically strong random numbers in MaraDNS 2.0. The hash nature of RadioGatún allows me to combine multiple entropy sources with varying amounts of randomness together to seed it then use it as a stream cipher to generate good random numbers. This way, the DNS query ID and source port are hard to guess, making blind DNS spoofing harder.

The nice thing about RadioGatún is that it only takes about 2k of compiled code (and can fit in under 600 bytes of source code, as seen in the parent) to pull all this off.

This was the best way to pull it off back in 2007, when RadioGatún was the only secure Extendable-Output Function (XOF) that existed.

Ayesh on Feb 4, 2020 | | [–]

Linux also has the ethos to choose boring technology. SHA2 has been here for so long and battle tested. For the majority of us, it is the natural choice. I'm not implying anything negative about SHA3/Blake/Keccak.

curben on Feb 4, 2020 | | [–]

The decision was made before the release of Blake3. The article did mention the algorithm is no longer hardcoded (hence the ability to support both SHA1 & SHA256). This means it's possible to transition to Blake3 (or any other) in future, though it won't be trivial.

nullc on Feb 4, 2020 | | [–]

> Git's performance-oriented ethos

Than sha256 will likely be preferable in the long run: It's faster with SHA-NI than blake3.

If you're not developing on a system with sha-ni, get with the program. Zen2 is freeking awesome. :)

wolfgke on Feb 4, 2020 | | [–]

> Zen2 is freeking awesome. :)

SHA-NI was introduced with the Intel Goldmont microarchitecture.

nullc on Feb 4, 2020 | | | [–]

Yes, but Goldmont is not particularly awesome. :) Presumably goldmont would be a downgrade for many people.

(On AMD the first generation zen have sha-ni, FWIW)

wolfgke on Feb 4, 2020 | | | [–]

Of course, processors that use one of the Atom/Celeron/Pentium microarchitectures are not the best choice if you desire maximum speed, but otherwise they are surprisingly interesting processors (IMHO much more interesting than what Intel delivers with the Core series).

At this time, Intel often experiments with or introduces features that are particularly interesting for embedded usages first on the Atom. For example the already mentioned SHA-NI. Another example are the MOVBE instructions (insanely useful if you handle big-endian data, for example in network packages (I am aware that on older x86 processors, there exists the BSWAP instruction)) - they were first introduced with Atom.

majewsky on Feb 4, 2020 | | | [–]

Great! I can't wait to have to throw away perfectly fine systems because of a new Git version. /s

bjoli on Feb 4, 2020 | | [–]

There are organisations that can only use approved crypto for various certifications and government contracts. It would be bad to drive such users away from git.

Under "feedback from git people" on https://www.mercurial-scm.org/wiki/SHA1TransitionPlan

fanf2 on Feb 4, 2020 | | [–]

They made the choice years ago and blake3 was announced last month.

pjc50 on Feb 4, 2020 | [–]

I'm waiting for Blake7.

xxs on Feb 4, 2020 | [–]

fond memories indeed....