Creating hashs is quite common to check if content X has changed without looking at the whole content of X. Git for example uses SHA1-hashs for each commit. SHA1 itself is a pretty old cryptographic hash function, but in the case of Git there might have been better alternatives available, because the “to-be-hashed” content is not crypto relevant - it’s just content marker. Well… in the case of Git the current standard is SHA1, which works, but a ‘better’ way would be to use non-cryptographic functions for non-crypto purposes.
Why you should not use crypto-hashs for non-crypto
I discovered this topic via a Twitter-conversation and it started with this Tweet:
Bend Message Deduplication on #azure #servicebus to Your Will https://t.co/zjIQFjt2c9
— Sean Feldman (@sfeldman) 3. Dezember 2016
Clemens Vasters then came and pointed out why it would be better to use non-crypto hash functions:
Rationale: Any use of broken crypto hashes may trip up security review processes.
— Clemens Vasters 🇪🇺 (@clemensv) 4. Dezember 2016
The reason makes perfect sense for me - next step: What other choice are available?
Non-cryptographic hash functions in .NET
If you are googleing around you will find many different hashing algorithm, like Jenkins or MurmurHash.
Sean Feldman, who more or less started the Twitter discussion mentioned a very good library for .NET developers:
for the hashing function, I think I'll lead readers to a better option here https://t.co/LDiJuLD5A5
— Sean Feldman (@sfeldman) 4. Dezember 2016
The author of this awesome package is Brandon Dahler, who created .NET versions of the most well known algorithm and published them as NuGet packages.
The source and everything can be found on GitHub.
Lessons learned
If you want to hash something and it is not crypto relevant, then it would be better to look at one of those Data.HashFunctions - some a pretty crazy fast.
I’m not sure which one is ‘better’ - if you have some opinions please let me know. Brandon created a small description of each algorithm on the Data.HashFunction documentation page.
(my blogging backlog is quite long, so I needed 6 month to write this down ;) )