RainbowConsult.DK © 2020

Tilbage til forsiden





Hashes, checksums and MACs explained 





Lets start with the basics.


In Cryptography, a hashing algorithm converts many bits to fewer bits through a digest operation. Hashes are used to confirm integrity of messages and files.


All hashing algorithms generate collisions.

A collision is when several many-bit combinations produce the same fewer bit output. The Cryptographic strength of a hashing algorithm is defined by the inability for an individual to determine what the output is going to be for a given input because if they could they could construct a file with a hash that matches a legitimate file and compromise the assumed integrity of the system. The difference between CRC32 and MD5 is that MD5 generates a larger hash that's harder to predict.


When you want to implement message integrity - meaning the message hasn't been tampered with in transit - the inability to predict collisions is an important property. A 32-bit hash can describe 4 billion different messages or files using 4 billion different unique hashes. If you have 4 billion and 1 files, you are guaranteed to have 1 collision. 1 TB Bitspace has the possibility for Billions of Collisions. If I'm an attacker and I can predict what that 32 bit hash is going to be, I can construct an infected file that collides with the target file; that has the same hash.


Additionally if I'm doing 10mbps transmission then the possibility of a packet getting corrupted just right to bypass crc32 and continue along the to the destination and execute is very low. Lets say at 10mbps I get 10 errors\second. If I ramp that upto 1gbps, now I'm getting 1,000 errors per second. If I ram upto 1 exabit per second, then I have an error rate of 1,000,000,000 errors per second. Say we have a collission rate of 1\1,000,000 transmission errors, Meaning 1 in a million transmission errors results in the corrupt data getting through undetected. At 10mbps I'd get error data being sent every 100,000 seconds or about once a day. At 1gbps it'd happen once every 5 minutes. At 1 exabit per second, we're talking several times a second.


If you pop open wireshark you'll see your typical ethernet header has a CRC32, your IP header has a CRC32, and your TCP Header has a CRC32, and that's in addition to the what the higher layer protocols may do; e.g. IPSEC might use MD5 or SHA for integrity checking in addition to the above. There are several layers of error checking in typical network communications, and they STILL goof now and again at sub 10mbps speeds.


Cyclic Redundancy Check (CRC) has several common versions and several uncommon but generally is designed to just tell when a message or file has been damaged in transit (multiple bits flipping). CRC32 by itself is not a very good error checking protocol by today's standards in large, scalar enterprise environments because of the collision rate; the average users hard-drive can have upwards of 100k files, and file-shares on a company can have tens of millions. The ratio of hash-space to the number of files is just too low. CRC32 is computationally cheap to impliment whereas MD5 isn't.


MD5 was designed to stop intentional use of collissions to make a malicious file look malignant. It's considered insecure because the hashspace has been sufficiently mapped to enable some attacks to occur, and some collissions are preditable. SHA1 and SHA2 are the new kids on the block.


For file verification, Md5 is starting to be used by a lot of vendors because you can do multigigabyte files or multiterrabyte files quickly with it and stack that ontop of the general OS's use and support of CRC32's. Do not be suprised if within the next decade filesystems start using MD5 for error checking.




CRC’s versus MD5,SHA1.


While properly designed CRC's are good at detecting random errors in the data (due to e.g. line noise), the CRC is useless as a secure indicator of intentional manipulation of the data. And this is because it's not hard at all to modify the data to produce any CRC you desire (e.g. the same CRC as the original data, to try to disguise your data manipulation).


Therefore, even a 2048-bit CRC would be cryptographically much less secure than a 128-bit MD5.


There is a reason cryptographically strong hashes such as MD5 or SHA require much more computation than a simple CRC.


SHA-1: A 160-bit hash function which resembles the earlier MD5 algorithm. This was designed by the National Security Agency (NSA) to be part of the Digital Signature Algorithm. Cryptographic weaknesses were discovered in SHA-1, and the standard was no longer approved for most cryptographic uses after 2010.


SHA-2: A family of two similar hash functions, with different block sizes, known as SHA-256 and SHA-512. They differ in the word size; SHA-256 uses 32-bit words where SHA-512 uses 64-bit words. There are also truncated versions of each standard, known as SHA-224, SHA-384, SHA-512/224 and SHA-512/256. These were also designed by the NSA.