Guides/Hashing Explained: What MD5, SHA-1 and SHA-256 Are Actually For

Hashing Explained: What MD5, SHA-1 and SHA-256 Are Actually For

Look at the download page for almost any piece of software and, somewhere near the link, you will often find a short label like "SHA-256" followed by a long string of hexadecimal characters. To most people this is invisible — easy to skip past entirely — but it serves a specific and useful purpose: it is a fingerprint of the file, and comparing it is the most reliable way to confirm that what you downloaded is exactly what was published, byte for byte.

What a hash function actually does

A hash function takes any input — a short word, a paragraph, an entire multi-gigabyte file — and produces a fixed-length string of characters derived from it, called a hash (or digest). The same input always produces exactly the same hash, every time, on any computer. Change even a single character anywhere in the input — flip one letter, add one space — and the resulting hash comes out completely different, with no visible relationship to the original hash. This property, where small changes cause large, unpredictable differences in the output, is sometimes called the avalanche effect.

Crucially, the process only goes one way: from a hash, there is no way to reconstruct the original input. A hash tells you nothing about what the data was, only — by comparison — whether two pieces of data are identical.

The everyday use: checking that a download is intact and unmodified

The most common practical use of hashing is verifying file integrity. A software publisher computes the hash of a file before releasing it and publishes that hash alongside the download. After downloading, you compute the hash of the file you received yourself and compare it to the published value. If they match, the file you have is, bit for bit, identical to the one the publisher released — it was not corrupted in transit and has not been modified or tampered with along the way. If they don't match, even by one character, something is different about the file, and it should not be trusted.

This is especially useful for large files downloaded over unreliable connections, where partial corruption is a real possibility, and for security-sensitive downloads — operating system images, software installers — where confirming the file has not been tampered with matters.

Why MD5 and SHA-1 are now considered "broken"

MD5 and SHA-1 were, for many years, the standard hash functions for exactly this kind of use. Both have since been shown to be vulnerable to "collisions" — cases where two different inputs can be deliberately crafted to produce the same hash. For security purposes, that is a serious problem: it means someone could, in principle, create a malicious file with the same hash as a legitimate one, defeating the whole point of the check.

For non-adversarial uses — quickly checking whether a file changed during a routine copy, or generating a short identifier for some data in a program — MD5 and SHA-1 are still perfectly fine and are still widely used for exactly that, because nobody is trying to deliberately fool the check. The "broken" label specifically means they should not be relied on where someone might have a motive to forge a match — verifying the authenticity of a security-sensitive download being the clearest example.

SHA-256 and what it is used for today

SHA-256 (part of the SHA-2 family) is the modern default where collision resistance actually matters: software releases, package managers, digital signatures, and systems like Git and blockchains that depend on hashes being effectively impossible to forge. It produces a longer hash than MD5 or SHA-1, and — as far as anyone has publicly demonstrated — has no known practical collision attacks, which is why it is the one most commonly published next to "verify your download" instructions today.

In practice, the function to use comes down to context: SHA-256 for anything where the hash is a security or authenticity check, and MD5 or SHA-1 where you just need a quick, convenient fingerprint to compare two pieces of data and nobody has a reason to fake the result.

Tools mentioned in this guide