← Back to blog

How Hashing Works: MD5, SHA-256, and When to Use Which

A hash function takes data of any size and produces a fixed-length fingerprint called a digest. Hash functions are used everywhere — file integrity checks, password storage, digital signatures, version control — yet the choice of algorithm matters enormously. Using the wrong one can leave your system critically exposed.

What makes a good hash function

A cryptographic hash function must satisfy four properties:

  1. Deterministic — the same input always produces the same output.
  2. Fast to compute — hashing should be efficient for any input size.
  3. Pre-image resistant — given a hash, it should be computationally infeasible to find the original input.
  4. Collision resistant — it should be infeasible to find two different inputs that produce the same hash.

The avalanche effect is a key observable property: changing even one character in the input completely changes the output. The word "hello" and "Hello" produce SHA-256 digests that share no obvious similarity — this behaviour makes hashes reliable for detecting any modification to data.

MD5: fast, widely used, and broken

MD5 was designed by Ron Rivest in 1991 and produces a 128-bit (32 hex character) digest. It became the standard for file checksums and is still widely encountered in legacy systems.

The critical problem: MD5 is cryptographically broken. Researchers demonstrated practical collision attacks — two different files producing the same MD5 hash — as early as 2004. By 2008, attackers used MD5 collisions to forge a rogue SSL certificate, demonstrating real-world impact.

MD5 is still acceptable for:

  • Non-security checksums (verifying a downloaded file was not corrupted in transit, when you also have a trusted source)
  • Generating cache keys or hash maps where collision resistance is not a security requirement
  • Legacy system compatibility where you cannot choose the algorithm

MD5 must never be used for:

  • Password hashing
  • Digital signatures
  • Any context where an attacker could benefit from a collision

SHA-1: deprecated but lingering

SHA-1, standardised by NIST in 1995, produces a 160-bit (40 hex character) digest. It was the successor to MD5 and dominated web security for over a decade.

SHA-1 was officially deprecated by NIST in 2011 after theoretical attacks were demonstrated. In 2017, Google's Project Zero published SHAttered — the first practical SHA-1 collision, producing two different PDF files with identical SHA-1 hashes. All major browsers stopped accepting SHA-1 TLS certificates in 2017.

SHA-1 still appears in older Git commits (Git has since migrated toward SHA-256), legacy certificate chains, and some embedded systems. Like MD5, it should not be used for any new security application.

SHA-256: the current standard

SHA-256 is part of the SHA-2 family, standardised by NIST in 2001. It produces a 256-bit (64 hex character) digest and remains unbroken as of 2026.

The security margin is substantial. A brute-force attack against SHA-256 would require more operations than there are atoms in the observable universe. No practical collision has ever been demonstrated.

SHA-256 is used in:

  • TLS certificates and HTTPS connections
  • Bitcoin proof-of-work mining
  • Code signing and software verification
  • JWT (JSON Web Tokens) signature verification
  • Git object storage (migrating from SHA-1)
  • Most modern file integrity systems

SHA-512 offers a larger digest (512 bits) and can be faster than SHA-256 on 64-bit platforms, but SHA-256 is sufficient for virtually all applications.

Hashing passwords: why MD5 and SHA-256 are both wrong

This is the most critical practical distinction. General-purpose hash functions — including SHA-256 — should never be used for password hashing.

The problem is speed. SHA-256 is designed to be fast, which is fine for file verification but catastrophic for passwords. A modern GPU can compute billions of SHA-256 hashes per second, making brute-force attacks against a leaked database practical.

Password hashing requires algorithms specifically designed to be slow and memory-intensive:

Algorithm Recommended for Tunable cost
bcrypt Passwords Yes (cost factor)
Argon2id Passwords, secrets Yes (time, memory, parallelism)
scrypt Passwords, key derivation Yes (N, r, p)
PBKDF2-SHA256 Keys, passwords Yes (iteration count)

Argon2id is the current recommendation from OWASP and most security bodies for new systems. It won the Password Hashing Competition in 2015 and is resistant to both GPU and ASIC attacks due to its memory requirements.

The rule is simple: if you are storing user passwords, use a dedicated password hashing library — never a general hash function.