Introduction 0%

Introduction

🎯 0/3 0%

🗝️

Any key. Any value.
O(1) average time.

A hash table maps keys to values using a hash function that converts any key into an array index. Insert, lookup, and delete all run in O(1) average time. It’s the most-used data structure in software engineering — powering Python dicts, JavaScript objects, database indexes, caches, and more. The catch? Collisions, and understanding when O(1) becomes O(n).

↓ Scroll to understand the data structure you use 100 times a day

Hash Functions

From Key to Index in One Step

A hash function compresses any key into a fixed-size array index

Hash Math

What makes a good hash function

🔑 Key-to-Index Mapping

A hash function takes any key (string, number, object) and returns a fixed-size integer. The modulo operator then maps it to a valid array index from 0 to table_size - 1.

index = hash(key) % table_size

🎯 Deterministic

The same key must ALWAYS produce the same hash value. If hash('alice') returned different values on different calls, lookups would fail silently.

📊 Uniform Distribution

Hashes should spread evenly across all buckets. If many keys cluster into the same bucket, performance degrades from O(1) to O(n) — defeating the purpose of hashing.

🌊 Avalanche Effect

A tiny change in the key should produce a completely different hash. 'alice' and 'alicf' must map to unrelated indices — this prevents clustering and ensures uniform distribution.

↑ Answer the question above to continue ↑

You have a hash table of size 8. Keys 'cat', 'car', 'cap' all hash to index 3. What is this called, and what does it do to performance?

Chaining

Collision Resolution: Chaining vs Open Addressing

Two strategies to handle collisions: linked lists per bucket (chaining) vs probing for empty slots (open addressing)

↑ Answer the question above to continue ↑

Python's dict uses open addressing with a probe sequence. If you insert 1 million keys into a dict with 1.3 million slots, what is the expected number of probes per lookup?

Load Factor

Load Factor: The Performance Knob

Load factor and resizing

📊 Load Factor (α) → 🟢 Low Load (α < 0.5) → 🎯 Sweet Spot (α ≈ 0.75) → 🔴 Overloaded (α → 1.0) → 📐 Resize (2× Table)

📊 Load Factor (α)

The load factor measures how full the table is — the ratio of occupied slots to total slots. It's the single most important metric for hash table performance.

α = n / m (keys / table size)

🟢 Low Load (α < 0.5)

Most lookups find an empty slot immediately with just 1 probe. Fast access, but over half the table sits empty — wasting memory.

🎯 Sweet Spot (α ≈ 0.75)

The ideal balance of speed and memory — 2-3 probes on average. Java's HashMap resizes at α = 0.75, Python's dict at α ≈ 0.67.

🔴 Overloaded (α → 1.0)

Probe counts skyrocket as almost every slot is occupied. At α = 0.95, linear probing needs ~20 probes on average — performance collapses.

📐 Resize (2× Table)

When α exceeds the threshold, allocate a table twice the size and rehash ALL keys. Each resize is O(n), but amortized over n inserts it's O(1) per insert.

Real-World Uses

Hash Tables Are Everywhere

Hash tables power the most critical systems in computing

↑ Answer the question above to continue ↑

Git identifies every commit, tree, and blob using a SHA-1 hash. Why does Git use hashing instead of sequential IDs (like commit #1, #2, #3)?

🎓 What You Now Know

✓ Hash tables map keys to values in O(1) average time — using hash(key) % table_size to compute an array index.

✓ Collisions are inevitable — chaining uses linked lists per bucket; open addressing probes for empty slots. Both degrade to O(n) under pathological hashing.

✓ Load factor α determines performance — resize at α ≈ 0.75 to keep probe counts low. Resizing is O(n) but amortized O(1).

✓ Hash tables power databases, caches, dedup, and security — from Redis to Git to SSL certificates, hashing is the universal data access pattern.

When you need ordered keys, range queries, or worst-case O(log n), reach for a balanced BST or B-tree instead. ⚡

Hash Tables — O(1) Access That Powers Everything

Any key. Any value.
O(1) average time.

From Key to Index in One Step

What makes a good hash function

Collision Resolution: Chaining vs Open Addressing

Load Factor: The Performance Knob

Load factor and resizing

Hash Tables Are Everywhere

🎓 What You Now Know

Comments

↗ Keep Learning

String Manipulation — Patterns, Hashing, and Sliding Windows

Tries — Prefix Trees for Autocomplete, Spell Check, and IP Routing

Caching — The Art of Remembering What's Expensive to Compute

Linked Lists — Pointers, Flexibility, and the Tradeoffs vs Arrays

String Manipulation — Patterns, Hashing, and Sliding Windows

Any key. Any value. O(1) average time.

From Key to Index in One Step

What makes a good hash function

Collision Resolution: Chaining vs Open Addressing

Load Factor: The Performance Knob

Load factor and resizing

Hash Tables Are Everywhere

🎓 What You Now Know

Comments

↗ Keep Learning

String Manipulation — Patterns, Hashing, and Sliding Windows

Tries — Prefix Trees for Autocomplete, Spell Check, and IP Routing

Caching — The Art of Remembering What's Expensive to Compute

Linked Lists — Pointers, Flexibility, and the Tradeoffs vs Arrays

String Manipulation — Patterns, Hashing, and Sliding Windows

Any key. Any value.
O(1) average time.