All articles
· 16 min deep-divesystemsdatabases
Article 1 in your session

Database Sharding — Scaling Beyond One Machine

A visual deep dive into database sharding. From single-server bottlenecks to consistent hashing — understand how companies scale their databases to billions of rows.

Introduction 0%
Introduction
🎯 0/5 0%

🪓

One database can’t hold
the entire internet.

Instagram has 2+ billion users. Twitter handles 500 million tweets per day.
No single database server can hold all that data or handle all those queries.
Sharding splits your data across multiple machines — and it’s how every large-scale system works.

↓ Scroll to learn how databases scale horizontally

The Problem

The Single-Server Bottleneck

Every database starts on one server. As your app grows, you hit limits:

10,000 req/sec👤👤👤👤🔥 Single DBCPU: 98% | Disk: FullLatency: 2000ms 📈Bottlenecks:• Storage: 16TB disk limit reached• Throughput: One CPU can’t serve 10K QPS• Availability: One server = single point of failure
One server, too many users — something has to give
Scale Up vs Out

Two Ways to Scale

Vertical Scaling”Buy a bigger machine”8GB256GB✓ Simple — no code changes✗ Expensive — costs grow exponentially✗ Ceiling — hardware has limits✗ Still a single point of failureHorizontal Scaling”Add more machines”Shard 1Shard 2Shard 3Shard 4✓ Scales linearly — add machines✓ No single point of failure✓ Cost-effective commodity hardware✗ Complex — cross-shard queries✗ Shard key choice is critical
Vertical scaling has a ceiling. Horizontal scaling (sharding) is theoretically unlimited.
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

Your startup has 100K users and your PostgreSQL server is at 40% CPU. Should you shard your database?

Shard Keys

Choosing the Right Shard Key

The shard key determines which machine stores each row. It’s the most important decision in sharding — and it’s very hard to change later.

❌ Bad: Shard by CountryUS 🔥60% of dataJapan5%Brazil3%☠️ US shard is overwhelmed while others idle✅ Good: Hash(user_id)Shard 125%Shard 225%Shard 325%✓ Even distribution across all shardsGood Shard Key Properties:1. High cardinality — many unique values (user_id ✓, country ✗)2. Even distribution — no “celebrity” keys that get most traffic3. Query-aligned — most queries include the shard key (avoids scatter-gather)4. Immutable — the shard key should never change after insertion
A good shard key distributes data evenly. A bad one creates 'hot spots'.
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You're building a social media app. Most queries are 'get posts by user X'. Which shard key is best?

Strategies

Sharding Strategies: Range vs Hash

Range-Based ShardingShard 1A — FAdams..FoxShard 2G — NGarcia..NgShard 3O — ZObama..Zhu✓ Range queries (“get all users A-C”) hit one shard✗ Hot spots: names starting with S are 4x more commonHash-Based ShardingShard 1hash % 3 = 0Fox, AdamsShard 2hash % 3 = 1Zhu, SmithShard 3hash % 3 = 2Garcia, Ng✓ Even distribution — hash function randomizes placement✗ No range queries — “users A-C” hits ALL shardsHash Sharding Formula:shard = hash(key) % num_shards⚠️ The problem with hash % NWhen you add a shard (N changes), MOST keys move to different shards!hash(key) % 3 ≠ hash(key) % 4 → massive data migration. Solution: consistent hashing.
Two ways to assign keys to shards — each with different trade-offs
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You shard using hash(user_id) % 4 across 4 shards. Now you want to add a 5th shard. What happens?

Rebalancing

Consistent Hashing: The Solution to Rebalancing

Regular hash % N is catastrophic when N changes. Consistent hashing solves this by placing keys and servers on a virtual ring.

S1S2S3k1 → S2k2 → S3k3 → S1k4 → S3Rule: Each key goes to the NEXT server clockwise on the ringAdding S4 only moves keys between old S3 and new S4 — everyone else is unaffected!
Consistent hashing: only K/N keys move when adding a server (K=total keys, N=servers)
Challenges

The Hard Parts of Sharding

Sharding solves the scale problem but creates new ones:

1. Cross-Shard JoinsSELECT * FROM users JOIN ordersIf users & orders are on differentshards, the DB can’t do a local join.Solution: Co-locate related data on same shard2. Hot PartitionsCelebrity posts: 1 user generates10% of all traffic → their shardbecomes the bottleneck.Solution: Shard-level caching + sub-sharding3. Distributed TransactionsTransfer $100: debit shard A, creditshard B. What if B fails after A?Money vanishes into the void.Solution: 2-Phase Commit or Saga pattern4. ReshardingNeed to go from 4 → 8 shards.Must move billions of rowswhile the system stays online.Solution: Consistent hashing + dual writes
Four major challenges that every sharded system must solve
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

Why are JOINs problematic in a sharded database?

↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

With consistent hashing and 100 servers, you add 1 new server. Approximately how much data needs to move?

🎓 What You Now Know

Scale vertically first, shard only when necessary — Sharding adds massive complexity. Exhaust simpler options first.

Shard key choice is the most critical decision — High cardinality, even distribution, query-aligned, immutable.

Hash sharding beats range sharding for distribution — But you lose range queries. Pick based on your access patterns.

Consistent hashing enables painless scaling — Adding a server moves only ~1/N of keys, not everything.

Cross-shard operations are the enemy — Design your shard key to keep related data together.

Sharding is inevitable at scale. Every system that handles billions of records — Instagram, Discord, Uber — uses it. The concepts here are the foundation of every system design interview and every real-world scalable architecture. 🚀

📄 Consistent Hashing and Random Trees (Karger et al., 1997)


📄 Dynamo: Amazon’s Highly Available Key-value Store (DeCandia et al., 2007)

Keep Learning