Database Sharding — Scaling Beyond One Machine

Introduction 0%

Introduction

🎯 0/5 0%

🪓

One database can’t hold
the entire internet.

Instagram has 2+ billion users. Twitter handles 500 million tweets per day.
No single database server can hold all that data or handle all those queries.
Sharding splits your data across multiple machines — and it’s how every large-scale system works.

↓ Scroll to learn how databases scale horizontally

The Problem

The Single-Server Bottleneck

Every database starts on one server. As your app grows, you hit limits:

One server, too many users — something has to give

Scale Up vs Out

Two Ways to Scale

Vertical scaling has a ceiling. Horizontal scaling (sharding) is theoretically unlimited.

↑ Answer the question above to continue ↑

Your startup has 100K users and your PostgreSQL server is at 40% CPU. Should you shard your database?

Shard Keys

Choosing the Right Shard Key

The shard key determines which machine stores each row. It’s the most important decision in sharding — and it’s very hard to change later.

A good shard key distributes data evenly. A bad one creates 'hot spots'.

↑ Answer the question above to continue ↑

You're building a social media app. Most queries are 'get posts by user X'. Which shard key is best?

Strategies

Sharding Strategies: Range vs Hash

Two ways to assign keys to shards — each with different trade-offs

↑ Answer the question above to continue ↑

You shard using hash(user_id) % 4 across 4 shards. Now you want to add a 5th shard. What happens?

Rebalancing

Consistent Hashing: The Solution to Rebalancing

Regular hash % N is catastrophic when N changes. Consistent hashing solves this by placing keys and servers on a virtual ring.

Consistent hashing: only K/N keys move when adding a server (K=total keys, N=servers)

Challenges

The Hard Parts of Sharding

Sharding solves the scale problem but creates new ones:

Four major challenges that every sharded system must solve

↑ Answer the question above to continue ↑

Why are JOINs problematic in a sharded database?

↑ Answer the question above to continue ↑

With consistent hashing and 100 servers, you add 1 new server. Approximately how much data needs to move?

🎓 What You Now Know

✓ Scale vertically first, shard only when necessary — Sharding adds massive complexity. Exhaust simpler options first.

✓ Shard key choice is the most critical decision — High cardinality, even distribution, query-aligned, immutable.

✓ Hash sharding beats range sharding for distribution — But you lose range queries. Pick based on your access patterns.

✓ Consistent hashing enables painless scaling — Adding a server moves only ~1/N of keys, not everything.

✓ Cross-shard operations are the enemy — Design your shard key to keep related data together.

Sharding is inevitable at scale. Every system that handles billions of records — Instagram, Discord, Uber — uses it. The concepts here are the foundation of every system design interview and every real-world scalable architecture. 🚀

📄 Consistent Hashing and Random Trees (Karger et al., 1997)

📄 Dynamo: Amazon’s Highly Available Key-value Store (DeCandia et al., 2007)

Database Sharding — Scaling Beyond One Machine

One database can’t hold
the entire internet.

The Single-Server Bottleneck

Two Ways to Scale

Choosing the Right Shard Key

Sharding Strategies: Range vs Hash

Consistent Hashing: The Solution to Rebalancing

The Hard Parts of Sharding

🎓 What You Now Know

Comments

↗ Keep Learning

Caching — The Art of Remembering What's Expensive to Compute

Vector Databases — Search by Meaning, Not Keywords

Caching — The Art of Remembering What's Expensive to Compute

One database can’t hold the entire internet.

The Single-Server Bottleneck

Two Ways to Scale

Choosing the Right Shard Key

Sharding Strategies: Range vs Hash

Consistent Hashing: The Solution to Rebalancing

The Hard Parts of Sharding

🎓 What You Now Know

Comments

↗ Keep Learning

Caching — The Art of Remembering What's Expensive to Compute

Vector Databases — Search by Meaning, Not Keywords

Caching — The Art of Remembering What's Expensive to Compute

One database can’t hold
the entire internet.