Sorting Algorithms — From Bubble Sort to Radix Sort
A scroll-driven visual deep dive into sorting algorithms. Compare bubble, insertion, merge, quick, heap, counting, and radix sort — with complexity analysis, stability, and real-world applications in databases, graphics, and event systems.
📊
Unsorted to sorted.
The most studied problem in CS.
Sorting is everywhere: database queries (ORDER BY), search result ranking, rendering layers in games, scheduling events, and binary search preprocessing. The difference between O(n²) and O(n log n) is the difference between 1 second and 11 days on 1 billion elements. Understanding WHY quick sort is fast, WHEN merge sort is better, and HOW radix sort breaks the comparison barrier will make you a better engineer.
↓ Scroll through every major sorting algorithm, from simple to mind-bending
O(n²) Sorts: Simple but Slow
You have an array of 1000 elements that's already 95% sorted (only ~50 elements are out of place). Which O(n²) sort would be fastest?
💡 Which sort adapts to the input's existing order?
Insertion sort's best case is O(n) — when the array is already sorted, each element is compared once with its predecessor and never shifted. For 95% sorted data, only ~50 elements need to be shifted, and each shifts only a small distance. Total work ≈ n + 50 × avg_shift ≈ O(n). Bubble sort would also benefit but less efficiently (it still does multiple passes). Selection sort ALWAYS does n²/2 comparisons regardless of input — it can't exploit existing order.
Divide and Conquer: Merge Sort and Quick Sort
Why divide-and-conquer gives O(n log n)
T(n) = 2T(n/2) + O(n) — recurrence for merge sort log₂(n) levels of recursion — each level does O(n) total work Total: O(n) work × log₂(n) levels = O(n log n) Quick sort: O(n log n) average but O(n²) worst case Quick sort is O(n²) in the worst case but usually faster than merge sort in practice. Why?
💡 Think about memory access patterns and auxiliary space...
In practice, quick sort wins because: (1) In-place: no O(n) auxiliary array → half the memory, (2) Cache-friendly: partitioning scans array sequentially → CPU prefetch works, (3) Smaller constant: simpler inner loop (one comparison + conditional swap vs merge's comparison + copy). Merge sort's advantage: guaranteed O(n log n) and stability. This is why production sorts often hybrid: Python's Timsort is merge sort + insertion sort. Java uses dual-pivot quicksort for primitives, merge sort for objects (stability).
Heap Sort: The Best of Both Worlds?
Heap sort combines in-place with guaranteed O(n log n)
Step 1: Build a max-heap from the array → O(n) Step 2: Repeatedly extract max → swap root with last element → heapify root → O(n log n) Time: O(n log n) worst case guaranteed. Space: O(1) — truly in-place But: poor cache locality → slower in practice than quick sort Java's Arrays.sort uses dual-pivot quicksort for primitives but merge sort (Timsort variant) for objects. Why different algorithms for different data types?
💡 Think about what 'stability' means for objects with multiple fields vs raw numbers...
Stability: if you sort a list of students by grade, stable sort preserves the alphabetical order among students with equal grades. Unstable sort might scramble them. For Objects, users often sort by one field while expecting other fields to maintain their prior order — stability matters. For primitives like int[] or double[], two elements with the same value are indistinguishable — stability is meaningless. So Java uses the faster (but unstable) quicksort. This is a thoughtful engineering tradeoff in the Java standard library.
Breaking the O(n log n) Barrier
Sorting in Production Systems
You have 100GB of log entries that don't fit in RAM (8GB available). Each has a timestamp and you need them sorted by time. Which approach works?
💡 When data doesn't fit in memory, which resource becomes the bottleneck?
When data exceeds RAM, disk I/O dominates. External merge sort minimizes disk accesses: Phase 1 creates ⌈100/8⌉ = 13 sorted runs. Phase 2 does a 13-way merge. Total disk I/O: 2 passes × 100GB = 200GB of sequential reads/writes. Quicksort requires random access (terrible for disk). Heap sort's random access pattern is even worse. Radix sort needs multiple passes over full data. External merge sort is why MapReduce's shuffle phase uses merge sort, and why databases use it for ORDER BY on large tables.
🎓 What You Now Know
✓ O(n²) sorts (bubble, selection, insertion) are only useful for tiny arrays — but insertion sort is great for nearly-sorted data and is used inside hybrid sorts.
✓ Merge sort guarantees O(n log n) and is stable — but needs O(n) extra space. Quick sort is faster in practice but has O(n²) worst case.
✓ Heap sort is optimal in theory (O(n log n), O(1) space) — but poor cache locality makes it slower than quicksort in practice.
✓ Non-comparison sorts (counting, radix, bucket) break the O(n log n) barrier — when data has bounded range, they achieve O(n) time.
✓ Production sorts are hybrids — Timsort (merge + insertion), introsort (quick + heap), dual-pivot quicksort. They combine the best properties of multiple algorithms.
Sorting is a building block: binary search needs sorted data, merge sort powers external sorting, heaps power priority queues, and radix sort powers GPU rendering. ⚡
↗ Keep Learning
Hash Tables — O(1) Access That Powers Everything
A scroll-driven visual deep dive into hash tables. From hash functions to collision resolution (chaining vs open addressing) to load factors — understand the data structure behind caches, databases, and deduplication.
Trees — Hierarchies, Traversals, and Balanced Search
A scroll-driven visual deep dive into tree data structures. Binary trees, BSTs, AVL trees, heaps, and B-trees — with real-world applications in databases, file systems, priority queues, and decision engines.
Linked Lists — Pointers, Flexibility, and the Tradeoffs vs Arrays
A scroll-driven visual deep dive into linked lists. Singly, doubly, and circular variants. Understand pointer manipulation, O(1) insertion/deletion, and real-world uses in LRU caches, OS schedulers, and undo systems.
Comments
No comments yet. Be the first!