String Manipulation — Patterns, Hashing, and Sliding Windows
A scroll-driven visual deep dive into string algorithms. From brute-force pattern matching to KMP, Rabin-Karp hashing, and sliding window techniques — with real-world applications in search engines, DNA analysis, and parsers.
🔤
Text is data.
Algorithms make it searchable.
Every Ctrl+F, every regex match, every DNA sequence alignment, every compiler lexer — they all rely on string algorithms. The difference between a naive O(n×m) search and KMP’s O(n+m) is the difference between “instant” and “hours” on genome-scale data. String manipulation is the most interview-tested and practically important category of algorithms.
↓ Scroll to master the algorithms that power text processing
Pattern Matching: Finding a Needle in a Haystack
You're searching for a 100-character pattern in a 1-million-character text. How many character comparisons does brute force make in the WORST case?
💡 Think about the worst case: every position requires checking all pattern characters...
Worst case for brute force: at each of the (n - m + 1) ≈ 1M positions, you compare m = 100 characters before discovering a mismatch at position 100. Total: ~100M comparisons. This happens with pathological inputs like text = 'AAAA...A' and pattern = 'AAAA...AB'. KMP reduces this to O(n + m) ≈ 1,000,100 comparisons by never re-comparing characters that already matched.
KMP: Never Go Backwards
KMP (Knuth-Morris-Pratt) — the key insight
Key idea: when a mismatch occurs, DON'T restart from scratch Build a failure function π[i] = length of longest proper prefix of pattern[0..i] that's also a suffix Text pointer NEVER goes backward — it only moves forward Preprocessing: O(m) to build the failure function Search: O(n) — each text character is examined at most twice Sliding Window: The Universal String Technique
Common sliding window problems
Expand the window right, tracking characters in a hash set. When a duplicate enters, shrink from the left until it's removed. Each character enters and leaves at most once, giving O(n) total work.
O(n) with hash set Find the smallest window containing all target characters. Expand right to include every required character, then shrink left to minimize the window while still covering all targets. Uses a frequency map to track coverage.
O(n) with frequency map Maintain a running sum of exactly k elements. As the window slides, add the new right element and subtract the departing left element — no need to recompute the full sum each time.
O(n) with running sum Find the longest substring where you can replace at most k characters to make all characters the same. The window is valid when (window_size - max_freq_char) ≤ k replacements.
O(n) with frequency count You need to find the longest substring of 'abcabcbb' without repeating characters. What's the answer and the time complexity of the optimal solution?
💡 How many times can each character enter and leave the window?
The sliding window approach: start with left=0, right=0. Expand right: add 'a', 'b', 'c' → window='abc' (len=3). Add 'a' → duplicate! Shrink left: remove 'a' → window='bca' (len=3). Continue: the maximum length window with unique characters is 3 ('abc', 'bca', or 'cab'). The key insight: each character enters the hash set once (when right moves) and leaves once (when left moves) → total operations = 2n = O(n).
String Algorithms in the Real World
KMP's failure function for pattern 'ABABAC' is [0,0,1,2,3,0]. What does the value 3 at index 4 mean, and how does it prevent wasted comparisons?
💡 Look at 'ABABA' — what string appears at both the beginning and end?
This is KMP's core insight. When we've matched 'ABABA' (pattern[0..4]) and mismatch at pattern[5], the text contains '...ABABA'. The failure function says: prefix 'ABA' (length 3) equals suffix 'ABA' of what we've matched. So we skip re-examining those 3 characters — align pattern position 3 with the current text position and continue. This is why KMP achieves O(n+m): each text character is examined at most twice.
🎓 What You Now Know
✓ Brute-force pattern matching is O(n×m) — it re-compares characters needlessly on every mismatch.
✓ KMP achieves O(n+m) by never going backwards — the failure function encodes the pattern’s internal prefix-suffix structure.
✓ Sliding window solves substring problems in O(n) — expand right, shrink left, each element enters/leaves at most once.
✓ String algorithms power search, biology, security, and compilers — from grep to genome alignment to SQL injection detection.
For prefix-based search and autocomplete, check out Tries — the data structure built specifically for string operations. ⚡
↗ Keep Learning
Hash Tables — O(1) Access That Powers Everything
A scroll-driven visual deep dive into hash tables. From hash functions to collision resolution (chaining vs open addressing) to load factors — understand the data structure behind caches, databases, and deduplication.
Tries — Prefix Trees for Autocomplete, Spell Check, and IP Routing
A scroll-driven visual deep dive into trie data structures. From basic prefix trees to compressed tries and radix trees — with real-world applications in autocomplete, spell checking, IP routing tables, and T9 keyboard input.
Sorting Algorithms — From Bubble Sort to Radix Sort
A scroll-driven visual deep dive into sorting algorithms. Compare bubble, insertion, merge, quick, heap, counting, and radix sort — with complexity analysis, stability, and real-world applications in databases, graphics, and event systems.
Comments
No comments yet. Be the first!