All articles
· 7 min deep-divedata-structuresalgorithms
Article 1 in your session

String Manipulation — Patterns, Hashing, and Sliding Windows

A scroll-driven visual deep dive into string algorithms. From brute-force pattern matching to KMP, Rabin-Karp hashing, and sliding window techniques — with real-world applications in search engines, DNA analysis, and parsers.

Introduction 0%
Introduction
🎯 0/3 0%

🔤

Text is data.
Algorithms make it searchable.

Every Ctrl+F, every regex match, every DNA sequence alignment, every compiler lexer — they all rely on string algorithms. The difference between a naive O(n×m) search and KMP’s O(n+m) is the difference between “instant” and “hours” on genome-scale data. String manipulation is the most interview-tested and practically important category of algorithms.

↓ Scroll to master the algorithms that power text processing

Brute Force

Pattern Matching: Finding a Needle in a Haystack

Text (n characters):ABABACABABABCPattern (m=5):ABABCBrute Force: O(n × m)n-m+1 positions × m comparisonsKMP / Rabin-Karp: O(n + m)Never re-compare matched chars
Brute-force matching: slide the pattern across the text, comparing character by character
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

You're searching for a 100-character pattern in a 1-million-character text. How many character comparisons does brute force make in the WORST case?

KMP Insight

KMP: Never Go Backwards

KMP (Knuth-Morris-Pratt) — the key insight

1
Key idea: when a mismatch occurs, DON'T restart from scratch
If you've matched 'ABAB' and then mismatch, you already know the text contains 'AB' at the end — which is a prefix of the pattern!
2
Build a failure function π[i] = length of longest proper prefix of pattern[0..i] that's also a suffix
For pattern 'ABABC': π = [0, 0, 1, 2, 0]. After matching 'ABAB' (4 chars), a mismatch means jump to position π[3]=2 — the 'AB' prefix.
3
Text pointer NEVER goes backward — it only moves forward
Brute force resets the text pointer on mismatch. KMP only resets the pattern pointer using the failure function.
4
Preprocessing: O(m) to build the failure function
One-time cost to analyze the pattern's internal structure
5
Search: O(n) — each text character is examined at most twice
Total: O(n + m). For n=1M, m=100: this is 1,000,100 comparisons vs brute force's 100,000,000
Sliding Window

Sliding Window: The Universal String Technique

Find longest substring without repeating charactersabcabcbbleftrightwindow = “cabc” (len=4)Sliding Window Pattern1. Expand RIGHT → add character to window (use a hash set for O(1) lookup)2. If window invalid (duplicate found) → shrink LEFT until valid again3. Track maximum window size → answer is the longest valid window
Sliding window maintains a valid range [left, right] and expands/contracts to solve substring problems

Common sliding window problems

🔤 Longest Substring Without Repeats

Expand the window right, tracking characters in a hash set. When a duplicate enters, shrink from the left until it's removed. Each character enters and leaves at most once, giving O(n) total work.

O(n) with hash set
🎯 Minimum Window Substring

Find the smallest window containing all target characters. Expand right to include every required character, then shrink left to minimize the window while still covering all targets. Uses a frequency map to track coverage.

O(n) with frequency map
📊 Maximum Sum Subarray of Size k

Maintain a running sum of exactly k elements. As the window slides, add the new right element and subtract the departing left element — no need to recompute the full sum each time.

O(n) with running sum
🔄 Longest Repeating Character Replacement

Find the longest substring where you can replace at most k characters to make all characters the same. The window is valid when (window_size - max_freq_char) ≤ k replacements.

O(n) with frequency count
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You need to find the longest substring of 'abcabcbb' without repeating characters. What's the answer and the time complexity of the optimal solution?

Real-World Uses

String Algorithms in the Real World

🔍 Search Engines• Substring matching in documents• Regular expression compilation• Autocomplete (prefix matching)Google, VS Code, grep🧬 Bioinformatics• DNA sequence alignment (BWT)• Protein motif finding• BLAST (seeded alignment)Human genome: 3.2B characters🛡️ Security• Malware signature matching• SQL/XSS injection detection• Network packet inspectionSnort, WAFs, antivirus engines📝 Compilers & Editors• Lexical analysis (tokenization)• Syntax highlighting• Diff algorithms (edit distance)GCC, VS Code, Git diff
String algorithms power critical systems across computing and biology
↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

KMP's failure function for pattern 'ABABAC' is [0,0,1,2,3,0]. What does the value 3 at index 4 mean, and how does it prevent wasted comparisons?

🎓 What You Now Know

Brute-force pattern matching is O(n×m) — it re-compares characters needlessly on every mismatch.

KMP achieves O(n+m) by never going backwards — the failure function encodes the pattern’s internal prefix-suffix structure.

Sliding window solves substring problems in O(n) — expand right, shrink left, each element enters/leaves at most once.

String algorithms power search, biology, security, and compilers — from grep to genome alignment to SQL injection detection.

For prefix-based search and autocomplete, check out Tries — the data structure built specifically for string operations. ⚡

Keep Learning