Home

All Deep Dives

53 interactive visual explainers. Filter by category or search for a topic.

Showing 53 articles

Sort:
Interactive
11 min

Confusion Matrix Deep Dive — What Your Model Gets Wrong and Why

A scroll-driven deep dive into the confusion matrix. Master TP, TN, FP, FN, and learn to derive every classification metric from a single 2×2 table.

machine-learningmetrics
Interactive
17 min

Language Models & N-grams — Predicting the Next Word Since 1948

A scroll-driven visual deep dive into statistical language models. From Shannon's information theory to N-grams to the bridge to transformers — learn how autocomplete, spell check, and speech recognition all predict what comes next.

NLPlanguage-modelsfoundations
Interactive
12 min

ROC Curves & AUC — Measuring Classifier Performance Visually

A scroll-driven visual deep dive into ROC curves and AUC. Learn TPR vs FPR, why AUC is threshold-independent, and when to use ROC vs PR curves.

machine-learningmetrics
Interactive
16 min

Topic Modeling — Discovering Hidden Themes in Millions of Documents

A scroll-driven visual deep dive into topic modeling. From LDA to NMF to neural topic models — learn how search engines and email services automatically categorize, cluster, and label unstructured text at scale.

NLPunsupervisedtopic-modeling
Interactive
13 min

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

A scroll-driven visual deep dive into classification metrics. Learn why accuracy misleads, what precision and recall actually measure, and when to use F1, F2, or something else entirely.

machine-learningmetrics
Interactive
16 min

Named Entity Recognition — Teaching Machines to Find Names in Text

A scroll-driven visual deep dive into Named Entity Recognition (NER). From rule-based to CRF to transformer-based approaches — learn how search engines and email services extract people, places, companies, and dates from unstructured text.

NLPsequence-labelingNER
Interactive
14 min

Feature Engineering — The Art That Makes or Breaks Your Model

A scroll-driven visual deep dive into feature engineering. Learn transformations, encoding, interaction features, handling missing data, and why feature engineering matters more than model choice.

machine-learningfundamentals
Interactive
13 min

Text Similarity — From Jaccard to Neural Matching

A scroll-driven visual deep dive into text similarity. Learn how search engines detect duplicates, match queries to documents, and measure how 'close' two texts really are — from set overlap to cosine similarity to learned embeddings.

NLPinformation-retrievalsimilarity
Interactive
14 min

Cross-Validation & Hyperparameter Tuning — How to Actually Evaluate Models

A scroll-driven visual deep dive into cross-validation and hyperparameter tuning. Learn K-fold CV, stratified splitting, grid search, random search, and Bayesian optimization.

machine-learningfundamentals
Interactive
11 min

Query Understanding — What Did the User Actually Mean?

A scroll-driven visual deep dive into query understanding. From spell correction to query expansion to intent classification — learn how search engines interpret ambiguous, misspelled, and complex queries.

NLPsearchquery-understanding
Interactive
13 min

Bias-Variance Tradeoff — The Most Important Concept in ML

A scroll-driven visual deep dive into the bias-variance tradeoff. Learn why every model makes errors, how underfitting and overfitting emerge, and how to balance them.

machine-learningfundamentals
Interactive
18 min

Information Retrieval — How Search Engines Find Your Needle in a Billion Haystacks

A scroll-driven visual deep dive into information retrieval. From inverted indices to BM25 to learning-to-rank — learn how Google, Bing, and enterprise search find the most relevant documents in milliseconds.

NLPinformation-retrievalsearch
Interactive
8 min

Search Reranking — The Two-Stage Pipeline That Powers Production Search

A visual deep dive into the retrieve + rerank pipeline. How BM25, dense retrieval, and learned sparse retrieval feed into Reciprocal Rank Fusion, then cross-encoder reranking — with a full latency budget breakdown.

searchinformation-retrievalsystems
Interactive
12 min

Bagging vs Boosting — The Two Philosophies of Ensemble Learning

A scroll-driven visual deep dive comparing bagging and boosting. Learn when to average independent models vs sequentially correct errors, and why ensembles dominate ML.

machine-learningensemble
Interactive
15 min

Sentiment Analysis — Reading Between the Lines at Scale

A scroll-driven visual deep dive into sentiment analysis. Learn how machines detect opinion, sarcasm, and emotion in text — from star ratings to brand monitoring to Gmail's tone detection.

NLPclassificationsentiment
Interactive
8 min

Cross-Encoders vs Bi-Encoders — Why Accuracy Costs 1000× More Compute

A visual deep dive into the architectural difference between bi-encoders and cross-encoders. Why cross-attention produces higher-quality relevance scores, and why cross-encoders can only be used for reranking — never for retrieval.

searchNLPtransformers
Interactive
8 min

DBSCAN — Finding Clusters of Any Shape

A scroll-driven visual deep dive into DBSCAN. Learn density-based clustering, core/border/noise points, choosing epsilon and minPts, and when DBSCAN beats K-Means.

machine-learningunsupervised
Interactive
12 min

Spam Detection — The Original ML Success Story

A scroll-driven visual deep dive into spam detection. From Bayesian filters to modern adversarial ML — learn how email services block 15 billion spam messages daily and why spammers keep finding ways around it.

NLPclassificationsecurity
Interactive
7 min

BM25 — The 30-Year-Old Algorithm That Still Wins at Search

A visual deep dive into BM25 scoring. Understand every term in the formula — IDF, TF saturation via k₁, length normalization via b — and why BM25 still outperforms neural retrievers on specialized vocabularies.

searchinformation-retrievalNLP
Interactive
9 min

PCA — Compressing Reality Without Losing the Plot

A scroll-driven visual deep dive into Principal Component Analysis. Learn eigenvectors, variance maximization, dimensionality reduction, and when PCA transforms your data — and when it doesn't.

machine-learningunsupervisedlinear-algebra
Interactive
16 min

Text Classification — Teaching Machines to Sort Your Inbox

A scroll-driven visual deep dive into text classification. From spam filters to Gmail's categories — learn how ML models read text, extract features, and assign labels at scale.

NLPclassification
Interactive
9 min

HNSW — Hierarchical Navigable Small World Graphs for Vector Search

A visual deep dive into HNSW, the dominant graph-based index for vector search. Understand the multi-layer graph structure, the M and ef parameters, complexity analysis, and how it compares to IVF, LSH, and brute force.

searchinformation-retrievalsystems
Interactive
12 min

K-Means Clustering — Grouping Data Without Labels

A scroll-driven visual deep dive into K-Means clustering. Learn the iterative algorithm, choosing K with the elbow method, limitations, and when to use alternatives.

machine-learningunsupervised
Interactive
16 min

Word Embeddings — When Words Learned to Be Vectors

A scroll-driven visual deep dive into word embeddings. Learn how Word2Vec, GloVe, and FastText turn words into dense vectors where meaning becomes geometry — and why 'king - man + woman = queen' actually works.

NLPrepresentation-learning
Interactive
7 min

IVF Index — Partitioning Vector Space with K-Means for Fast Search

A visual deep dive into the Inverted File Index (IVF). How k-means clustering partitions billion-scale vector collections into searchable regions, the nprobe recall-speed knob, and IVF complexity analysis.

searchinformation-retrievalsystems
Interactive
10 min

Gradient Boosting & XGBoost — The Kaggle King

A scroll-driven visual deep dive into gradient boosting. Learn how weak learners combine sequentially, how XGBoost optimizes the process, and why it dominates tabular ML competitions.

machine-learningensemble
Interactive
15 min

Bag of Words & TF-IDF — How Search Engines Ranked Before AI

A scroll-driven visual deep dive into Bag of Words and TF-IDF. Learn how documents become vectors, why term frequency alone fails, and how IDF rescues relevance — the backbone of search before neural models.

NLPinformation-retrieval
Interactive
6 min

Approximate Nearest Neighbor Search — Trading 1% Accuracy for 1000× Speed

A visual deep dive into ANN search. Why brute-force nearest neighbor fails at scale, how approximate methods achieve 99% recall with logarithmic query time, and the fundamental accuracy-speed tradeoff behind every vector search system.

searchinformation-retrievalsystems
Interactive
10 min

Hash Tables — O(1) Access That Powers Everything

A scroll-driven visual deep dive into hash tables. From hash functions to collision resolution (chaining vs open addressing) to load factors — understand the data structure behind caches, databases, and deduplication.

data-structuresalgorithms
Interactive
9 min

Linked Lists — Pointers, Flexibility, and the Tradeoffs vs Arrays

A scroll-driven visual deep dive into linked lists. Singly, doubly, and circular variants. Understand pointer manipulation, O(1) insertion/deletion, and real-world uses in LRU caches, OS schedulers, and undo systems.

data-structuresalgorithms
Interactive
12 min

Sorting Algorithms — From Bubble Sort to Radix Sort

A scroll-driven visual deep dive into sorting algorithms. Compare bubble, insertion, merge, quick, heap, counting, and radix sort — with complexity analysis, stability, and real-world applications in databases, graphics, and event systems.

data-structuresalgorithms
Interactive
7 min

String Manipulation — Patterns, Hashing, and Sliding Windows

A scroll-driven visual deep dive into string algorithms. From brute-force pattern matching to KMP, Rabin-Karp hashing, and sliding window techniques — with real-world applications in search engines, DNA analysis, and parsers.

data-structuresalgorithms
Interactive
12 min

Trees — Hierarchies, Traversals, and Balanced Search

A scroll-driven visual deep dive into tree data structures. Binary trees, BSTs, AVL trees, heaps, and B-trees — with real-world applications in databases, file systems, priority queues, and decision engines.

data-structuresalgorithms
Interactive
9 min

Tries — Prefix Trees for Autocomplete, Spell Check, and IP Routing

A scroll-driven visual deep dive into trie data structures. From basic prefix trees to compressed tries and radix trees — with real-world applications in autocomplete, spell checking, IP routing tables, and T9 keyboard input.

data-structuresalgorithms
Interactive
14 min

Random Forests — Why 1000 Bad Models Beat 1 Good One

A scroll-driven visual deep dive into Random Forests. Learn bagging, feature randomness, out-of-bag error, and why ensembles are the most reliable ML technique.

machine-learningensemble
Interactive
14 min

Text Preprocessing — Turning Messy Words into Clean Features

A scroll-driven visual deep dive into text preprocessing. Learn tokenization, stemming, lemmatization, stopword removal, and normalization — the essential first step of every NLP pipeline.

NLPtext-processing
Interactive
13 min

Decision Trees — How Machines Learn to Ask Questions

A scroll-driven visual deep dive into decision trees. Learn how trees split data, what Gini impurity and information gain mean, and why trees overfit like crazy.

machine-learningclassification
Interactive
13 min

MSE, MAE, R-Squared and Beyond — Regression Metrics That Actually Matter

A scroll-driven deep dive into regression metrics. Understand MSE, RMSE, MAE, MAPE, R-squared, and Adjusted R-squared — when to use each, their gotchas, and how to report results properly.

machine-learningmetrics
Interactive
15 min

Support Vector Machines — Finding the Perfect Boundary

A scroll-driven visual deep dive into SVMs. Learn about maximum margin, the kernel trick, support vectors, and why SVMs dominated ML before deep learning.

machine-learningclassification
Interactive
13 min

Naive Bayes — Why 'Stupid' Assumptions Work Brilliantly

A scroll-driven visual deep dive into Naive Bayes. Learn Bayes' theorem, why the 'naive' independence assumption is wrong but works anyway, and why it dominates spam filtering.

machine-learningclassification
Interactive
12 min

K-Nearest Neighbors — The Algorithm with No Training Step

A scroll-driven visual deep dive into KNN. Learn how the laziest algorithm in ML works, why distance metrics matter, and how the curse of dimensionality kills it.

machine-learningclassification
Interactive
14 min

Logistic Regression — The Classifier That's Not Really Regression

A scroll-driven visual deep dive into logistic regression. Learn how a regression model becomes a classifier, why the sigmoid is the key, and how log-loss trains it.

machine-learningclassification
Interactive
14 min

Ridge & Lasso — Taming Overfitting with Regularization

A scroll-driven visual deep dive into Ridge and Lasso regression. Learn why models overfit, how penalizing large weights fixes it, and why Lasso kills features.

machine-learningregularization
Interactive
12 min

Polynomial Regression — When Lines Aren't Enough

A scroll-driven visual deep dive into polynomial regression. See why straight lines fail, how curves capture nonlinear patterns, and when you're overfitting vs underfitting.

machine-learningregression
Interactive
15 min

Flash Attention — Making Transformers Actually Fast

A scroll-driven visual deep dive into Flash Attention. Learn why standard attention is broken, how GPU memory works, and how tiling fixes everything — with quizzes to test your understanding.

transformersgpuflash-attention
Interactive
12 min

Speculative Decoding — Making LLMs Think Ahead

A scroll-driven visual deep dive into speculative decoding. Learn why LLM inference is slow, how a small 'draft' model can speed up a big model by 2-3x, and why the output is mathematically identical.

transformersinference
Interactive

Hello World — Why I'm Building This Site

A quick intro to who I am, why I decided to build this portfolio, and what you can expect to find here.

personalcareer
Interactive
18 min

Transformers — The Architecture That Changed AI

A scroll-driven visual deep dive into the Transformer architecture. From RNNs to self-attention to GPT — understand the engine behind every modern AI model.

transformersattention
Interactive
12 min

Linear Regression — The Foundation of Machine Learning

A scroll-driven visual deep dive into linear regression. From data points to loss functions to gradient descent — understand the building block behind all of ML.

machine-learningmath
Interactive
14 min

RLHF — How AI Learns to Follow Human Instructions

A visual deep dive into Reinforcement Learning from Human Feedback. From pretraining to reward models to PPO — understand how ChatGPT went from autocomplete to assistant.

AIalignment
Interactive
15 min

Caching — The Art of Remembering What's Expensive to Compute

A visual deep dive into caching. From CPU caches to CDNs — understand cache strategies, eviction policies, and the hardest problem in computer science: cache invalidation.

systemsperformance
Interactive
16 min

Database Sharding — Scaling Beyond One Machine

A visual deep dive into database sharding. From single-server bottlenecks to consistent hashing — understand how companies scale their databases to billions of rows.

systemsdatabases
Interactive
17 min

Vector Databases — Search by Meaning, Not Keywords

A visual deep dive into vector databases. From embeddings to ANN search to HNSW — understand how AI-powered search finds what you actually mean, not just what you typed.

systemsAI