Introduction 0%

Introduction

🎯 0/4 0%

👑

king − man + woman
= queen

In 2013, Tomas Mikolov at Google showed that words could be represented as vectors where meaning becomes math. This single idea — word embeddings — sparked the entire modern NLP revolution, from BERT to GPT.

↓ Scroll to understand how words learned geometry

Sparse vs Dense

The Problem: Words Without Meaning

One-hot vectors are sparse and meaningless; embeddings are dense and meaningful

↑ Answer the question above to continue ↑

In one-hot encoding, the cosine similarity between 'happy' and 'joyful' is:

Word2Vec

Word2Vec: Learning Meaning from Context

Two architectures for learning word embeddings: CBOW predicts the center word, Skip-gram predicts context words

Word2Vec Skip-Gram objective

maximize Σ log P(context | center)

For each word in the corpus, predict its surrounding words

P(w_context | w_center) = softmax(v_context · v_center)

Dot product of embedding vectors → softmax → probability

Problem: softmax over 50,000 words is slow!

Computing the denominator requires summing over the entire vocabulary

Fix: negative sampling — only update ~5 random 'wrong' words

Instead of computing over all 50K words, just distinguish the real context from 5 random noise words

↑ Answer the question above to continue ↑

Word2Vec learns that 'dog' and 'cat' have similar vectors. How does it learn this without any explicit labels?

Word Arithmetic

The Aha Moment: Word Arithmetic

Vector arithmetic captures conceptual relationships — the most famous result in NLP

↑ Answer the question above to continue ↑

Word2Vec learns vec('Tokyo') - vec('Japan') + vec('France') ≈ vec('Paris'). What geometric relationship does this reveal?

GloVe & FastText

Beyond Word2Vec: GloVe and FastText

Three embedding methods, three design philosophies

GloVe's elegant objective

X_ij = co-occurrence count of word i with word j

Build a global matrix counting how often words appear near each other

minimize Σ f(X_ij) × (v_i · v_j + b_i + b_j − log X_ij)²

Find vectors whose dot product = log of co-occurrence count

f(X_ij) = weighting function

Down-weights extremely frequent pairs like ('the', 'the')

↑ Answer the question above to continue ↑

A user types 'reccommendation systms' (two misspellings) into a search engine. Which embedding approach handles this best?

Applications

Real-World Applications

Word embeddings power features across search and email systems

🎓 What You Now Know

✓ One-hot encoding is meaningless — every word is equidistant from every other word. No notion of similarity.

✓ Word2Vec learns from context — “you shall know a word by the company it keeps.” Words in similar contexts get similar vectors.

✓ Embeddings encode relationships as directions — king-queen, France-Paris, big-bigger all live in consistent geometric directions.

✓ GloVe uses global statistics; FastText handles unknown words — different strengths for different use cases.

✓ Embeddings have bias — they learn human stereotypes from training text. Debiasing is critical for fair systems.

✓ Static embeddings led to contextual embeddings — BERT and GPT produce context-dependent vectors, solving the polysemy problem.

Word embeddings were the bridge from classical NLP to the deep learning era. They proved that meaning could be captured as geometry — an insight that powers every modern language model. 👑

Word Embeddings — When Words Learned to Be Vectors

king − man + woman
= queen

The Problem: Words Without Meaning

Word2Vec: Learning Meaning from Context

Word2Vec Skip-Gram objective

The Aha Moment: Word Arithmetic

Beyond Word2Vec: GloVe and FastText

GloVe's elegant objective

Real-World Applications

🎓 What You Now Know

Comments

↗ Keep Learning

Bag of Words & TF-IDF — How Search Engines Ranked Before AI

Vector Databases — Search by Meaning, Not Keywords

Transformers — The Architecture That Changed AI

Text Similarity — From Jaccard to Neural Matching

Bag of Words & TF-IDF — How Search Engines Ranked Before AI

king − man + woman = queen

The Problem: Words Without Meaning

Word2Vec: Learning Meaning from Context

Word2Vec Skip-Gram objective

The Aha Moment: Word Arithmetic

Beyond Word2Vec: GloVe and FastText

GloVe's elegant objective

Real-World Applications

🎓 What You Now Know

Comments

↗ Keep Learning

Bag of Words & TF-IDF — How Search Engines Ranked Before AI

Vector Databases — Search by Meaning, Not Keywords

Transformers — The Architecture That Changed AI

Text Similarity — From Jaccard to Neural Matching

Bag of Words & TF-IDF — How Search Engines Ranked Before AI

king − man + woman
= queen