Cosine similarity: judging things by the direction they point
You've turned two things into vectors — two arrows pointing out from the origin into space. Now the obvious question: how close are they? You might reach for a ruler and measure the straight-line gap between their tips. Cosine similarity makes a different, often smarter choice: ignore how long the arrows are, and only ask whether they point the same way.
Why direction, not distance?
Picture two documents about dogs. One is a tweet; the other is a 50-page essay. The essay says "dog" 200 times and the tweet says it once, so as raw vectors the essay is a huge arrow and the tweet a tiny one. By straight-line distance they look far apart — yet they're about exactly the same thing. Their arrows point in nearly the same direction; one is simply longer because the document is bigger.
Cosine similarity measures the angle between two vectors and discards their length. Same direction, tiny angle, high similarity — whatever the sizes.
The number
It's reported as the cosine of that angle, which lands neatly between −1 and 1:
- 1 — pointing the exact same way (as similar as it gets)
- 0 — at right angles, unrelated
- −1 — pointing in opposite directions
So "how similar are these?" collapses to a single, bounded score you can sort by. Most similar just means largest cosine.
How it's computed
You don't need the geometry to use it, but the recipe is short: multiply the two vectors component by component and add it all up (the dot product), then divide by each vector's length to cancel out size.
cosine(A, B) = (A · B) / (|A| × |B|)
That division is the whole trick — it's what turns "distance" into "direction."
Why it matters
This is the workhorse behind embeddings. When a search engine finds the passage "nearest" your question, near almost always means highest cosine similarity. It's the scoring function under semantic search, recommendations ("more like this"), clustering, and the retrieval step in RAG. Embeddings give you the map; cosine similarity is how you measure distances on it.
The catch
Cosine deliberately forgets magnitude — usually a feature, occasionally a bug. If how much of something matters (intensity, a count, a confidence), throwing away length throws away real signal. And like any similarity measure, it inherits whatever the embedding believes: if two ideas are wrongly placed near each other, cosine will faithfully report them as similar. It measures the map, not the territory.
The one line to keep: cosine similarity scores how much two things point the same way, not how big they are.