Why Dot Product Equals Cosine Similarity for Unit Vectors
Magnitude of Vector
Unit Vectors
Dot Product
Cosine Similarity
Unit Vectors
Helpful Resources

Working With Dot Product And Cosine Similarity For Unit Vectors

math
ml

2024-01-26

I was reviewing the OpenAI docs for text embeddings today and came across this section:

Which distance function should I use?

We recommend cosine similarity. The choice of distance function typically doesn't matter much.

OpenAI embeddings are normalized to length 1, which means that:

Cosine similarity can be computed slightly faster using just a dot product

Cosine similarity and Euclidean distance will result in the identical rankings

Hmm, it's been a while since I studied linear algebra, so I wanted to prove this out a bit.

Why Dot Product Equals Cosine Similarity for Unit Vectors

When two vectors are unit vectors, their dot product is the same as their cosine similarity. Here are some notes on why this is the case:

Magnitude of Vector

The magnitude (or norm) of a vector measures the "length" of the vector in Euclidean space. For vector $\mathbf{v} = (v_1, v_2, \dots, v_n)$ , the Euclidean norm is calculated as:

\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \dots + v_n^2}

Example Calculation

# vector v =[3, 4]
v = [3 ,4]

squares = [x ** 2 for x in v]
# [9, 16]

sum_of_squares = sum(squares)
# 25

magnitude = sum_of_squares ** 0.5
# 5

Unit Vectors

To transform text embeddings into unit vectors, OpenAI would've needed to:

Adjust the magnitude of a vector to be exactly 1,
Preserve the direction of the vector in the vector space.

To achieve this, they would've:

Calculated the magnitude of the vector as described above.
Divided each element of the vector by the magnitude:

\mathbf{u} = \left( \frac{v_1}{\|\mathbf{v}\|}, \frac{v_2}{\|\mathbf{v}\|}, \dots, \frac{v_n}{\|\mathbf{v}\|} \right)

The resulting vector $\mathbf{u}$ would then be a unit vector.

Example Calculation

# magnitude of ||v|| = 5 as calculated above
u = [x / magnitude for x in v]
# [0.6, 0.8]

sum([x ** 2 for x in u]) ** 0.5
# 1.0

We can also visually verify that by using this approach, we get a shorter vector that points in the same direction.

unit-vectory-visualization

Dot Product

The dot product of two vectors $\mathbf{a}$ and $\mathbf{b}$ is defined as:

\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos(\theta)

where $\theta$ is the angle between the vectors and $\mathbf{a}$ and $\mathbf{b}$ are the magnitudes of the vectors.

Cosine Similarity

Cosine similarity is specifically the cosine of the angle $\theta$ between two vectors, which is calculated as:

\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\| \|\mathbf{b}\|}

Unit Vectors

For unit vectors, the magnitude of each vector is 1. Therefore, the formulas simplify:

\|\mathbf{a}\| = 1, \quad \|\mathbf{b}\| = 1

This makes the dot product:

\mathbf{a} \cdot \mathbf{b} = 1 \cdot 1 \cdot \cos(\theta) = \cos(\theta)

And the cosine similarity formula simplifies to:

\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{1 \cdot 1} = \mathbf{a} \cdot \mathbf{b}

Thus, when the vectors are unit vectors, the dot product is exactly the cosine of the angle between them, which is the same as their cosine similarity.

Why Dot Product Equals Cosine Similarity for Unit Vectors

Magnitude of Vector

Unit Vectors

Dot Product

Cosine Similarity

Unit Vectors

Helpful Resources