- Working With Dot Product And Cosine Similarity For Unit Vectors
- Why Dot Product Equals Cosine Similarity for Unit Vectors
- Magnitude of Vector
- Unit Vectors
- Dot Product
- Cosine Similarity
- Unit Vectors
- Helpful Resources
Working With Dot Product And Cosine Similarity For Unit Vectors
I was reviewing the OpenAI docs for text embeddings today and came across this section:
Which distance function should I use?
We recommend cosine similarity. The choice of distance function typically doesn't matter much.
OpenAI embeddings are normalized to length 1, which means that:
- Cosine similarity can be computed slightly faster using just a dot product
- Cosine similarity and Euclidean distance will result in the identical rankings
Hmm, it's been a while since I studied linear algebra, so I wanted to prove this out a bit.
Why Dot Product Equals Cosine Similarity for Unit Vectors
When two vectors are unit vectors, their dot product is the same as their cosine similarity. Here are some notes on why this is the case:
Magnitude of Vector
The magnitude (or norm) of a vector measures the "length" of the vector in Euclidean space. For vector , the Euclidean norm is calculated as:
Example Calculation
# vector v =[3, 4]
v = [3 ,4]
squares = [x ** 2 for x in v]
# [9, 16]
sum_of_squares = sum(squares)
# 25
magnitude = sum_of_squares ** 0.5
# 5
Unit Vectors
To transform text embeddings into unit vectors, OpenAI would've needed to:
- Adjust the magnitude of a vector to be exactly 1,
- Preserve the direction of the vector in the vector space.
To achieve this, they would've:
- Calculated the magnitude of the vector as described above.
- Divided each element of the vector by the magnitude:
- The resulting vector would then be a unit vector.
Example Calculation
# magnitude of ||v|| = 5 as calculated above
u = [x / magnitude for x in v]
# [0.6, 0.8]
sum([x ** 2 for x in u]) ** 0.5
# 1.0
We can also visually verify that by using this approach, we get a shorter vector that points in the same direction.
Dot Product
The dot product of two vectors and is defined as:
where is the angle between the vectors and and are the magnitudes of the vectors.
Cosine Similarity
Cosine similarity is specifically the cosine of the angle between two vectors, which is calculated as:
Unit Vectors
For unit vectors, the magnitude of each vector is 1. Therefore, the formulas simplify:
This makes the dot product:
And the cosine similarity formula simplifies to:
Thus, when the vectors are unit vectors, the dot product is exactly the cosine of the angle between them, which is the same as their cosine similarity.