Example Calculation
# vector v =[3, 4]
v = [3 ,4]
squares = [x ** 2 for x in v]
# [9, 16]
sum_of_squares = sum(squares)
# 25
magnitude = sum_of_squares ** 0.5
# 5
To transform text embeddings into unit vectors, OpenAI would've needed to:
To achieve this, they would've:
Example Calculation
# magnitude of ||v|| = 5 as calculated above
u = [x / magnitude for x in v]
# [0.6, 0.8]
sum([x ** 2 for x in u]) ** 0.5
# 1.0
We can also visually verify that by using this approach, we get a shorter vector that points in the same direction.
The dot product of two vectors and is defined as:
where is the angle between the vectors and and are the magnitudes of the vectors.
Cosine similarity is specifically the cosine of the angle between two vectors, which is calculated as:
For unit vectors, the magnitude of each vector is 1. Therefore, the formulas simplify:
This makes the dot product:
And the cosine similarity formula simplifies to:
Thus, when the vectors are unit vectors, the dot product is exactly the cosine of the angle between them, which is the same as their cosine similarity.