- Working With Dot Product And Cosine Similarity For Unit Vectors
- Why Dot Product Equals Cosine Similarity for Unit Vectors
- Magnitude of Vector
- Unit Vectors
- Dot Product
- Cosine Similarity
- Unit Vectors
- Helpful Resources

I was reviewing the OpenAI docs for text embeddings today and came across this section:

Which distance function should I use?We recommend cosine similarity. The choice of distance function typically doesn’t matter much.

OpenAI embeddings are normalized to length 1, which means that:

- Cosine similarity can be computed slightly faster using just a dot product
- Cosine similarity and Euclidean distance will result in the identical rankings

Hmm, it's been a while since I studied linear algebra, so I wanted to prove this out a bit.

When two vectors are unit vectors, their dot product is the same as their cosine similarity. Here are some notes on why this is the case:

The magnitude (or norm) of a vector measures the "length" of the vector in Euclidean space. For vector $\mathbf{v} = (v_1, v_2, \dots, v_n)$, the Euclidean norm is calculated as:

$\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \dots + v_n^2}$

**Example Calculation**

```
# vector v =[3, 4]
v = [3 ,4]
squares = [x ** 2 for x in v]
# [9, 16]
sum_of_squares = sum(squares)
# 25
magnitude = sum_of_squares ** 0.5
# 5
```

To transform text embeddings into unit vectors, OpenAI would've needed to:

- Adjust the magnitude of a vector to be exactly 1,
- Preserve the direction of the vector in the vector space.

To achieve this, they would've:

- Calculated the magnitude of the vector as described above.
- Divided each element of the vector by the magnitude:

$\mathbf{u} = \left( \frac{v_1}{\|\mathbf{v}\|}, \frac{v_2}{\|\mathbf{v}\|}, \dots, \frac{v_n}{\|\mathbf{v}\|} \right)$

- The resulting vector $\mathbf{u}$ would then be a unit vector.

**Example Calculation**

```
# magnitude of ||v|| = 5 as calculated above
u = [x / magnitude for x in v]
# [0.6, 0.8]
sum([x ** 2 for x in u]) ** 0.5
# 1.0
```

We can also visually verify that by using this approach, we get a shorter vector that points in the same direction.

The dot product of two vectors $\mathbf{a}$ and $\mathbf{b}$ is defined as:

$\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos(\theta)$

where $\theta$ is the angle between the vectors and $\mathbf{a}$ and $\mathbf{b}$ are the magnitudes of the vectors.

Cosine similarity is specifically the cosine of the angle $\theta$ between two vectors, which is calculated as:

$\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\| \|\mathbf{b}\|}$

For unit vectors, the magnitude of each vector is 1. Therefore, the formulas simplify:

$\|\mathbf{a}\| = 1, \quad \|\mathbf{b}\| = 1$

This makes the dot product:

$\mathbf{a} \cdot \mathbf{b} = 1 \cdot 1 \cdot \cos(\theta) = \cos(\theta)$

And the cosine similarity formula simplifies to:

$\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{1 \cdot 1} = \mathbf{a} \cdot \mathbf{b}$

Thus, when the vectors are unit vectors, the dot product is exactly the cosine of the angle between them, which is the same as their cosine similarity.