- Understanding PyTorch Transforms - ToTensor() and Normalize()
- Setup
- transforms.ToTensor()
- Illustrative Calculation
- transforms.Normalize()
- Illustrative Calculation
- Why Normalize?
- Resources
Understanding PyTorch Transforms - ToTensor() and Normalize()
Today I was reviewing code for a convolutional neural network and wanted to understand better what transforms.ToTensor()
and transforms.Normalize()
were actually doing to the image data. These are typically used as follows:
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
The transform can then be applied to a dataset:
train_dataset = torchvision.datasets.MNIST(root='./data',
train=True,
transform=transform, # here
download=True)
These transforms are part of the torchvision.transforms
module, which provides a variety of common image transformations for preprocessing data.
Setup
I've saved a test image from the MNIST dataset of 9.

The code to convert it to a NumPy array with values ranging from 0 to 255:
# URL of the image
url = 'https://static.andypai.me/mnist9.jpg'
# Download the image
response = requests.get(url)
img = Image.open(BytesIO(response.content))
# Convert the image to grayscale
img = img.convert('L')
# Convert the image to a numpy array
img_array = np.array(img) # 28x28 matrix
Plot to verify:
import matplotlib.pyplot as plt
# Plotting the numpy array
plt.imshow(img_array, cmap='gray')
plt.axis('off')
plt.show()

transforms.ToTensor()
The ToTensor()
transform converts a PIL Image or NumPy ndarray (representing an image) into a PyTorch tensor.
Here's what it does under the hood:
-
Changes Data Type: Converts the image data from the original data type (e.g., uint8 for PIL Images) to
torch.float32
. -
Reshapes Dimensions: Rearrange the dimensions from (Height, Width, Channels) in a PIL image to (Channels, Height, Width), which is the standard format for PyTorch tensors.
Illustrative Calculation
Let's consider a 2x2 RGB image represented as a NumPy ndarray:
import numpy as np
image_np = np.array([
[[255, 0, 0], [0, 255, 0]], # Red, Green
[[0, 0, 255], [255, 255, 255]] # Blue, White
], dtype=np.uint8) # Ensure the dtype is uint8
Applying ToTensor()
, this converts the pixel values to the range [0, 1] and changes the dimension order to (C, H, W).
from torchvision import transforms
to_tensor = transforms.ToTensor()
image_tensor = to_tensor(image_np)
print(image_tensor)
Output:
tensor([[[1., 0.],
[0., 1.]],
[[0., 1.],
[0., 1.]],
[[0., 0.],
[1., 1.]]])
transforms.Normalize()
The Normalize()
transform normalizes a tensor by subtracting the mean and dividing by the standard deviation for each channel:
# Normalize((mean_R, mean_G, mean_B), (std_R, std_G, std_B))
normalize = transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
normalized_image = normalize(image_tensor)
print(normalized_image)
Output:
tensor([[[ 1., -1.],
[-1., 1.]],
[[-1., 1.],
[-1., 1.]],
[[-1., -1.],
[ 1., 1.]]])
Illustrative Calculation
Using the image_tensor
from the previous example:
-
Subtract the Mean: Subtract the specified mean (0.5 in this case) from each channel:
- Red Channel:
1.0 - 0.5 = 0.5
,0.0 - 0.5 = -0.5
, etc. - Green Channel:
0.0 - 0.5 = -0.5
,1.0 - 0.5 = 0.5
, etc. - Blue Channel:
0.0 - 0.5 = -0.5
,1.0 - 0.5 = 0.5
, etc.
- Red Channel:
-
Divide by Standard Deviation: Divide the results by the specified standard deviation (0.5 in this case):
- Red Channel:
0.5 / 0.5 = 1.0
,-0.5 / 0.5 = -1.0
, etc. - Green Channel:
-0.5 / 0.5 = -1.0
,0.5 / 0.5 = 1.0
, etc. - Blue Channel:
-0.5 / 0.5 = -1.0
,0.5 / 0.5 = 1.0
, etc.
- Red Channel:
The resulting tensor normalized_image
contains values normalized around 0, typically in the range [-1, 1].
Why Normalize?
Normalization is a crucial preprocessing step for several reasons:
-
Improved Numerical Stability: By scaling the features to a similar range, normalization helps prevent issues with numerical stability during training, especially for models with multiple layers.
-
Faster Convergence: Normalization can help the optimization algorithm converge faster to a good solution, as the gradients are more well-behaved.
-
Better Generalization: Normalization can improve the model's ability to generalize to new data by reducing the influence of features with large scales.
Resources
- PyTorch ToTensor Documentation on ToTensor
- PyTorch Normalize Documentation on Normalize
- Colab Notebook with all the calculations discussed