
What Is a Tensor? A Beginner's Guide with Real Examples
Tensors explained from scratch — no math degree required. Learn what tensors are, why PyTorch uses them, and how to work with them confidently.
If you've ever opened a PyTorch tutorial and immediately hit the word tensor, you're not alone. It sounds intimidating — like something from a physics textbook. But here's the truth: a tensor is just a container for numbers, organised in a grid. That's it. Once that clicks, everything else in deep learning becomes a lot less scary.
Who is this post for?
Before we define a tensor, let's look at things you already know — because a tensor is just a generalisation of all of them.
A scalar is just one number. No grid, no list. Things like your age, the temperature outside, or the price of a coffee. Examples: 42, 3.14, -7.
A vector is a row (or column) of numbers. Think of a week's worth of temperatures: [22, 24, 19, 17, 25, 28, 23]. Or the three RGB colour values of a pixel: [255, 128, 0].
A matrix is a 2-D grid of numbers — rows and columns. A spreadsheet is a matrix. A grayscale photo is a matrix (each cell holds the brightness of one pixel).
A tensor is the general term for any of the above, plus the idea that you can keep stacking dimensions. A colour photo is three matrices stacked (one for Red, one for Green, one for Blue). A batch of 32 colour photos is 32 of those stacked. That's a tensor.
The one-line definition
Every tensor has a shape — a tuple that tells you how many elements exist along each dimension. Learning to read shapes fluently is the single most useful skill when debugging deep learning code.
import torch
# 0-D tensor (scalar) — shape is empty ()
scalar = torch.tensor(42)
print(scalar.shape) # torch.Size([])
print(scalar.ndim) # 0
# 1-D tensor (vector) — shape has one number
vector = torch.tensor([10, 20, 30, 40])
print(vector.shape) # torch.Size([4]) ← 4 elements
print(vector.ndim) # 1
# 2-D tensor (matrix) — shape has two numbers: rows x cols
matrix = torch.tensor([[1, 2, 3],
[4, 5, 6]])
print(matrix.shape) # torch.Size([2, 3]) ← 2 rows, 3 columns
print(matrix.ndim) # 2
# 3-D tensor — shape has three numbers
cube = torch.zeros(3, 4, 5) # 3 "layers", each 4x5
print(cube.shape) # torch.Size([3, 4, 5])
print(cube.ndim) # 3How to read a shape out loud
Abstract shapes become much easier to understand when you map them to something concrete. Here's how tensors appear in real machine learning tasks:
| Shape | What it represents |
|---|---|
| torch.Size([]) | A single loss value: 0.342 |
| torch.Size([10]) | Probability scores for 10 classes (e.g. digit 0–9) |
| torch.Size([784]) | A flattened 28×28 MNIST image |
| torch.Size([28, 28]) | One grayscale image (28 pixels × 28 pixels) |
| torch.Size([3, 64, 64]) | One colour image (RGB × height × width) |
| torch.Size([32, 3, 64, 64]) | A batch of 32 colour images |
| torch.Size([100, 512]) | 100 sentences, each embedded as a 512-d vector |
| torch.Size([8, 12, 128, 64]) | Attention scores in a transformer (batch × heads × seq × seq) |
There are several ways to create tensors. The right choice depends on whether you already have data or just need a tensor of a certain size to start with.
import torch
# ── From existing Python data ──────────────────────────────────
t1 = torch.tensor([1.0, 2.0, 3.0]) # from a list
print(t1) # tensor([1., 2., 3.])
t2 = torch.tensor([[1, 2], [3, 4]]) # from a nested list → 2-D
print(t2.shape) # torch.Size([2, 2])
# ── Tensors filled with a constant ────────────────────────────
zeros = torch.zeros(3, 4) # all 0s, shape (3, 4)
ones = torch.ones(2, 5) # all 1s, shape (2, 5)
full = torch.full((3, 3), 7) # all 7s, shape (3, 3)
# ── Tensors filled with random numbers ────────────────────────
rand_uniform = torch.rand(2, 3) # uniform [0, 1)
rand_normal = torch.randn(2, 3) # standard normal (mean=0, std=1)
# ── Useful sequences ──────────────────────────────────────────
range_t = torch.arange(0, 10, 2) # [0, 2, 4, 6, 8]
linspace = torch.linspace(0, 1, 5) # [0.0, 0.25, 0.5, 0.75, 1.0]
# ── Same shape as another tensor ──────────────────────────────
existing = torch.randn(3, 4)
like_it = torch.zeros_like(existing) # shape (3,4), all zeros
print(like_it.shape) # torch.Size([3, 4])All elements in a tensor must be the same data type (dtype). The most common ones you'll encounter are:
| dtype | PyTorch name | When to use |
|---|---|---|
| 32-bit float | torch.float32 (default) | Almost everything — weights, activations, loss |
| 64-bit float | torch.float64 | High-precision scientific computation |
| 16-bit float | torch.float16 | Memory-efficient training on GPUs (mixed precision) |
| 32-bit int | torch.int32 | Counts, indices when 64-bit is overkill |
| 64-bit int | torch.int64 | Class labels, token IDs — the default integer type |
| Boolean | torch.bool | Masks (e.g. which positions to ignore in attention) |
import torch
# Default dtype for floats is float32
f = torch.tensor([1.0, 2.0])
print(f.dtype) # torch.float32
# Default dtype for ints is int64
i = torch.tensor([1, 2, 3])
print(i.dtype) # torch.int64
# Specify explicitly
half = torch.tensor([1.0, 2.0], dtype=torch.float16)
print(half.dtype) # torch.float16
# Cast an existing tensor to a different dtype
float_version = i.float() # int64 → float32
long_version = f.long() # float32 → int64
print(float_version.dtype) # torch.float32
# Check dtype before operations — mismatched dtypes cause errors!
a = torch.tensor([1.0]) # float32
b = torch.tensor([2], dtype=torch.int64)
# a + b ← this would raise a RuntimeError
a + b.float() # cast b first ✓Tensors support all the arithmetic you'd expect. Most operations work element-wise — meaning they're applied to each number independently, in the same position.
import torch
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
# ── Element-wise arithmetic ────────────────────────────────────
print(a + b) # tensor([5., 7., 9.])
print(a - b) # tensor([-3., -3., -3.])
print(a * b) # tensor([ 4., 10., 18.])
print(a / b) # tensor([0.25, 0.40, 0.50])
print(a ** 2) # tensor([1., 4., 9.])
# ── Reduction operations ───────────────────────────────────────
print(a.sum()) # tensor(6.) — sum of all elements
print(a.mean()) # tensor(2.) — average
print(a.max()) # tensor(3.) — maximum value
print(a.min()) # tensor(1.) — minimum value
# ── Matrix multiplication (the workhorse of deep learning) ─────
A = torch.tensor([[1.0, 2.0], [3.0, 4.0]]) # shape (2, 2)
B = torch.tensor([[5.0, 6.0], [7.0, 8.0]]) # shape (2, 2)
print(A @ B) # matrix multiply, shape (2, 2)
print(torch.matmul(A, B)) # same thing, different syntaxWhy is matrix multiplication so important?
What happens when you try to add a shape (3,) tensor to a shape (2, 3) tensor? PyTorch uses a rule called broadcasting to "stretch" the smaller tensor to match the larger one — without actually copying data. It sounds confusing but follows a simple rule: dimensions are aligned from the right, and any dimension of size 1 (or missing) gets repeated to match.
import torch
# Add a vector to every row of a matrix
matrix = torch.tensor([[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]]) # shape (2, 3)
vector = torch.tensor([10.0, 20.0, 30.0]) # shape (3,)
# PyTorch mentally expands vector to shape (2, 3):
# [[10, 20, 30],
# [10, 20, 30]]
result = matrix + vector
print(result)
# tensor([[11., 22., 33.],
# [14., 25., 36.]])
# Another example: add a column vector to a matrix
col = torch.tensor([[100.0], [200.0]]) # shape (2, 1)
result2 = matrix + col
print(result2)
# tensor([[101., 102., 103.],
# [204., 205., 206.]])Broadcasting can silently do the wrong thing
The data in a tensor is stored as a flat list of numbers in memory. The shape is just a description of how to interpret that flat list as an N-dimensional grid. Reshaping changes the interpretation without moving any data — it's essentially free.
import torch
t = torch.arange(12) # [0, 1, 2, ..., 11] shape: (12,)
print(t.shape) # torch.Size([12])
# Reshape to a 3×4 matrix
m = t.reshape(3, 4)
print(m.shape) # torch.Size([3, 4])
print(m)
# tensor([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
# Use -1 to let PyTorch calculate one dimension automatically
m2 = t.reshape(2, -1) # -1 means "figure it out" → (2, 6)
print(m2.shape) # torch.Size([2, 6])
m3 = t.reshape(-1, 4) # → (3, 4)
print(m3.shape) # torch.Size([3, 4])
# Flatten back to 1-D
flat = m.flatten()
print(flat.shape) # torch.Size([12])
# Adding/removing dimensions of size 1 (useful for batch dims)
v = torch.tensor([1.0, 2.0, 3.0]) # shape (3,)
print(v.unsqueeze(0).shape) # torch.Size([1, 3]) — add row dim
print(v.unsqueeze(1).shape) # torch.Size([3, 1]) — add col dim
batched = v.unsqueeze(0) # shape (1, 3)
print(batched.squeeze(0).shape) # torch.Size([3]) — remove itIndexing a tensor works just like indexing a NumPy array or a nested Python list. You can grab a single element, a row, a column, or any sub-region you like.
import torch
t = torch.tensor([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
# Single element (returns a scalar tensor)
print(t[1, 2]) # tensor(60)
# Entire row
print(t[0]) # tensor([10, 20, 30])
# Entire column
print(t[:, 1]) # tensor([20, 50, 80])
# Sub-matrix (rows 0-1, all columns)
print(t[:2, :]) # tensor([[10, 20, 30], [40, 50, 60]])
# Boolean (mask) indexing — grab elements greater than 40
mask = t > 40
print(mask)
# tensor([[False, False, False],
# [False, True, True],
# [ True, True, True]])
print(t[mask]) # tensor([50, 60, 70, 80, 90])
# Get the integer value out of a single-element tensor
print(t[0, 0].item()) # 10 (plain Python int)One of the biggest reasons to use tensors instead of plain NumPy arrays is that tensors can live on a GPU, where thousands of cores can process them in parallel. Moving a tensor to the GPU is a single line.
import torch
# Check if a GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# Create a tensor on the CPU (default)
cpu_tensor = torch.randn(3, 3)
print(cpu_tensor.device) # cpu
# Move it to the GPU
gpu_tensor = cpu_tensor.to(device)
print(gpu_tensor.device) # cuda:0 (if GPU available)
# Or create directly on the GPU
gpu_tensor2 = torch.randn(3, 3, device=device)
# ⚠️ You cannot mix CPU and GPU tensors in an operation!
# cpu_tensor + gpu_tensor ← RuntimeError!
# Move back to CPU (e.g. for numpy conversion or plotting)
back_on_cpu = gpu_tensor.cpu()
print(back_on_cpu.device) # cpuApple Silicon (M1/M2/M3)?
If you come from a data science background you've probably used NumPy arrays. PyTorch tensors and NumPy arrays are closely related — they can share the same underlying memory block, so converting between them is essentially free (as long as the tensor is on the CPU).
import torch
import numpy as np
# Tensor → NumPy
t = torch.tensor([1.0, 2.0, 3.0])
arr = t.numpy() # shares memory!
print(arr) # [1. 2. 3.]
print(type(arr)) # <class 'numpy.ndarray'>
# They share memory — changing one changes the other!
t[0] = 99
print(arr) # [99. 2. 3.] ← changed!
# NumPy → Tensor
arr2 = np.array([4.0, 5.0, 6.0])
t2 = torch.from_numpy(arr2) # also shares memory
print(t2) # tensor([4., 5., 6.], dtype=torch.float64)
# If you DON'T want shared memory, make a copy
t3 = torch.tensor(arr2) # copies the data — no memory sharing| Operation | Code |
|---|---|
| Tensor → NumPy (shared memory) | arr = tensor.numpy() |
| Tensor → NumPy (safe copy) | arr = tensor.detach().cpu().numpy() |
| NumPy → Tensor (shared memory) | t = torch.from_numpy(arr) |
| NumPy → Tensor (safe copy) | t = torch.tensor(arr) |
Here's a worked example that ties everything together. We'll represent a tiny batch of two colour images, poke around its shape, do some operations, and see how it would flow into a neural network:
import torch
torch.manual_seed(0)
# Simulate a batch of 2 colour images, 4×4 pixels
# Shape: (batch, channels, height, width) — the standard NCHW format
images = torch.rand(2, 3, 4, 4)
print(f"Batch shape : {images.shape}") # torch.Size([2, 3, 4, 4])
# Grab the first image
first_image = images[0]
print(f"One image : {first_image.shape}") # torch.Size([3, 4, 4])
# Grab the red channel of the first image
red_channel = images[0, 0]
print(f"Red channel : {red_channel.shape}") # torch.Size([4, 4])
# Global average across height and width (simple pooling)
pooled = images.mean(dim=[2, 3])
print(f"After pool : {pooled.shape}") # torch.Size([2, 3])
# Now we have one 3-value summary per image per channel
# Flatten each image to a vector (ready for a Linear layer)
flat = images.reshape(2, -1)
print(f"Flattened : {flat.shape}") # torch.Size([2, 48])
# 3 channels * 4 * 4 = 48 values per image
# Simulated linear layer: (48 inputs → 10 outputs)
W = torch.randn(48, 10)
logits = flat @ W
print(f"Logits : {logits.shape}") # torch.Size([2, 10])
# 2 images, each with 10 raw class scoresimport torch
# ─── Creating ──────────────────────────────────────────────────
torch.tensor([1, 2, 3]) # from data
torch.zeros(3, 4) # all zeros
torch.ones(3, 4) # all ones
torch.rand(3, 4) # uniform [0,1)
torch.randn(3, 4) # normal distribution
torch.arange(0, 10, 2) # [0, 2, 4, 6, 8]
torch.linspace(0, 1, 5) # evenly spaced
torch.zeros_like(other) # zeros, same shape as other
# ─── Inspecting ────────────────────────────────────────────────
t.shape # e.g. torch.Size([2, 3])
t.ndim # number of dimensions
t.dtype # e.g. torch.float32
t.device # cpu or cuda:0
t.numel() # total number of elements
# ─── Reshaping ─────────────────────────────────────────────────
t.reshape(2, -1) # reshape, -1 = inferred
t.flatten() # collapse to 1-D
t.unsqueeze(0) # add a dimension at position 0
t.squeeze(0) # remove dimension of size 1
t.permute(2, 0, 1) # reorder axes
t.transpose(0, 1) # swap two axes
# ─── Operations ────────────────────────────────────────────────
a + b ; a - b ; a * b ; a / b # element-wise
a @ b # matrix multiplication
a.sum() ; a.mean() ; a.max() # reductions
a.sum(dim=0) # reduce along a specific axis
# ─── Moving around ─────────────────────────────────────────────
t.to('cuda') # to GPU
t.to('cpu') # to CPU
t.numpy() # to NumPy (CPU only)
t.item() # to Python scalar (single-element tensor)
t.detach() # detach from autograd graphA tensor is nothing more than an N-dimensional grid of numbers. Scalars, vectors, and matrices are all just tensors with fewer dimensions. Once you're comfortable reading shapes and thinking about dimensions, you'll find that most PyTorch code is just shuffling tensors into the right shape and multiplying them together. Every image, every word, every prediction, every loss value — it's all tensors all the way down.
Your next step
Related Articles
PyTorch Autograd: Automatic Differentiation from the Ground Up
A complete, beginner-friendly guide to PyTorch's autograd engine — from what a gradient is to building a neural network by hand.
Logistic Regression from Scratch in PyTorch: Every Line Explained
Build a multi-class classifier in PyTorch without nn.Linear, without optim.SGD, without CrossEntropyLoss. Just [tensors](/blog/what-is-a-tensor), [autograd](/blog/pytorch-autograd-deep-dive), and arithmetic — so you finally see what those helpers actually do.
From Words to Intelligence: Building an MLP Classifier on Pretrained Sentence Embeddings
A deep dive into pretrained sentence embeddings, MLP architecture, BatchNorm, Dropout, Adam, and early stopping — with full PyTorch implementation.