Back to articles
What Is a Tensor? A Beginner's Guide with Real Examples

What Is a Tensor? A Beginner's Guide with Real Examples

Tensors explained from scratch — no math degree required. Learn what tensors are, why PyTorch uses them, and how to work with them confidently.

10 min read

If you've ever opened a PyTorch tutorial and immediately hit the word tensor, you're not alone. It sounds intimidating — like something from a physics textbook. But here's the truth: a tensor is just a container for numbers, organised in a grid. That's it. Once that clicks, everything else in deep learning becomes a lot less scary.

Who is this post for?

Absolute beginners. If you know what a list of numbers is, you know enough to follow along. We'll build the concept up step by step, then show how it maps to real PyTorch code.

Before we define a tensor, let's look at things you already know — because a tensor is just a generalisation of all of them.

A scalar is just one number. No grid, no list. Things like your age, the temperature outside, or the price of a coffee. Examples: 42, 3.14, -7.

A vector is a row (or column) of numbers. Think of a week's worth of temperatures: [22, 24, 19, 17, 25, 28, 23]. Or the three RGB colour values of a pixel: [255, 128, 0].

A matrix is a 2-D grid of numbers — rows and columns. A spreadsheet is a matrix. A grayscale photo is a matrix (each cell holds the brightness of one pixel).

A tensor is the general term for any of the above, plus the idea that you can keep stacking dimensions. A colour photo is three matrices stacked (one for Red, one for Green, one for Blue). A batch of 32 colour photos is 32 of those stacked. That's a tensor.

The one-line definition

A tensor is an N-dimensional grid of numbers, all of the same data type. Scalars, vectors, and matrices are all just special cases of tensors with 0, 1, and 2 dimensions respectively.

Every tensor has a shape — a tuple that tells you how many elements exist along each dimension. Learning to read shapes fluently is the single most useful skill when debugging deep learning code.

01_shapes.py
python
import torch

# 0-D tensor (scalar) — shape is empty ()
scalar = torch.tensor(42)
print(scalar.shape)          # torch.Size([])
print(scalar.ndim)           # 0

# 1-D tensor (vector) — shape has one number
vector = torch.tensor([10, 20, 30, 40])
print(vector.shape)          # torch.Size([4])  ← 4 elements
print(vector.ndim)           # 1

# 2-D tensor (matrix) — shape has two numbers: rows x cols
matrix = torch.tensor([[1, 2, 3],
                        [4, 5, 6]])
print(matrix.shape)          # torch.Size([2, 3])  ← 2 rows, 3 columns
print(matrix.ndim)           # 2

# 3-D tensor — shape has three numbers
cube = torch.zeros(3, 4, 5)  # 3 "layers", each 4x5
print(cube.shape)             # torch.Size([3, 4, 5])
print(cube.ndim)              # 3

How to read a shape out loud

torch.Size([32, 3, 64, 64]) — read it as: '32 images, each with 3 colour channels, each channel being 64 pixels tall and 64 pixels wide.' Always read left-to-right, outermost dimension first.

Abstract shapes become much easier to understand when you map them to something concrete. Here's how tensors appear in real machine learning tasks:

ShapeWhat it represents
torch.Size([])A single loss value: 0.342
torch.Size([10])Probability scores for 10 classes (e.g. digit 0–9)
torch.Size([784])A flattened 28×28 MNIST image
torch.Size([28, 28])One grayscale image (28 pixels × 28 pixels)
torch.Size([3, 64, 64])One colour image (RGB × height × width)
torch.Size([32, 3, 64, 64])A batch of 32 colour images
torch.Size([100, 512])100 sentences, each embedded as a 512-d vector
torch.Size([8, 12, 128, 64])Attention scores in a transformer (batch × heads × seq × seq)

There are several ways to create tensors. The right choice depends on whether you already have data or just need a tensor of a certain size to start with.

02_creating_tensors.py
python
import torch

# ── From existing Python data ──────────────────────────────────
t1 = torch.tensor([1.0, 2.0, 3.0])          # from a list
print(t1)       # tensor([1., 2., 3.])

t2 = torch.tensor([[1, 2], [3, 4]])          # from a nested list → 2-D
print(t2.shape) # torch.Size([2, 2])

# ── Tensors filled with a constant ────────────────────────────
zeros  = torch.zeros(3, 4)       # all 0s, shape (3, 4)
ones   = torch.ones(2, 5)        # all 1s, shape (2, 5)
full   = torch.full((3, 3), 7)   # all 7s, shape (3, 3)

# ── Tensors filled with random numbers ────────────────────────
rand_uniform = torch.rand(2, 3)       # uniform [0, 1)
rand_normal  = torch.randn(2, 3)      # standard normal (mean=0, std=1)

# ── Useful sequences ──────────────────────────────────────────
range_t  = torch.arange(0, 10, 2)    # [0, 2, 4, 6, 8]
linspace = torch.linspace(0, 1, 5)   # [0.0, 0.25, 0.5, 0.75, 1.0]

# ── Same shape as another tensor ──────────────────────────────
existing = torch.randn(3, 4)
like_it  = torch.zeros_like(existing) # shape (3,4), all zeros
print(like_it.shape)                  # torch.Size([3, 4])

All elements in a tensor must be the same data type (dtype). The most common ones you'll encounter are:

dtypePyTorch nameWhen to use
32-bit floattorch.float32 (default)Almost everything — weights, activations, loss
64-bit floattorch.float64High-precision scientific computation
16-bit floattorch.float16Memory-efficient training on GPUs (mixed precision)
32-bit inttorch.int32Counts, indices when 64-bit is overkill
64-bit inttorch.int64Class labels, token IDs — the default integer type
Booleantorch.boolMasks (e.g. which positions to ignore in attention)
03_dtypes.py
python
import torch

# Default dtype for floats is float32
f = torch.tensor([1.0, 2.0])
print(f.dtype)    # torch.float32

# Default dtype for ints is int64
i = torch.tensor([1, 2, 3])
print(i.dtype)    # torch.int64

# Specify explicitly
half = torch.tensor([1.0, 2.0], dtype=torch.float16)
print(half.dtype) # torch.float16

# Cast an existing tensor to a different dtype
float_version = i.float()           # int64 → float32
long_version  = f.long()            # float32 → int64
print(float_version.dtype)          # torch.float32

# Check dtype before operations — mismatched dtypes cause errors!
a = torch.tensor([1.0])             # float32
b = torch.tensor([2], dtype=torch.int64)
# a + b  ← this would raise a RuntimeError
a + b.float()   # cast b first ✓

Tensors support all the arithmetic you'd expect. Most operations work element-wise — meaning they're applied to each number independently, in the same position.

04_operations.py
python
import torch

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

# ── Element-wise arithmetic ────────────────────────────────────
print(a + b)     # tensor([5., 7., 9.])
print(a - b)     # tensor([-3., -3., -3.])
print(a * b)     # tensor([ 4., 10., 18.])
print(a / b)     # tensor([0.25, 0.40, 0.50])
print(a ** 2)    # tensor([1., 4., 9.])

# ── Reduction operations ───────────────────────────────────────
print(a.sum())   # tensor(6.)   — sum of all elements
print(a.mean())  # tensor(2.)   — average
print(a.max())   # tensor(3.)   — maximum value
print(a.min())   # tensor(1.)   — minimum value

# ── Matrix multiplication (the workhorse of deep learning) ─────
A = torch.tensor([[1.0, 2.0], [3.0, 4.0]])   # shape (2, 2)
B = torch.tensor([[5.0, 6.0], [7.0, 8.0]])   # shape (2, 2)

print(A @ B)          # matrix multiply, shape (2, 2)
print(torch.matmul(A, B))  # same thing, different syntax

Why is matrix multiplication so important?

Every linear layer in a neural network is a matrix multiplication: output = input @ weight. It's how information flows through the network. Understanding that y = x @ W + b is just a matrix multiply plus a vector addition makes neural networks feel much less mysterious.

What happens when you try to add a shape (3,) tensor to a shape (2, 3) tensor? PyTorch uses a rule called broadcasting to "stretch" the smaller tensor to match the larger one — without actually copying data. It sounds confusing but follows a simple rule: dimensions are aligned from the right, and any dimension of size 1 (or missing) gets repeated to match.

05_broadcasting.py
python
import torch

# Add a vector to every row of a matrix
matrix = torch.tensor([[1.0, 2.0, 3.0],
                        [4.0, 5.0, 6.0]])     # shape (2, 3)
vector = torch.tensor([10.0, 20.0, 30.0])     # shape    (3,)

# PyTorch mentally expands vector to shape (2, 3):
# [[10, 20, 30],
#  [10, 20, 30]]
result = matrix + vector
print(result)
# tensor([[11., 22., 33.],
#         [14., 25., 36.]])

# Another example: add a column vector to a matrix
col = torch.tensor([[100.0], [200.0]])         # shape (2, 1)
result2 = matrix + col
print(result2)
# tensor([[101., 102., 103.],
#         [204., 205., 206.]])

Broadcasting can silently do the wrong thing

Broadcasting is powerful but can produce surprising results if your shapes are accidentally compatible in the wrong way. Always print .shape before and after operations you're unsure about — it's the fastest debugging trick in PyTorch.

The data in a tensor is stored as a flat list of numbers in memory. The shape is just a description of how to interpret that flat list as an N-dimensional grid. Reshaping changes the interpretation without moving any data — it's essentially free.

06_reshaping.py
python
import torch

t = torch.arange(12)       # [0, 1, 2, ..., 11]  shape: (12,)
print(t.shape)             # torch.Size([12])

# Reshape to a 3×4 matrix
m = t.reshape(3, 4)
print(m.shape)             # torch.Size([3, 4])
print(m)
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])

# Use -1 to let PyTorch calculate one dimension automatically
m2 = t.reshape(2, -1)     # -1 means "figure it out" → (2, 6)
print(m2.shape)            # torch.Size([2, 6])

m3 = t.reshape(-1, 4)     # → (3, 4)
print(m3.shape)            # torch.Size([3, 4])

# Flatten back to 1-D
flat = m.flatten()
print(flat.shape)          # torch.Size([12])

# Adding/removing dimensions of size 1 (useful for batch dims)
v = torch.tensor([1.0, 2.0, 3.0])   # shape (3,)
print(v.unsqueeze(0).shape)          # torch.Size([1, 3])  — add row dim
print(v.unsqueeze(1).shape)          # torch.Size([3, 1])  — add col dim

batched = v.unsqueeze(0)             # shape (1, 3)
print(batched.squeeze(0).shape)      # torch.Size([3])     — remove it

Indexing a tensor works just like indexing a NumPy array or a nested Python list. You can grab a single element, a row, a column, or any sub-region you like.

07_indexing.py
python
import torch

t = torch.tensor([[10, 20, 30],
                  [40, 50, 60],
                  [70, 80, 90]])

# Single element (returns a scalar tensor)
print(t[1, 2])          # tensor(60)

# Entire row
print(t[0])             # tensor([10, 20, 30])

# Entire column
print(t[:, 1])          # tensor([20, 50, 80])

# Sub-matrix (rows 0-1, all columns)
print(t[:2, :])         # tensor([[10, 20, 30], [40, 50, 60]])

# Boolean (mask) indexing — grab elements greater than 40
mask = t > 40
print(mask)
# tensor([[False, False, False],
#         [False,  True,  True],
#         [ True,  True,  True]])

print(t[mask])          # tensor([50, 60, 70, 80, 90])

# Get the integer value out of a single-element tensor
print(t[0, 0].item())   # 10  (plain Python int)

One of the biggest reasons to use tensors instead of plain NumPy arrays is that tensors can live on a GPU, where thousands of cores can process them in parallel. Moving a tensor to the GPU is a single line.

08_gpu.py
python
import torch

# Check if a GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Create a tensor on the CPU (default)
cpu_tensor = torch.randn(3, 3)
print(cpu_tensor.device)   # cpu

# Move it to the GPU
gpu_tensor = cpu_tensor.to(device)
print(gpu_tensor.device)   # cuda:0  (if GPU available)

# Or create directly on the GPU
gpu_tensor2 = torch.randn(3, 3, device=device)

# ⚠️  You cannot mix CPU and GPU tensors in an operation!
# cpu_tensor + gpu_tensor  ← RuntimeError!

# Move back to CPU (e.g. for numpy conversion or plotting)
back_on_cpu = gpu_tensor.cpu()
print(back_on_cpu.device)  # cpu

Apple Silicon (M1/M2/M3)?

Use device = 'mps' instead of 'cuda' on Apple Silicon Macs. The pattern is the same: tensor.to('mps') moves it to the GPU. Write device-agnostic code by always checking torch.cuda.is_available() or torch.backends.mps.is_available().

If you come from a data science background you've probably used NumPy arrays. PyTorch tensors and NumPy arrays are closely related — they can share the same underlying memory block, so converting between them is essentially free (as long as the tensor is on the CPU).

09_numpy_bridge.py
python
import torch
import numpy as np

# Tensor → NumPy
t = torch.tensor([1.0, 2.0, 3.0])
arr = t.numpy()           # shares memory!
print(arr)                # [1. 2. 3.]
print(type(arr))          # <class 'numpy.ndarray'>

# They share memory — changing one changes the other!
t[0] = 99
print(arr)                # [99.  2.  3.]  ← changed!

# NumPy → Tensor
arr2 = np.array([4.0, 5.0, 6.0])
t2 = torch.from_numpy(arr2)   # also shares memory
print(t2)                 # tensor([4., 5., 6.], dtype=torch.float64)

# If you DON'T want shared memory, make a copy
t3 = torch.tensor(arr2)  # copies the data — no memory sharing
OperationCode
Tensor → NumPy (shared memory)arr = tensor.numpy()
Tensor → NumPy (safe copy)arr = tensor.detach().cpu().numpy()
NumPy → Tensor (shared memory)t = torch.from_numpy(arr)
NumPy → Tensor (safe copy)t = torch.tensor(arr)

Here's a worked example that ties everything together. We'll represent a tiny batch of two colour images, poke around its shape, do some operations, and see how it would flow into a neural network:

10_putting_it_together.py
python
import torch

torch.manual_seed(0)

# Simulate a batch of 2 colour images, 4×4 pixels
# Shape: (batch, channels, height, width) — the standard NCHW format
images = torch.rand(2, 3, 4, 4)
print(f"Batch shape : {images.shape}")      # torch.Size([2, 3, 4, 4])

# Grab the first image
first_image = images[0]
print(f"One image   : {first_image.shape}") # torch.Size([3, 4, 4])

# Grab the red channel of the first image
red_channel = images[0, 0]
print(f"Red channel : {red_channel.shape}") # torch.Size([4, 4])

# Global average across height and width (simple pooling)
pooled = images.mean(dim=[2, 3])
print(f"After pool  : {pooled.shape}")      # torch.Size([2, 3])
# Now we have one 3-value summary per image per channel

# Flatten each image to a vector (ready for a Linear layer)
flat = images.reshape(2, -1)
print(f"Flattened   : {flat.shape}")        # torch.Size([2, 48])
# 3 channels * 4 * 4 = 48 values per image

# Simulated linear layer: (48 inputs → 10 outputs)
W = torch.randn(48, 10)
logits = flat @ W
print(f"Logits      : {logits.shape}")      # torch.Size([2, 10])
# 2 images, each with 10 raw class scores
cheatsheet.py
python
import torch

# ─── Creating ──────────────────────────────────────────────────
torch.tensor([1, 2, 3])          # from data
torch.zeros(3, 4)                # all zeros
torch.ones(3, 4)                 # all ones
torch.rand(3, 4)                 # uniform [0,1)
torch.randn(3, 4)                # normal distribution
torch.arange(0, 10, 2)          # [0, 2, 4, 6, 8]
torch.linspace(0, 1, 5)         # evenly spaced
torch.zeros_like(other)         # zeros, same shape as other

# ─── Inspecting ────────────────────────────────────────────────
t.shape                          # e.g. torch.Size([2, 3])
t.ndim                           # number of dimensions
t.dtype                          # e.g. torch.float32
t.device                         # cpu or cuda:0
t.numel()                        # total number of elements

# ─── Reshaping ─────────────────────────────────────────────────
t.reshape(2, -1)                 # reshape, -1 = inferred
t.flatten()                      # collapse to 1-D
t.unsqueeze(0)                   # add a dimension at position 0
t.squeeze(0)                     # remove dimension of size 1
t.permute(2, 0, 1)               # reorder axes
t.transpose(0, 1)                # swap two axes

# ─── Operations ────────────────────────────────────────────────
a + b  ;  a - b  ;  a * b  ;  a / b   # element-wise
a @ b                            # matrix multiplication
a.sum()  ;  a.mean()  ;  a.max()       # reductions
a.sum(dim=0)                     # reduce along a specific axis

# ─── Moving around ─────────────────────────────────────────────
t.to('cuda')                     # to GPU
t.to('cpu')                      # to CPU
t.numpy()                        # to NumPy (CPU only)
t.item()                         # to Python scalar (single-element tensor)
t.detach()                       # detach from autograd graph

A tensor is nothing more than an N-dimensional grid of numbers. Scalars, vectors, and matrices are all just tensors with fewer dimensions. Once you're comfortable reading shapes and thinking about dimensions, you'll find that most PyTorch code is just shuffling tensors into the right shape and multiplying them together. Every image, every word, every prediction, every loss value — it's all tensors all the way down.

Your next step

Experiment! Open a notebook, create tensors of different shapes, and try to trigger a shape mismatch error on purpose — then fix it. Nothing builds intuition faster than breaking things and understanding why. Once tensors feel natural, the next topic to tackle is autograd — how PyTorch automatically computes gradients through these tensors.

Related Articles