What Is a Tensor? A Beginner's Guide with Real Examples

Tensors explained from scratch — no math degree required. Learn what tensors are, why PyTorch uses them, and how to work with them confidently.

AI EducatorApril 22, 2026

If you've ever opened a PyTorch tutorial and immediately hit the word tensor, you're not alone. It sounds intimidating — like something from a physics textbook. But here's the truth: a tensor is just a container for numbers, organised in a grid. That's it. Once that clicks, everything else in deep learning becomes a lot less scary.

Who is this post for?

Absolute beginners. If you know what a list of numbers is, you know enough to follow along. We'll build the concept up step by step, then show how it maps to real PyTorch code.

1. Start With What You Already Know

Before we define a tensor, let's look at things you already know — because a tensor is just a generalisation of all of them.

A single number — a Scalar

A scalar is just one number. No grid, no list. Things like your age, the temperature outside, or the price of a coffee. Examples: 42, 3.14, -7.

A list of numbers — a Vector

A vector is a row (or column) of numbers. Think of a week's worth of temperatures: [22, 24, 19, 17, 25, 28, 23]. Or the three RGB colour values of a pixel: [255, 128, 0].

A table of numbers — a Matrix

A matrix is a 2-D grid of numbers — rows and columns. A spreadsheet is a matrix. A grayscale photo is a matrix (each cell holds the brightness of one pixel).

Multiple tables stacked — a Tensor

A tensor is the general term for any of the above, plus the idea that you can keep stacking dimensions. A colour photo is three matrices stacked (one for Red, one for Green, one for Blue). A batch of 32 colour photos is 32 of those stacked. That's a tensor.

The one-line definition

A tensor is an N-dimensional grid of numbers, all of the same data type. Scalars, vectors, and matrices are all just special cases of tensors with 0, 1, and 2 dimensions respectively.

2. The Shape — The Most Important Property

Every tensor has a shape — a tuple that tells you how many elements exist along each dimension. Learning to read shapes fluently is the single most useful skill when debugging deep learning code.

01_shapes.py

python

import torch

# 0-D tensor (scalar) — shape is empty ()
scalar = torch.tensor(42)
print(scalar.shape)          # torch.Size([])
print(scalar.ndim)           # 0

# 1-D tensor (vector) — shape has one number
vector = torch.tensor([10, 20, 30, 40])
print(vector.shape)          # torch.Size([4])  ← 4 elements
print(vector.ndim)           # 1

# 2-D tensor (matrix) — shape has two numbers: rows x cols
matrix = torch.tensor([[1, 2, 3],
                        [4, 5, 6]])
print(matrix.shape)          # torch.Size([2, 3])  ← 2 rows, 3 columns
print(matrix.ndim)           # 2

# 3-D tensor — shape has three numbers
cube = torch.zeros(3, 4, 5)  # 3 "layers", each 4x5
print(cube.shape)             # torch.Size([3, 4, 5])
print(cube.ndim)              # 3

How to read a shape out loud

torch.Size([32, 3, 64, 64]) — read it as: '32 images, each with 3 colour channels, each channel being 64 pixels tall and 64 pixels wide.' Always read left-to-right, outermost dimension first.

3. Real-World Examples of Each Dimension

Abstract shapes become much easier to understand when you map them to something concrete. Here's how tensors appear in real machine learning tasks:

Shape	What it represents
torch.Size([])	A single loss value: 0.342
torch.Size([10])	Probability scores for 10 classes (e.g. digit 0–9)
torch.Size([784])	A flattened 28×28 MNIST image
torch.Size([28, 28])	One grayscale image (28 pixels × 28 pixels)
torch.Size([3, 64, 64])	One colour image (RGB × height × width)
torch.Size([32, 3, 64, 64])	A batch of 32 colour images
torch.Size([100, 512])	100 sentences, each embedded as a 512-d vector
torch.Size([8, 12, 128, 64])	Attention scores in a transformer (batch × heads × seq × seq)

4. Creating Tensors in PyTorch

There are several ways to create tensors. The right choice depends on whether you already have data or just need a tensor of a certain size to start with.

02_creating_tensors.py

python

import torch

# ── From existing Python data ──────────────────────────────────
t1 = torch.tensor([1.0, 2.0, 3.0])          # from a list
print(t1)       # tensor([1., 2., 3.])

t2 = torch.tensor([[1, 2], [3, 4]])          # from a nested list → 2-D
print(t2.shape) # torch.Size([2, 2])

# ── Tensors filled with a constant ────────────────────────────
zeros  = torch.zeros(3, 4)       # all 0s, shape (3, 4)
ones   = torch.ones(2, 5)        # all 1s, shape (2, 5)
full   = torch.full((3, 3), 7)   # all 7s, shape (3, 3)

# ── Tensors filled with random numbers ────────────────────────
rand_uniform = torch.rand(2, 3)       # uniform [0, 1)
rand_normal  = torch.randn(2, 3)      # standard normal (mean=0, std=1)

# ── Useful sequences ──────────────────────────────────────────
range_t  = torch.arange(0, 10, 2)    # [0, 2, 4, 6, 8]
linspace = torch.linspace(0, 1, 5)   # [0.0, 0.25, 0.5, 0.75, 1.0]

# ── Same shape as another tensor ──────────────────────────────
existing = torch.randn(3, 4)
like_it  = torch.zeros_like(existing) # shape (3,4), all zeros
print(like_it.shape)                  # torch.Size([3, 4])

5. Data Types (dtype)

All elements in a tensor must be the same data type (dtype). The most common ones you'll encounter are:

dtype	PyTorch name	When to use
32-bit float	torch.float32 (default)	Almost everything — weights, activations, loss
64-bit float	torch.float64	High-precision scientific computation
16-bit float	torch.float16	Memory-efficient training on GPUs (mixed precision)
32-bit int	torch.int32	Counts, indices when 64-bit is overkill
64-bit int	torch.int64	Class labels, token IDs — the default integer type
Boolean	torch.bool	Masks (e.g. which positions to ignore in attention)

03_dtypes.py

python

import torch

# Default dtype for floats is float32
f = torch.tensor([1.0, 2.0])
print(f.dtype)    # torch.float32

# Default dtype for ints is int64
i = torch.tensor([1, 2, 3])
print(i.dtype)    # torch.int64

# Specify explicitly
half = torch.tensor([1.0, 2.0], dtype=torch.float16)
print(half.dtype) # torch.float16

# Cast an existing tensor to a different dtype
float_version = i.float()           # int64 → float32
long_version  = f.long()            # float32 → int64
print(float_version.dtype)          # torch.float32

# Check dtype before operations — mismatched dtypes cause errors!
a = torch.tensor([1.0])             # float32
b = torch.tensor([2], dtype=torch.int64)
# a + b  ← this would raise a RuntimeError
a + b.float()   # cast b first ✓

6. Basic Operations

Tensors support all the arithmetic you'd expect. Most operations work element-wise — meaning they're applied to each number independently, in the same position.

04_operations.py

python

import torch

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

# ── Element-wise arithmetic ────────────────────────────────────
print(a + b)     # tensor([5., 7., 9.])
print(a - b)     # tensor([-3., -3., -3.])
print(a * b)     # tensor([ 4., 10., 18.])
print(a / b)     # tensor([0.25, 0.40, 0.50])
print(a ** 2)    # tensor([1., 4., 9.])

# ── Reduction operations ───────────────────────────────────────
print(a.sum())   # tensor(6.)   — sum of all elements
print(a.mean())  # tensor(2.)   — average
print(a.max())   # tensor(3.)   — maximum value
print(a.min())   # tensor(1.)   — minimum value

# ── Matrix multiplication (the workhorse of deep learning) ─────
A = torch.tensor([[1.0, 2.0], [3.0, 4.0]])   # shape (2, 2)
B = torch.tensor([[5.0, 6.0], [7.0, 8.0]])   # shape (2, 2)

print(A @ B)          # matrix multiply, shape (2, 2)
print(torch.matmul(A, B))  # same thing, different syntax

Why is matrix multiplication so important?

Every linear layer in a neural network is a matrix multiplication: output = input @ weight. It's how information flows through the network. Understanding that y = x @ W + b is just a matrix multiply plus a vector addition makes neural networks feel much less mysterious.

7. Broadcasting — When Shapes Don't Match

What happens when you try to add a shape (3,) tensor to a shape (2, 3) tensor? PyTorch uses a rule called broadcasting to "stretch" the smaller tensor to match the larger one — without actually copying data. It sounds confusing but follows a simple rule: dimensions are aligned from the right, and any dimension of size 1 (or missing) gets repeated to match.

05_broadcasting.py

python

import torch

# Add a vector to every row of a matrix
matrix = torch.tensor([[1.0, 2.0, 3.0],
                        [4.0, 5.0, 6.0]])     # shape (2, 3)
vector = torch.tensor([10.0, 20.0, 30.0])     # shape    (3,)

# PyTorch mentally expands vector to shape (2, 3):
# [[10, 20, 30],
#  [10, 20, 30]]
result = matrix + vector
print(result)
# tensor([[11., 22., 33.],
#         [14., 25., 36.]])

# Another example: add a column vector to a matrix
col = torch.tensor([[100.0], [200.0]])         # shape (2, 1)
result2 = matrix + col
print(result2)
# tensor([[101., 102., 103.],
#         [204., 205., 206.]])

Broadcasting can silently do the wrong thing

Broadcasting is powerful but can produce surprising results if your shapes are accidentally compatible in the wrong way. Always print .shape before and after operations you're unsure about — it's the fastest debugging trick in PyTorch.

8. Reshaping Tensors

The data in a tensor is stored as a flat list of numbers in memory. The shape is just a description of how to interpret that flat list as an N-dimensional grid. Reshaping changes the interpretation without moving any data — it's essentially free.

06_reshaping.py

python

import torch

t = torch.arange(12)       # [0, 1, 2, ..., 11]  shape: (12,)
print(t.shape)             # torch.Size([12])

# Reshape to a 3×4 matrix
m = t.reshape(3, 4)
print(m.shape)             # torch.Size([3, 4])
print(m)
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])

# Use -1 to let PyTorch calculate one dimension automatically
m2 = t.reshape(2, -1)     # -1 means "figure it out" → (2, 6)
print(m2.shape)            # torch.Size([2, 6])

m3 = t.reshape(-1, 4)     # → (3, 4)
print(m3.shape)            # torch.Size([3, 4])

# Flatten back to 1-D
flat = m.flatten()
print(flat.shape)          # torch.Size([12])

# Adding/removing dimensions of size 1 (useful for batch dims)
v = torch.tensor([1.0, 2.0, 3.0])   # shape (3,)
print(v.unsqueeze(0).shape)          # torch.Size([1, 3])  — add row dim
print(v.unsqueeze(1).shape)          # torch.Size([3, 1])  — add col dim

batched = v.unsqueeze(0)             # shape (1, 3)
print(batched.squeeze(0).shape)      # torch.Size([3])     — remove it

9. Indexing and Slicing

Indexing a tensor works just like indexing a NumPy array or a nested Python list. You can grab a single element, a row, a column, or any sub-region you like.

07_indexing.py

python

import torch

t = torch.tensor([[10, 20, 30],
                  [40, 50, 60],
                  [70, 80, 90]])

# Single element (returns a scalar tensor)
print(t[1, 2])          # tensor(60)

# Entire row
print(t[0])             # tensor([10, 20, 30])

# Entire column
print(t[:, 1])          # tensor([20, 50, 80])

# Sub-matrix (rows 0-1, all columns)
print(t[:2, :])         # tensor([[10, 20, 30], [40, 50, 60]])

# Boolean (mask) indexing — grab elements greater than 40
mask = t > 40
print(mask)
# tensor([[False, False, False],
#         [False,  True,  True],
#         [ True,  True,  True]])

print(t[mask])          # tensor([50, 60, 70, 80, 90])

# Get the integer value out of a single-element tensor
print(t[0, 0].item())   # 10  (plain Python int)

10. Moving to the GPU

One of the biggest reasons to use tensors instead of plain NumPy arrays is that tensors can live on a GPU, where thousands of cores can process them in parallel. Moving a tensor to the GPU is a single line.

08_gpu.py

python

import torch

# Check if a GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Create a tensor on the CPU (default)
cpu_tensor = torch.randn(3, 3)
print(cpu_tensor.device)   # cpu

# Move it to the GPU
gpu_tensor = cpu_tensor.to(device)
print(gpu_tensor.device)   # cuda:0  (if GPU available)

# Or create directly on the GPU
gpu_tensor2 = torch.randn(3, 3, device=device)

# ⚠️  You cannot mix CPU and GPU tensors in an operation!
# cpu_tensor + gpu_tensor  ← RuntimeError!

# Move back to CPU (e.g. for numpy conversion or plotting)
back_on_cpu = gpu_tensor.cpu()
print(back_on_cpu.device)  # cpu

Apple Silicon (M1/M2/M3)?

Use device = 'mps' instead of 'cuda' on Apple Silicon Macs. The pattern is the same: tensor.to('mps') moves it to the GPU. Write device-agnostic code by always checking torch.cuda.is_available() or torch.backends.mps.is_available().

11. Tensors and NumPy — Two Sides of the Same Coin

If you come from a data science background you've probably used NumPy arrays. PyTorch tensors and NumPy arrays are closely related — they can share the same underlying memory block, so converting between them is essentially free (as long as the tensor is on the CPU).

09_numpy_bridge.py

python

import torch
import numpy as np

# Tensor → NumPy
t = torch.tensor([1.0, 2.0, 3.0])
arr = t.numpy()           # shares memory!
print(arr)                # [1. 2. 3.]
print(type(arr))          # <class 'numpy.ndarray'>

# They share memory — changing one changes the other!
t[0] = 99
print(arr)                # [99.  2.  3.]  ← changed!

# NumPy → Tensor
arr2 = np.array([4.0, 5.0, 6.0])
t2 = torch.from_numpy(arr2)   # also shares memory
print(t2)                 # tensor([4., 5., 6.], dtype=torch.float64)

# If you DON'T want shared memory, make a copy
t3 = torch.tensor(arr2)  # copies the data — no memory sharing

Operation	Code
Tensor → NumPy (shared memory)	arr = tensor.numpy()
Tensor → NumPy (safe copy)	arr = tensor.detach().cpu().numpy()
NumPy → Tensor (shared memory)	t = torch.from_numpy(arr)
NumPy → Tensor (safe copy)	t = torch.tensor(arr)

12. Everything Together — A Quick Mental Model

Here's a worked example that ties everything together. We'll represent a tiny batch of two colour images, poke around its shape, do some operations, and see how it would flow into a neural network:

10_putting_it_together.py

python

import torch

torch.manual_seed(0)

# Simulate a batch of 2 colour images, 4×4 pixels
# Shape: (batch, channels, height, width) — the standard NCHW format
images = torch.rand(2, 3, 4, 4)
print(f"Batch shape : {images.shape}")      # torch.Size([2, 3, 4, 4])

# Grab the first image
first_image = images[0]
print(f"One image   : {first_image.shape}") # torch.Size([3, 4, 4])

# Grab the red channel of the first image
red_channel = images[0, 0]
print(f"Red channel : {red_channel.shape}") # torch.Size([4, 4])

# Global average across height and width (simple pooling)
pooled = images.mean(dim=[2, 3])
print(f"After pool  : {pooled.shape}")      # torch.Size([2, 3])
# Now we have one 3-value summary per image per channel

# Flatten each image to a vector (ready for a Linear layer)
flat = images.reshape(2, -1)
print(f"Flattened   : {flat.shape}")        # torch.Size([2, 48])
# 3 channels * 4 * 4 = 48 values per image

# Simulated linear layer: (48 inputs → 10 outputs)
W = torch.randn(48, 10)
logits = flat @ W
print(f"Logits      : {logits.shape}")      # torch.Size([2, 10])
# 2 images, each with 10 raw class scores

Quick Reference Cheatsheet

cheatsheet.py

python

import torch

# ─── Creating ──────────────────────────────────────────────────
torch.tensor([1, 2, 3])          # from data
torch.zeros(3, 4)                # all zeros
torch.ones(3, 4)                 # all ones
torch.rand(3, 4)                 # uniform [0,1)
torch.randn(3, 4)                # normal distribution
torch.arange(0, 10, 2)          # [0, 2, 4, 6, 8]
torch.linspace(0, 1, 5)         # evenly spaced
torch.zeros_like(other)         # zeros, same shape as other

# ─── Inspecting ────────────────────────────────────────────────
t.shape                          # e.g. torch.Size([2, 3])
t.ndim                           # number of dimensions
t.dtype                          # e.g. torch.float32
t.device                         # cpu or cuda:0
t.numel()                        # total number of elements

# ─── Reshaping ─────────────────────────────────────────────────
t.reshape(2, -1)                 # reshape, -1 = inferred
t.flatten()                      # collapse to 1-D
t.unsqueeze(0)                   # add a dimension at position 0
t.squeeze(0)                     # remove dimension of size 1
t.permute(2, 0, 1)               # reorder axes
t.transpose(0, 1)                # swap two axes

# ─── Operations ────────────────────────────────────────────────
a + b  ;  a - b  ;  a * b  ;  a / b   # element-wise
a @ b                            # matrix multiplication
a.sum()  ;  a.mean()  ;  a.max()       # reductions
a.sum(dim=0)                     # reduce along a specific axis

# ─── Moving around ─────────────────────────────────────────────
t.to('cuda')                     # to GPU
t.to('cpu')                      # to CPU
t.numpy()                        # to NumPy (CPU only)
t.item()                         # to Python scalar (single-element tensor)
t.detach()                       # detach from autograd graph

Conclusion

A tensor is nothing more than an N-dimensional grid of numbers. Scalars, vectors, and matrices are all just tensors with fewer dimensions. Once you're comfortable reading shapes and thinking about dimensions, you'll find that most PyTorch code is just shuffling tensors into the right shape and multiplying them together. Every image, every word, every prediction, every loss value — it's all tensors all the way down.

Your next step

Experiment! Open a notebook, create tensors of different shapes, and try to trigger a shape mismatch error on purpose — then fix it. Nothing builds intuition faster than breaking things and understanding why. Once tensors feel natural, the next topic to tackle is autograd — how PyTorch automatically computes gradients through these tensors.

#deep-learning #pytorch

beginner

PyTorch Autograd: Automatic Differentiation from the Ground Up

A complete, beginner-friendly guide to PyTorch's autograd engine — from what a gradient is to building a neural network by hand.

intermediate

Logistic Regression from Scratch in PyTorch: Every Line Explained

Build a multi-class classifier in PyTorch without nn.Linear, without optim.SGD, without CrossEntropyLoss. Just [tensors](/blog/what-is-a-tensor), [autograd](/blog/pytorch-autograd-deep-dive), and arithmetic — so you finally see what those helpers actually do.

beginner

Backpropagation and the Chain Rule: A Simple Visual Guide

Learn how backpropagation works through a simple, step-by-step example. Understand the chain rule intuitively with clear visualizations and working code.

Who is this post for?

The one-line definition

How to read a shape out loud

Why is matrix multiplication so important?

Broadcasting can silently do the wrong thing

Apple Silicon (M1/M2/M3)?

Your next step

Related Articles

PyTorch Autograd: Automatic Differentiation from the Ground Up

Logistic Regression from Scratch in PyTorch: Every Line Explained

Backpropagation and the Chain Rule: A Simple Visual Guide