beginnerMachine Learning Basics Tutorial

ML Hyperparameters Explained for Beginners: Learning Rate, Epochs, Batch Size, L2, and Seed

A beginner-friendly explanation of core machine learning hyperparameters — learning rate, epochs, batch size, L2 regularization, and random seed — with simple examples and every important term explained clearly.

HaneeshApril 19, 2026

If you are just starting machine learning, words like learning rate, epochs, batch size, regularization, and seed can feel technical very quickly. But the ideas behind them are actually simple. They are just settings you choose before training starts, and they control how the model learns.

This post explains five common ML hyperparameters in the simplest possible way: lr, epochs, batch_size, l2_lambda, and seed. We will also explain every related term we use, so nothing feels like hidden jargon.

The one idea to hold onto

Hyperparameters are settings chosen by you before training starts. The model then uses those settings while learning from data.

Before the hyperparameters: a few words you must know

A machine learning model is a system that learns patterns from data and then uses those patterns to make predictions. A prediction could be something like "spam or not spam," "house price," or "which category this text belongs to."

A dataset is a collection of examples. Each example usually has an input and a correct output. For example, if we are predicting exam results, an input might be hours studied = 5, and the output might be passed = yes.

A model learns by adjusting internal numbers called parameters. In many models, the most important parameters are called weights. A weight is just a number inside the model that controls how strongly the model reacts to some pattern in the input.

There is an important difference between parameters and hyperparameters. Parameters are learned by the model during training. Hyperparameters are chosen by you before training begins.

Term	Meaning
Parameters	Numbers the model learns during training, such as weights
Hyperparameters	Settings you choose before training, such as learning rate or batch size

Think of cooking. The food changing while it cooks is like the model's parameters changing during training. The oven temperature and cooking time are like hyperparameters: you choose them before the cooking starts.

1) Learning rate (`lr`)

The learning rate tells the model how big a step to take when it updates its weights.

To understand that sentence, we need two more words: error and update. Error means the difference between the model's prediction and the correct answer. An update means changing the model's weights to try to reduce that error.

Suppose the correct answer is 10, but the model predicts 7. The model is wrong. Training tries to reduce that wrongness by changing the weights a little. The learning rate decides whether that change should be big or small.

Learning rate = step size while learning
— the beginner-friendly definition

A large learning rate means the model takes bigger jumps. A small learning rate means the model takes smaller, gentler steps.

A simple analogy: imagine you are trying to stand exactly on a line painted on the floor. If you take huge jumps, you may keep crossing past the line. If you take tiny steps, you move safely but slowly. The learning rate controls that step size.

What does "overshoot the minimum" mean?

A minimum is the lowest point of error. To overshoot means to go past it. If the learning rate is too big, the model can keep jumping across the best point instead of settling near it.

Learning rate value	What it usually feels like
0.1	Aggressive, bigger jumps
0.01	Moderate, common starting point
0.001	Gentle, smaller steps

If the learning rate is too high, training can become unstable. If it is too low, training can become painfully slow.

2) Epochs

An epoch is one full pass through the entire training dataset.

Suppose your training dataset has 100 examples. If the model sees all 100 examples once, that is 1 epoch. If it sees all 100 examples again, that is 2 epochs. So epochs = 100 means the model goes through the full training data 100 times.

A good beginner analogy is flashcards. If you have 20 flashcards and review all 20 once, that is one pass. Review all 20 again, that is another pass. In ML, each full pass is called an epoch.

Epoch = one complete pass through all training examples
— the simple definition

Why do we need multiple epochs? Because the model usually does not learn everything from one pass. It often needs to see the same data many times to slowly improve its weights.

Too few vs too many epochs

Too few epochs can mean the model has not learned enough. Too many epochs can mean it starts memorizing the training data too closely instead of learning general patterns.

When the model has not learned enough, that is called underfitting. When it memorizes the training data too much and performs poorly on new data, that is called overfitting.

Situation	What it means
Underfitting	The model has not learned the pattern well enough
Overfitting	The model learned the training data too specifically and may fail on new unseen data

3) Batch size (`batch_size`)

The batch size is how many training examples the model processes before it updates the weights.

Suppose you have 100 training examples and batch_size = 10. That means the model looks at 10 examples, computes how wrong it was on those 10, updates the weights, then moves to the next 10.

Each small group of examples is called a batch. So if you have 100 examples and a batch size of 10, you will have 10 batches in one epoch.

Why not use all examples at once every time? Sometimes you can, but smaller groups are often more practical. They use less memory and allow the model to update more often.

You will also hear the word gradient here. A gradient is information that tells the model which direction to change the weights, and roughly how strongly to change them.

What does "smoother gradients" mean?

With bigger batches, the gradient is based on more examples, so it is usually more stable and less noisy. With smaller batches, updates can bounce around more because they are based on fewer examples.

Batch size	Common effect
Small batch	Faster updates, more noise, less memory needed
Large batch	Smoother updates, more stability, more memory needed

A simple analogy: asking 2 people for feedback on a product gives a noisy opinion. Asking 200 people gives a more stable average. Small batches are like asking a few people. Large batches are like asking many people.

4) L2 regularization (`l2_lambda`)

L2 regularization adds a penalty when the model's weights become too large.

To understand why that matters, remember overfitting: sometimes a model becomes too eager to match the training data exactly. One sign of this can be very large weights. Large weights can make the model too sensitive, so tiny input changes produce very large output changes.

Regularization means adding a rule that says: "fit the data, but also try to stay simple." In L2 regularization, staying simple usually means preferring smaller weights.

L2 regularization = penalty on large weights
— the simple intuition

The value l2_lambda controls how strong that penalty is. A small l2_lambda means a weak penalty. A large l2_lambda means a strong penalty.

What if the penalty is too strong?

If l2_lambda is too high, the model may be forced to keep weights too small and become too simple to learn the real pattern. That can lead to underfitting.

A beginner analogy: imagine packing a bag for school. You want enough things to do the job, but not so many that the bag becomes heavy and messy. L2 regularization is like a rule that discourages carrying too much weight unless it is truly needed.

L2 setting	Likely effect
Very low `l2_lambda`	Little protection against overfitting
Moderate `l2_lambda`	Helpful pressure toward simpler weights
Very high `l2_lambda`	Model may become too simple and underfit

5) Seed

A seed is a starting number used to control randomness in a program.

Machine learning often involves randomness. For example, the model's starting weights may be random. The training examples may be shuffled randomly. Some algorithms may randomly sample data during training.

If you do not fix the seed, two runs of the same code can produce slightly different results. If you do fix the seed, the results become much more repeatable.

Same seed = same randomness pattern
— why seeds matter

This matters a lot for debugging and reproducibility. Debugging means finding out why something is wrong. Reproducibility means being able to run the same experiment again and get the same result.

Why beginners should always care about seed

If your results change every run, it becomes much harder to know whether your code changed something important or randomness changed the result. Fixing the seed removes one source of confusion.

A simple analogy is shuffling a deck of cards. Without a seed, every shuffle is different. With a fixed seed, you can make the shuffle happen in the same way every time.

One tiny example putting all five together

Imagine we are training a model to predict whether a student will pass an exam.

lr = 0.01 means the model changes its weights with moderately small steps
epochs = 100 means the model sees the full training dataset 100 times
batch_size = 32 means it processes 32 examples before each weight update
l2_lambda = 0.001 means it applies a small penalty to very large weights
seed = 42 means the random parts of training are made repeatable

None of these numbers are magic by themselves. They are settings you tune based on the problem, the dataset, and the model. But understanding what each one does is the first step toward making good choices.

Quick summary table

Hyperparameter	Simple meaning	Why it matters
`lr`	How big each learning step is	Too big can overshoot, too small can be slow
`epochs`	How many full passes through the dataset	Controls how long the model keeps practicing on the data
`batch_size`	How many examples are used before each update	Affects stability, speed, and memory use
`l2_lambda`	How strong the penalty on large weights is	Helps reduce overfitting
`seed`	The number that controls randomness	Helps make experiments repeatable

Final intuition

Think of training like practicing basketball shots.

Learning rate = how much you change your shooting style after each miss
Epochs = how many full practice rounds you do
Batch size = how many shots you watch before deciding what to adjust
L2 regularization = avoiding wild, extreme movements that only work for a few cases
Seed = making the practice setup repeatable so you can compare sessions fairly

These are some of the most common machine learning basics you will see in tutorials, research code, and production systems. Once these ideas click, many training loops stop looking mysterious.

Takeaways

Hyperparameters are settings chosen before training. The model then learns within those rules.
Learning rate controls step size. Big steps can be unstable; tiny steps can be slow.
Epochs tell you how many times the model sees the full training data.
Batch size controls how many examples are used before each update.
L2 regularization helps prevent overfitting by discouraging very large weights.
Seed helps make experiments repeatable, which is critical for debugging and fair comparison.

See These Hyperparameters in Action

Want to see how these hyperparameters work in real code? Check out Logistic Regression from Scratch in PyTorch for a complete implementation with learning rate, epochs, batch size, and L2 regularization explained line-by-line.

#machine-learning #regularization

advanced

Phase 2: Agent Architecture — ReAct, Planning, Memory & Frameworks

A comprehensive 8-week deep dive into building AI agents from scratch — ReAct loops, planning patterns, memory systems, and frameworks like LangGraph and AutoGen. Build it yourself before you abstract it away.

intermediate

Phase 1: Core Foundations of LLM Engineering — APIs, Prompts, Tools & RAG

A comprehensive 8-week roadmap covering LLM APIs, prompt engineering, function calling, tool use, and retrieval-augmented generation — everything you need to build production AI applications.