Back to articles
ML Hyperparameters Explained for Beginners: Learning Rate, Epochs, Batch Size, L2, and Seed

ML Hyperparameters Explained for Beginners: Learning Rate, Epochs, Batch Size, L2, and Seed

A beginner-friendly explanation of core machine learning hyperparameters — learning rate, epochs, batch size, L2 regularization, and random seed — with simple examples and every important term explained clearly.

9 min read

If you are just starting machine learning, words like learning rate, epochs, batch size, regularization, and seed can feel technical very quickly. But the ideas behind them are actually simple. They are just settings you choose before training starts, and they control how the model learns.

This post explains five common ML hyperparameters in the simplest possible way: lr, epochs, batch_size, l2_lambda, and seed. We will also explain every related term we use, so nothing feels like hidden jargon.

The one idea to hold onto

Hyperparameters are settings chosen by you before training starts. The model then uses those settings while learning from data.

A machine learning model is a system that learns patterns from data and then uses those patterns to make predictions. A prediction could be something like "spam or not spam," "house price," or "which category this text belongs to."

A dataset is a collection of examples. Each example usually has an input and a correct output. For example, if we are predicting exam results, an input might be hours studied = 5, and the output might be passed = yes.

A model learns by adjusting internal numbers called parameters. In many models, the most important parameters are called weights. A weight is just a number inside the model that controls how strongly the model reacts to some pattern in the input.

There is an important difference between parameters and hyperparameters. Parameters are learned by the model during training. Hyperparameters are chosen by you before training begins.

TermMeaning
ParametersNumbers the model learns during training, such as weights
HyperparametersSettings you choose before training, such as learning rate or batch size

Think of cooking. The food changing while it cooks is like the model's parameters changing during training. The oven temperature and cooking time are like hyperparameters: you choose them before the cooking starts.

The learning rate tells the model how big a step to take when it updates its weights.

To understand that sentence, we need two more words: error and update. Error means the difference between the model's prediction and the correct answer. An update means changing the model's weights to try to reduce that error.

Suppose the correct answer is 10, but the model predicts 7. The model is wrong. Training tries to reduce that wrongness by changing the weights a little. The learning rate decides whether that change should be big or small.

Learning rate = step size while learning

the beginner-friendly definition

A large learning rate means the model takes bigger jumps. A small learning rate means the model takes smaller, gentler steps.

A simple analogy: imagine you are trying to stand exactly on a line painted on the floor. If you take huge jumps, you may keep crossing past the line. If you take tiny steps, you move safely but slowly. The learning rate controls that step size.

What does "overshoot the minimum" mean?

A **minimum** is the lowest point of error. To **overshoot** means to go past it. If the learning rate is too big, the model can keep jumping across the best point instead of settling near it.

Learning rate valueWhat it usually feels like
0.1Aggressive, bigger jumps
0.01Moderate, common starting point
0.001Gentle, smaller steps

If the learning rate is too high, training can become unstable. If it is too low, training can become painfully slow.

An epoch is one full pass through the entire training dataset.

Suppose your training dataset has 100 examples. If the model sees all 100 examples once, that is 1 epoch. If it sees all 100 examples again, that is 2 epochs. So epochs = 100 means the model goes through the full training data 100 times.

A good beginner analogy is flashcards. If you have 20 flashcards and review all 20 once, that is one pass. Review all 20 again, that is another pass. In ML, each full pass is called an epoch.

Epoch = one complete pass through all training examples

the simple definition

Why do we need multiple epochs? Because the model usually does not learn everything from one pass. It often needs to see the same data many times to slowly improve its weights.

Too few vs too many epochs

Too few epochs can mean the model has not learned enough. Too many epochs can mean it starts memorizing the training data too closely instead of learning general patterns.

When the model has not learned enough, that is called underfitting. When it memorizes the training data too much and performs poorly on new data, that is called overfitting.

SituationWhat it means
UnderfittingThe model has not learned the pattern well enough
OverfittingThe model learned the training data too specifically and may fail on new unseen data

The batch size is how many training examples the model processes before it updates the weights.

Suppose you have 100 training examples and batch_size = 10. That means the model looks at 10 examples, computes how wrong it was on those 10, updates the weights, then moves to the next 10.

Each small group of examples is called a batch. So if you have 100 examples and a batch size of 10, you will have 10 batches in one epoch.

Why not use all examples at once every time? Sometimes you can, but smaller groups are often more practical. They use less memory and allow the model to update more often.

You will also hear the word gradient here. A gradient is information that tells the model which direction to change the weights, and roughly how strongly to change them.

What does "smoother gradients" mean?

With bigger batches, the gradient is based on more examples, so it is usually more stable and less noisy. With smaller batches, updates can bounce around more because they are based on fewer examples.

Batch sizeCommon effect
Small batchFaster updates, more noise, less memory needed
Large batchSmoother updates, more stability, more memory needed

A simple analogy: asking 2 people for feedback on a product gives a noisy opinion. Asking 200 people gives a more stable average. Small batches are like asking a few people. Large batches are like asking many people.

L2 regularization adds a penalty when the model's weights become too large.

To understand why that matters, remember overfitting: sometimes a model becomes too eager to match the training data exactly. One sign of this can be very large weights. Large weights can make the model too sensitive, so tiny input changes produce very large output changes.

Regularization means adding a rule that says: "fit the data, but also try to stay simple." In L2 regularization, staying simple usually means preferring smaller weights.

L2 regularization = penalty on large weights

the simple intuition

The value l2_lambda controls how strong that penalty is. A small l2_lambda means a weak penalty. A large l2_lambda means a strong penalty.

What if the penalty is too strong?

If `l2_lambda` is too high, the model may be forced to keep weights too small and become too simple to learn the real pattern. That can lead to underfitting.

A beginner analogy: imagine packing a bag for school. You want enough things to do the job, but not so many that the bag becomes heavy and messy. L2 regularization is like a rule that discourages carrying too much weight unless it is truly needed.

L2 settingLikely effect
Very low `l2_lambda`Little protection against overfitting
Moderate `l2_lambda`Helpful pressure toward simpler weights
Very high `l2_lambda`Model may become too simple and underfit

A seed is a starting number used to control randomness in a program.

Machine learning often involves randomness. For example, the model's starting weights may be random. The training examples may be shuffled randomly. Some algorithms may randomly sample data during training.

If you do not fix the seed, two runs of the same code can produce slightly different results. If you do fix the seed, the results become much more repeatable.

Same seed = same randomness pattern

why seeds matter

This matters a lot for debugging and reproducibility. Debugging means finding out why something is wrong. Reproducibility means being able to run the same experiment again and get the same result.

Why beginners should always care about seed

If your results change every run, it becomes much harder to know whether your code changed something important or randomness changed the result. Fixing the seed removes one source of confusion.

A simple analogy is shuffling a deck of cards. Without a seed, every shuffle is different. With a fixed seed, you can make the shuffle happen in the same way every time.

Imagine we are training a model to predict whether a student will pass an exam.

  • lr = 0.01 means the model changes its weights with moderately small steps
  • epochs = 100 means the model sees the full training dataset 100 times
  • batch_size = 32 means it processes 32 examples before each weight update
  • l2_lambda = 0.001 means it applies a small penalty to very large weights
  • seed = 42 means the random parts of training are made repeatable

None of these numbers are magic by themselves. They are settings you tune based on the problem, the dataset, and the model. But understanding what each one does is the first step toward making good choices.

HyperparameterSimple meaningWhy it matters
`lr`How big each learning step isToo big can overshoot, too small can be slow
`epochs`How many full passes through the datasetControls how long the model keeps practicing on the data
`batch_size`How many examples are used before each updateAffects stability, speed, and memory use
`l2_lambda`How strong the penalty on large weights isHelps reduce overfitting
`seed`The number that controls randomnessHelps make experiments repeatable

Think of training like practicing basketball shots.

  • Learning rate = how much you change your shooting style after each miss
  • Epochs = how many full practice rounds you do
  • Batch size = how many shots you watch before deciding what to adjust
  • L2 regularization = avoiding wild, extreme movements that only work for a few cases
  • Seed = making the practice setup repeatable so you can compare sessions fairly

These are some of the most common machine learning basics you will see in tutorials, research code, and production systems. Once these ideas click, many training loops stop looking mysterious.

  1. Hyperparameters are settings chosen before training. The model then learns within those rules.
  2. Learning rate controls step size. Big steps can be unstable; tiny steps can be slow.
  3. Epochs tell you how many times the model sees the full training data.
  4. Batch size controls how many examples are used before each update.
  5. L2 regularization helps prevent overfitting by discouraging very large weights.
  6. Seed helps make experiments repeatable, which is critical for debugging and fair comparison.

Related Articles