HyperParameter Tuning in Machine Learning

Hyperparameters are the knobs that programmers tweak in machine learning algorithms. Most machine learning programmers spend a fair amount of time tuning / tweaking the HyperParameter

In this tutorial, you will learn about the following HyperParameters:

Learning Rate (α)

Gradient descent takes small steps to reduce the loss of a model. Gradient descent algorithms multiply the gradient by a scalar known as the learning rate (also sometimes called step size) to determine the next point. For example, if the gradient magnitude is 2.5 and the learning rate is 0.01, then the gradient descent algorithm will pick the next point 0.025 away from the previous point.

If you pick a learning rate that is too small, learning will take too long:

Machine learning Learning Rate small

Conversely, if you specify a learning rate that is too large, the next point will perpetually bounce haphazardly across the bottom of the well like a quantum mechanics experiment gone horribly wrong:

Machine learning Learning Rate Too Large

When Learning rate is just right:

Machine learning Learning Rate Just Right

Number of Training Cycles (EPOCHs)

EPOCHs is one of the HyperParameters that you can tweak while training a model. Play around with the number of training cycles to get to the minimal loss from gradient descent

Training Batch Size

You can play around with the number of training examples that you want to use for each training cycle. For example, if you have 1000 training examples, you can choose to pick 100 random samples (batch size=100) for each EPOCH (Training cycle)