Skip to main content

Gradient Descent Algorithm Fully Explained| Machine Learning

GRADIENT DESCENT ALGORITHM

When one start Learning For Machine Learning, As a beginner one founds a very complex explanation of Gradient Descent Algorithm and since it is a very important algorithm in machine Learning,understanding it, is also much important. Today , in this article we will try to understand Gradient Descent  in very easy, brief and well explained way.

Gradient Descent is the most common optimization algorithm in machine learning and deep learning. It is a first-order optimization algorithm. This means it only takes into account the first derivative when performing the updates on the parameters.
If we start searching for Gradient Descent Algorithm ,we found this picture.
Don't panic! Let me explain you
z is a cost function of x and y.We have to find those value of x and y for which value of cost function (z) is minimum.

If we visualize it, splitted in two part (z VS x for any fix value of y) and (z VS y for any fix value of x)  it looks easier.
 

Gradient Descent Algorithm  provides an efficient way to find the optimum value of y and x (or all the features on which cost function depends), which will minimize the value of cost function(z).

Points to Know Before Proceeding:

  • This is an iterative method ,you would have familiarity with iterative method(If not, no need to panic).Iterative method is which,where we start with pre-assumed value of any variable and iterate over it,find new values and get closer to desired value
  •  From any point, if we go opposite direction of slope,we will tend toward minima of that region.This is the key behind Gradient descent algorithm.If I plot above two graphs again with arrow direction, you'll understand.


From Above graphs, It is quite clear that if we go opposite to the slope from any point, we will tend toward minima point.
But now what matters is, how much distance we'll take in single step, If we take a quite big step in opposite direction, we may cross the minima point.
Fig: Jump Over minima

  •  The Step size is quantified by a variable called 'learning rate(η)'. We choose a value of  'η' such that, while iterating, we don't jump over the minima .It is generally taken 0.001(we can change it depends upon data-set given)

Finally the iterating formula for Gradient Descent Algorithm:

(y)new=(y)old η*(∂j/∂y)

(x)new=(x)old η*(∂j/∂x)

In General:

(θ)new=(θ)old η*∇θ

Where θ = [θ12, ... ,θn]

∇θ = Gradient of θ= [∂z/∂θ1,∂z/∂θ2, ... ,∂z/∂θn]




Above is Called Gradient Descent Algorithm. In next post, we will discuss about implementation of it.We will try to understand Linear Regression ( fitting a line to given dataset ) using Gradient Descent.


If You have any doubt , Ask us in comment. Also u can mail us at 4wallspace@gmail.com





Comments

Popular posts from this blog

Perceptron Algorithm | Pegasos Algorithm | Support Vector Machine | Linear Classifier Among Dataset | Deep Learning | Machine Learning

PERCEPTRON ALGORITHM Perceptron Algorithm is one of the most used algorithm in classifying data.It frequently seems little tough while learning first time. In today's post , We'll try to understand it well in brief with full explanation.  Perceptron Algorithm is used whenever given dataset can be classified into two parts or it has only two labels.(One can consider it as +1, -1) There are three version of Perceptron Algorithm: Simple Perceptron  Average Percepron Pegasos Algorithm or Support Vector  Machine Relax! Terms are sounding dangerous but they are really very easy.Let me explain you. 1.Simple Perceptron: The single target of perceptron algorithm is to find a Linear classifier( say Linear equation) on one side of which are all positively labelled point and on another side all negatively labelled points are there. As we know any linear equation can be written as, Y= θ .X + θ 0 Using Perceptron Our Aim is to find those value of θ vec...

Stochastic Gradient Descent Algorithm Fully Explained | Machine Learning

STOCHASTIC GRADIENT DESCENT STOCHASTIC GRADIENT DESCENT  is an efficient algorithm over GRADIENT DESCENT when it requires to deal with BIG DATA.Where Data are huge STOCHASTIC GRADIENT DESCENT is used. In our Previous post, We already discussed about GRADIENT DESCENT ( Click Here ) very well.In this post, we will try to understand STOCHASTIC GRADIENT DESCENT.  Both are almost same , only difference comes while iterating: In Gradient Descent ,We had four things Feature Vector(X) Label(Y) Cost function(J) Predicted Value( Y p) θ was representing the coefficient/Weightage vector for feature vector,  θ 0   Offset Parameter Y p =θ.X+θ 0 θ new =θ old -(η*∂J/∂θ) The Single Difference between Gradient Descent and Stochastic Gradient Descent comes while iterating: In Gradient Descent ,   We sum up the losses over all the data points given and take average in our cost function, Something like this: J=(1/n)ΣLoss(Y i ,Y p i ) ∂J/∂θ=(1...