Stochastic Gradient Descent Algorithm Fully Explained

Stochastic Gradient Descent Algorithm Fully Explained | Machine Learning

STOCHASTIC GRADIENT DESCENT

STOCHASTIC GRADIENT DESCENT is an efficient algorithm over GRADIENT DESCENT when it requires to deal with BIG DATA.Where Data are huge STOCHASTIC GRADIENT DESCENT is used.

In our Previous post, We already discussed about GRADIENT DESCENT (Click Here) very well.In this post, we will try to understand STOCHASTIC GRADIENT DESCENT. Both are almost same , only difference comes while iterating:

In Gradient Descent ,We had four things

Feature Vector(X)

Label(Y)

Cost function(J)

Predicted Value(Y_p)

θ was representing the coefficient/Weightage vector for feature vector,
θ₀Offset Parameter

Y_p=θ.X+θ₀

θ_new=θ_old-(η∂J/∂θ)

The Single Difference between Gradient Descent and Stochastic Gradient Descent comes while iterating:

In Gradient Descent ,

We sum up the losses over all the data points given and take average in our cost function, Something like this:

J=(1/n)ΣLoss(Yⁱ,Y_pⁱ)

∂J/∂θ=(1/n)∂∕∂θ(ΣLoss(Yⁱ,Y_pⁱ))

And in each iteration, for calculating ∂J/∂θ, we consider all the points. Stochastic Gradient Descent makes it easy, in each iteration we choose any i∈{1,2,3,...,n} randomly and calculate ∂J/∂θ only for that point instead of summing up and taking average after that updates θ.
∂J/∂θ=(1/n)∂∕∂θ(Loss(Yⁱ,Y_pⁱ))

θ_new=θ_old-(η∂J/∂θ)

So, In Stochastic Gradient Descent Method, In every iteration, we update the value of theta according to a single random point only.

As Stochastic means 'Random'.

Comments

Perceptron Algorithm | Pegasos Algorithm | Support Vector Machine | Linear Classifier Among Dataset | Deep Learning | Machine Learning

PERCEPTRON ALGORITHM Perceptron Algorithm is one of the most used algorithm in classifying data.It frequently seems little tough while learning first time. In today's post , We'll try to understand it well in brief with full explanation. Perceptron Algorithm is used whenever given dataset can be classified into two parts or it has only two labels.(One can consider it as +1, -1) There are three version of Perceptron Algorithm: Simple Perceptron Average Percepron Pegasos Algorithm or Support Vector Machine Relax! Terms are sounding dangerous but they are really very easy.Let me explain you. 1.Simple Perceptron: The single target of perceptron algorithm is to find a Linear classifier( say Linear equation) on one side of which are all positively labelled point and on another side all negatively labelled points are there. As we know any linear equation can be written as, Y= θ .X + θ 0 Using Perceptron Our Aim is to find those value of θ vec...

Gradient Descent Algorithm Fully Explained| Machine Learning

GRADIENT DESCENT ALGORITHM When one start Learning For Machine Learning, As a beginner one founds a very complex explanation of Gradient Descent Algorithm and since it is a very important algorithm in machine Learning,understanding it, is also much important. Today , in this article we will try to understand Gradient Descent in very easy, brief and well explained way. Gradient Descent is the most common optimization algorithm in machine learning and deep learning . It is a first-order optimization algorithm. This means it only takes into account the first derivative when performing the updates on the parameters. If we start searching for Gradient Descent Algorithm ,we found this picture. Don't panic! Let me explain you z is a cost function of x and y.We have to find those value of x and y for which value of cost function (z) is minimum. If we visualize it, splitted in two part (z VS x for any fix value of y) and (z VS y for any fix value ...

What is Kernel Function | Fully Explained

Kernel Function In this post, We'll know what exactly is kernel function. Kernel Function is used to transform n - dimensional input to m- dimensional input , where m is much higher than n then find the dot product in higher dimensional efficiently.It also helps sometimes to do calculation easily in infinite dimensional space without going to infinite dimensions. Mathematical definition : K(x, y) = <f(x), f(y)> . Here K is the kernel function, x, y are n dimensional inputs. f is a map from n-dimension to m-dimension space. < x,y> denotes the dot product. usually m is much larger than n. Intuition : Normally calculating <f(x), f(y)> requires us to calculate f(x), f(y) first, and then do the dot product. These two computation steps can be quite expensive as they involve manipulations in m dimensional space, where m can be a large number. But after all the trouble of going to the high dimensional space, the result of the dot product is really a ...

4WallSpace ||A place to go parallel with latest emerging technologies

Search This Blog