STOCHASTIC GRADIENT DESCENT
STOCHASTIC GRADIENT DESCENT is an efficient algorithm over GRADIENT DESCENT when it requires to deal with BIG DATA.Where Data are huge STOCHASTIC GRADIENT DESCENT is used.
In our Previous post, We already discussed about GRADIENT DESCENT (Click Here) very well.In this post, we will try to understand STOCHASTIC GRADIENT DESCENT. Both are almost same , only difference comes while iterating:
In Gradient Descent ,We had four things
- Feature Vector(X)
- Label(Y)
- Cost function(J)
- Predicted Value(Yp)
θ was representing the coefficient/Weightage vector for feature vector,
θ0 Offset Parameter
Yp=θ.X+θ0
θnew=θold-(η*∂J/∂θ)
The Single Difference between Gradient Descent and Stochastic Gradient Descent comes while iterating:
In Gradient Descent ,
We sum up the losses over all the data points given and take average in our cost function, Something like this:
J=(1/n)ΣLoss(Yi,Ypi)
∂J/∂θ=(1/n)∂∕∂θ(ΣLoss(Yi,Ypi))
And in each iteration, for calculating ∂J/∂θ, we consider all the points. Stochastic Gradient Descent makes it easy, in each iteration we choose any i∈{1,2,3,...,n} randomly and calculate ∂J/∂θ only for that point instead of summing up and taking average after that updates θ.
∂J/∂θ=(1/n)∂∕∂θ(Loss(Yi,Ypi))
θnew=θold-(η*∂J/∂θ)
So, In Stochastic Gradient Descent Method, In every iteration, we update the value of theta according to a single random point only.
As Stochastic means 'Random'.
θ0 Offset Parameter
The Single Difference between Gradient Descent and Stochastic Gradient Descent comes while iterating:
In Gradient Descent ,
We sum up the losses over all the data points given and take average in our cost function, Something like this:
J=(1/n)ΣLoss(Yi,Ypi)
∂J/∂θ=(1/n)∂∕∂θ(ΣLoss(Yi,Ypi))
And in each iteration, for calculating ∂J/∂θ, we consider all the points. Stochastic Gradient Descent makes it easy, in each iteration we choose any i∈{1,2,3,...,n} randomly and calculate ∂J/∂θ only for that point instead of summing up and taking average after that updates θ.
∂J/∂θ=(1/n)∂∕∂θ(Loss(Yi,Ypi))
θnew=θold-(η*∂J/∂θ)
So, In Stochastic Gradient Descent Method, In every iteration, we update the value of theta according to a single random point only.
Comments
Post a Comment