Skip to main content

What is Kernel Function | Fully Explained

Kernel Function

In this post, We'll know what exactly is kernel function.Kernel Function is used to transform n - dimensional input to m- dimensional input, where m is much higher than n then find the dot product in higher dimensional efficiently.It also helps sometimes to do calculation easily in infinite dimensional space without going to infinite dimensions.
Mathematical definition
K(x, y) = <f(x), f(y)>. Here K is the kernel function, x, y are n dimensional inputs. f is a map from n-dimension to m-dimension space. < x,y> denotes the dot product. usually m is much larger than n.
Intuition:
 Normally calculating <f(x), f(y)> requires us to calculate f(x), f(y) first, and then do the dot product. These two computation steps can be quite expensive as they involve manipulations in m dimensional space, where m can be a large number. But after all the trouble of going to the high dimensional space, the result of the dot product is really a scalar: we come back to one-dimensional space again! Now, the question we have is: do we really need to go through all the trouble to get this one number? do we really have to go to the m-dimensional space? The answer is no, if you find a clever kernel.
Simple Example:
 x = (x1, x2, x3); y = (y1, y2, y3). Then for the function f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3), the kernel is K(x, y ) = (<x, y>)^2.
Let's plug in some numbers to make this more intuitive: suppose x = (1, 2, 3); y = (4, 5, 6). Then:
f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
<f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024
A lot of algebra. Mainly because f is a mapping from 3-dimensional to 9 dimensional space.
Now let us use the kernel instead:
K(x, y) = (4 + 10 + 18 ) ^2 = 32^2 = 1024
Same result, but this calculation is so much easier.
Use Of Kernel Function:
Kernel function is used to find a non linear classifier or non linear regression line.The Idea behind using kernel function is ' A Linear Classifier in higher dimension works as non linear classifier  in lower dimension 
Ex: 
Suppose a 2D line satisfy all the points [x,y] if we transform it to [x^2,x,y] and it becomes a plane which satisfy all the points in 3D , at the same time it is a parabola in 2D, which is non-linear.

Comments

Popular posts from this blog

Linear Regression(With Gradient Descent ) Fully Explained| Machine Learning

Linear Regression (Gradient Descent Method) If you don't have any idea of Gradient Descent Algorithm , please check our previous post , there I have explained Gradient Descent Algorithm very well explained in brief. Now moving toward our current topic Linear Regression . In the previous post , we have just discussed the theory behind the Gradient Descent . Today we will learn Linear Regression where we will use  Gradient Descent to minimize the cost function. WHAT IS LINEAR REGRESSION: Suppose you are given a equation: y=2x1+3x2+4x3+1 and you are said to find the value at any point (1,1,1) corresponds to x1, x2, x3 respectively. You'll simply put the value of x1, x2, x3 into equation and tell me the answer :10,Right? But What if you are given different set of (x1, x2, x3,y) and you are said to find the equation. Here's what,Linear Regression Comes into picture.It helps us to find out or fit a Linear equation to datasets  given. Above equation can be easily tra...

Gradient Descent Algorithm Fully Explained| Machine Learning

GRADIENT DESCENT ALGORITHM When one start Learning For Machine Learning, As a beginner one founds a very complex explanation of Gradient Descent Algorithm and since it is a very important algorithm in machine Learning,understanding it, is also much important. Today , in this article we will try to understand Gradient Descent   in very easy, brief and well explained way. Gradient Descent  is the most common optimization algorithm in  machine learning  and  deep learning . It is a first-order optimization algorithm. This means it only takes into account the first derivative when performing the updates on the parameters. If we start searching for Gradient Descent Algorithm ,we found this picture. Don't panic! Let me explain you z is a cost function of x and y.We have to find those value of x and y for which value of cost function (z) is minimum. If we visualize it, splitted in two part (z VS x for any fix value of y) and (z VS y for any fix value ...