In my last blog, I explained Linear Regression as follows:-
Y = h(X)
And we mentioned that in case of Linear Regression both h(X) and Y are known and we train an algotithm so that we can predict Y accurately for a new sample of X which is from out test set.
"Cost Function" also called the "Error Function" is Linear Regression is the measure of the error incurred between the value we predicted of the object via the training algorithm and the actual value of the sample. Hence the term cost comes into place. In simple words, it is described as the difference.
(Predicted Value of test Sample - Actual Value of Test Sample)
To explain this concept further, we consider a set of training data represented by following notatations:-
Problem:- We want to predict the sales of a product based on the previous sales pattern. Some of the parameters we can consider for this problem would be:-
Past Sales.
Economic Trends.
Inflation.
Competitor rates.
Sales pattern based on ad campaigns and other modes of publicity.
All the above will form the different variables of our hypothesis.
Formally the cost function will be standard deviation of the predicted values on the entire set from actual values.
The above equation represents what is known as the "Cost Function" or the "Squared Error Cost Function".
For simpicity, let's consider this hypothesis to be a single variable one. This would be called "Univariate Linear Regression". Hence our hypothesis will look like:-
A simplified version of the above equation will be if we consider theta-0=0 then the function will be straight line passing through the origin as in the figure below. This forms the "simplified version of the cost function (Intuition-1)".
We have to choose the values of theta's in a way that the difference below is the minimum.
The second case is when we keep both the parameters theta-0 and theta-1 as non-zero. This forms the "simplified version of the cost function (Intuition-2)".
Below is a graphical representation of the way density estimation takes place around different sets of values for theta-0 and theta-1. This leads to a displacement density graph similar to the one below. It is an approximation for representation.
The Cost Function is the part of one of most significant algorithms in Linear Regression which is called the Gradient Descent Algorithm. The goal of Gradient Descent algorithm is to keep changing the values of theta-0 and theta-1 for minimization of cost function. Understanding of the core concept of cost function forms the basis of our very first Machine Learning algorithms that will be learn about in the next blog.
Till then keep visiting and let me know your thoughts on what I can improve and explain better so it helps you. You can connect with me on (2) Priyadarshani Pandey | LinkedIn OR email on on priyadarshani.pandey@gmail.com