Linear Regression

3 min readJul 5, 2020

Linear Regression is one of the simpler models in Machine learning or applied statistics. It can be used to predict a variable(target value)based on the input features.

Let’s say you would like to determine the price of a car using variables such as car width , engine type etc. We can achieve this using Linear regression . Let’s dive deep and understand this algorithm further.

THE EQUATION

A linear regression model makes prediction by simply computing a weighted sum of the input features ,plus a constant called bias term (also called intercept term) as shown in the equation above.

1 . y is the predicted value.
2. n is the number of features
3. Xn is the nth feature value.
4.Beta n is the nth model parameter.

To write this in a vectored form

Beta is the model’s parameter vector
X is the feature vector (having X0 to Xn)

HOW DOES IT WORK?

Training a model means setting the parameters so that the model best fits the data we have . To do this , we must first have a measure that can tell us how well or badly a model fits the data. The most preferred performance measure for a regression equation is Root Mean Squared Error (RMSE). To train a regression model , we have to find the coefficients(beta value) that minimizes the Root Mean Squared Error or the Mean Squared Error.This is also known as a Cost Function.

HOW DO WE REDUCE THE COST FUNCTION ?

There are mainly 2 ways to reduce the cost Function

Closed form solution
Iterative Form solution

CLOSED FORM SOLUTION

In closed form solution ,We find the minimum value of a cost function by equating the first differential to 0.

We can also use a mathematical equation that can give the result directly.This is known as a Normal Equation.

ITERATIVE FORM SOLUTION

In this case we iteratively move towards the minimum value.

Ex: Gradient decent.

What is Gradient Descent ?

Gradient descent is an iterative method of optimizing an objective function, in our case the cost function, by moving toward the negative of the gradient.

Where η is known as the learning rate, which defines the speed at which we want to move towards negative of the gradient.

What is the Gauss-Markov theorem?

The Gauss Markov theorem tells us that if a certain set of assumptions are met, the ordinary least squares estimate for regression coefficients gives you the best linear unbiased estimate (BLUE) possible. They can be considered as Full ideal conditions for OLS.

Assumptions :

1. Random data sampling : Our data must have been randomly sampled from the population data.
2. Non-Collinearity: The regressors that are being calculated should not be perfectly correlated with each other
3. Linearity : The parameters we are estimating must be linear.
4. All independent variables are uncorrelated with the error term.
5. Observations of the error term are uncorrelated with each other
6. Homoscedasticity : The error terms have constant variance.

This is a brief about Linear regression . Cost functions , gradient descent and solving regression problems using Python has been explained in other posts.