AI for Product Managers : Linear Regression Residuals
In this session of AI for Prod Managers , lets explore residuals in Linear regression .
What is a Residual ?
The difference between the predicted value and the original observed value of the dependent variable is known as a residual.There are few assumptions regarding the residuals of a linear regression, let’s discuss them.
Assumptions of Residuals
There are 4 main assumptions regarding residuals
Linear relationship between residuals and Y (dependent variable)
Error terms are normally distributed,with mean zero
Error terms are independent of each other
Homoscedasticity
Linear Relationship
The residuals and the dependent values y values must have a linear relation.We draw a scatter plot of residuals and Y value , if a linear trend is observed , that means the assumption is satisfied.
Normal Distribution
The residuals must be normally distributed with mean as 0.We draw a distribution plot of the residuals.If the residuals is not skewed ,it means that the assumption is satisfied.
Independence
The residuals should not have any dependence between each other . We should not observe any trend in the residuals obtained .A scatter plot of the residuals can help us understand this better.
Homoscedasticity
Error terms must have constant variance. We look at the scatter plot which we drew for linearity (see above) — i.e. y on the vertical axis, and standardized residuals on the x axis. If the residuals do not fan out as the predicted values increase that means that the equal variance assumption is met.
This is a brief about Linear regression residuals. Cost functions , gradient descent and solving regression problems using Python has been explained in other posts.