In this video we'll be starting with the concept of linear regression. Now, what is linear regression? in linear regression you have a set of independent variables and one dependent variable. There are two types of linear regression one is simple linear regression and one is multiple linear regression. Now, you have to remember that in case of linear regression or in case of classical linear regression model, the dependent variables and the independent variables are linearly related to each other in case of simple linear regression I have one dependent variable and one independent variable and in case of multiple linear regression, I have one dependent variable and multiple independent variables. Now, these independent variables are also called predictor variables or predictors Using the independent variables we will be estimating the value of the dependent variable.
So, in simple linear regression there is one outcome variable with one independent variable and in multiple linear regression there is one outcome variable with multiple independent variables. So, in linear regression analysis, we fit a predictive model to our data and use that model to predict values of the dependent variable from one or more independent variables. Here the dependent variable is linearly related to all the independent variables as I told you simple seeks to predict an outcome variable from a single character variable or independent variables, whereas multiple linear regression seeks to predict an outcome variable from several parameters or several independent variables. So, since the dependent and independent variables are linearly related to each other, therefore, the line of best fit for this model is a straight line. So, the model that we fit here is a linear model linear model just means a model based on the straight line that is the line of business It is a straight line.
So, I have represented the concept of classical linear regression model graphically to see this is the x axis which, which consists of the independent variables and y axis consists of the dependent variable This is the line of best fit Now, what is the standard form of linear regression equation. So, the linear regression equation the standard form of linear regression equation is represented by y equal to a plus b one x one plus b two x two plus b three x three plus bn excellent plus e, where y is the value of the dependent variable for the ayat observation is my intercept or constant b one v two v three v n are the regression coefficients or slopes of my linear regression equation x one x two x three dot Excel are the value of my independent variables and Ei is the error term. So, the error terms means the part of the dependent variable that remains unexplained so from a till be an accident is the part of me explain variation and er is the part of my unexplained variation.
Now, we need to know what are the features of a straight line. So, this straight line is the line of best fit for my classical linear regression model. There are two features first is the slope or gradient of the line. Next is the point where my straight line is cutting the vertical axis that is the y axis and that we call as the intercept of the line what is the method of least squares. The method of least squares is a way of finding the line that best fits the data of all the possible lines that could be drawn. The line of best fit is the one which results in the least amount of difference between the observed data points and the line.
The figure shows that when any line is fitted to a set of data, there will be small differences between the line and the actual data. We are interested in the vertical differences between the line and the actual data because we are using the line to predict the values of y from the value of x, some of those differences are positive, they are above the line indicating that the model underestimates their value and some are negative that is there below the line indicating that the model overestimates the value. So, the ones which are above the line, they are the positive differences, the ones which are below the line, they are the negative differences. Now, what is the goodness of fit for the for the linear regression model you need to remember the goodness of fit for classical linear regression model is measured by r square and r square is the ratio of explained variation by total variation technically, if my value of r square increases and I must say that the value of my explained variation of my model will increase, but if the value or if the number of independent variables of my classical linear regression model increases, automatically the value of my r square will increase, but that will lead to inefficiency of the model if the independent variables are redundant in nature because asked It will not consider the redundancy of the model or it will not consider that which of the independent variables are redundant therefore, r square is not taken as a good measure for goodness of fit for the model.
So, r squared is the ratio of explained variation to the total variation the problem of r square is that if the number of independent variables in the linear regression model are increased, the value of r squared will increase gradually even if redundant variables are taken into account. Hence, these redundant variables does not increase the efficiency of the model. Therefore, r square is not a good measure of goodness of fit for the mode in this case, we will be moving to the concept of adjusted R square adjusted R square is taken as an accurate measure for the goodness of fit of the model, because our adjusted R square is adjusted to the degrees of freedom, which considers only the important and significant variable for the model two this helps to increase the model efficiency. What is the test of significance of the estimated parameters There are two types of tests which is done to check the significance of the estimated parameters.
One is global test and another is local test. Global test transfer overall significance local test transfer individual significance in global test my H naught is all the parameters are equal to zero simultaneously This means they are insignificant equal to zero means they're insignificant and each one is at least one is nonzero. So they're significant this test is conducted by using an F statistic. Next comes the concept of local test this deals with the individual significance of the parameter t test for the individual significance of the parameter where h notice the parameter value is zero that is their insignificant h one is the value is known. So that is their significant when it is equal to zero. When the parameter is equal to zero, we call it as insignificant when it is nonzero we call it a significant and this test is conducted by using a t statistic.
So in this video we will be doing here Let's end this video over here. So goodbye. Have a nice day. Bye See all for the next video.