We had started with the practical session of linear regression where we were using a customer satisfaction data sets whose name was linear underscore reg underscore retail, the data set consists of 14 variables, where the first variable is the dependent variable that is customer satisfaction, all the variables are independent variables. So, we are going to predict the value of customer satisfaction that is a dependent variable using the 13 independent variables. So, in our last video, we have done regression procedure and we have checked for multicollinearity for the independent variables using the keyword VI, which stands for variance inflation factor, and we had found out that delivery speed was the independent variable with the maximum value of Vi F, showing maximum multicollinearity. So, So, we have removed the variable delivery speed from the set of independent variables and again, we had executed the regression procedure and used the keyword vi to check for multicollinearity for the rest of the independent variables from product quality to price flexibility, and we found out that the VA values were quite low showing that there is less multicollinearity which is the actual assumption for classical linear regression model that is multicollinearity for the data should be low.
So our objective of minimum multicollinearity of our data is fulfilled. Now in this video, we will be checking for autocorrelation and heteroscedasticity. So for that, we will be doing the code proc reg. Data equals my lip one dot mileage one is a library named linear underscore, reg. underscore read. This is the name of my data set model is the key word to build the classical linear regression model where my dependent variable is customer satisfaction.
Then I'm specifying the range of the independent variables starting from product quality. Till price flexibility, we have to specify the names of the independent variables or names of any of the variable. The weight is given in the data set otherwise SAS will not identify them. So we have to write exactly the way it is given the data set the variables, the variable names. product quality is my first variable that is first independent variable price price flexibility is my last independent variable because we have removed from Dell very sweet. We are using the keyword bw to check for autocorrelation VW stands for Durbin Watson test and you will know Durbin Watson test which is used to check for correlation there the H naught the H naught of the null hypothesis for the Durbin Watson test is there is no correlation for the data an alternative hypothesis that is each one is there is an autocorrelation for the data so, let's run this code whichever autocorrelation we are doing the dorsum test.
So, this is manda Watson test results. The leader blue statistic values disappeared that is 2.377 as you move that If our dw statistic value lies between 1.5 to 2.5, then it is said that there is no autocorrelation here the p value for the Durbin Watson test is not displayed. So, from the DW statistic only, we have to draw the interpretation which is showing that there is no correlation since the DW statistic value that is 2.377 is falling between the range 1.5 to 2.5. This is my parameter estimates table, my Aspen adjusted R square values are the same, that is 80% username percent, this is my analysis of variance table and the number of observations in many cases it is 200. So, these mean number of observations are only used to run the regression procedures. Now, since in my dw tester devinwatson test, we did not get the p value.
Therefore, we will be doing another test to check for autocorrelation and heteroscedasticity. And that test is called spec Test specification test. So let's do the spec test to specification test we are going to use the same procedure that is proc freq. Data equals my library name is my lip, one dot dataset name is linear. underscore, reg. underscore retail mall there is a keyword perform the classical linear regression model.
Customer satisfaction is a dependent variable. The range of independent variable starts from product quality, price, flexibility We have to write the variables. Name, the weight is given in the data set. I'm not specifying the individual variable names. Instead of that, we are just specifying the range of independent variables. My key word to do the test for autocorrelation and heteroscedasticity spec stands for specification test.
Okay. Cross specification test, my null hypothesis is there is no heteroscedasticity and autocorrelation for the data and the alternative hypothesis is there is an heteroscedasticity and autocorrelation for the data. So, let's run this code and do the specification test. So, this is my specification test results. See, the p value is displayed that is 0.1030 as you know The default level of significance in SAS is 0.05 that is 5% level of significance and if our P value is greater than the level of significance, then we accept the null hypothesis. So, here will be 00 point 1030 which is greater than 0.05.
So, I will be accepting the value purchases, that is my data does not have any heteroscedasticity and autocorrelation that is an analysis perspective that that is there is no heteroscedasticity and autocorrelation This is an environmental estimates table that is these are the independent variables that are used to the regression procedure. I got the R square adjusted R squared value that is it's the same 80% is entering point one 6%. We also got the analysis of variance table and we also got the number of observations that are used to run the regression procedure that is 200 observations. Okay, in this video we'll be doing till here. So before I move to the next video, let me recap The concepts that we had done for the last videos, we had worked with a customer satisfaction data set whose name was linear underscore reg underscore retail, he first brought the data set in, in the SAS environment using the live name statement, then we have checked for multicollinearity using the regression procedure for all independent variables, and whichever model multicollinearity we had used the keyword called vi F, which stands for variance inflation factor, we had found out that Derby speed was the independent variable with maximum value of wave.
So we had removed that V speed from the set of independent variables. And we had included all the independent variables from product quality reprise flexibility in our classical linear regression model, and again we are check for multicollinearity using the keyword var For the regression procedure, and we found out that the VA values were quite low. So, our objective to get minimum multicollinearity for our data got fulfilled, because that is assumption of classical linear regression model, that is independent variables should have minimum of multicollinearity. Then we had checked for autocorrelation for our data using dw test, dw tests transferred all the Watson tests were may not hypothesis there is no correlation. And my alternative hypothesis is there is an autocorrelation in in our dw test results, we got the value of dw statistic, which was around 2.377. And as we know that if our dw statistic value lies between 1.5 to 2.5, then there is no accumulation for our data.
So, since our dw statistic value lies in the range of 1.5 to 2.5. Therefore, there was no other coalition. That is there is not a correlation for data. Next, just because we did not get the p value for our revenue or some test, we Another test to check for adverse plasticity and autocorrelation and that that is called SPECT test, which stands for specification when my null hypothesis there is no it was the density and autocorrelation in order alternative hypothesis there is an heteroscedasticity in autocorrelation. So, we got a p value which is greater than our level of significance that is 0.05 level of significance by default is 0.05. And the p value that we are obtained was greater than 0.05 was around 0.1030.
So, we had accepted the null hypothesis for our specification test and we had got that there is no need to stress it in autocorrelation. For our data. In our upcoming videos, we will be dividing the data into two parts that history and validation we will be building our model based on our training data and we will be applying the results on validation we will be doing stepwise selection, we will also predict the value of our dependent variable for both the training and validation data and we will be finding the correlation between the observed value of our dependent variable And predicted value of our dependent variable for both the training data and validation data. So for now, let's end the video over here. goodbye and thank you see what for the next week