Logistic Regression Practical part- 1

SAS Analytics Logistic Regression- Case Study & Practical
11 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€67.87
List Price:  €96.96
You save:  €29.09
£56.33
List Price:  £80.48
You save:  £24.14
CA$101.11
List Price:  CA$144.46
You save:  CA$43.34
A$112.90
List Price:  A$161.30
You save:  A$48.39
S$95.89
List Price:  S$136.99
You save:  S$41.10
HK$544.41
List Price:  HK$777.77
You save:  HK$233.35
CHF 63.59
List Price:  CHF 90.85
You save:  CHF 27.25
NOK kr795.45
List Price:  NOK kr1,136.41
You save:  NOK kr340.95
DKK kr506.37
List Price:  DKK kr723.41
You save:  DKK kr217.04
NZ$125.48
List Price:  NZ$179.27
You save:  NZ$53.78
د.إ257.07
List Price:  د.إ367.26
You save:  د.إ110.19
৳8,489.45
List Price:  ৳12,128.31
You save:  ৳3,638.85
₹6,003.32
List Price:  ₹8,576.54
You save:  ₹2,573.22
RM314.95
List Price:  RM449.95
You save:  RM135
₦108,169.08
List Price:  ₦154,533.88
You save:  ₦46,364.80
₨19,460.08
List Price:  ₨27,801.31
You save:  ₨8,341.22
฿2,414.58
List Price:  ฿3,449.55
You save:  ฿1,034.97
₺2,476.26
List Price:  ₺3,537.66
You save:  ₺1,061.40
B$432.69
List Price:  B$618.15
You save:  B$185.46
R1,309.65
List Price:  R1,871.01
You save:  R561.36
Лв132.94
List Price:  Лв189.93
You save:  Лв56.98
₩102,666.23
List Price:  ₩146,672.33
You save:  ₩44,006.10
₪255.59
List Price:  ₪365.15
You save:  ₪109.55
₱4,073.94
List Price:  ₱5,820.16
You save:  ₱1,746.22
¥11,002.07
List Price:  ¥15,717.92
You save:  ¥4,715.85
MX$1,443.99
List Price:  MX$2,062.93
You save:  MX$618.94
QR254.73
List Price:  QR363.91
You save:  QR109.18
P971.15
List Price:  P1,387.42
You save:  P416.26
KSh9,031.62
List Price:  KSh12,902.87
You save:  KSh3,871.24
E£3,551.78
List Price:  E£5,074.19
You save:  E£1,522.41
ብር8,922.62
List Price:  ብር12,747.15
You save:  ብር3,824.52
Kz63,830.88
List Price:  Kz91,190.88
You save:  Kz27,360
CLP$70,211.88
List Price:  CLP$100,306.99
You save:  CLP$30,095.10
CN¥512.36
List Price:  CN¥731.98
You save:  CN¥219.61
RD$4,268.13
List Price:  RD$6,097.60
You save:  RD$1,829.46
DA9,503.22
List Price:  DA13,576.62
You save:  DA4,073.39
FJ$162.81
List Price:  FJ$232.59
You save:  FJ$69.78
Q539.11
List Price:  Q770.19
You save:  Q231.08
GY$14,612.09
List Price:  GY$20,875.31
You save:  GY$6,263.22
ISK kr9,750.89
List Price:  ISK kr13,930.45
You save:  ISK kr4,179.55
DH704.78
List Price:  DH1,006.87
You save:  DH302.09
L1,300.73
List Price:  L1,858.26
You save:  L557.53
ден4,174.94
List Price:  ден5,964.46
You save:  ден1,789.51
MOP$559.86
List Price:  MOP$799.84
You save:  MOP$239.97
N$1,311.40
List Price:  N$1,873.51
You save:  N$562.11
C$2,571.06
List Price:  C$3,673.10
You save:  C$1,102.04
रु9,593.53
List Price:  रु13,705.64
You save:  रु4,112.10
S/262.27
List Price:  S/374.69
You save:  S/112.42
K279.90
List Price:  K399.88
You save:  K119.97
SAR262.85
List Price:  SAR375.52
You save:  SAR112.66
ZK1,946.04
List Price:  ZK2,780.17
You save:  ZK834.13
L337.70
List Price:  L482.45
You save:  L144.75
Kč1,708.24
List Price:  Kč2,440.45
You save:  Kč732.21
Ft28,222.76
List Price:  Ft40,319.96
You save:  Ft12,097.20
SEK kr777.78
List Price:  SEK kr1,111.16
You save:  SEK kr333.38
ARS$72,264.67
List Price:  ARS$103,239.67
You save:  ARS$30,975
Bs482.86
List Price:  Bs689.83
You save:  Bs206.97
COP$305,718.10
List Price:  COP$436,758.87
You save:  COP$131,040.76
₡35,617.57
List Price:  ₡50,884.43
You save:  ₡15,266.85
L1,775.85
List Price:  L2,537.04
You save:  L761.19
₲546,995.01
List Price:  ₲781,454.93
You save:  ₲234,459.92
$U3,074.18
List Price:  $U4,391.87
You save:  $U1,317.69
zł289.95
List Price:  zł414.23
You save:  zł124.28
Already have an account? Log In

Transcript

We had discussed about the case study and the data set that we will be using in SAS software to do logistic regression. The data set that we will be using to do logistic regression in SAS is related to credit risk analytics in logistic regression as we know that our dependent variable is binary nature or dichotomous in nature, which calculates the probability for Y equals to one or y equals to event and our independent variables all are categorical in nature or continuous in nature. In our case, that is the case study of the data set that we are going to use there my dependent variable will calculate the probability for Y equals to event where the y where y equals three event denotes that loan will be repaid by the particular customer at right time and y equals to non even. That is why to zero denote that loan will not be repaid by the customer at Rite Aid.

So here the loan officer wants to decide that which customer is going to repay the loan at right thing and which customer will not be able to repay the loan or right thing. So in order to start the practical session In this video, we will be first getting the data sets in SAS environment. So first let us get the data set in SAS environment and then we'll start practical session. In order to get the data sets in SAS environment we have to execute the live name statement. So let's execute the live name statement it is listening. I've given the library name that is my live one.

Then I have to give the path where I have my data sets. So this is the path of many data sets. I'm closing the double quotes, then semicolon. This is the live name statement to get my data sets in my own library which I'm creating that is my liver. My liver library will be displayed in the Explorer window. So let's run this code.

This is the my econ library. Let's open the data sets logistic underscore reg underscore German underscore bank. This is my data set. So let's use the data set see the first column is observation number column. Then you have check account duration, history, new car, used car furniture, radio, TV education retraining amount, save account, employment installed rate meal, diff, meal single meal married or widowed applicant guarantor present resident real estate property unknown each other installments rent rent own residence number of credit cards job number of dependents telephone foreign response. So, this variable that is my response variable is my dependent variable or we can see that is my outcome variable and all the variables from check account check account in foreign all the variables from checkout till foreign are all my independent variables are predictor variables.

So, I will be using these set of independent variables or predictor variables to calculate my probability for rifles to event so, we have total 32 variables in our data set. were my first variable observation number which we are going to be We will not use for our analysis and our independent variables are from check account till foreign and response variable is our dependent variable. There are total 30 independent variables and one dependent variable that is response variable that is from check account and foreign there are 30 independent variables or predictors, variables and responses our dependent variable. So now let's start the practical session we will be executing the logistic procedure. So, the procedure name is proc logistic data equals my library name is my lib one.my lib run.my data set name is logistic underscore break underscore German underscore vac space BSc is a key word that we are going to use.

Next model I'm building the machine regression model model using the model keyword response that is my dependent variable equal to Now I am going to give the range of independent variables as I told you the independent variable starting from check count till forum. So let's specify the name of the independent variables. I have to write the variable to exactly the way it is given in the data set. I have given the range of the independent variable starting from check account till forum slash selection equaled Step race lakh fit and then Dre Let me explain you all the code first. Here we are doing the procedure proc logistic data call to my loved one is my library name dot logistic reg underscore German underscore band This is my data set name d s is a keyword to build a model for Vipers to one as you know we are predicting probability for Y equals to one and our model is is built based on wipers to even that is y equals to one model as a keyword to build a logistic regression model responses my dependent variable my independent variables are from chapter content forests are given the range of independent variables we are doing step by selection that is which of the variables using stock price action technique we want to select the significant variables that should be used for that should be used for further analysis.

And that will be used to build the model for the further further analysis. So selection goes to surprise like fit is used to do this step by selection to select the significant variables. So now let's run this code at first see, here, there are a couple thousand observations in our data set. And my response variable has got two levels at a 01 number of observations is thousand. We are building the model for responsibles to run each step is showing the step by selection This is a step by selection procedure. So which of the variables should be kept in the model and which of the variables should be removed?

That is shown in step by selection. So which of the variables should be kept or should be included in the model and should and which of the variables should be excluded This is answered by residual chi square test, where my h notice or not hypothesis for residual chi square test is the model does not require more variables and my h1 is the model requires more variables. So, see in each and every steps the individual significance of the variables are shown using the zero and chi square test. So, in each and every step these variables are getting entered. That is there included. I hope you'll remember that there were 30 independent variables out of them.

Only the significant variables will be selected. So, which of the significant variables are selected? We can get that in summary of step by step directions that is check account duration history, save account new currency Patient guarantor needs in the other installment install rate amount used car foreign rent. So, out of 13 dependent variables, we have only got 14 significant variables rest are all excluded from our model. This is the reservoir chi square test p values. This is the analysis of maximum likelihood estimates you know Emelia maximum likelihood estimation technique is used to estimate the parameters for logistic regression.

These are the odds ratio estimates that is the ratio of towards we have also got the percentage amount of components discordant present it is the percentage amount the component faces at 1.9% which is good that is there less amount of misclassification discordant is 17.9%. So, more of the percentage of component presbyteries are moderate because there are less amount of misclassification. Now, the most important thing is the international and show Goodness of Fit Test we got the p value. That is since my P value in hospital and sugar effect test is greater than 0.05. As you know 0.05 is the default level of significance. So, if my P value is greater than 0.05 then I will accept the null hypothesis origin rejected and the non hypothesis for our Goodness of Fit Test is h notice the model is a good fit and each one is model is not a good fit here since my P value is greater than 0.05 So, that is it is 0.8446 it is greater than 0.05 which is my level of significance Therefore, I will accept the null hypothesis I will accept H naught therefore, I can conclude that my model is a good fit.

So, in this video we will be learning till here before I move to the next video, let me recap the concepts that we have covered in this video. In this video we have covered we had executed the logistic procedure we had done step by selection on our data set and we have selected the significant variables out of the 30 independent variables. There are 14 significant independent variables that we will be including in our model. We had done hospital MC Goodness of Fit Test, we had built a model For y equals to one in our coming videos we will be doing the next part of our practical session that is we will be generating classification table we will be predicting the probability for Y equals to one we will be creating a variable called status that is we will be creating a variable called status which is which is which is going to be binary variables taking values either zero or one we will be setting the priority level and we will be forming the confusion matrix from the confusion matrix we will be calculating the different measures of calculating confusion matrix to measure the accuracy of the mode.

So, for now, let's end this video over here. Goodbye. Thank you see you all for the next video.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.