Book All Semester Assignments at 50% OFF! ORDER NOW

Part A - Linear Regression

A multiple regression was run to predict credit card limit from customer’s age, gender, level of education, marital status, Whether or not the customer has left the bank in the last 12 months, income category, type of credit card, number of months as credit card customer, average utilization ratio and whether or not the monthly balance on the credit card paid off. Table 01 shows the output of the regression analysis. This resulted in a significant model, F(15, 1930) = 220, p < .01, R2 = 0.631, Adj. R2 = 0.628. The individual predictors were examined further and indicated that marital status, income category, type of credit card, Whether or not the customer has left the bank in the last 12 months, average utilization ratio and whether or not the monthly balance on the credit card paid off were significant predictors but customer’s age, gender, level of education, number of months as credit card customer were not significant predictors of credit card limit.

Table 01.

Regression results using Credit_Limit as the criterion

Predictor

b

B

95% CI

[LL, UL]

Fit

(Intercept)

10175.45**

[8165.61, 12185.29]

Customer_Age

15.76

[-34.89, 66.40]

Gender1

164.77

[-763.97, 1093.50]

Education_Level2

-602.75

[-1413.20, 207.71]

Education_Level3

-477.35

[-1179.42, 224.72]

Marital_Status2

-1407.91**

[-2343.81, -472.01]

Marital_Status3

-359.96

[-1311.93, 592.01]

Attrition_Flag1

-1202.43**

[-1929.44, -475.43]

Income_Category2

831.41*

[50.99, 1611.83]

Income_Category3

4673.50**

[3525.24, 5821.77]

Income_Category4

8998.13**

[7861.04, 10135.22]

Income_Category5

11355.12**

[10041.97, 12668.26]

Card_Category2

12639.14**

[11521.54, 13756.75]

Months_on_book

7.04

[-44.59, 58.67]

Avg_Utilization_Ratio

-14330.49**

[-15610.30, -13050.69]

Pay_on_time1

-5088.31**

[-5876.95, -4299.67]

R2 = .631**

95% CI[.61,.65]

Note b represents unstandardized regression weights. LL and UL indicate the lower and upper limits of a confidence interval, respectively.
* indicates p < .05. ** indicates p < .01.

Model Selection:

Using the Backward elimination method it was found that marital status, Income category, type of credit card, Whether or not the customer has left the bank in the last 12 months, average utilization ratio and whether or not the monthly balance on the credit card paid off explain a significant amount of the variance in the credit card limit F(10, 1935) = 330, p < .01, R2 = 0.63, Adj. R2 = 0.628. Table 02 provides the output of the regression analysis.

Table 02.

Regression results using Credit_Limit as the criterion

Predictor

b

B

95% CI

[LL, UL]

Fit

(Intercept)

10857.91**

[9761.27, 11954.56]

Marital_Status2

-1402.05**

[-2336.01, -468.09]

Marital_Status3

-357.64

[-1308.14, 592.86]

Income_Category2

754.19*

[49.85, 1458.54]

Income_Category3

4546.58**

[3758.45, 5334.71]

Income_Category4

8852.18**

[8074.93, 9629.43]

Income_Category5

11228.47**

[10209.55, 12247.39]

Card_Category2

12633.38**

[11516.47, 13750.30]

Avg_Utilization_Ratio

-14298.49**

[-15569.56, -13027.41]

Pay_on_time1

-5068.65**

[-5855.55, -4281.76]

 Attrition_Flag1

-1193.20**

[-1919.08, -467.32]

R2 = .630**

95% CI[.61,.65]

Note. b represents unstandardized regression weights. LL and UL indicate the lower and upper limits of a confidence interval, respectively.
* indicates p < .05. ** indicates p < .01.

Check for Multicollinearity

For our best model the VIF values are all well below 10 and the tolerance statistics all well above 0.2. Also, the average VIF is very close to 1. Based on these measures we can safely conclude that there is no collinearity within our data.

Checking the normality of the residuals

Shapiro-Wilk test wasa run to check the normality of the residuals, The tests showeda significant deviation from normality W = 0.95011, p < 0.01. As another measure, QQ-plot was plotted. The resulting plot is shown in Figure 01, and the plot shows significant deviation from normality. Since the residuals show problems with normality it is advised to transform the raw data.

Figure 01.

Q-Q plot of the residuals

Residuals

Cook’s distance

Figure 02.

Cook’s Distance

Cooks Distance

The Figure 02 shows the Cook’s distance plot to illustrate data points that are an outlier and have high leverage. Three data points - 313, 1137 and 1227 have large values of Cook’s distance. It is suggested that to run the regression analysis with these data points excluded and see what happens to the model performance and to the regression coefficients.

Conclusion

Based on the above analysis we can conclude that factors such as marital status, Income category, type of credit card, average utilization ratio and whether or not the monthly balance on the credit card paid off explain a significant amount of the variance in the credit card limit.

Part B) Logistic Regression

Logistic regression model was performed to see whether credit card limit, customer’s age, gender, level of education, marital status, income category, type of credit card, number of months as credit card customer, average utilization ratio and whether or not the monthly balance on the credit card paid off predict whether or not the customer has left the bank in the last 12 months. The logistic regression model was statistically significant, χ 2 (15) = 194.56, p < 0.05. Table 03 shows the output of the regression analysis.

Table 03.

Logistic Regression results using Attrition_Flag as the criterion

Predictor

b

ODD’S Ratio

95% CI

[LL, OR, UL]

(Intercept)

-2.26 (0.53)**

[0.03, 0.10, 0.29]

Customer_Age

0.017(0.013)

[-0.99, 1.01, 1.04]

Gender1

0.31(0.24)

[0.85, 1.36, 2.23]

Education_Level2

-0.07(0.21)

[0.61, 0.93, 1.42]

Education_Level3

0.12(0.18)

[0.79, 1.129, 1.62]

Marital_Status2

-0.21(0.23)

[0.51, 0.81, 1.29]

Marital_Status3

-0.19(0.23)

[0.52, 0.82, 1.32]

Income_Category2

0.02(0.19)

[0.69, 1.02, 1.50]

Income_Category3

0.08(0.31)

[0.59, 1.08, 2.00]

Income_Category4

0.39(0.31)

[0.79, 1.47, 2.75]

Income_Category5

0.58(0.36)

[0.88, 1.79, 3.65]

Card_Category2

0.30(0.34)

[0.67, 1.35, 2.60]

Months_on_book

 -0.01(0.01)

[0.96, 0.98, 1.01]

Credit_Limit

-0.0000345(0.00001)*

[0.99, 0.99, 0.99]

Avg_Utilization_Ratio

-0.63(0.40)

[0.23, 0.52, 1.16]

Pay_on_time1

1.49(0.20)**

[3.03, 4.48, 6.65]

Note. R2 = 0.112 (Hosmer–Lemeshow), 0.095 (Cox–Snell), 0.162 (Nagelkerke). Model χ 2 (15) = 194.56, p < 0.01.

*p< 0.05, ** p < .01

Interpreting Odds’ Ratio

Through the logistic regression it was found Whether or not the monthly balance on the credit card was paid off (Pay_on_time1) to be a significant predictor of Whether or not the customer has left the bank in the last 12 months (Attrition_Flag). The odds of attrition increased by 4.4 times (95% CI [3.03, 6.65]) when the monthly balance on the credit card was paid off.

Suggestion based on logistic regression model

Based on the logistic regression model, Credit card limit and Whether or not the monthly balance on the credit card was paid off emerged as significant predictors of Attrition. Surprisingly, factors such as customer’s age, gender, level of education, marital status, income category, type of credit card, number of months as credit card customer and average utilization ratio did not predict the attrition rate.

R Markdown

Read xlsx file selecting a random sample from CreditCard.xls to create a dataset of 2000 observations

library(readxl)
library(dplyr)

library(apaTables)

set.seed(1)
df # load the xlsx file from the saved location
my_data % sample_n(2000) # Select 2000 random rows to create the dataset

 my_data$Attrition_Flag my_data$Gender my_data$Education_Level my_data$Marital_Status my_data$Income_Category my_data$Card_Category my_data$Pay_on_time

Regression model with all the variables

full.model summary(full.model)

apa.reg.table(full.model, filename = "full_model.doc")

Model Selection using Backward elimination procedure

full.model final.model

apa.reg.table(final.model, filename = "final_model.doc")

Best model chosen from Backward elimination

best.model summary(best.model)

Check for multicollinearity

library(car)

vif(best.model)

tolerance tolerance

avg.tolerenace avg.tolerenace

Check for normality of residuals

hist( x = residuals( best.model ), xlab = "Value of residual", main = "")

plot( x = best.model, which = 2 )

plot( x = best.model, which = 5 )

shapiro.test(residuals(best.model))

Cook’s Distance

plot(x = best.model, which = 4)

Logistic Regression model with all the variables

options(scipen=999, digits = 2)
full.model2

summary(full.model2)

Testing model significance

modelChi chidf chisq.prob modelChi; chidf; chisq.prob

R2 value

logisticPseudoR2s dev nullDev modelN R.l R.cs R.n cat("Pseudo R^2 for logistic regression\n")
cat("Hosmer and Lemeshow R^2 ", round(R.l, 3), "\n")
cat("Cox and Snell R^2
", round(R.cs, 3), "\n")
cat("Nagelkerke R^2
", round(R.n, 3),
"\n")
}
logisticPseudoR2s(full.model2)

Odd’s Ratio

exp(full.model2$coefficients)

exp(confint(full.model2))

You Might Also Like:- 

Computer Science Essay Help

Great UK Universities for Computer Science and IT Degrees

Get It Done! Today

Country
  • 1,212,718Orders

  • 4.9/5Rating

  • 5,063Experts

Highlights

  • 21 Step Quality Check
  • 2000+ Ph.D Experts
  • Live Expert Sessions
  • Dedicated App
  • Earn while you Learn with us
  • Confidentiality Agreement
  • Money Back Guarantee
  • Customer Feedback

Just Pay for your Assignment

  • Turnitin Report

    $10.00
  • Proofreading and Editing

    $9.00Per Page
  • Consultation with Expert

    $35.00Per Hour
  • Live Session 1-on-1

    $40.00Per 30 min.
  • Quality Check

    $25.00
  • Total

    Free
  • Let's Start

Browse across 1 Million Assignment Samples for Free

Explore MASS

Customer Feedback

Check out what our Student community has to say about us.

Read More

Request Callback

My Assignment Services- Whatsapp Get 50% + 20% EXTRAAADiscount on WhatsApp

Need Assistance on your
existing assignment order?
refresh