Data mining and analysis

 

 

Background:

 

You work for the internal audit department at the Universal Bank. One of the routine audit is the approval of personal loans and mortgages. For this year, you are assigned to build a model of personal loan approval based on various factors. The model will be applied to all branches to red flag potential loan approval frauds. You collect the following data for your analysis.

 

Data Description (Statistic.com):
ID Customer ID
Age Customer’s age in completed years
Experience #years of professional experience
Income Annual income of the customer ($000)      
ZIPCode Home Address ZIP code.
Family Family size of the customer        
CCAvg Avg. spending on credit cards per month ($000)
Education Education Level. 1: Undergrad; 2: Graduate; 3: Advanced/Professional
Mortgage Value of house mortgage if any. ($000)      
Personal Loan Did this customer get approval for the personal loan?
Securities Account Does the customer have a securities account with the bank?
CD Account Does the customer have a certificate of deposit (CD) account with the bank?
Online Does the customer use internet banking facilities?
CreditCard Does the customer use a credit card issued by UniversalBank?

 

Instruction:

  1. Generate descriptive statistics for Income, Education, and Mortgage. Discuss the results.
  2. Generate correlations for Experience, Income and Education. What do you observe?
  3. Perform discriminant analysis when the target is Personal Loan and predictors are Age, Experience, Income, Family, CCAvg, Education, Mortgage and Credit Card. Set 80% as training data and 20% as validation data.
  • What is the classification function?
  • What is the accuracy rate for the training dataset and the validation dataset? Is it reasonably good and consistent?
  1. Let’s start over again. Perform Principal Component Analysis on Age, Experience, Income, Family, CCAvg, Education, Mortgage and Credit Card. Why do we need to perform principal component analysis? Choose smallest # of components that explain 95% of the variances. Do we use the covariance matrix or the correlation matrix? Why? Then choose to output the principal component scores. How many components do you have in the output? What is the percentage of variances you can explained? What are the meanings of the principal components?

 

 

 

For this homework, the grade will be determined based on analysis (50%) and your discussion (50%, the logic, consistency, the depth of explanations, whether there is enough information for readers to understand, etc.).

Order from us and get better grades. We are the service you have been looking for.