Disclaimer: The Process and the Charts are to be used carefully, as they change a lot based on the loan product type, customer cohort and other loan parameters. Here we have just presented a very generic overview of a vanilla product. All the data used for the credit risk modelling is with Roopya.
We explore the models created for the Roopya Score which aims to determine whether a customer’s creditworthiness is ‘Good’ or ‘Bad’ based on various input factors. We use various statistical methods used in credit scoring and provide an in-depth look at how the model was developed, including important considerations in the process. We also apply quantitative criteria to find the best model, highlighting its key qualities as an effective scorecard.
Start Free TrialUsing data from loan applications, we guide you through essential steps, including data preparation, selecting important features, converting variables using the Weight of Evidence (WOE) method, building the logistic regression model, conducting thorough evaluations, and ultimately creating a credit scoring system.
A credit score is a numerical representation of a customer’s creditworthiness. There are two main categories of credit scoring models: application scoring and behavioural scoring. Application scoring is used to evaluate the risk of default when a customer applies for credit, using data like demographic information and credit bureau records. Behavioural scoring, on the other hand, assesses the risk associated with existing customers. This is done by examining their recent account transactions, current financial data, repayment history, any delinquencies, credit bureau information, and their overall relationship with the bank. Identifying high-risk clients enables the bank to take proactive measures to protect itself from potential future losses.
Logistic regression stands as a fundamental technique frequently employed in the development of scorecards. This method comes into play when the predicted variable assumes a categorical nature. In instances where the predicted variable takes on a continuous form, linear regression takes the reins. In this section, we will delve into the application of multiple logistic regression for predicting binary outcomes, typically categorized as ‘good’ or ‘bad’.
Logistic regression, much like various other predictive modelling methods, leverages a set of predictor characteristics to gauge the likelihood or probability of a specific outcome—our target. The equation representing the logit transformation of the event’s probability can be articulated as follows:
Logit (pi) = 0 + 1×1 + 2×2 + … + kxk
Here’s a breakdown of the terms:
Several evaluation metrics are calculated to assess the model’s performance, including:
Sample Model Performance Hyperparameters:
The below image shows the ROC Curve and ROC-AUC Score of the model:
Sample Correlation Heat Map Between Variables:
Sample p-values for each column and checked that these columns are significant for this model:
Here’s a table summarizing the performance metrics for the three models: Logistic Regression, Random Forest Classifier, and XGBoost Classifier.
Metric | Logistic Regression | Random Forest Classifier | XGBoost Classifier |
Accuracy | 0.7678 | 0.9665 | 0.9686 |
F1 Score | 0.8608 | 0.9829 | 0.9840 |
AUC-ROC Score | 0.8386 | 0.8169 | 0.8438 |
Gini Coefficient | 0.6692 | 0.6337 | 0.6876 |
Precision | 0.9896 | 0.9699 | 0.9699 |
Recall | 0.7627 | 0.9964 | 0.9984 |
Specificity | 0.7678 | 0.0476 | 0.0486 |
Creating a scorecard involves multiple steps, from initial data analysis to scaling and categorizing the scores.
Step 1: Initial Characteristic Analysis and Logistic Regression
Before producing the final scorecard, conduct initial characteristic analysis and logistic regression on the dataset to identify relevant characteristics and their coefficients.
Step 2: Scaling
Scaling refers to converting the logistic regression scores into a format that is more usable and understandable. The choice of scaling can vary depending on operational needs, regulatory requirements, and ease of interpretation. Some common scaling methods include:
Logarithmic Scaling: Score = Offset + Factor * ln(odds)
Factor and Offset can be calculated using simultaneous equations:
Score = Offset + Factor * ln(odds)
Score + pdo = Offset + Factor * ln(2 * odds)
Where:
Calculate Factor and Offset as follows:
Factor = pdo / ln(2)
Offset = Score – (Factor * ln(odds))