Home credit default risk assessment using embedded feature selection and stacking ensemble technique

Main Article Content

Yosza Dasril
Yosy Arisandy
Shahrul Nizam Salahudin

Abstract

The objective of this study is to evaluate and compare the accuracy of typical credit assessment techniques, particularly Logistic Regression, Gradient Boosting, Random Forest (RF), Extra Gradient Boosting (XGB), Light Gradient Boosting Machine (LGBM), and Cat Boost. Furthermore, the study involves stacking ensemble learning with feature selection based on embedded techniques. This research utilized a data set sourced from Kaggle, namely the Home Credit Default Risk gathering data. The results of the study indicate that the reached accuracies were as follows: Logistic regression - 92.02%, XGB - 92.01%, LGBM - 92.09%, RF - 92.07%, and CB - 92.06%. Additionally, while stacking with XGB, RF, and LGBM models, and utilizing the final logistic regression estimator 92.01%, the accuracy does not show any improvement when compared to the usual algorithm. It is even lower than the LGBM accuracy results. However, the findings of this study demonstrate better rates of accuracy in comparison to other previous research conducted by researchers, regardless of that used the same dataset. However, study Mahmudi et al. in 2022 performs better than it in terms of accuracy using oversampling approaches. This finding provides evidence that the accuracy of the model is affected by the quantity of features that are examined. The level of accuracy will be better the more optimally chosen features are for examination.

Article Details

How to Cite
Dasril, Y., Arisandy, Y., & Salahudin, S. N. (2024). Home credit default risk assessment using embedded feature selection and stacking ensemble technique. Journal of Numerical Optimization and Technology Management, 1(2), 59-68. Retrieved from https://shmpublisher.com/index.php/jnotm/article/view/306
Section
Articles
Abstract viewed = 124 times