Using genetic algorithm feature selection to optimize XGBoost performance in Australian credit
Main Article Content
Abstract
To reduce credit risk in credit institutions, credit risk management practices need to be implemented so that lending institutions can survive in the long term. Data mining is one of the techniques used for credit risk management. Where data mining can find information patterns from big data using classification techniques with the resulting level of accuracy. This research aims to increase the accuracy of classification algorithms in predicting credit risk by applying genetic algorithms as the best feature selection method. Thus, the most important feature will be used to search for credit risk information. This research applies a classification method using the XGBoost classifier on the Australian credit dataset, then carries out an evaluation by measuring the level of accuracy and AUC. The results show an increase in accuracy of 2.24%, with an accuracy value of 89.93% after optimization using a genetic algorithm. So, through research on genetic algorithm feature selection, we can improve the accuracy performance of the XGBoost algorithm on the Australian credit dataset.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
J. Witzany and J. Witzany, Credit risk management. Springer, 2017.
S. Claessens, J. Frost, G. Turner, and F. Zhu, “Fintech credit markets around the world: size, drivers and policy issues,” BIS Quarterly Review September, 2018.
I. Irwansyah and Y. Ahsan, “Police Efforts In Tackling The Crime Of Theft By Children In The Jurisdiction Of The Tampan Police Sector,” Jurnal Kajian Ilmu Hukum, vol. 1, no. 1, pp. 40–61, 2022.
S. Zulkarnain, “Penggunaan Upaya Paksa Oleh Penegak Hukum dalam Perspektif Hukum Acara Pidana Indonesia,” Jurnal Mahkamah, no. Oktober, 2014.
G. N. Al-Eitan and T. O. Bani-Khalid, “Credit risk and financial performance of the Jordanian commercial banks: A panel data analysis,” Academy of Accounting and Financial Studies Journal, vol. 23, no. 5, pp. 1–13, 2019.
J. Luo, X. Yan, and Y. Tian, “Unsupervised quadratic surface support vector machine with application to credit risk assessment,” Eur J Oper Res, vol. 280, no. 3, pp. 1008–1017, 2020.
H. Hassani, X. Huang, and E. Silva, “Digitalisation and big data mining in banking,” Big Data and Cognitive Computing, vol. 2, no. 3, p. 18, 2018.
M. Soui, I. Gasmi, S. Smiti, and K. Ghédira, “Rule-based credit risk assessment model using multi-objective evolutionary algorithms,” Expert Syst Appl, vol. 126, pp. 144–157, 2019.
M. Ala’raj and M. F. Abbod, “Classifiers consensus system approach for credit scoring,” Knowl Based Syst, vol. 104, pp. 89–105, 2016.
Y. Chang, K. Chang, and G. Wu, “Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions,” Applied Soft Computing Journal, vol. 73, pp. 914–920, 2018, doi: 10.1016/j.asoc.2018.09.029.
X. Ma, J. Sha, D. Wang, Y. Yu, Q. Yang, and X. Niu, “Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning,” Electron Commer Res Appl, vol. 31, pp. 24–39, Sep. 2018, doi: 10.1016/j.elerap.2018.08.002.
A. Levy and R. Baha, “Credit risk assessment: a comparison of the performances of the linear discriminant analysis and the logistic regression,” International Journal of Entrepreneurship and Small Business, vol. 42, no. 1–2, pp. 169–186, 2021.
Z. Tian, J. Xiao, H. Feng, and Y. Wei, “Credit risk assessment based on gradient boosting decision tree,” Procedia Comput Sci, vol. 174, pp. 150–160, 2020.
X. Huang, X. Liu, and Y. Ren, “Enterprise credit risk evaluation based on neural network algorithm,” Cogn Syst Res, vol. 52, pp. 317–324, 2018.
T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
A. L. Bluma and P. Langley, “Selection of relevant features and examples in machine,” Artif Intell, vol. 97, no. 97, pp. 245–271, 1997.
S. Jadhav, H. He, and K. Jenkins, “Information gain directed genetic algorithm wrapper feature selection for credit rating,” Applied Soft Computing Journal, vol. 69, pp. 541–553, 2018, doi: 10.1016/j.asoc.2018.04.033.
L. Zhuo, J. Zheng, X. Li, F. Wang, B. Ai, and J. Qian, “A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine,” in Geoinformatics 2008 and Joint Conference on GIS and Built Environment: Classification of Remote Sensing Images, SPIE, 2008, pp. 503–511.
H. Chen, W. Jiang, C. Li, and R. Li, “A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm,” Math Probl Eng, vol. 2013, pp. 1–6, 2013.
C. Liu, D. Jiang, and W. Yang, “Expert Systems with Applications Global geometric similarity scheme for feature selection in fault diagnosis,” Expert Syst Appl, vol. 41, no. 8, pp. 3585–3595, 2014, doi: 10.1016/j.eswa.2013.11.037.
M. Mukherjee and M. Khushi, “SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features,” Applied System Innovation, vol. 4, no. 1, p. 18, 2021.
R. Muzayanah, A. D. Lestari, B. Prasetiyo, and D. A. A. Pertiwi, “Comparative Study of Imbalanced Data Oversampling Techniques for Peer-to-Peer Landing Loan Prediction,” Scientific Journal of Informatics, vol. 11, no. 1, pp. 245–254, 2024, doi: 10.15294/sji.v11i1.50274.
S. García, J. Luengo, and F. Herrera, “Tutorial on practical tips of the most influential data preprocessing algorithms in data mining,” Knowl Based Syst, vol. 98, pp. 1–29, 2016.
M. A. Muslim and Y. Dasril, “Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning,” International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 6, p. 5549, Dec. 2021, doi: 10.11591/ijece.v11i6.pp5549-5557.
M. A. Muslim, Y. Dasril, H. Javed, W. F. Abror, D. A. A. Pertiwi, and T. Mustaqim, “An Ensemble Stacking Algorithm to Improve Model Accuracy in Bankruptcy Prediction,” Journal of Data Science and Intelligent Systems, vol. 1, no. 1, 2023.
N. Arora and P. D. Kaur, “A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment,” Applied Soft Computing Journal, vol. 86, p. 105936, 2020, doi: 10.1016/j.asoc.2019.105936.
T. M. Alam et al., “An investigation of credit card default prediction in the imbalanced datasets,” IEEE Access, vol. 8, pp. 201173–201198, 2020.
N. Maleki, Y. Zeinali, S. Taghi, and A. Niaki, “A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection,” Expert Syst Appl, vol. 164, no. September 2020, p. 113981, 2021, doi: 10.1016/j.eswa.2020.113981.
Y. Jiang, G. Tong, H. Yin, and N. Xiong, “A pedestrian detection method based on genetic algorithm for optimize XGBoost training parameters,” IEEE Access, vol. 7, pp. 118310–118321, 2019.
D. A. A. Pertiwi, T. Mustaqim, and M. A. Muslim, “Prediksi Rating Aplikasi Playstore Menggunakan Xgboost,” in Proceedings of SNIK, Semarang, 2020, pp. 108–112.