Car insurance segmentation prediction based on the most influential features using random forest and stacking ensemble learning
Main Article Content
Abstract
In addition to financial transaction services, the Bank also provides insurance services by conducting regular campaigns to attract new customers such as car insurance based on market segmentation, which is one of the main aspects of marketing used in financial services based on demographic data. One way to analyze the market is to predict the likely target market based on the campaign's target demographic data. Therefore, this study aims to find the best classification method for predicting campaign targets using historical data from 4000 customers of a bank in the United States. The market segmentation analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using important features for Random Forest. The ensemble learning used is a stacking model consisting of the basic model of Logistic Regression, Support Vector Classifier, Gradient Boosting, Extra Tree, Bagging, Adaboost, Gaussian Naive Bayes, MLP, XBoost, LGBM, KNeighbors, Decision Tree, and Random Forest. The accuracy results of the stacking model can exceed the accuracy of the basic model with an accuracy rate of 78.80%.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
F. Schreiber, “Identification of customer groups in the German term life market: a benefit segmentation", Ann. Oper. Res., Springer, vol. 254, no. 1, pp. 365–399, 2017.
W. Lin, Z. Wu, L. Lin, A. Wen, and J. Li, “An ensemble random forest algorithm for insurance big data analysis”, IEEE Access, vol. 5, pp. 16568–16575, 2017.
Statista, “Number of cars sold worldwide between 2010 and 2021”, Statista Research Department, 2020. https://www.statista.com/statistics/200002/international-car-sales-since-1990/.
A. Meidan, “Customer Behaviour and Market Segmentation”, Mark. Financ. Serv., pp. 23–49, 1996.
I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector”, IEEE Access, vol. 7, pp. 60134–60149, 2019.
W. Qadadeh and S. Abdallah, “Customers Segmentation in the Insurance Company (TIC) Dataset”, in Proc. Comp. Sci., vol. 144, pp. 277–290, 2018.
H. D. Wang, “Research on the features of car insurance data based on machine learning”, in Proc. Comp. Sci., vol. 166, pp. 582–587, 2020.
M. S. Chen et al., “Driving behaviors analysis based on feature selection and statistical approach: a preliminary study”, J. Supercomput., vol. 75, no. 4, pp. 2007–2026, 2019.
W. Li and L. Wei, “Unsupervised Feature Selection Based on Low-Rank Regularized Self-Representation”, OALib, vol. 07, no. 04, pp. 1–12, 2020.
K. Fang, Y. Jiang, and M. Song, “Customer profitability forecasting using Big Data analytics: A case study of the insurance industry”, Comput. Ind. Eng., vol. 101, pp. 554–564, 2016, doi: 10.1016/j.cie.2016.09.011.
S. Kang and J. Song, “Feature selection for continuous aggregate response and its application to auto insurance data”, Expert Syst. Appl., vol. 93, pp. 104–117, 2018, doi: 10.1016/j.eswa.2017.10.007.
T. K. Ho, “Random decision forests”, in Proc. Int. Conf. Doc. Anal. Recognit., ICDAR, vol. 1, pp. 278–282, 1995, doi: 10.1109/ICDAR.1995.598994.
L. Breiman, “Random forests”, Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
E. Martin et al., “Stacked Generalization”, Encycl. Mach. Learn., pp. 912–912, 2011, doi: 10.1007/978-0-387-30164-8_778.
L. Brieman, “Stacked Regressions”, Mach. Learn., vol. 24, no. 1, pp. 49–64, 1996.
M. J. Van Der Laan, E. C. Polley, and A. E. Hubbard, “Super learner”, Stat. Appl. Genet. Mol. Biol., vol. 6, no. 1, 2007, doi: 10.2202/1544-6115.1309.
Z. Chen, W. Chen, and Y. Shi, “Ensemble learning with label proportions for bankruptcy prediction”, Expert Syst. Appl., vol. 146, 2020, doi: 10.1016/j.eswa.2019.113155.