Car insurance segmentation prediction based on the most inﬂuential features using random forest and stacking ensemble learning

Etna Vianita; Adi Wibowo; Bayu Surarso; Aris Puji  Widodo

doi:10.52465/joscex.v2i2.39

pdf

Published: Sep 7, 2021

DOI: https://doi.org/10.52465/joscex.v2i2.39

Article Metrics

Etna Vianita

Department of Mathematics, Universitas Diponegoro, Indonesia

Adi Wibowo

Department of Informatics, Universitas Diponegoro, Indonesia

Bayu Surarso

Department of Informatics, Universitas Diponegoro, Indonesia

Aris Puji Widodo

Department of Informatics, Universitas Diponegoro, Indonesia

Abstract

In addition to financial transaction services, the Bank also provides insurance services by conducting regular campaigns to attract new customers such as car insurance based on market segmentation, which is one of the main aspects of marketing used in financial services based on demographic data. One way to analyze the market is to predict the likely target market based on the campaign's target demographic data. Therefore, this study aims to find the best classification method for predicting campaign targets using historical data from 4000 customers of a bank in the United States. The market segmentation analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using important features for Random Forest. The ensemble learning used is a stacking model consisting of the basic model of Logistic Regression, Support Vector Classifier, Gradient Boosting, Extra Tree, Bagging, Adaboost, Gaussian Naive Bayes, MLP, XBoost, LGBM, KNeighbors, Decision Tree, and Random Forest. The accuracy results of the stacking model can exceed the accuracy of the basic model with an accuracy rate of 78.80%.

Downloads

Download data is not yet available.

How to Cite

[1]

E. Vianita, A. Wibowo, B. Surarso, and A. P. . Widodo, “Car insurance segmentation prediction based on the most inﬂuential features using random forest and stacking ensemble learning”, J. Soft Comput. Explor., vol. 2, no. 2, pp. 86-92, Sep. 2021.

Issue

Vol. 2 No. 2 (2021): September 2021

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

F. Schreiber, “Identification of customer groups in the German term life market: a benefit segmentation", Ann. Oper. Res., Springer, vol. 254, no. 1, pp. 365–399, 2017.

W. Lin, Z. Wu, L. Lin, A. Wen, and J. Li, “An ensemble random forest algorithm for insurance big data analysis”, IEEE Access, vol. 5, pp. 16568–16575, 2017.

Statista, “Number of cars sold worldwide between 2010 and 2021”, Statista Research Department, 2020. https://www.statista.com/statistics/200002/international-car-sales-since-1990/.

A. Meidan, “Customer Behaviour and Market Segmentation”, Mark. Financ. Serv., pp. 23–49, 1996.

I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector”, IEEE Access, vol. 7, pp. 60134–60149, 2019.

W. Qadadeh and S. Abdallah, “Customers Segmentation in the Insurance Company (TIC) Dataset”, in Proc. Comp. Sci., vol. 144, pp. 277–290, 2018.

H. D. Wang, “Research on the features of car insurance data based on machine learning”, in Proc. Comp. Sci., vol. 166, pp. 582–587, 2020.

M. S. Chen et al., “Driving behaviors analysis based on feature selection and statistical approach: a preliminary study”, J. Supercomput., vol. 75, no. 4, pp. 2007–2026, 2019.

W. Li and L. Wei, “Unsupervised Feature Selection Based on Low-Rank Regularized Self-Representation”, OALib, vol. 07, no. 04, pp. 1–12, 2020.

K. Fang, Y. Jiang, and M. Song, “Customer profitability forecasting using Big Data analytics: A case study of the insurance industry”, Comput. Ind. Eng., vol. 101, pp. 554–564, 2016, doi: 10.1016/j.cie.2016.09.011.

S. Kang and J. Song, “Feature selection for continuous aggregate response and its application to auto insurance data”, Expert Syst. Appl., vol. 93, pp. 104–117, 2018, doi: 10.1016/j.eswa.2017.10.007.

T. K. Ho, “Random decision forests”, in Proc. Int. Conf. Doc. Anal. Recognit., ICDAR, vol. 1, pp. 278–282, 1995, doi: 10.1109/ICDAR.1995.598994.

L. Breiman, “Random forests”, Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.

E. Martin et al., “Stacked Generalization”, Encycl. Mach. Learn., pp. 912–912, 2011, doi: 10.1007/978-0-387-30164-8_778.

L. Brieman, “Stacked Regressions”, Mach. Learn., vol. 24, no. 1, pp. 49–64, 1996.

M. J. Van Der Laan, E. C. Polley, and A. E. Hubbard, “Super learner”, Stat. Appl. Genet. Mol. Biol., vol. 6, no. 1, 2007, doi: 10.2202/1544-6115.1309.

Z. Chen, W. Chen, and Y. Shi, “Ensemble learning with label proportions for bankruptcy prediction”, Expert Syst. Appl., vol. 146, 2020, doi: 10.1016/j.eswa.2019.113155.

Abstract viewed = 505 times

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)