Customer churn prediction in the case of telecommunication company using support vector machine (SVM) method and oversampling

Main Article Content

Dhiya Urrahman
Raffi Winanto
Thierry Widyatama

Abstract

hurn is the act by which a customer withdraws from service, including service provider-initiated churn and customer-initiated churn. Churn is a big challenge for companies, especially churn-prone enterprise sectors such as telecommunications. Churn can affect both revenue and reputation if occurs for negative reasons. This study aims to predict customer churn in a telecommunication company dataset, investigating the impact of various variables and classes on churn occurrences to inform strategic decision-making for businesses. The Support Vector Machine (SVM) model is employed, and dataset imbalance is addressed through oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and random oversampling (ROS). Three SVM models are created with different training datasets (normal, SMOTE, ROS), yielding varying results. The normal dataset achieves the highest accuracy at 92%, outperforming SVM with ROS (89%) and SVM with SMOTE (87%). However, the normal dataset exhibits lower sensitivity compared to both oversampling techniques. The study identifies the cause of decreased accuracy in oversampling and low sensitivity in the normal dataset. The novelty of this research lies in testing the SVM model's ability to surpass the accuracy of previous models on the same dataset and in exploring the unique impact of oversampling in churn prediction.

Article Details

Section
Articles

References

A. K. Ahmad, A. Jafar, and K. Aljoumaa, “Customer churn prediction in telecom using machine learning in big data platform,” J. Big Data, vol. 6, no. 1, p. 28, Dec. 2019, doi: 10.1186/s40537-019-0191-6.

Y. Pandey, R. Jha, and U. Umamaheswari, “CUSTOMER CHURN ANALYSIS IN TELECOM ORGANIZATION,” J. Posit. Sch. Psychol., pp. 5475–5488, 2022.

A. Nazal and Y. Megdadi, “The Role of Customer Relationship Management Strategies on Developing Customer Services of Jordanian Telecommunication Companies,” J. Mark. Manag., vol. 7, pp. 77–88, Dec. 2019, doi: 10.15640/jmm.v7n2a9.

L. Geiler, S. Affeldt, and M. Nadif, “A survey on machine learning methods for churn prediction,” Int. J. Data Sci. Anal., vol. 14, no. 3, pp. 217–242, Sep. 2022, doi: 10.1007/s41060-022-00312-5.

Y. Zhang, S. He, S. Li, and J. Chen, “Intra-Operator Customer Churn in Telecommunications: A Systematic Perspective,” IEEE Trans. Veh. Technol., vol. 69, no. 1, pp. 948–957, Jan. 2020, doi: 10.1109/TVT.2019.2953605.

H. Faris, “A Hybrid Swarm Intelligent Neural Network Model for Customer Churn Prediction and Identifying the Influencing Factors,” Information, vol. 9, no. 11, p. 288, Nov. 2018, doi: 10.3390/info9110288.

A. Keramati, H. Ghaneei, and S. M. Mirmohammadi, “Investigating factors affecting customer churn in electronic banking and developing solutions for retention,” Int. J. Electron. Bank., vol. 2, no. 3, p. 185, 2020, doi: 10.1504/IJEBANK.2020.111427.

A. Viloria, O. B. Pineda Lezama, and N. Mercado-Caruzo, “Unbalanced data processing using oversampling: Machine Learning,” Procedia Comput. Sci., vol. 175, pp. 108–113, 2020, doi: 10.1016/j.procs.2020.07.018.

P. Joshi and S. Gupta, “Predicting Customers Churn in Telecom Industry using Centroid Oversampling method and KNN classifier,” Int. Res. J. Eng. Technol., vol. 6, no. 4, pp. 3708–3712, 2019.

S. Arifin and F. Samopa, “Analysis of Churn Rate Significantly Factors in Telecommunication Industry Using Support Vector Machines Method,” J. Phys. Conf. Ser., vol. 1108, p. 012018, Nov. 2018, doi: 10.1088/1742-6596/1108/1/012018.

X. Xiahou and Y. Harada, “B2C E-Commerce Customer Churn Prediction Based on K-Means and SVM,” J. Theor. Appl. Electron. Commer. Res., vol. 17, no. 2, pp. 458–475, Apr. 2022, doi: 10.3390/jtaer17020024.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

Y. Chachoui, N. Azizi, R. Hotte, and T. Bensebaa, “Enhancing algorithmic assessment in education: Equi-fused-data-based SMOTE for balanced learning,” Comput. Educ. Artif. Intell., vol. 6, p. 100222, Jun. 2024, doi: 10.1016/j.caeai.2024.100222.

A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.

B. Zhu, B. Baesens, A. Backiel, and S. K. L. M. vanden Broucke, “Benchmarking sampling techniques for imbalance learning in churn prediction,” J. Oper. Res. Soc., vol. 69, no. 1, pp. 49–65, Jan. 2018, doi: 10.1057/s41274-016-0176-1.

S. Baker, B. Baugh, and M. Sammon, “Measuring Customer Churn and Interconnectedness,” Cambridge, MA, Aug. 2020. doi: 10.3386/w27707.

S. Pandya and P. Mehta, A Review On Sentiment Analysis Methodologies, Practices And Applications. 2020.

S. Sharma, A. Gosain, and S. Jain, “A Review of the Oversampling Techniques in Class Imbalance Problem,” 2022, pp. 459–472. doi: 10.1007/978-981-16-2594-7_38.

I. V. Pustokhina, D. A. Pustokhin, P. T. Nguyen, M. Elhoseny, and K. Shankar, “Multi-objective rain optimization algorithm with WELM model for customer churn prediction in telecommunication sector,” Complex Intell. Syst., vol. 9, no. 4, pp. 3473–3485, Aug. 2023, doi: 10.1007/s40747-021-00353-6.

J. Wei, H. Huang, L. Yao, Y. Hu, Q. Fan, and D. Huang, “New imbalanced bearing fault diagnosis method based on Sample-characteristic Oversampling TechniquE (SCOTE) and multi-class LS-SVM,” Appl. Soft Comput., vol. 101, p. 107043, Mar. 2021, doi: 10.1016/j.asoc.2020.107043.

M. Rahman and V. Kumar, “Machine Learning Based Customer Churn Prediction In Banking,” in 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, Nov. 2020, pp. 1196–1201. doi: 10.1109/ICECA49313.2020.9297529.

A. R. Safitri and M. A. Muslim, “Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms,” J. Soft Comput. Explor., vol. 1, no. 1, Sep. 2020, doi: 10.52465/joscex.v1i1.5.

S. J. Haddadi, A. Farshidvard, F. dos S. Silva, J. C. dos Reis, and M. da Silva Reis, “Customer churn prediction in imbalanced datasets with resampling methods: A comparative study,” Expert Syst. Appl., vol. 246, p. 123086, Jul. 2024, doi: 10.1016/j.eswa.2023.123086.

C. Rao, Y. Xu, X. Xiao, F. Hu, and M. Goh, “Imbalanced customer churn classification using a new multi-strategy collaborative processing method,” Expert Syst. Appl., vol. 247, p. 123251, Aug. 2024, doi: 10.1016/j.eswa.2024.123251.

S. Ougiaroglou, K. I. Diamantaras, and G. Evangelidis, “Exploring the effect of data reduction on Neural Network and Support Vector Machine classification,” Neurocomputing, vol. 280, pp. 101–110, Mar. 2018, doi: 10.1016/j.neucom.2017.08.076.

J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Inf. Sci. (Ny)., vol. 507, pp. 772–794, Jan. 2020, doi: 10.1016/j.ins.2019.06.064.

M. S. Santos, J. P. Soares, P. H. Abreu, H. Araújo, and J. A. M. Santos, “Cross-Validation for Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches [Research Frontier],” IEEE Comput. Intell. Mag., vol. 13, pp. 59–76, 2018, [Online]. Available: https://api.semanticscholar.org/CorpusID:52986310

Y. Kim, Y. Kwon, and M. C. Paik, “Valid oversampling schemes to handle imbalance,” Pattern Recognit. Lett., vol. 125, pp. 661–667, Jul. 2019, doi: 10.1016/j.patrec.2019.07.006.

N. Mustafa, L. Sook Ling, and S. F. Abdul Razak, “Customer churn prediction for telecommunication industry: A Malaysian Case Study,” F1000Research, vol. 10, p. 1274, Dec. 2021, doi: 10.12688/f1000research.73597.1.

Abstract viewed = 49 times