Improved playstore review sentiment classification accuracy  with stacking ensemble

Dwi Budi Santoso; Aliyatul Munna; Dewi Handayani Untari Ningsih

doi:10.52465/joscex.v5i1.247

PDF

Published: Mar 18, 2024

DOI: https://doi.org/10.52465/joscex.v5i1.247

Article Metrics

Keywords:

Sentiment classification, Playstore review, Stacking ensemble, Logistic regression, Meta model

Dwi Budi Santoso

Department of Information System, Universitas Stikubank, Indonesia

Aliyatul Munna

Department of Master of Information Technology, Universitas Stikubank, Indonesia

Dewi Handayani Untari Ningsih

Department of Informatic Engineering, Universitas Stikubank, Indonesia

Abstract

In today's digital era, user reviews on the Playstore platform are an invaluable source of information for developers, offering insights that are critical for service improvement. Previous research has explored the application of stacking ensemble methods, such as in the context of predicting depression among university students, to enhance prediction accuracy. However, these studies often do not explicitly detail the data acquisition process, leaving a gap in understanding the applicability of these methods to different domains. This research aims to bridge this gap by applying the stacking ensemble approach to improve the accuracy of sentiment classification in Playstore reviews, with a clear exposition of the data collection method. Utilizing Logistic Regression as the meta classifier, this methodology is executed in several stages. Initially, data was collected from user reviews of online loan applications on Google Playstore, ensuring transparency in the data acquisition process. The data is then classified using three basic models: Random Forest, Naive Bayes, and SVM. The outputs of these models serve as inputs to the Logistic Regression meta model. A comparison of each base model output with the meta model was subsequently carried out. The test results on the Playstore review dataset demonstrated an increase in accuracy, precision, recall, and F1 score compared to using a single model, achieving an accuracy of 87.05%, which surpasses Random Forest (85.6%), Naive Bayes (85.55%), and SVM (86.5%). This indicates the effectiveness of the stacking ensemble method in providing deeper and more accurate insights into user sentiment, overcoming the limitations of single models and previous research by explicitly addressing data acquisition methods.

Downloads

Download data is not yet available.

How to Cite

[1]

D. B. Santoso, A. Munna, and D. H. Untari Ningsih, “Improved playstore review sentiment classification accuracy with stacking ensemble”, J. Soft Comput. Explor., vol. 5, no. 1, pp. 38-45, Mar. 2024.

Issue

Vol. 5 No. 1 (2024): March 2024

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

L. Rahmawati and D. B. Santoso, “Implementasi Metode Naive Bayes Untuk Klasifikasi Ulasan Aplikasi E-Commerce Tokopedia,” INTECOMS: Journal of Information Technology and Computer Science, vol. 6, no. 1, pp. 116–124, Feb. 2023, doi: 10.31539/intecoms.v6i1.5515.

A. Chader, L. Hamdad, and A. Belkhiri, “Sentiment Analysis in Google Play Store: Algerian Reviews Case,” 2021, pp. 107–121. doi: 10.1007/978-3-030-58861-8_8.

P. K. Jain, R. Pamula, and G. Srivastava, “A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews,” Comput Sci Rev, vol. 41, p. 100413, 2021.

S. N. Singh and T. Sarraf, “Sentiment analysis of a product based on user reviews using random forests algorithm,” in 2020 10th International conference on cloud computing, data science & engineering (Confluence), 2020, pp. 112–116.

I. D. Mienye and Y. Sun, “A survey of ensemble learning: Concepts, algorithms, applications, and prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022.

S. Chen, G. I. Webb, L. Liu, and X. Ma, “A Novel Selective Naive Bayes Algorithm,” Knowl Based Syst, vol. 192, p. 105361, 2020.

J. Wu, P. Guo, Y. Cheng, H. Zhu, X.-B. Wang, and X. Shao, “Ensemble generalized multiclass support-vector-machine-based health evaluation of complex degradation systems,” IEEE/ASME Transactions on Mechatronics, vol. 25, no. 5, pp. 2230–2240, 2020.

A. Al Shorman, H. Faris, and I. Aljarah, “Unsupervised intelligent system based on one class support vector machine and Grey Wolf optimization for IoT botnet detection,” J Ambient Intell Humaniz Comput, vol. 11, pp. 2809–2825, 2020.

A. Daza Vergaray, J. C. H. Miranda, J. B. Cornelio, A. R. López Carranza, and C. F. Ponce Sánchez, “Predicting the depression in university students using stacking ensemble techniques over oversampling method,” Inform Med Unlocked, vol. 41, p. 101295, 2023, doi: 10.1016/j.imu.2023.101295.

T. Wolf et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.

Y. A. Alhaj, M. A. A. Al-qaness, A. Dahou, M. Abd Elaziz, D. Zhao, and J. Xiang, “Effects of light stemming on feature extraction and selection for arabic documents classification,” Recent Advances in NLP: The Case of Arabic Language, pp. 59–79, 2020.

C. N. Noviyanti and A. Alamsyah, “Early Detection of Diabetes Using Random Forest Algorithm,” Journal of Information System Exploration and Research, vol. 2, no. 1, 2024.

G.-W. Cha et al., “Development of a prediction model for demolition waste generation using a random forest algorithm based on small datasets,” Int J Environ Res Public Health, vol. 17, no. 19, p. 6997, 2020.

S. S. Bafjaish, “Comparative analysis of Naive Bayesian techniques in health-related for classification task,” Journal of Soft Computing and Data Mining, vol. 1, no. 2, pp. 1–10, 2020.

E. Y. Boateng, J. Otoo, and D. A. Abaye, “Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: a review,” Journal of Data Analysis and Information Processing, vol. 8, no. 4, pp. 341–357, 2020.

R. Yacouby and D. Axman, “Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models,” in Proceedings of the first workshop on evaluation and comparison of NLP systems, 2020, pp. 79–91.

M. A. Muslim et al., “New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning *,” Intelligent Systems with Applications, vol. 18, no. December 2022, p. 200204, 2023, doi: 10.1016/j.iswa.2023.200204.

M. A. Muslim, Y. Dasril, H. Javed, W. F. Abror, D. A. A. Pertiwi, and T. Mustaqim, “An Ensemble Stacking Algorithm to Improve Model Accuracy in Bankruptcy Prediction,” Journal of Data Science and Intelligent Systems, vol. 1, no. 1, 2023.

R. Rofik, R. Aulia, K. Musaadah, S. S. F. Ardyani, and A. A. Hakim, “The Optimization of Credit Scoring Model Using Stacking Ensemble Learning and Oversampling Techniques,” Journal of Information System Exploration and Research, vol. 2, no. 1, 2024.

A. U. Dullah, F. N. Apsari, and J. Jumanto, “Ensemble learning technique to improve breast cancer classification model,” Journal of Soft Computing Exploration, vol. 4, no. 2, 2023.

Abstract viewed = 251 times

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References