Improved playstore review sentiment classification accuracy with stacking ensemble

Main Article Content

Dwi Budi Santoso
Aliyatul Munna
Dewi Handayani Untari Ningsih

Abstract

In today's digital era, user reviews on the Playstore platform are an invaluable source of information for developers, offering insights that are critical for service improvement. Previous research has explored the application of stacking ensemble methods, such as in the context of predicting depression among university students, to enhance prediction accuracy. However, these studies often do not explicitly detail the data acquisition process, leaving a gap in understanding the applicability of these methods to different domains. This research aims to bridge this gap by applying the stacking ensemble approach to improve the accuracy of sentiment classification in Playstore reviews, with a clear exposition of the data collection method. Utilizing Logistic Regression as the meta classifier, this methodology is executed in several stages. Initially, data was collected from user reviews of online loan applications on Google Playstore, ensuring transparency in the data acquisition process. The data is then classified using three basic models: Random Forest, Naive Bayes, and SVM. The outputs of these models serve as inputs to the Logistic Regression meta model. A comparison of each base model output with the meta model was subsequently carried out. The test results on the Playstore review dataset demonstrated an increase in accuracy, precision, recall, and F1 score compared to using a single model, achieving an accuracy of 87.05%, which surpasses Random Forest (85.6%), Naive Bayes (85.55%), and SVM (86.5%). This indicates the effectiveness of the stacking ensemble method in providing deeper and more accurate insights into user sentiment, overcoming the limitations of single models and previous research by explicitly addressing data acquisition methods.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
D. B. Santoso, A. Munna, and D. H. Untari Ningsih, “Improved playstore review sentiment classification accuracy with stacking ensemble”, J. Soft Comput. Explor., vol. 5, no. 1, pp. 38-45, Mar. 2024.
Section
Articles

References

L. Rahmawati and D. B. Santoso, “Implementasi Metode Naive Bayes Untuk Klasifikasi Ulasan Aplikasi E-Commerce Tokopedia,” INTECOMS: Journal of Information Technology and Computer Science, vol. 6, no. 1, pp. 116–124, Feb. 2023, doi: 10.31539/intecoms.v6i1.5515.

A. Chader, L. Hamdad, and A. Belkhiri, “Sentiment Analysis in Google Play Store: Algerian Reviews Case,” 2021, pp. 107–121. doi: 10.1007/978-3-030-58861-8_8.

P. K. Jain, R. Pamula, and G. Srivastava, “A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews,” Comput Sci Rev, vol. 41, p. 100413, 2021.

S. N. Singh and T. Sarraf, “Sentiment analysis of a product based on user reviews using random forests algorithm,” in 2020 10th International conference on cloud computing, data science & engineering (Confluence), 2020, pp. 112–116.

I. D. Mienye and Y. Sun, “A survey of ensemble learning: Concepts, algorithms, applications, and prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022.

S. Chen, G. I. Webb, L. Liu, and X. Ma, “A Novel Selective Naive Bayes Algorithm,” Knowl Based Syst, vol. 192, p. 105361, 2020.

J. Wu, P. Guo, Y. Cheng, H. Zhu, X.-B. Wang, and X. Shao, “Ensemble generalized multiclass support-vector-machine-based health evaluation of complex degradation systems,” IEEE/ASME Transactions on Mechatronics, vol. 25, no. 5, pp. 2230–2240, 2020.

A. Al Shorman, H. Faris, and I. Aljarah, “Unsupervised intelligent system based on one class support vector machine and Grey Wolf optimization for IoT botnet detection,” J Ambient Intell Humaniz Comput, vol. 11, pp. 2809–2825, 2020.

A. Daza Vergaray, J. C. H. Miranda, J. B. Cornelio, A. R. López Carranza, and C. F. Ponce Sánchez, “Predicting the depression in university students using stacking ensemble techniques over oversampling method,” Inform Med Unlocked, vol. 41, p. 101295, 2023, doi: 10.1016/j.imu.2023.101295.

T. Wolf et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.

Y. A. Alhaj, M. A. A. Al-qaness, A. Dahou, M. Abd Elaziz, D. Zhao, and J. Xiang, “Effects of light stemming on feature extraction and selection for arabic documents classification,” Recent Advances in NLP: The Case of Arabic Language, pp. 59–79, 2020.

C. N. Noviyanti and A. Alamsyah, “Early Detection of Diabetes Using Random Forest Algorithm,” Journal of Information System Exploration and Research, vol. 2, no. 1, 2024.

G.-W. Cha et al., “Development of a prediction model for demolition waste generation using a random forest algorithm based on small datasets,” Int J Environ Res Public Health, vol. 17, no. 19, p. 6997, 2020.

S. S. Bafjaish, “Comparative analysis of Naive Bayesian techniques in health-related for classification task,” Journal of Soft Computing and Data Mining, vol. 1, no. 2, pp. 1–10, 2020.

E. Y. Boateng, J. Otoo, and D. A. Abaye, “Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: a review,” Journal of Data Analysis and Information Processing, vol. 8, no. 4, pp. 341–357, 2020.

R. Yacouby and D. Axman, “Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models,” in Proceedings of the first workshop on evaluation and comparison of NLP systems, 2020, pp. 79–91.

M. A. Muslim et al., “New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning *,” Intelligent Systems with Applications, vol. 18, no. December 2022, p. 200204, 2023, doi: 10.1016/j.iswa.2023.200204.

M. A. Muslim, Y. Dasril, H. Javed, W. F. Abror, D. A. A. Pertiwi, and T. Mustaqim, “An Ensemble Stacking Algorithm to Improve Model Accuracy in Bankruptcy Prediction,” Journal of Data Science and Intelligent Systems, vol. 1, no. 1, 2023.

R. Rofik, R. Aulia, K. Musaadah, S. S. F. Ardyani, and A. A. Hakim, “The Optimization of Credit Scoring Model Using Stacking Ensemble Learning and Oversampling Techniques,” Journal of Information System Exploration and Research, vol. 2, no. 1, 2024.

A. U. Dullah, F. N. Apsari, and J. Jumanto, “Ensemble learning technique to improve breast cancer classification model,” Journal of Soft Computing Exploration, vol. 4, no. 2, 2023.

Abstract viewed = 236 times