Enhancing Abusive Language Detection on Twitter Using Stacking Ensemble Learning

Putri Utami; Yulizchia Malica Pinkan Tanga; Jumanto Unjung; Much Aziz  Muslim

doi:10.52465/joiser.v3i2.594

PDF

Published: Aug 19, 2025

DOI: https://doi.org/10.52465/joiser.v3i2.594

Article Metrics

Keywords:

Abusive language, Twitter, Stacking, Ensemble learning

Putri Utami

Department of Computer Science, Universitas Negeri Semarang, Indonesia

Yulizchia Malica Pinkan Tanga

Department of Computer Science, Universitas Negeri Semarang, Indonesia

Jumanto Unjung

Department of Computer Science, Universitas Negeri Semarang, Indonesia

Much Aziz Muslim

Faculty of Technology Management and Business, Universiti Tun Hussein Onn Malaysia, Malaysia

Abstract

Detecting abusive language on Twitter is an important step in reducing the prevalence of negative content and harassment. This study aims to improve the accuracy and effectiveness of abusive language detection on Twitter by addressing the limitations of the single model commonly used previously. The stacking method is employed by combining Term Frequency-Inverse Document Frequency (TF-IDF) as the feature extraction method, along with the Naive Bayes and XGBoost algorithms as classification models. Naive Bayes is known for its simplicity in handling text classification, while XGBoost excels in processing complex data and achieving high accuracy. The combination of these two models is expected to improve performance in detecting coarse language. The research results show that the proposed model outperforms the methods in previous studies, with an accuracy of 91.91% and an AUC of 96.76%. These findings demonstrate the effectiveness of the stacking approach in reducing classification errors in coarse language detection. Further research could explore the use of larger datasets or more complex models to improve detection accuracy.

How to Cite

Utami, P., Tanga, Y. M. P., Unjung, J., & Muslim, M. A. . (2025). Enhancing Abusive Language Detection on Twitter Using Stacking Ensemble Learning. Journal of Information System Exploration and Research, 3(2), 63-72. https://doi.org/10.52465/joiser.v3i2.594

Issue

Vol. 3 No. 2 (2025): July 2025

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

Y. Wang, J. Guo, C. Yuan, and B. Li, “Sentiment Analysis of Twitter Data,” Appl. Sci., vol.. 12, no. 22, pp. 1–14, 2022, doi: 10.3390/app122211775.

Y. Li and Y. Xie, “Is a Picture Worth a Thousand Words? An Empirical Study of Image Content and Social Media Engagement,” J. Mark. Res., vol. 57, no. 1, pp. 1–19, 2020, doi: 10.1177/0022243719881113.

A. Maleki and K. Holmberg, “Tweeting and retweeting scientific articles: implications for altmetrics,” Scientometrics, no. 0123456789, 2024, doi: 10.1007/s11192-024-05127-8.

Y. Djenouri, A. Belhadi, G. Srivastava, and J. C. W. Lin, “Toward a Cognitive-Inspired Hashtag Recommendation for Twitter Data Analysis,” IEEE Trans. Comput. Soc. Syst., vol. 9, no. 6, pp. 1748–1757, 2022, doi: 10.1109/TCSS.2022.3169838.

E. W. Pamungkas, V. Basile, and V. Patti, Investigating the role of swear words in abusive language detection tasks, vol. 57, no. 1. Springer Netherlands, 2023. doi: 10.1007/s10579-022-09582-8.

R. Shukla and M. Vidhwani, “and Engineering Trends Electricity Theft Detection Using Machine Learning,” vol. 4, no. 9, pp. 2019–2021, 2020.

R. Gupta, J. Kumar, H. Agrawal, and Kunal, “A Statistical Approach for Sarcasm Detection Using Twitter Data,” Proc. Int. Conf. Intell. Comput. Control Syst. ICICCS 2020, no. Iciccs, pp. 633–638, 2020, doi: 10.1109/ICICCS48265.2020.9120917.

R. Arifudin, D. I. Wijaya, B. Warsito, and A. Wibowo, “Voting Classifier Technique and Count Vectorizer with N-gram to Identify Hate Speech and Abusive Tweets in Indonesian,” vol. 10, no. 4, pp. 469–478, 2023, doi: 10.15294/sji.v10i4.46633.

F. Rodriguez-Sanchez, J. Carrillo-De-Albornoz, and L. Plaza, “Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data,” IEEE Access, vol. 8, pp. 219563–219576, 2020, doi: 10.1109/ACCESS.2020.3042604.

M. Amjad, N. Ashraf, G. Sidorov, A. Zhila, L. Chanona-Hernandez, and A. Gelbukh, “Automatic Abusive Language Detection in Urdu Tweets,” Acta Polytech. Hungarica, vol. 19, no. 10, pp. 143–163, 2022, doi: 10.12700/APH.19.10.2022.10.9.

P. Utami, M. R. Ningsih, D. Ananda, and A. Pertiwi, “Sentimen based-emotion classification using bidirectional long,” pp. 281–289, 2024.

K. Tzoumpas, A. Estrada, P. Miraglio, and P. Zambelli, “A Data Filling Methodology for Time Series Based on CNN and (Bi)LSTM Neural Networks,” IEEE Access, vol. 12, no. January, pp. 31443–31460, 2024, doi: 10.1109/ACCESS.2024.3369891.

K. M. El Hindi, R. R. Aljulaidan, and H. AlSalman, “Lazy fine-tuning algorithms for naïve Bayesian text classification,” Appl. Soft Comput. J., vol. 96, p. 106652, 2020, doi: 10.1016/j.asoc.2020.106652.

A. R. Safitri and M. A. Muslim, “Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms,” pp. 70–75, 2020.

S. Li and X. Zhang, “Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm,” Neural Comput. Appl., vol. 32, no. 7, pp. 1971–1979, 2020, doi: 10.1007/s00521-019-04378-4.

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” pp. 46–57, 2019, doi: 10.18653/v1/w19-3506.

I. F. Putra and A. Purwarianti, “Improving Indonesian Text Classification Using Multilingual Language Model,” 2020 7th Int. Conf. Adv. Informatics Concepts, Theory Appl. ICAICTA 2020, 2020, doi: 10.1109/ICAICTA49861.2020.9429038.

Rofik, R. Aulia, K. Musaadah, S. Shafira, F. Ardyani, and A. A. Hakim, “The Optimization of Credit Scoring Model Using Stacking Ensemble Learning and Oversampling Techniques,” J. Inf. Syst. Explor. Res., vol. 2, no. 1, pp. 11–20, 2024.

M. Kayest and S. K. Jain, “Optimization driven cluster based indexing and matching for the document retrieval,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 3, pp. 851–861, 2022, doi: 10.1016/j.jksuci.2019.02.012.

L. C. Chen, “An extended TF-IDF method for improving keyword extraction in traditional corpus-based research: An example of a climate change corpus,” Data Knowl. Eng., vol. 153, no. September 2023, p. 102322, 2024, doi: 10.1016/j.datak.2024.102322.

P. Mohseni and A. Ghorbani, “Exploring the synergy of artificial intelligence in microbiology: Advancements, challenges, and future prospects,” Comput. Struct. Biotechnol. Reports, vol. 1, no. June, p. 100005, 2024, doi: 10.1016/j.csbr.2024.100005.

R. Alanazi and S. Alanazi, “A hybrid NLP and domain validation technique for disposable email detection,” Alexandria Eng. J., vol. 102, no. May, pp. 200–210, 2024, doi: 10.1016/j.aej.2024.05.068.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.

M. Niazkar et al., “Applications of XGBoost in water resources engineering: A systematic literature review (Dec 2018–May 2023),” Environ. Model. Softw., vol. 174, no. January, p. 105971, 2024, doi: 10.1016/j.envsoft.2024.105971.

R. Islam and M. A. Layek, “StackEnsembleMind: Enhancing well-being through accurate identification of human mental states using stack-based ensemble machine learning,” Informatics Med. Unlocked, vol. 43, no. August, p. 101405, 2023, doi: 10.1016/j.imu.2023.101405.

D. Ling, T. Jiang, J. Sun, Y. Wang, Y. Wang, and L. Wang, “An Ensemble Learning System Based on Stacking Strategy for Survival Risk Prediction of Patients with Esophageal Cancer,” Irbm, vol. 45, no. 6, p. 100860, 2024, doi: 10.1016/j.irbm.2024.100860.

M. A. Muslim et al., “New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning,” Intell. Syst. with Appl., vol. 18, no. February, p. 200204, 2023, doi: 10.1016/j.iswa.2023.200204.

A. Parvez, S. D. Ali, H. Tayara, and K. T. Chong, “Stacking based ensemble learning framework for identification of nitrotyrosine sites,” Comput. Biol. Med., vol. 183, no. May, p. 109200, 2024, doi: 10.1016/j.compbiomed.2024.109200.

J. Miao and W. Zhu, “Precision–recall curve (PRC) classification trees,” Evol. Intell., vol. 15, no. 3, pp. 1545–1569, 2022, doi: 10.1007/s12065-021-00565-2.

R. Y. Gultom, F. I. Zulkarnaen, Y. Nurhasanah, and A. Sholahuddin, “Indonesian Abusive Tweet Classification based on Convolutional Neural Network and Long Short Term Memory Method,” 2021 Int. Conf. Artif. Intell. Big Data Anal. ICAIBDA 2021, pp. 121–126, 2021, doi: 10.1109/ICAIBDA53487.2021.9689728.

R. Hendrawan, Adiwijaya, and S. Al Faraby, “Multilabel Classification of Hate Speech and Abusive Words on Indonesian Twitter Social Media,” 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, 2020, doi: 10.1109/ICoDSA50139.2020.9212962.

M. R. Mahardika, I. P. J. Wijaya, A. R. Prayoga, H. Lucky, and I. A. Iswanto, “Exploring the Performance of BERT Models for Multi-Label Hate Speech Detection on Indonesian Twitter,” 2023 4th Int. Conf. Artif. Intell. Data Sci. Discov. Technol. Adv. Artif. Intell. Data Sci. AiDAS 2023 - Proc., pp. 256–261, 2023, doi: 10.1109/AiDAS60501.2023.10284596.

Abstract viewed = 82 times

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)