The Classification of Hate Comments on Twitter Using a Combination of Logistic Regression and Support Vector Machine Algorithm
Main Article Content
Abstract
This research was conducted to increase accuracy in classifying sentences containing hate speech and non-hate speech on Twitter. This is important to do because, as technology develops, it also comes with negative impacts, one of which is hate speech. This classification is carried out using a combination of Logistic Regression (LR) and Support Vector Machine (SVM) methods. This combination is based on the ease of implementation and speed of LR as well as SVM's ability to handle more complex and non-linear data. In this context, LR is used to model the probability that a comment on Twitter contains hate elements or not. The model can then provide probability predictions for each class, and a threshold can be set to determine the final class. This research shows that combining these methods can build a good classification model with an accuracy of 96%.
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
B. Mathew et al., “Thou shalt not hate: Countering online hate speech,” in Proceedings of the international AAAI conference on web and social media, 2019, vol. 13, pp. 369–380.
N. Alkiviadou, “Hate speech on social media networks: towards a regulatory framework?,” Inf. Commun. Technol. Law, vol. 28, no. 1, pp. 19–35, 2019.
S. Ullmann and M. Tomalin, “Quarantining online hate speech: technical and ethical perspectives,” Ethics Inf. Technol., vol. 22, pp. 69–80, 2020.
K. Florio, V. Basile, M. Polignano, P. Basile, and V. Patti, “Time of your hate: The challenge of time in hate speech detection on social media,” Appl. Sci., vol. 10, no. 12, p. 4180, 2020.
M. R. S. P. Pamungkas, M. N. Huda, D. A. Fauzan, A. H. Itsna, and F. M. Al-Hijri, “Sistem Klasifikasi Otomatis Dengan Konsep Machine Learning As A Service (MLaaS) Pada Kasus Pesan Berindikasi Cyberbullying,” Ilk. J. Comput. Sci. Appl. Informatics, vol. 4, no. 3, pp. 252–261, 2022.
S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder, “Hate speech detection: Challenges and solutions,” PLoS One, vol. 14, no. 8, p. e0221152, 2019.
F. Alkomah and X. Ma, “A literature review of textual hate speech detection methods and datasets,” Information, vol. 13, no. 6, p. 273, 2022.
K. R. Sylwander, “Affective atmospheres of sexualized hate among youth online: A contribution to bullying and cyberbullying research on social atmosphere,” Int. J. bullying Prev., vol. 1, no. 4, pp. 269–284, 2019.
A. Tontodimamma, E. Nissi, A. Sarra, and L. Fontanella, “Thirty years of research into hate speech: topics of interest and their evolution,” Scientometrics, vol. 126, pp. 157–179, 2021.
J. C. Pereira-Kohatsu, L. Quijano-Sánchez, F. Liberatore, and M. Camacho-Collados, “Detecting and monitoring hate speech in Twitter,” Sensors, vol. 19, no. 21, p. 4654, 2019.
G. O. Ganfure, “Comparative analysis of deep learning based Afaan Oromo hate speech detection,” J. Big Data, vol. 9, no. 1, pp. 1–13, 2022.
E. Barendt, “What is the harm of hate speech?,” Ethical Theory Moral Pract., vol. 22, pp. 539–553, 2019.
G. Nguyen et al., “Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey,” Artif. Intell. Rev., vol. 52, no. 1, pp. 77–124, 2019, doi: 10.1007/s10462-018-09679-z.
L. Vrysis et al., “A web interface for analyzing hate speech,” Futur. Internet, vol. 13, no. 3, p. 80, 2021.
M. S. Jahan and M. Oussalah, “A systematic review of Hate Speech automatic detection using Natural Language Processing.,” Neurocomputing, p. 126232, 2023.
R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, vol. 7, pp. 41525–41550, 2019, doi: 10.1109/ACCESS.2019.2895334.
I. Temitayo, S. Solomon, and S. Oyelere, A systematic review of teaching and learning machine learning in K ‑ 12 education, vol. 28, no. 5. Springer US, 2023. doi: 10.1007/s10639-022-11416-7.
D. Ni, “Machine learning in recycling business : an investigation of its practicality , benefits and future trends,” vol. 7, pp. 7907–7927, 2021, doi: 10.1007/s00500-021-05579-7.
H. Ghoddusi, G. G. Creamer, and N. Rafizadeh, “Machine learning in energy economics and finance : A review,” Energy Econ., vol. 81, pp. 709–727, 2019, doi: 10.1016/j.eneco.2019.05.006.
F. Emmanuel, O. Folorunso, F. Thomas, I. Ademola, and A. Abayomi-alli, “A probabilistic clustering model for hate speech classification in twitter,” Expert Syst. Appl., vol. 173, no. February 2020, p. 114762, 2021, doi: 10.1016/j.eswa.2021.114762.
A. Toosi, “Twitter Sentiment Analysis,” 2019. https://www.kaggle.com/
X. Zou, “Logistic Regression Model Optimization and Case Analysis,” pp. 135–139, 2019.
M. Arya and C. S. S. Bedi, “Survey on SVM and their application in image classification,” Int. J. Inf. Technol., vol. 13, no. 5, pp. 1867–1877, 2021, doi: 10.1007/s41870-017-0080-1.
M. Sheykhmousa, M. Mahdianpari, H. Ghanbari, F. Mohammadimanesh, P. Ghamisi, and S. Member, “Support Vector Machine Versus Random Forest for Remote Sensing Image Classification : A Meta-Analysis and Systematic Review,” vol. 13, pp. 6308–6325, 2020.
K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the text classification,” Augment. Hum. Res., vol. 5, pp. 1–16, 2020.
G. Ambrish, B. Ganesh, A. Ganesh, C. Srinivas, and K. Mensinkal, “Logistic regression technique for prediction of cardiovascular disease,” vol. 3, no. April, pp. 127–130, 2022, doi: 10.1016/j.gltp.2022.04.008.
P. S. B. Ginting, B. Irawan, and C. Setianingsih, “Hate speech detection on Twitter using multinomial logistic regression classification method,” in 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), 2019, pp. 105–111.
F. E. Ayo, O. Folorunso, F. T. Ibharalu, and I. A. Osinuga, “Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions,” Comput. Sci. Rev., vol. 38, p. 100311, 2020.
H. Mehta and K. Passi, “Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI),” Algorithms, vol. 15, no. 8, p. 291, 2022.
B. Vidgen and T. Yasseri, “Detecting weak and strong Islamophobic hate speech on social media,” J. Inf. Technol. Polit., vol. 17, no. 1, pp. 66–78, 2020.
F. R. Irawan, A. Jazuli, and T. Khotimah, “Analisis Sentimen Terhadap Pengguna Gojek Menggunakan Metode K-Nearset Neighbors,” JIKO (Jurnal Inform. dan Komputer), vol. 5, no. 1, pp. 62–68, 2022.
D. Gupta, A. Choudhury, U. Gupta, and P. Singh, “Computational approach to clinical diagnosis of diabetes disease : a comparative study,” pp. 30091–30116, 2021.
R. H. Situngkir, “Analisis Regresi Logistik untuk Menentukan Faktor-Faktor yang Mempengaruhi Kesejahteraan Masyarakat Kabupaten/Kota di Pulau Nias.” Universitas Sumatera Utara, 2022.
H. Sulastomo, R. Ramadiansyah, K. Gibran, E. Maryansyah, and A. Tegar, “Analisis Sentimen Pada Twitter@ Ovo_Id dengan Metode Support Vectore Machine (SVM),” J-SAKTI (Jurnal Sains Komput. dan Inform., vol. 6, no. 2, pp. 1050–1056, 2022.
M. R. A. Nasution and M. Hayaty, “Comparison of Accuracy and Processing Time of K-NN and SVM Algorithms in Twitter Sentiment Analysis,” J. Inf., vol. 6, no. 2, 2019.
M. A. R. Reynaldhi and Y. Sibaroni, “Analisis Sentimen Review Film Pada Twitter Menggunakan Metode Klasifikasi Hybrid Svm, Naïve Bayes, Dan Decision Tree,” eProceedings Eng., vol. 8, no. 5, 2021.
Y. S. Triyantono, S. Al Faraby, and M. Dwifebri, “Analisis Sentimen Terhadap Ulasan Film Menggunakan Word2Vec dan SVM,” eProceedings Eng., vol. 8, no. 4, 2021.
M. Manzo and S. Pellino, “Voting in transfer learning system for ground-based cloud classification,” Mach. Learn. Knowl. Extr., vol. 3, no. 3, pp. 542–553, 2021.
K. Harimoorthy and M. Thangavelu, “Multi-disease prediction model using improved SVM-radial bias technique in healthcare monitoring system,” J. Ambient Intell. Humaniz. Comput., vol. 12, no. 3, pp. 3715–3723, 2021, doi: 10.1007/s12652-019-01652-0.
V. Chang, “Pima Indians diabetes mellitus classification based on machine learning ( ML ) algorithms,” Neural Comput. Appl., vol. 35, no. 22, pp. 16157–16173, 2023, doi: 10.1007/s00521-022-07049-z.
R. T. Mutanga, N. Naicker, and O. O. Olugbara, “Detecting Hate Speech on Twitter Network using Ensemble Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 3, 2022.
D. C. Asogwa, C. I. Chukwuneke, C. C. Ngene, and G. N. Anigbogu, “Hate speech classification using SVM and naive BAYES,” arXiv Prepr. arXiv2204.07057, 2022.
E. Omran, E. Al Tararwah, and J. Al Qundus, “A comparative analysis of machine learning algorithms for hate speech detection in social media,” Online J. Commun. Media Technol., vol. 13, no. 4, p. e202348, 2023.