Comparison of LSTM, SVM, and naive bayes for classifying sexual harassment tweets

Main Article Content

Tiara Lailatul Nikmah
Muhammad Zhafran Ammar
Yusuf Ridwan Allatif
Rizki Mahjati Prie Husna
Putu Ayu Kurniasari
Andi Syamsul Bahri

Abstract

Twitter is now a very open and extensive social media; anyone can freely express their opinion on any topic on social media. The content or discussion on Twitter is also quite diverse and unlimited. However, because it is unlimited, many misuse it for negative things. One of them is verbal sexual harassment through Twitter. This research aims to identify sexual harassment in an Indonesian tweet using sentiment analysis using the LSTM, SVM, and naive bayes methods with text normalization. In this study, 2990 tweets in the Indonesian language were tested from 4th to 6th in May 2022. The Twitter data shows that tweets included in sexual harassment are more than those not included in sexual harassment, totaling 2026 data. From the results of the evaluation of tweet data classification using text normalization with LSTM, the accuracy is 84.62%, SVM is 86.54%, and naive bayes is 85.45%. Using the SVM algorithm with text normalization gets the highest accuracy compared to LSTM and naive bayes in classifying Indonesian sexual harassment tweets.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
T. Lailatul Nikmah, M. Z. Ammar, Y. R. Allatif, R. M. P. Husna, P. A. Kurniasari, and A. S. Bahri, “Comparison of LSTM, SVM, and naive bayes for classifying sexual harassment tweets”, J. Soft Comput. Explor., vol. 3, no. 2, pp. 131 - 137, Sep. 2022.
Section
Articles

References

C. Carr and R. Hayes, “Social Media: Defining, Developing, and Divining,” Atl. J. Commun., vol. 23, pp. 46–65, 2015.

L. R. Zhong, M. R. Kebbell, and J. L. Webster, “An exploratory study of technology-facilitated Sexual Violence in online romantic interactions: Can the Internet’s toxic disinhibition exacerbate sexual aggression?,” Comput. Hum. Behav., vol. 108, p. 106314, 2020.

R. Sagayam, S. Srinivasan, and S. Roshni, “A Survey of Text Mining: Retrieval, Extraction and Indexing Techniques,” Int. J. Comput. Eng. Res., vol. 2, no. 5, pp. 2250–3005, 2012.

S. Pal, S. Ghosh, and A. Nag, “Sentiment Analysis in the Light of LSTM Recurrent Neural Networks,” Int. J. Synth. Emot., vol. 9, pp. 33–39, 2018.

A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset,” in 8th Int. Conf. Syst. Model. Adv. Res. Trends, 2019, pp. 266–270.

B. Le and H. Nguyen, “Twitter Sentiment Analysis Using Machine Learning Techniques,” in Adv. Comput. Methods Knowl. Eng., 2015, pp. 279–289.

I. E. Tiffani, “Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 1–7, 2020.

A. Yadav and D. Kumar, “Sentiment analysis using deep learning architectures : a review,” Artif. Intell. Rev., vol. 53, no. 6, pp. 4335–4385, 2020.

R. Adelia, S. Suyanto, and U. N. Wisesty, “Indonesian Indonesian Abstractive Abstractive Text Text Summarization Summarization Using Using Bidirectional Bidirectional Gated Recurrent Unit Gated Recurrent Unit,” Procedia Comput. Sci., vol. 157, pp. 581–588, 2019.

S. Qaiser and R. Ali, “Text Mining : Use of TF-IDF to Examine the Relevance of Words to Documents Text Mining : Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 0975 – 8887, 2018.

A. A. Hakim, A. Erwin, K. I. Eng, M. Galinium, and W. Muliady, “Automated Document Classification for News Article in Bahasa Indonesia based on Term Frequency Inverse Document Frequency ( TF-IDF ) Approach,” in 6th Int. Conf. Inf. Technol. Electr. Eng., 2014, pp. 0–3.

T. U. Haque, N. N. Saber, and F. M. Shah, “Sentiment analysis on large scale Amazon product reviews,” in IEEE Int. Conf. Innov. Res. Dev., 2018, pp. 1–6.

K. Smagulova and A. P. James, “A survey on LSTM memristive neural network architectures and applications,” Eur. Phys. J. Spec. Top., vol. 228, no. 10, pp. 2313–2324, 2019.

T. Saini, G. Tomar, D. Rana, S. Attri, P. Chaturvedi, and V. Dutt, “CloudIoT for pollution monitoring: A multivariate weighted ensemble forecasting approach for prediction of suspended particulate matter,” in CloudIoT: Concepts Paradig. Appl., CRC Press, 2020.

A. Pulver and S. Lyu, “LSTM with working memory,” in Int. Jt. Conf. Neural Netw., 2017, pp. 845–851.

I. Augenstein, T. Rocktäschel, A. Vlachos, and K. Bontcheva, “Stance Detection with Bidirectional Conditional Encoding,” in Proc. 2016 Conf. Empir. Methods Nat. Lang. Process., Nov. 2016, pp. 876–885.

W. B. Trihanto, R. Arifudin, and M. A. Muslim, “Information Retrieval System for Determining The Title of Journal Trends in Indonesian Language Using TF-IDF and Naive Bayes Classifier,” Sci. J. Inform., vol. 4, no. 2, pp. 179–190, 2017.

H. A. Santoso, E. H. Rachmawanto, and U. Hidayati, “Fake Twitter Account Classification of Fake News Spreading Using Naïve Bayes,” Sci. J. Inform., vol. 7, no. 2, pp. 228–237, 2020.

R. L. Mustofa and B. Prasetiyo, “Sentiment analysis using lexicon-based method with naive bayes classifier algorithm on #newnormal hashtag in twitter,” in J. Phys.: Conf. Ser., 2021, vol. 1918, no. 4.

D. M. Freeman, “Using Naive Bayes to detect spammy names in social networks,” Proc. ACM Conf. Comput. Commun. Secur., pp. 3–12, 2013.

Walid and Alamsyah, “Naïve Bayesian classifier algorithm and neural network time series for identification of lecturer publications in realizing internationalization of Universitas Negeri Semarang,” in J. Phys.: Conf. Ser., 2019, vol. 1321, no. 3.

L. Marlina, M. Muslim, and A. P. U. Siahaan, “Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms),” Int. J. Emerg. Trends Technol. Comput. Sci., vol. 38, pp. 380–383, 2016.

Y. F. Safri, R. Arifudin, and M. A. Muslim, “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor,” Sci. J. Inform., vol. 5, no. 1, p. 18, 2018.

E. Tuba and Z. Stanimirovic, “Elephant herding optimization algorithm for support vector machine parameters tuning,” in 9th Int. Conf. Electron. Comput. Artif. Intell., Jun. 2017, pp. 1–4.

S. Tyagi and S. Mittal, “Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning,” in Proc. ICRIC, 2020, pp. 209–221.

Sulistiana and M. A. Muslim, “Support Vector Machine (SVM) Optimization Using Grid Search and Unigram to Improve E-Commerce Review Accuracy,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 8–15, 2020.

M. Sam’an and Y. N. Ifriza, “Performance comparison of support vector machine and gaussian naive bayes classifier for youtube spam comment detection,” J. Soft Comput. Explor., vol. 2, no. 2, pp. 93–98, 2021.

T. Mustaqim, K. Umam, and M. A. Muslim, “Twitter text mining for sentiment analysis on government’s response to forest fires with vader lexicon polarity detection and k-nearest neighbor algorithm,” in J. Phys.: Conf. Ser., 2020, vol. 1567, no. 3.

Abstract viewed = 1145 times