Performance comparison of support vector machine and gaussian naive bayes classifier for youtube spam comment detection

Main Article Content

Yahya Nur Ifriza
Muhammad Sam'an


Youtube is a video sharing site that was begun back in 2005. Youtube produces over 400 hours of substance each moment and more than 1 billion hours of substance are devoured by clients every day. In this work, we present a new approach by comparing the analysis results using a support vector machine and the Gaussian Naive Bayes classificatio. Our proposed methodology We used the  dataset from UCI especially Youtube-Shakira for training and testing. The transformed dataset is split into training and testing subsets and fed into Naive Bayes and Support Vector Machin. In all cases, the F1 score was used to evaluate the classifier's performance. The results of the experiment are displayed in Gaussian Naive Bayes with an F1 score of 84.38% and a Support Vector Machine (SVM) with an F1 score of 88.00%. Naive Bayes is consistently the worst performer than SVM.


Download data is not yet available.

Article Details

How to Cite
Y. N. Ifriza and M. Sam’an, “Performance comparison of support vector machine and gaussian naive bayes classifier for youtube spam comment detection”, JOSCEX, vol. 2, no. 2, pp. 93-98, Sep. 2021.


YouTube for Press, “YouTube for Press,” YouTube, 2017. .

K. Budiman, A. T. Putra, Alamsyah, E. Sugiharti, M. A. Muslim, and R. Arifudin, “Implementation of ERP system functionalities for data acquisition based on API at the study program of Universities,” J. Phys. Conf. Ser., vol. 1918, no. 4, 2021, doi: 10.1088/1742-6596/1918/4/042151.

D. O’Callaghan, M. Harrigan, J. Carthy, and P. Cunningham, “Network analysis of recurring YouTube spam campaigns,” in ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media, 2012, pp. 531–534.

E. Cassin, A. Subedar, and M. Wendling, “Glitch in YouTube’s tool for tracking obscene comments,” 2017. .

S. Aiyar and N. P. Shetty, “N-Gram Assisted Youtube Spam Comment Detection,” Procedia Comput. Sci., vol. 132, no. Iccids, pp. 174–182, 2018, doi: 10.1016/j.procs.2018.05.181.

A. Kantchelian, J. Ma, L. Huang, S. Afroz, A. D. Joseph, and J. D. Tygar, “Robust detection of comment spam using entropy rate,” in Proceedings of the ACM Conference on Computer and Communications Security, 2012, pp. 59–69, doi: 10.1145/2381896.2381907.

A. Aziz, C. F. Mohd Foozy, P. Shamala, and Z. Suradi, “YouTube Spam Comment Detection Using Support Vector Machine and K–Nearest Neighbor,” Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 2, p. 612, 2018, doi: 10.11591/ijeecs.v12.i2.pp612-619.

N. Alias, C. F. M. Foozy, and S. N. Ramli, “Video spam comment features selection using machine learning techniques,” Indones. J. Electr. Eng. Comput. Sci., vol. 15, no. 2, pp. 1046–1053, 2019, doi: 10.11591/ijeecs.v15.i2.pp1046-1053.

A. M. Al-Zoubi, H. Faris, J. Alqatawna, and M. A. Hassonah, “Evolving Support Vector Machines using Whale Optimization Algorithm for spam profiles detection on online social networks in different lingual contexts,” Knowledge-Based Syst., vol. 153, pp. 91–104, 2018, doi: 10.1016/j.knosys.2018.04.025.

M. S. Boyd, “(New) participatory framework on YouTube? Commenter interaction in US political speeches,” J. Pragmat., vol. 72, pp. 46–58, 2014, doi: 10.1016/j.pragma.2014.03.002.

M. Chakraborty, S. Pal, R. Pramanik, and C. Ravindranath Chowdary, “Recent developments in social spam detection and combating techniques: A survey,” Inf. Process. Manag., vol. 52, no. 6, pp. 1053–1073, 2016, doi: 10.1016/j.ipm.2016.04.009.

A. Makkar and N. Kumar, “An efficient deep learning-based scheme for web spam detection in IoT environment,” Futur. Gener. Comput. Syst., vol. 108, pp. 467–487, 2020, doi: 10.1016/j.future.2020.03.004.

M. Singh, M. Wasim Bhatt, H. S. Bedi, and U. Mishra, “Performance of bernoulli’s naive bayes classifier in the detection of fake news,” Mater. Today Proc., no. xxxx, 2020, doi: 10.1016/j.matpr.2020.10.896.

J. Yin, Q. Li, S. Liu, Z. Wu, and G. Xua, “Leveraging multi-level dependency of relational sequences for social spammer detection,” Neurocomputing, vol. 428, pp. 130–141, 2020.

M. Ontivero-Ortega, A. Lage-Castellanos, G. Valente, R. Goebel, and M. Valdes-Sosa, “Fast Gaussian Naïve Bayes for searchlight classification analysis,” Neuroimage, vol. 163, pp. 471–479, 2017, doi: 10.1016/j.neuroimage.2017.09.001.

A. Khajenezhad, M. A. Bashiri, and H. Beigy, “A distributed density estimation algorithm and its application to naive Bayes classification,” Appl. Soft Comput., vol. 98, p. 106837, 2021, doi: 10.1016/j.asoc.2020.106837.

Q. He et al., “Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms,” Sci. Total Environ., vol. 663, pp. 1–15, 2019, doi: 10.1016/j.scitotenv.2019.01.329.

P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil, “Lessons from applying the systematic literature review process within the software engineering domain,” J. Syst. Softw., vol. 80, no. 4, pp. 571–583, 2007, doi: 10.1016/j.jss.2006.07.009.

M. Sam and Y. N. Ifriza, “A combination of TDM and KSAM to determine initial feasible solution of transportation problems,” J. Soft Comput. Explor., vol. 2, no. 1, pp. 17–24, 2021, doi: 10.52465/joscex.v2i1.16.

Y. N. Ifriza and M. Sam, “Irrigation management of agricultural reservoir with correlation feature selection based binary particle swarm optimization,” J. Soft Comput. Explor., vol. 2, no. 1, pp. 40–45, 2021, doi: 10.52465/joscex.v2i1.23.

Y. N. Ifriza, C. E. Edi, and J. E. Suseno, “Expert system irrigation management of agricultural reservoir system using analytical hierarchy process (AHP) and forward chaining method,” Proc. of ICMSE, pp. 74–83, 2017.

M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?,” J. Mach. Learn. Res., vol. 15, no. October, pp. 3133–3181, 2014, doi: 10.1117/1.JRS.11.015020.

I. Chaturvedi, E. Cambria, R. E. Welsch, and F. Herrera, “Distinguishing between facts and opinions for sentiment analysis: Survey and challenges,” Inf. Fusion, vol. 44, no. December 2017, pp. 65–77, 2018, doi: 10.1016/j.inffus.2017.12.006.

I. Chaturvedi, E. Ragusa, P. Gastaldo, R. Zunino, and E. Cambria, “Bayesian network based extreme learning machine for subjectivity detection,” J. Franklin Inst., vol. 355, no. 4, pp. 1780–1797, 2018, doi: 10.1016/j.jfranklin.2017.06.007.