Optimize Naïve Bayes Classifier Using Chi Square and Term Frequency Inverse Document Frequency For Amazon Review Sentiment Analysis
Main Article Content
Abstract
The rapid development of the internet has made information flow rapidly wich has an impact on the world of commerce. Some people who have bought a product will write their opinion on social media or other online site. Long-text buyer reviews need a machine to recognize opinions. Sentiment analysis applies the text mining method. One of the methods applied in sentiment analysis is classification. One of the classification algorithms is the naïve bayes classifier. Naïve bayes classifier is a classification method with good efficiency and performance. However, it is very sensitive with too many features, wich makes the accuracy low. To improve the accuracy of the naïve bayes classifier algorithm it can be done by selecting features. One of the feature selection is chi square. The selection of features with chi square calculation based on the top-K value that has been determined, namely 450. In addition, weighting features can also improve the accuracy of the naïve bayes classifier algorithm. One of the feature weighting techniques is term frequency inverse document frequency (TF-IDF). In this study, using sentiment labelled dataset (field amazon_labelled) obtained from UCI Machine Learning. This dataset has 500 positive reviews and 500 negative reviews. The accuracy of the naïve bayes classifier in the amazon review sentiment analysis was 82%. Meanwhile, the accuracy of the naïve bayes classifier by applying chi square and TF-IDF is 83%.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
A. Nurzahputra and M. A. Muslim, “Analisis sentimen pada opini mahasiswa menggunakan natural language processing,” in Seminar Nasional Ilmu Komputer (SNIK 2016), 2016, pp. 114–118.
H. Muhamad, C. A. Prasojo, N. A. Sugianto, L. Surtiningsih, and I. Cholissodin, “Optimasi naïve bayes classifier dengan menggunakan particle swarm optimization pada data iris,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), p-ISSN, pp. 2355–7699, 2017.
A. R. Safitri and M. A. Muslim, “Improved accuracy of naive bayes classifier for determination of customer churn uses smote and genetic algorithms,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 70–75, 2020.
J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classification with Naïve Bayes,” Expert Syst. Appl., vol. 36, no. 3, pp. 5432–5435, 2009.
L. D. Utami and R. S. Wahono, “Integrasi metode information gain untuk seleksi fitur dan adaboost untuk mengurangi bias pada analisis sentimen review restoran menggunakan algoritma naive bayes,” J. Intell. Syst., vol. 1, no. 2, pp. 120–126, 2015.
S. Wang, D. Li, X. Song, Y. Wei, and H. Li, “A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification,” Expert Syst. Appl., vol. 38, no. 7, pp. 8696–8702, 2011.
U. I. Larasati, M. A. Muslim, R. Arifudin, and A. Alamsyah, “Improve the accuracy of support vector machine using chi square statistic and term frequency inverse document frequency on movie review sentiment analysis,” Sci. J. Inform., vol. 6, no. 1, pp. 138–149, 2019.
K. Oh, C.-G. Lim, S. S. Kim, and H.-J. Choi, “Research trend analysis using word similarities and clusters,” Int. J. Multimed. Ubiquitous Eng., vol. 8, no. 1, pp. 185–196, 2013.
S. Qaiser and R. Ali, “Text mining: use of TF-IDF to examine the relevance of words to documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 25–29, 2018.
A. A. Hakim, A. Erwin, K. I. Eng, M. Galinium, and W. Muliady, “Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach,” in 2014 6th int. conf. inf. technol. electr. eng. (ICITEE), 2014, pp. 1–4.
M. Liu and J. Yang, “An improvement of TFIDF weighting in text categorization,” Int. proc. comput. sci. inf. technol., vol. 47, pp. 44–47, 2012.
A. Moh’d A Mesleh, “Chi square feature extraction based svms arabic language text categorization system,” J. Comput. Sci., vol. 3, no. 6, pp. 430–435, 2007.
M. Govindarajan, “Sentiment analysis of restaurant reviews using hybrid classification method,” Int. J. Soft Comput. Artif. Intell. (IJSCAI), vol. 2, no. 1, pp. 17–23, 2014.