News text classification using Long-Term Short Memory (LSTM) algorithm

Main Article Content

Indra Triyadi
Budi Prasetiyo
Tiara Lailatul Nikmah

Abstract

Over the past few years, the classification of texts has become increasingly important. Because knowledge is now available to users through various sources namely electronic media, digital media, print media, and many more. One of them is the development of so much news every day. LSTM is one of the algorithms of deep learning methods that can classify a text. This research proves for the LSTM algorithm on the classification of news text sentences. The data used is the news text from the Kaggle data center set i.e. aggregator news data. The results of the LSTM experiment from 10 epochs obtained with an accuracy value of 93,15% on the classification of texts into four categories, namely entertainment, bussines, science, and health.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
I. Triyadi, B. Prasetiyo, and T. L. Nikmah, “News text classification using Long-Term Short Memory (LSTM) algorithm”, J. Soft Comput. Explor., vol. 4, no. 2, May 2023.
Section
Articles

References

A. D. Arifin, I. Arieshanti, and A. Z. Arifin, “Implementasi algoritma k-nearest neighbor yang berdasarkan one pass clustering untuk kategorisasi teks,” ITS, Surabaya, pp. 1–7, 2012.

A. Y. Rofiqi, “Clustering Berita Olahraga Berbahasa Indonesia Menggunakan Metode K-Medoid Bersyarat,” J. Simantec, vol. 6, no. 1, 2017.

R. Hartono, Y. Wibisono, and R. A. Sukamto, “Damropa (Damage Roads Patrol): Aplikasi Pendeteksi Jalan Rusak Memanfaatkan Accelerometer pada Smartphone,” OSF Prepr., 2017, doi: https://doi.org/10.31219/osf.io/yekpr.

A. Rizaldy and H. A. Santoso, “Performance improvement of Support Vector Machine (SVM) With information gain on categorization of Indonesian news documents,” in 2017 International Seminar on Application for Technology of Information and Communication (iSemantic), 2017, pp. 227–232.

W. B. Trihanto, R. Arifudin, and M. A. Muslim, “Information Retrieval System for Determining The Title of Journal Trends in Indonesian Language Using TF-IDF and Naive Bayes Classifier,” Sci. J. Informatics, vol. 4, no. 2, pp. 179–190, 2017, doi: 10.15294/sji.v4i2.11876.

N. P. Ririanti and A. Purwinarko, “Implementation of Support Vector Machine Algorithm with Correlation-Based Feature Selection and Term Frequency Inverse Document Frequency for Sentiment Analysis Review Hotel,” Sci. J. Informatics, vol. 8, no. 2, pp. 297–303, 2021, doi: 10.15294/sji.v8i2.29992.

U. I. Larasati, M. A. Muslim, R. Arifudin, and A. Alamsyah, “Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis,” Sci. J. Informatics, vol. 6, no. 1, pp. 138–149, 2019, doi: 10.15294/sji.v6i1.14244.

T. L. Nikmah, M. Z. Ammar, Y. R. Allatif, R. M. P. Husna, P. A. Kurniasari, and A. S. Bahri, “Comparison of LSTM , SVM , and Naive Bayes for Classifying Sexual Harassment Tweets,” J. Soft Comput. Explor., vol. 3, no. 2, pp. 131–137, 2022, doi: https://doi.org/10.52465/joscex.v3i2.85.

Sulistiana and M. A. Muslim, “Support Vector Machine (SVM) Optimization Using Grid Search and Unigram to Improve E-Commerce Review Accuracy,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 8–15, 2020.

A. Falasari and M. A. Muslim, “Optimize Naïve Bayes Classifier Using Chi Square and Term Frequency Inverse Document Frequency For Amazon Review Sentiment Analysis,” J. Soft Comput. Explor., vol. 3, no. 1, pp. 31–36, 2022, doi: 10.52465/joscex.v3i1.68.

I. E. Tiffani, “Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 1–7, 2020.

F. Miao, P. Zhang, L. Jin, and H. Wu, “Chinese news text classification based on machine learning algorithm,” in 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2018, vol. 2, pp. 48–51.

P. Barberá, A. E. Boydstun, S. Linn, R. McMahon, and J. Nagler, “Automated text classification of news articles: A practical guide,” Polit. Anal., vol. 29, no. 1, pp. 19–42, 2021.

S. Kaur and N. K. Khiva, “Online news classification using deep learning technique,” Int. Res. J. Eng. Technol., vol. 3, no. 10, pp. 558–563, 2016.

L. Deping, W. Hongjuan, L. Mengyang, and L. Pei, “News text classification based on Bidirectional Encoder Representation from Transformers,” in 2021 International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA), 2021, pp. 137–140, doi: 10.1109/CAIBDA53561.2021.00036.

Y. Zhu, “Research on News Text Classification Based on Deep Learning Convolutional Neural Network,” Wirel. Commun. Mob. Comput., vol. 2021, p. 1508150, 2021, doi: 10.1155/2021/1508150.

N. Sun and C. Du, “News Text Classification Method and Simulation Based on the Hybrid Deep Learning Model,” Complexity, vol. 2021, p. 8064579, 2021, doi: 10.1155/2021/8064579.

W. Zhao, L. Zhu, M. Wang, X. Zhang, and J. Zhang, “WTL-CNN: a news text classification method of convolutional neural network based on weighted word embedding,” Conn. Sci., vol. 34, no. 1, pp. 2291–2312, 2022, doi: 10.1080/09540091.2022.2117274.

C. Li, G. Zhan, and Z. Li, “News Text Classification Based on Improved Bi-LSTM-CNN,” in 2018 9th International Conference on Information Technology in Medicine and Education (ITME), 2018, pp. 890–893, doi: 10.1109/ITME.2018.00199.

M. Shopon, “Bidirectional LSTM with Attention Mechanism for Automatic Bangla News Categorization in Terms of News Captions,” in Electronic Systems and Intelligent Computing, 2020, pp. 763–773.

R. Saputra, A. Waworuntu, and A. Rusli, “Classification of Indonesian News using LSTM-RNN Method,” in 2021 6th International Conference on New Media Studies (CONMEDIA), 2021, pp. 72–77, doi: 10.1109/CONMEDIA53104.2021.9617187.

F. Wang, X. Deng, and L. Hou, “Chinese News Text Multi Classification Based on Naive Bayes Algorithm,” in Proceedings of the 2nd International Symposium on Computer Science and Intelligent Control, 2018, pp. 1–5, doi: 10.1145/3284557.3284704.

Y. Ying, T. N. Mursitama, Shidarta, and Lohansen, “Effectiveness of the News Text Classification Test Using the Naïve Bayes’ Classification Text Mining Method,” J. Phys. Conf. Ser., vol. 1764, no. 1, p. 12105, Feb. 2021, doi: 10.1088/1742-6596/1764/1/012105.

Q. Wang, H. Xu, and Y. Li, “Classification of News Texts Based on Bayes Algorithm,” in Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering, 2022, pp. 1288–1291, doi: 10.1145/3501409.3501636.

U. Parida, M. Nayak, and A. K. Nayak, “News Text Categorization using Random Forest and Naïve Bayes,” in 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON), 2021, pp. 1–4, doi: 10.1109/ODICON50556.2021.9428925.

S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and Support Vector Machine for news classification,” in 2016 IEEE International Conference on Engineering and Technology (ICETECH), 2016, pp. 112–116.

A. A. Khan, S. Jamwal, and M. M. Sepehri, “Applying Data Mining to Customer Churn Prediction in an Internet Service Provider,” Int. J. Comput. Appl., vol. 9, no. 7, pp. 8–14, 2010, doi: 10.5120/1400-1889.

I. A. Kandhro et al., “Classification of Sindhi Headline News Documents based on TF-IDF Text Analysis Scheme,” Indian J. Sci. Technol., vol. 12, no. 33, pp. 1–10, 2019.

B. Herwijayanti, D. E. Ratnawati, and L. Muflikhah, “Klasifikasi Berita Online dengan menggunakan Pembobotan TF-IDF dan Cosine Similarity,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 1, pp. 306–312, 2018.

R. Wongso, F. A. Luwinda, B. C. Trisnajaya, O. Rusli, and others, “News article text classification in Indonesian language,” Procedia Comput. Sci., vol. 116, pp. 137–143, 2017.

X. Li and H. Ning, “Chinese text classification based on hybrid model of CNN and LSTM,” in Proceedings of the 3rd International Conference on Data Science and Information Technology, 2020, pp. 129–134.

X. She and D. Zhang, “Text classification based on hybrid CNN-LSTM hybrid model,” in 2018 11th International Symposium on Computational Intelligence and Design (ISCID), 2018, vol. 2, pp. 185–189.

G. Nergız, Y. Safali, E. Avaroğlu, and S. Erdoğan, “Classification of Turkish News Content by Deep Learning Based LSTM Using Fasttext Model,” in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), 2019, pp. 1–6, doi: 10.1109/IDAP.2019.8875949.

M. Zhang, “Applications of deep learning in news text classification,” Sci. Program., vol. 2021, p. 9, 2021, doi: https://doi.org/10.1155/2021/6095354.

O. V. Putra, A. Musthafa, and K. P. Wibowo, “Klasifikasi Ekspresi Teks Berbahasa Jawa Menggunakan Algoritma Long Short Term Memory,” Komputika J. Sist. Komput., vol. 10, no. 2, pp. 137–143, 2021.

Y. yuli Astari, A. Afiyati, and S. W. Rozaqi, “Analisis Sentimen Multi-Class pada Sosial Media menggunakan metode Long Short-Term Memory (LSTM),” J. Linguist. Komputasional, vol. 4, no. 1, pp. 8–12, 2021.

Y. Widhiyasana, T. Semiawan, I. G. A. Mudzakir, and M. R. Noor, “Penerapan Convolutional Long Short-Term Memory untuk Klasifikasi Teks Berita Bahasa Indonesia,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 10, no. 4, pp. 354–361, 2021.

F. Qian and X. Chen, “Stock prediction based on LSTM under different stability,” in 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), 2019, pp. 483–486.

F. Landi, L. Baraldi, M. Cornia, and R. Cucchiara, “Working memory connections for LSTM,” Neural Networks, vol. 144, pp. 334–341, 2021.

Y. Huang, X. Dai, Q. Wang, and D. Zhou, “A hybrid model for carbon price forecasting using GARCH and long short-term memory network,” Appl. Energy, vol. 285, p. 116485, 2021.

S. Al Faraby and A. Romadhony, “Pengaruh Distribusi Panjang Data Teks pada Klasifikasi: Sebuah Studi Awal,” J. MEDIA Inform. BUDIDARMA, vol. 6, no. 3, pp. 1501–1508, 2022.

T. T. Nguyen et al., “Why globally re-shuffle? Revisiting data shuffling in large scale deep learning,” in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022, pp. 1085–1096.

C.-H. Chen, P.-H. Lin, J.-G. Hsieh, S.-L. Cheng, and J.-H. Jeng, “Robust multi-class classification using linearly scored categorical cross-entropy,” in 2020 3rd IEEE International Conference on Knowledge Innovation and Invention (ICKII), 2020, pp. 200–203.

M. N. Rizaldi, A. Adiwijaya, and S. Al Faraby, “Klasifikasi Argument Pada Teks dengan Menggunakan Metode Multinomial Logistic Regression Terhadap Kasus Pemindahan Ibu Kota Indonesia di Twitter,” J. Media Inform. Budidarma, vol. 4, no. 4, pp. 904–913, 2020.

P. Rodr’iguez, M. A. Bautista, J. Gonzalez, and S. Escalera, “Beyond one-hot encoding: Lower dimensional target embedding,” Image Vis. Comput., vol. 75, pp. 21–31, 2018.

P. Arsi, L. N. Hidayati, and A. Nurhakim, “Komparasi Model Klasifikasi Sentimen Issue Vaksin Covid-19 Berbasis Platform Instagram,” J. MEDIA Inform. BUDIDARMA, vol. 6, no. 1, pp. 459–466, 2022.

C. Seger, “An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing.” 2018.

E. Indrayuni, “Klasifikasi Text Mining Review Produk Kosmetik Untuk Teks Bahasa Indonesia Menggunakan Algoritma Naive Bayes,” J. Khatulistiwa Inform., vol. 7, no. 1, 2019.

S. Bera and V. K. Shrivastava, “Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification,” Int. J. Remote Sens., vol. 41, no. 7, pp. 2664–2683, 2020.

M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of significant features and data mining techniques in predicting heart disease,” Telemat. Informatics, vol. 36, pp. 82–93, 2019.

Abstract viewed = 493 times