Implementation of text summarization on indonesian scientific articles using textrank algorithm with TF-IDF web-based

Main Article Content

Jeremia Jordan Sihombing
Arnita Arnita
Said Iskandar Al Idrus
Debi Yandra Niska

Abstract

The development of information technology has significantly changed how information is accessed, necessitating readers to absorb content efficiently and make quick decisions. To address this challenge, this research developed a text summarization system specifically for Indonesian scientific articles using a web-based implementation of the TextRank and TF-IDF algorithms. TextRank was selected for its capability to identify key sentences without requiring training data, while TF-IDF was employed to weight words based on their frequency within the document. The dataset comprised 100 scientific articles in Indonesian from the Unimed Kode Journal, covering the years 2022-2024. The summarization process included several critical stages: text preprocessing, TF-IDF weighting, cosine similarity calculation, and sentence ranking. The resulting summaries were rigorously evaluated by language experts and website specialists using a Likert scale to assess both the quality of the summaries and the usability of the system. The findings demonstrated that the system effectively generated summaries that retained essential information from the original articles, with the highest accuracy observed at a 50% compression rate (88.533%). Additionally, the system achieved good performance at 40% compression (85.133%) and 30% compression (81.26%). The web-based system allows users to input article text and quickly obtain a summary, offering a practical tool for researchers and readers to efficiently comprehend academic content.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
J. J. Sihombing, A. Arnita, S. I. . Al Idrus, and D. Y. Niska, “Implementation of text summarization on indonesian scientific articles using textrank algorithm with TF-IDF web-based”, J. Soft Comput. Explor., vol. 5, no. 3, pp. 310-319, Dec. 2024.
Section
Articles

References

M. Ngafifi, “Kemajuan Teknologi Dan Pola Hidup Manusia Dalam Perspektif Sosial Budaya,” TUTURAN J. Ilmu Komunikasi, Sos. dan Hum., vol. 1, no. 3, pp. 33–47, 2023, doi: 10.47861/tuturan.v1i3.272.

S. Suherman, “Online Research Skills (ORS): Solusi Literasi Mahasiswa Perguruan Tinggi di Masa Pandemi Covid-19,” IQRA` J. Ilmu Perpust. dan Inf., vol. 16, no. 1, p. 70, 2022, doi: 10.30829/iqra.v16i1.10153.

Kemendikbud, “Garuda - Garba Rujukan Digital.” Accessed: Feb. 20, 2024. [Online]. Available: https://garuda.kemdikbud.go.id/home/graphic

D. Fatmalasari and F. R. Lumbanraja, “Peringkasan Teks Artikel Ilmiah Berbahasa Indonesia dengan Metode Pembobotan Kalimat,” J. Pepadun, vol. 3, no. 3, pp. 314–322, 2022, doi: 10.23960/pepadun.v3i3.127.

D. Fitrianah and R. N. Jauhari, “Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units,” Bull. Electr. Eng. Informatics, vol. 11, no. 1, pp. 150–157, 2022, doi: 10.11591/eei.v11i1.3278.

A. P. Widyassari et al., “Review of automatic text summarization techniques & methods,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1029–1046, 2022, doi: 10.1016/j.jksuci.2020.05.006.

Q. A. Itsnaini, M. Hayaty, A. D. Putra, and N. A. . Jabari, “Abstractive Text Summarization using Pre-Trained Language Model ‘Text-to-Text Transfer Transformer (T5),’” Ilk. J. Ilm., vol. 15, no. 1, pp. 124–131, 2023, doi: 10.33096/ilkom.v15i1.1532.124-131.

V. Gupta and G. S. Lehal, “A Survey of Text Summarization Extractive techniques,” J. Emerg. Technol. Web Intell., vol. 2, no. 3, pp. 258–268, 2010, doi: 10.4304/jetwi.2.3.258-268.

R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization,” 6th Int. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc., no. i, pp. 1–12, 2018.

V. K. Manojkumar, S. Mathi, and X. Z. Gao, “An Experimental Investigation on Unsupervised Text Summarization for Customer Reviews,” Procedia Comput. Sci., vol. 218, pp. 1692–1701, 2022, doi: 10.1016/j.procs.2023.01.147.

M. F. Hidayattullah and A. Azizi, “Peringkas Otomatis Teks Berbahasa Arab Menggunakan Algoritma TextRank,” J. Ilm. Inform., vol. 6, no. 1, pp. 33–42, 2021, doi: 10.35316/jimi.v6i1.1231.

N. K. Widyasanti, I. K. G. Darma Putra, and N. K. Dwi Rusjayanthi, “Seleksi Fitur Bobot Kata dengan Metode TFIDF untuk Ringkasan Bahasa Indonesia,” J. Ilm. Merpati (Menara Penelit. Akad. Teknol. Informasi), vol. 6, no. 2, p. 119, 2018, doi: 10.24843/jim.2018.v06.i02.p06.

R. Ramadhan, Y. A. Sari, and P. P. Adikara, “Perbandingan Pembobotan Term Frequency-Inverse Document Frequency dan Term Frequency-Relevance Frequency terhadap Fitur N-Gram pada Analisis Sentimen,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 11, pp. 5075–5079, 2021, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/10173

P. Kodicherla, S. R. Sathineni, and J. S. Sai, “Comparative Analysis of TextRank and Latent Semantic Analysis Algorithms for Extractive News Summarization,” 2023 3rd Asian Conf. Innov. Technol. ASIANCON 2023, pp. 1–6, 2023, doi: 10.1109/ASIANCON58793.2023.10270050.

F. Barrios, F. López, L. Argerich, and R. Wachenchauzer, “Variations of the Similarity Function of TextRank for Automated Summarization,” 2016, [Online]. Available: http://arxiv.org/abs/1602.03606

Eris, V. C. M, and J. Pragantha, “PENERAPAN ALGORITMA TEXTRANK UNTUK AUTOMATIC SUMMARIZATION PADA DOKUMEN BERBAHASA INDONESIA,” J. Ilmu Tek. dan Komput., vol. 1, no. 1, pp. 71–78, 2017.

Y. Ananda Kresna, I. Cholissodin, and Indriati, “Peringkasan Teks Menggunakan Metode Maximum Marginal Relevance terhadap Artikel Berita terkait COVID-19,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 9, pp. 3901–3907, 2021, [Online]. Available: http://j-ptiik.ub.ac.id

I. Apriani, Y. Sibaroni, and I. Palupi, “Perbandingan Pembobotan Fitur TF-IDF dan TF-ABS Dalam Klasifikasi Berita Online Menggunakan Support Vector Machine (SVM),” e-Proceeding Eng., vol. 10, no. 3, pp. 3652–3663, 2023.

M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” 2022.

D. W. Septiana and P. B. Pastika, “Classification of travel class with k-nearest neighbors algorithm using rapidminer,” J. Student Res. Explor., vol. 2, no. 2, pp. 88–100, Jul. 2024, doi: 10.52465/josre.v2i2.357.

V. A. Savitri, M. Sa’id, H. Husni, and A. Muntasa, “A sentiment analysis of madura island tourism news using C4.5 algorithm,” J. Soft Comput. Explor., vol. 5, no. 1, pp. 9–17, 2024, doi: 10.52465/joscex.v5i1.258.

E. T. Wijaya, “Perancangan Information Retrieval (IR) Berbasis Term Frequency-Inverse Document Frequency (TF-IDF) Untuk Peringkasan Teks Tugas Khusus Berbahasa Indonesia,” J. Ilm. Teknol. Inf. Asia, vol. 7, no. 1, pp. 79–93, 2013.

R. Samuel, R. Natan, and U. Syafiqoh, “Application of Cosine Similarity and K-Nearest Neighbor (K-NN) in Classification and Book Search,” J. Big Data Anal. Artif. Intell., vol. 1, no. 1, pp. 9–14, 2018.

M. F. Abdurrafi and D. H. U. Ningsih, “Content-based filtering using cosine similarity algorithm for alternative selection on training programs,” J. Soft Comput. Explor., vol. 4, no. 4, pp. 204–212, 2023, doi: 10.52465/joscex.v4i4.232.

R. Mihalcea and P. Tarau, “TextRank: Bringing order into texts,” Proc. 2004 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2004 - A Meet. SIGDAT, a Spec. Interes. Gr. ACL held conjunction with ACL 2004, vol. 85, pp. 404–411, 2004.

E. T. ARYANI and H. KURNIANINGSIH, “Pengaruh Inovasi Produk, Kepercayaan Merek, Dan Keragaman Produk Terhadap Keputusan Pembelian Honda Beat Di Surakarta,” J. Maneksi, vol. 12, no. 2, pp. 368–377, 2023, doi: 10.31959/jm.v12i2.1488.

J. Steinberger and K. Ježek, “Evaluation measures for text summarization,” Comput. Informatics, vol. 28, no. 2, pp. 251–275, 2009.

W. Handiwidjojo and L. Ernawati, “Pengukuran Tingkat Ketergunaan ( Usability ) Sistem Informasi Keuangan,” Juisi, vol. 02, no. 01, pp. 49–55, 2016.

Abstract viewed = 68 times