Implementation of text summarization on indonesian scientific articles using textrank algorithm with TF-IDF web-based

Jeremia Jordan Sihombing; Arnita Arnita; Said Iskandar  Al Idrus; Debi Yandra Niska

doi:10.52465/joscex.v5i3.475

PDF

Published: Dec 3, 2024

DOI: https://doi.org/10.52465/joscex.v5i3.475

Article Metrics

Jeremia Jordan Sihombing

Department of Mathematics, Universitas Negeri Medan, Indonesia

Arnita Arnita

Department of Mathematics, Universitas Negeri Medan, Indonesia

Said Iskandar Al Idrus

Department of Mathematics, Universitas Negeri Medan, Indonesia

Debi Yandra Niska

Department of Mathematics, Universitas Negeri Medan, Indonesia

Abstract

The development of information technology has significantly changed how information is accessed, necessitating readers to absorb content efficiently and make quick decisions. To address this challenge, this research developed a text summarization system specifically for Indonesian scientific articles using a web-based implementation of the TextRank and TF-IDF algorithms. TextRank was selected for its capability to identify key sentences without requiring training data, while TF-IDF was employed to weight words based on their frequency within the document. The dataset comprised 100 scientific articles in Indonesian from the Unimed Kode Journal, covering the years 2022-2024. The summarization process included several critical stages: text preprocessing, TF-IDF weighting, cosine similarity calculation, and sentence ranking. The resulting summaries were rigorously evaluated by language experts and website specialists using a Likert scale to assess both the quality of the summaries and the usability of the system. The findings demonstrated that the system effectively generated summaries that retained essential information from the original articles, with the highest accuracy observed at a 50% compression rate (88.533%). Additionally, the system achieved good performance at 40% compression (85.133%) and 30% compression (81.26%). The web-based system allows users to input article text and quickly obtain a summary, offering a practical tool for researchers and readers to efficiently comprehend academic content.

Downloads

Download data is not yet available.

How to Cite

[1]

J. J. Sihombing, A. Arnita, S. I. . Al Idrus, and D. Y. Niska, “Implementation of text summarization on indonesian scientific articles using textrank algorithm with TF-IDF web-based”, J. Soft Comput. Explor., vol. 5, no. 3, pp. 310-319, Dec. 2024.

Issue

Vol. 5 No. 3 (2024): September 2024

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

M. Ngafifi, “Kemajuan Teknologi Dan Pola Hidup Manusia Dalam Perspektif Sosial Budaya,” TUTURAN J. Ilmu Komunikasi, Sos. dan Hum., vol. 1, no. 3, pp. 33–47, 2023, doi: 10.47861/tuturan.v1i3.272.

S. Suherman, “Online Research Skills (ORS): Solusi Literasi Mahasiswa Perguruan Tinggi di Masa Pandemi Covid-19,” IQRA` J. Ilmu Perpust. dan Inf., vol. 16, no. 1, p. 70, 2022, doi: 10.30829/iqra.v16i1.10153.

Kemendikbud, “Garuda - Garba Rujukan Digital.” Accessed: Feb. 20, 2024. [Online]. Available: https://garuda.kemdikbud.go.id/home/graphic

D. Fatmalasari and F. R. Lumbanraja, “Peringkasan Teks Artikel Ilmiah Berbahasa Indonesia dengan Metode Pembobotan Kalimat,” J. Pepadun, vol. 3, no. 3, pp. 314–322, 2022, doi: 10.23960/pepadun.v3i3.127.

D. Fitrianah and R. N. Jauhari, “Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units,” Bull. Electr. Eng. Informatics, vol. 11, no. 1, pp. 150–157, 2022, doi: 10.11591/eei.v11i1.3278.

A. P. Widyassari et al., “Review of automatic text summarization techniques & methods,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1029–1046, 2022, doi: 10.1016/j.jksuci.2020.05.006.

Q. A. Itsnaini, M. Hayaty, A. D. Putra, and N. A. . Jabari, “Abstractive Text Summarization using Pre-Trained Language Model ‘Text-to-Text Transfer Transformer (T5),’” Ilk. J. Ilm., vol. 15, no. 1, pp. 124–131, 2023, doi: 10.33096/ilkom.v15i1.1532.124-131.

V. Gupta and G. S. Lehal, “A Survey of Text Summarization Extractive techniques,” J. Emerg. Technol. Web Intell., vol. 2, no. 3, pp. 258–268, 2010, doi: 10.4304/jetwi.2.3.258-268.

R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization,” 6th Int. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc., no. i, pp. 1–12, 2018.

V. K. Manojkumar, S. Mathi, and X. Z. Gao, “An Experimental Investigation on Unsupervised Text Summarization for Customer Reviews,” Procedia Comput. Sci., vol. 218, pp. 1692–1701, 2022, doi: 10.1016/j.procs.2023.01.147.

M. F. Hidayattullah and A. Azizi, “Peringkas Otomatis Teks Berbahasa Arab Menggunakan Algoritma TextRank,” J. Ilm. Inform., vol. 6, no. 1, pp. 33–42, 2021, doi: 10.35316/jimi.v6i1.1231.

N. K. Widyasanti, I. K. G. Darma Putra, and N. K. Dwi Rusjayanthi, “Seleksi Fitur Bobot Kata dengan Metode TFIDF untuk Ringkasan Bahasa Indonesia,” J. Ilm. Merpati (Menara Penelit. Akad. Teknol. Informasi), vol. 6, no. 2, p. 119, 2018, doi: 10.24843/jim.2018.v06.i02.p06.

R. Ramadhan, Y. A. Sari, and P. P. Adikara, “Perbandingan Pembobotan Term Frequency-Inverse Document Frequency dan Term Frequency-Relevance Frequency terhadap Fitur N-Gram pada Analisis Sentimen,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 11, pp. 5075–5079, 2021, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/10173

P. Kodicherla, S. R. Sathineni, and J. S. Sai, “Comparative Analysis of TextRank and Latent Semantic Analysis Algorithms for Extractive News Summarization,” 2023 3rd Asian Conf. Innov. Technol. ASIANCON 2023, pp. 1–6, 2023, doi: 10.1109/ASIANCON58793.2023.10270050.

F. Barrios, F. López, L. Argerich, and R. Wachenchauzer, “Variations of the Similarity Function of TextRank for Automated Summarization,” 2016, [Online]. Available: http://arxiv.org/abs/1602.03606

Eris, V. C. M, and J. Pragantha, “PENERAPAN ALGORITMA TEXTRANK UNTUK AUTOMATIC SUMMARIZATION PADA DOKUMEN BERBAHASA INDONESIA,” J. Ilmu Tek. dan Komput., vol. 1, no. 1, pp. 71–78, 2017.

Y. Ananda Kresna, I. Cholissodin, and Indriati, “Peringkasan Teks Menggunakan Metode Maximum Marginal Relevance terhadap Artikel Berita terkait COVID-19,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 9, pp. 3901–3907, 2021, [Online]. Available: http://j-ptiik.ub.ac.id

I. Apriani, Y. Sibaroni, and I. Palupi, “Perbandingan Pembobotan Fitur TF-IDF dan TF-ABS Dalam Klasifikasi Berita Online Menggunakan Support Vector Machine (SVM),” e-Proceeding Eng., vol. 10, no. 3, pp. 3652–3663, 2023.

M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” 2022.

D. W. Septiana and P. B. Pastika, “Classification of travel class with k-nearest neighbors algorithm using rapidminer,” J. Student Res. Explor., vol. 2, no. 2, pp. 88–100, Jul. 2024, doi: 10.52465/josre.v2i2.357.

V. A. Savitri, M. Sa’id, H. Husni, and A. Muntasa, “A sentiment analysis of madura island tourism news using C4.5 algorithm,” J. Soft Comput. Explor., vol. 5, no. 1, pp. 9–17, 2024, doi: 10.52465/joscex.v5i1.258.

E. T. Wijaya, “Perancangan Information Retrieval (IR) Berbasis Term Frequency-Inverse Document Frequency (TF-IDF) Untuk Peringkasan Teks Tugas Khusus Berbahasa Indonesia,” J. Ilm. Teknol. Inf. Asia, vol. 7, no. 1, pp. 79–93, 2013.

R. Samuel, R. Natan, and U. Syafiqoh, “Application of Cosine Similarity and K-Nearest Neighbor (K-NN) in Classification and Book Search,” J. Big Data Anal. Artif. Intell., vol. 1, no. 1, pp. 9–14, 2018.

M. F. Abdurrafi and D. H. U. Ningsih, “Content-based filtering using cosine similarity algorithm for alternative selection on training programs,” J. Soft Comput. Explor., vol. 4, no. 4, pp. 204–212, 2023, doi: 10.52465/joscex.v4i4.232.

R. Mihalcea and P. Tarau, “TextRank: Bringing order into texts,” Proc. 2004 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2004 - A Meet. SIGDAT, a Spec. Interes. Gr. ACL held conjunction with ACL 2004, vol. 85, pp. 404–411, 2004.

E. T. ARYANI and H. KURNIANINGSIH, “Pengaruh Inovasi Produk, Kepercayaan Merek, Dan Keragaman Produk Terhadap Keputusan Pembelian Honda Beat Di Surakarta,” J. Maneksi, vol. 12, no. 2, pp. 368–377, 2023, doi: 10.31959/jm.v12i2.1488.

J. Steinberger and K. Ježek, “Evaluation measures for text summarization,” Comput. Informatics, vol. 28, no. 2, pp. 251–275, 2009.

W. Handiwidjojo and L. Ernawati, “Pengukuran Tingkat Ketergunaan ( Usability ) Sistem Informasi Keuangan,” Juisi, vol. 02, no. 01, pp. 49–55, 2016.

Abstract viewed = 543 times

Implementation of text summarization on indonesian scientific articles using textrank algorithm with TF-IDF web-based

Abstract

Downloads

References

Similar Articles

Most read articles by the same author(s)

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Similar Articles

Most read articles by the same author(s)