Comparative study of pre-trained RoBERTa sentiment models and zero-shot LLM on indonesian and english texts

Akmal Faiz Agiputra; Jumanto Unjung; Budi Prasetiyo; Nurrizky Arum Jatmiko

doi:10.52465/joscex.v6i4.639

PDF

Published: Mar 9, 2026

DOI: https://doi.org/10.52465/joscex.v6i4.639

Article Metrics

Keywords:

Sentiment analysis, RoBERTa, Zero-shot learning, Large language models, Multilingual text

Akmal Faiz Agiputra

Universitas Negeri Semarang

Jumanto Unjung

Universitas Negeri Semarang

Budi Prasetiyo

Universitas Negeri Semarang

Nurrizky Arum Jatmiko

Universitas Negeri Semarang

Abstract

The growth of user-generated content on social media has increased the need for effective sentiment analysis methods. Although fine-tuned transformer-based models and zero-shot large language models (LLMs) have both been applied to sentiment classification, comparisons across languages under unified evaluation settings remain limited. This study examines the trade-offs between task-specific fine-tuning and instruction-based zero-shot inference for multilingual sentiment classification. Experiments were conducted using two publicly available Twitter sentiment datasets in Indonesian and English, each annotated into three sentiment classes. Fine-tuned RoBERTa-based models were evaluated on full test sets, while all models, including a zero-shot LLM, were compared using an identical controlled subset. Performance was assessed using accuracy and macro-averaged precision, recall, and F1-score, with macro F1-score as the primary metric. The results show that fine-tuned RoBERTa-based models achieve stable and balanced performance across sentiment classes, with monolingual models consistently outperforming multilingual variants. Under controlled evaluation, zero-shot LLMs demonstrate competitive performance in English but remain less effective in Indonesian, indicating that their effectiveness is influenced by language resource availability. Overall, this study provides a controlled comparison of the strengths and limitations of fine-tuned and zero-shot approaches for multilingual sentiment classification.

Downloads

Download data is not yet available.

How to Cite

[1]

A. F. Agiputra, J. Unjung, B. Prasetiyo, and N. A. Jatmiko, “Comparative study of pre-trained RoBERTa sentiment models and zero-shot LLM on indonesian and english texts”, J. Soft Comput. Explor., vol. 6, no. 4, pp. 303-310, Mar. 2026.

Issue

Vol. 6 No. 4 (2025): December 2025

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

K. L. Tan, C. P. Lee, K. M. Lim, and K. S. M. Anbananthen, “Sentiment Analysis With Ensemble Hybrid Deep Learning Model,” IEEE Access, vol. 10, pp. 103694–103704, 2022, doi: 10.1109/ACCESS.2022.3210182.

L. Yang, Y. Li, J. Wang, and R. S. Sherratt, “Sentiment Analysis for E-Commerce Product Reviews in Chinese Based on Sentiment Lexicon and Deep Learning,” IEEE Access, vol. 8, pp. 23522–23530, 2020, doi: 10.1109/ACCESS.2020.2969854.

K. L. Tan, C. P. Lee, and K. M. Lim, “A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research,” Apr. 01, 2023, MDPI. doi: 10.3390/app13074550.

H. M. U. Ali, Q. Farooq, A. Imran, and K. el Hindi, “A systematic literature review on sentiment analysis techniques, challenges, and future trends,” May 01, 2025, Springer Science and Business Media Deutschland GmbH. doi: 10.1007/s10115-025-02365-x.

M. Birjali, M. Kasri, and A. Beni-Hssane, “A comprehensive survey on sentiment analysis: Approaches, challenges and trends,” Knowl Based Syst, vol. 226, Aug. 2021, doi: 10.1016/j.knosys.2021.107134.

J. Cui, Z. Wang, S. B. Ho, and E. Cambria, “Survey on sentiment analysis: evolution of research methods and topics,” Artif Intell Rev, vol. 56, no. 8, pp. 8469–8510, Aug. 2023, doi: 10.1007/s10462-022-10386-z.

N. V. Babu and E. G. M. Kanaga, “Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review,” 2022, Springer. doi: 10.1007/s42979-021-00958-1.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed Tools Appl, vol. 82, no. 3, pp. 3713–3744, Jan. 2023, doi: 10.1007/s11042-022-13428-4.

Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” Apr. 01, 2024, King Saud bin Abdulaziz University. doi: 10.1016/j.jksuci.2024.102048.

T. Islam et al., “Lexicon and Deep Learning-Based Approaches in Sentiment Analysis on Short Texts,” Journal of Computer and Communications, vol. 12, no. 01, pp. 11–34, 2024, doi: 10.4236/jcc.2024.121002.

Q. Li et al., “A Survey on Text Classification: From Traditional to Deep Learning,” Apr. 01, 2022, Association for Computing Machinery. doi: 10.1145/3495162.

A. Vaswani et al., “Attention Is All You Need,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/1706.03762

N. A. Sharma, A. B. M. S. Ali, and M. A. Kabir, “A review of sentiment analysis: tasks, applications, and deep learning techniques,” Apr. 01, 2025, Springer Science and Business Media Deutschland GmbH. doi: 10.1007/s41060-024-00594-x.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” May 2019, [Online]. Available: http://arxiv.org/abs/1810.04805

Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” July 2019, [Online]. Available: http://arxiv.org/abs/1907.11692

F. Barbieri, J. Camacho-Collados, L. Neves, and L. Espinosa-Anke, “TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.12421

C. Wu, B. Ma, Z. Zhang, N. Deng, Y. He, and Y. Xue, “Evaluating zero-shot multilingual aspect-based sentiment analysis with large language models,” International Journal of Machine Learning and Cybernetics, Oct. 2025, doi: 10.1007/s13042-025-02711-z.

M. E. Chatzimina, H. A. Papadaki, C. Pontikoglou, and M. Tsiknakis, “A Comparative Sentiment Analysis of Greek Clinical Conversations Using BERT, RoBERTa, GPT-2, and XLNet,” Bioengineering, vol. 11, no. 6, June 2024, doi: 10.3390/bioengineering11060521.

H. U. Khan, A. Naz, F. K. Alarfaj, and N. Almusallam, “Analyzing student mental health with RoBERTa-Large: a sentiment analysis and data analytics approach,” Front Big Data, vol. 8, 2025, doi: 10.3389/fdata.2025.1615788.

B. Paneru, B. Thapa, and B. Paneru, “Sentiment analysis of movie reviews: A flask application using CNN with RoBERTa embeddings,” Systems and Soft Computing, vol. 7, Dec. 2025, doi: 10.1016/j.sasc.2025.200192.

B. Setiadi, E. Purwanto, and H. Permatasari, "Optimisasi klasifikasi sentimen pada review hotel bahasa Inggris dengan model RoBERTa Twitter," SINTECH Journal, vol. 7, no. 2, pp. 70–79, 2024, doi: 10.31598/sintechjournal.v7i2.1547.

A. Jaya, “Analisis Sentimen pandangan public terhadap profesi PNS (Pegawai Negeri Sipil) dari Twiter menerapkan Indonesian Roberta Base Sentiment Classifier,” Indonesian Journal of Data and Science (IJODAS), vol. 4, no. 1, pp. 38–44, 2023, doi: 10.56705/ijodas.v4i1.66.

Z. Maryam et al., "Sentiment analysis on social media posts using RoBERTa: a deep learning approach for text classification," JCBI, vol. 9, no. 1, 2025.

U. Sirisha and B. S. Chandana, "Aspect based sentiment and emotion analysis with RoBERTa and LSTM," International Journal of Advanced Computer Science and Applications, vol. 13, no. 11, pp. 766–774, 2022, doi: 10.14569/IJACSA.2022.0131189.

A. P. Maretta and A. Meiriza, "Aspect-based sentiment analysis of hospital service reviews using fine-tuned IndoBERT," Journal of Applied Informatics and Computing (JAIC), vol. 9, no. 5, pp. 2541–2551, 2025, doi: 10.30871/jaic.v9i5.10765.

N. Nurhasiyah, R. Dwiyansaputra, S. I. Murpratiwi, and A. Aranta, "Analisis sentimen pengguna platform media sosial X pada topik pemilihan presiden 2024 menggunakan perbandingan model monolingual dan multilingual BERT," JATI (Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 1, pp. 626–634, 2025, doi: 10.36040/jati.v9i1.12430.

Ardiansyah, A. Sri Widagdo, K. N. Qodri, F. E. N. Saputro, and N. A. Rizky, "Analisis sentimen terhadap pelayanan kesehatan berdasarkan ulasan Google Maps menggunakan BERT," Jurnal FASILKOM (Teknologi Informasi dan Ilmu Komputer), vol. 13, no. 2, pp. 326–333, 2023, doi: 10.37859/jf.v13i02.5170.

M. M. A. Paramarta, R. Dwiyansaputra, and R. P. Rassy, "Performance analysis of multilingual and monolingual models in predicting Indonesian language emotion using Twitter dataset," Jurnal Teknologi Informasi, Komputer dan Aplikasinya (JTIKA), vol. 7, no. 2, pp. 237–246, 2025, doi: 10.29303/jtika.v7i2.482.

Khen Dedes, Fatimatuzzahra, M. Hermansyah, A. B. Setiawan, R. P. Pradana, and A. F. M. Harvyanti, “BERT Sentimen: Fine-Tuning Multibahasa untuk Ulasan Bahasa Indonesia,” Jurnal Komputer Teknologi Informasi Sistem Informasi (JUKTISI), vol. 4, no. 2, pp. 1080–1084, Sept. 2025, doi: 10.62712/juktisi.v4i2.585.

Z. Wang, Q. Xie, Y. Feng, Z. Ding, Z. Yang, and R. Xia, “Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2304.04339

A. H. Nasution, A. Onan, Y. Murakami, W. Monika, and A. Hanafiah, “Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets,” IEEE Access, vol. 13, pp. 94009–94025, 2025, doi: 10.1109/ACCESS.2025.3574629.

I. Muhammad and M. Rospocher, “On Assessing the Performance of LLMs for Target-Level Sentiment Analysis in Financial News Headlines,” Algorithms, vol. 18, no. 1, Jan. 2025, doi: 10.3390/a18010046.

A. Widiarta, “Indonesian Twitter Sentiment Analysis Dataset-PPKM,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/anggapurnama/twitter-dataset-ppkm. Accessed: Des. 2025.

CrowdFlower, “Twitter US Airline Sentiment,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment. Accessed: Jan. 2025.

Abstract viewed = 0 times

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)