Content-based filtering using cosine similarity algorithm for alternative selection on training programs

Main Article Content

Muhammad Falah Abdurrafi
Dewi Handayani Untari Ningsih


The large selection of training programs provided by the Ministry of Manpower of the Republic of Indonesia makes it difficult for prospective trainees to choose a training program that suits their interests and needs. The purpose of this research is to support the selection process so that an appropriate method is needed to recommend the selection of training programs that match the interests and needs of users. One of the selection methods that can be used is the Content-Based Filtering method with similarity measurement using Cosine Similarity. The content-based filtering method is a content-based filtering method, which recommends training programs based on the suitability between the description of the training program and the interests of prospective trainees using the cosine similarity distance measurement. The test results using the Content-Based Filtering method were able to achieve an average precision value of 88%, indicating the ability of the system to provide training program recommendations that are very relevant and in accordance with the interests and needs of the trainees.


Download data is not yet available.

Article Details

How to Cite
M. F. Abdurrafi and D. H. U. Ningsih, “Content-based filtering using cosine similarity algorithm for alternative selection on training programs”, J. Soft Comput. Explor., vol. 4, no. 4, pp. 204-212, Dec. 2023.


S. Lin and C. Hsu, “A Study of Impact on—Job Trading on Job Performance of Employees in Catering Industry,” Int. J. Organ. Innov., vol. 9, 2017.

N. Gibran and D. Ramadani, “The Effect of Training and Career Development on Employee Performance,” Almana J. Manaj. dan Bisnis, vol. 5, no. 3, pp. 407–415, Dec. 2021, doi: 10.36555/almana.v5i3.1680.

O. Sunardi, M. Widyarini, and J. H. Tjakraatmadja, “The Impact of Sales Forces Training Program to Employees Behaviour Styles (A Quasi-experimental Case Study In a Medium Sized Enterprise),” Procedia Econ. Financ., vol. 4, pp. 264–273, 2012, doi: 10.1016/S2212-5671(12)00341-3.

R. Muzayanah and E. A. Tama, “Application of the Greedy Algorithm to Maximize Advantages of Cutting Steel Bars in the Factory Construction,” J. Student Res. Explor., vol. 1, no. 1, pp. 41–50, Dec. 2022, doi: 10.52465/josre.v1i1.112.

J. Son and S. B. Kim, “Content-based filtering for recommendation systems using multiattribute networks,” Expert Syst. Appl., vol. 89, pp. 404–412, Dec. 2017, doi: 10.1016/j.eswa.2017.08.008.

Y. Afoudi, M. Lazaar, and M. Al Achhab, “Hybrid recommendation system combined content-based filtering and collaborative prediction using artificial neural network,” Simul. Model. Pract. Theory, vol. 113, p. 102375, Dec. 2021, doi: 10.1016/j.simpat.2021.102375.

S. H. Nallamala, U. R. Bajjuri, S. Anandarao, D. D. D. Prasad, and D. P. Mishra, “A Brief Analysis of Collaborative and Content Based Filtering Algorithms used in Recommender Systems,” IOP Conf. Ser. Mater. Sci. Eng., vol. 981, no. 2, p. 022008, Dec. 2020, doi: 10.1088/1757-899X/981/2/022008.

A. Tabassum and D. R. R. Patil, “A Survey on Text Pre-Processing & Feature Extraction Techniques in Natural Language Processing,” 2020. [Online]. Available:

Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation,” J. Big Data, vol. 8, no. 1, p. 26, Dec. 2021, doi: 10.1186/s40537-021-00413-1.

. G. K., “USAGE OF REGULAR EXPRESSIONS IN NLP,” Int. J. Res. Eng. Technol., vol. 03, no. 01, pp. 168–174, Jan. 2014, doi: 10.15623/ijret.2014.0301026.

J. Kaur and P. Buttar, “A Systematic Review on Stopword Removal Algorithms,” vol. 4, pp. 207–210, Apr. 2018.

H. Dwiharyono and S. Suyanto, “Stemming for Better Indonesian Text-to-Phoneme,” Ampersand, vol. 9, p. 100083, 2022, doi: 10.1016/j.amper.2022.100083.

R. Friedman, “Tokenization in the Theory of Knowledge,” Encyclopedia, vol. 3, no. 1, pp. 380–386, Mar. 2023, doi: 10.3390/encyclopedia3010024.

S.-W. Kim and J.-M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-centric Comput. Inf. Sci., vol. 9, no. 1, p. 30, Dec. 2019, doi: 10.1186/s13673-019-0192-7.

F. Alzami, E. D. Udayanti, D. P. Prabowo, and R. A. Megantara, “Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, pp. 235–242, Aug. 2020, doi: 10.22219/kinetik.v5i3.1066.

A. Ridho Lubis, M. K. M. Nasution, O. Salim Sitompul, and E. Muisa Zamzami, “The effect of the TF-IDF algorithm in times series in forecasting word on social media,” Indones. J. Electr. Eng. Comput. Sci., vol. 22, no. 2, p. 976, May 2021, doi: 10.11591/ijeecs.v22.i2.pp976-984.

D. Liu, X. Chen, and D. Peng, “Cosine Similarity Measure between Hybrid Intuitionistic Fuzzy Sets and Its Application in Medical Diagnosis,” Comput. Math. Methods Med., vol. 2018, pp. 1–7, Oct. 2018, doi: 10.1155/2018/3146873.

P. Xia, L. Zhang, and F. Li, “Learning similarity with cosine similarity ensemble,” Inf. Sci. (Ny)., vol. 307, pp. 39–52, Jun. 2015, doi: 10.1016/j.ins.2015.02.024.

M. Gao, Y. Luo, and X. Hu, “Online Course Recommendation Using Deep Convolutional Neural Network with Negative Sequence Mining,” Wirel. Commun. Mob. Comput., vol. 2022, pp. 1–7, Aug. 2022, doi: 10.1155/2022/9054149.

A. Kurniaji and R. C. N. Santi, “Implementasi Metode Content Based Filtering Pada Pemilihan Komik,” J. Inform., vol. 10, no. 2, pp. 109–117, Oct. 2023, doi: 10.31294/inf.v10i2.16113.

H. A. Adyatma and Z. K. A. Baizal, “Book Recommender System Using Matrix Factorization with Alternating Least Square Method,” J. Inf. Syst. Res., vol. 4, no. 4, 2023, doi: 10.47065/josh.v4i4.3816.

E. Y. Utomo, “Recommendation of Yogyakarta tourism based on simple additive weighting under fuzziness,” J. Soft Comput. Explor., vol. 2, no. 1, Mar. 2021, doi: 10.52465/joscex.v2i1.13.

D. Wang, Y. Liang, D. Xu, X. Feng, and R. Guan, “A content-based recommender system for computer science publications,” Knowledge-Based Syst., vol. 157, pp. 1–9, Oct. 2018, doi: 10.1016/j.knosys.2018.05.001.

A. Nurcahya and S. Supriyanto, “Content-based recommender system architecture for similar e-commerce products,” J. Inform., vol. 14, no. 3, p. 90, Sep. 2020, doi: 10.26555/jifo.v14i3.a18511.

H. Taherdoost, “Data Collection Methods and Tools for Research; A Step-by-Step Guide to Choose Data Collection Technique for Academic and Business Research Projects,” Aug. 2021.

V. Singrodia, A. Mitra, and S. Paul, “A Review on Web Scrapping and its Applications,” in 2019 International Conference on Computer Communication and Informatics (ICCCI), IEEE, Jan. 2019, pp. 1–6. doi: 10.1109/ICCCI.2019.8821809.

Y. Fu and Y. Yu, “Research on Text Representation Method Based on Improved TF-IDF,” J. Phys. Conf. Ser., vol. 1486, no. 7, p. 072032, Apr. 2020, doi: 10.1088/1742-6596/1486/7/072032.

N. Febriyanti, D. P. Rini, and O. Arsalan, “Text Similarity Detection Between Documents Using Case Based Reasoning Method with Cosine Similarity Measure (Case Study SIMNG LPPM Universitas Sriwijaya),” Sriwij. J. Informatics Appl., vol. 3, no. 2, Aug. 2022, doi: 10.36706/sjia.v3i2.47.

Ylber Januzaj and Artan Luma, “Cosine Similarity – A Computing Approach to Match Similarity Between Higher Education Programs and Job Market Demands Based on Maximum Number of Common Words,” Int. J. Emerg. Technol. Learn., vol. 17, no. 12, pp. 258–268, Jun. 2022, doi: 10.3991/ijet.v17i12.30375.

N. Hu, “Application of Top-N Rule-based Optimal Recommendation System for Language Education Content based on Parallel Computing,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, 2023, doi: 10.14569/IJACSA.2023.01406110.

R. Padilla, S. L. Netto, and E. A. B. da Silva, “A Survey on Performance Metrics for Object-Detection Algorithms,” in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), IEEE, Jul. 2020, pp. 237–242. doi: 10.1109/IWSSIP48289.2020.9145130.

X. Li, D. Bian, J. Yu, M. Li, and D. Zhao, “Using machine learning models to improve stroke risk level classification methods of China national stroke screening,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, p. 261, Dec. 2019, doi: 10.1186/s12911-019-0998-2.

S. Seo et al., “Predicting Successes and Failures of Clinical Trials With Outer Product–Based Convolutional Neural Network,” Front. Pharmacol., vol. 12, Jun. 2021, doi: 10.3389/fphar.2021.670670.

Abstract viewed = 338 times