Early Detection of Diabetes Using Random Forest Algorithm
Main Article Content
Abstract
Diabetes is one of the most chronic and deadly diseases. According to data from WHO in 2021, there were approximately 422 million adults living with diabetes worldwide, and this number is expected to continue to increase in the future due to various factors. Many studies have been conducted for early detection of diabetes by focusing on improving accuracy. However, a big problem in diabetes prediction is the selection of the right classification algorithm. This study aims to improve the accuracy of early detection of diabetes by implementing the Random Forest algorithm model. This research was conducted with the stages of data collection, data preprocessing, split data, modeling, and evaluation. This research uses the Pima Indian Diabetes data set. The results showed that the diabetes early detection model using the Random Forest algorithm produced an accuracy of 87%. This research shows that by using the Random Forest algorithm model, the performance of early detection of diabetes can be improved. However, there is still room for optimization of this performance, which is recommended for further research to carry out feature selection, data balancing, more complex model building, and exploring larger data.
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
S. Kaul and Y. Kumar, “Artificial Intelligence-based Learning Techniques for Diabetes Prediction: Challenges and Systematic Review,” SN Computer Science, vol. 1, no. 6. Springer, Nov. 01, 2020. doi: 10.1007/s42979-020-00337-2.
Institute of Electrical and Electronics Engineers, 2018 2nd International Conference on Informatics and Computational Sciences (ICICoS).
J. February, O. S. Abe, O. O. Obe, O. K. Boyinbode, and O. N. Biodun, “Classifier Algorithms and Ensemble Models for Diabetes Mellitus Prediction: A Review,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 10, no. 1, pp. 430–439, 2021, doi: 10.30534/ijatcse/2021/641012021.
G. Li, S. Peng, C. Wang, J. Niu, and Y. Yuan, “An energy-efficient data collection scheme using denoising autoencoder in wireless sensor networks,” Tsinghua Science and Technology, vol. 24, no. 1, pp. 86–96, 2019, doi: 10.26599/TST.2018.9010002.
P. Arsi and O. Somantri, “Deteksi Dini Penyakit Diabetes Menggunakan Algoritma Neural Network Berbasiskan Algoritma Genetika,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 3, no. 3, pp. 290–294, 2018, doi: 10.30591/jpit.v3i3.1008.
S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: A review of classification and combining techniques,” Artificial Intelligence Review, vol. 26, no. 3, pp. 159–190, 2006, doi: 10.1007/s10462-007-9052-3.
F. A. Jaber and J. W. James, “Early Prediction of Diabetic Using Data Mining,” SN Computer Science, vol. 4, no. 2, pp. 1–7, 2023, doi: 10.1007/s42979-022-01594-z.
S. Saxena, D. Mohapatra, S. Padhee, and G. K. Sahoo, “Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms,” Evolutionary Intelligence, no. 0123456789, 2021, doi: 10.1007/s12065-021-00685-9.
H. Zhou, R. Myrzashova, and R. Zheng, “Diabetes prediction model based on an enhanced deep neural network,” Eurasip Journal on Wireless Communications and Networking, vol. 2020, no. 1, Dec. 2020, doi: 10.1186/s13638-020-01765-7.
A. S. Mahajan, “Medical Diagnosis of Diabetes Using Deep Learning Techniques and Big data Analytics,” Journal of Emerging Technologies and Innovative Research, vol. 7, no. 4, pp. 1490–1497, 2020.
R. Birjais, A. K. Mourya, R. Chauhan, and H. Kaur, “Prediction and diagnosis of future diabetes risk: a machine learning approach,” SN Applied Sciences, vol. 1, no. 9, pp. 1–8, 2019, doi: 10.1007/s42452-019-1117-9.
L. J. Muhammad, E. A. Algehyne, and S. S. Usman, “Predictive Supervised Machine Learning Models for Diabetes Mellitus,” SN Computer Science, vol. 1, no. 5, pp. 1–10, 2020, doi: 10.1007/s42979-020-00250-8.
N. Sneha and T. Gangil, “Analysis of diabetes mellitus for early prediction using optimal features selection,” Journal of Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0175-6.
A. Doğru, S. Buyrukoğlu, and M. Arı, “A hybrid super ensemble learning model for the early-stage prediction of diabetes risk,” Medical and Biological Engineering and Computing, vol. 61, no. 3, pp. 785–797, 2023, doi: 10.1007/s11517-022-02749-z.
“UCI Machine Learning. Pima Indians Diabetes Database.” [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
R. Rousyati, A. N. Rais, E. Rahmawati, and R. F. Amir, “Prediksi Pima Indians Diabetes Database Dengan Ensemble Adaboost Dan Bagging,” EVOLUSI : Jurnal Sains dan Manajemen, vol. 9, no. 2, pp. 36–42, 2021, doi: 10.31294/evolusi.v9i2.11159.
M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and prediction of diabetes disease using machine learning paradigm,” Health Information Science and Systems, vol. 8, no. 1, pp. 1–14, 2020, doi: 10.1007/s13755-019-0095-z.
L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
T. N. Nuklianggraita, A. Adiwijaya, and A. Aditsania, “On the Feature Selection of Microarray Data for Cancer Detection based on Random Forest Classifier,” Jurnal Infotel, vol. 12, no. 3, pp. 89–96, 2020, doi: 10.20895/infotel.v12i3.485.
J. Jumanto, M. A. Muslim, Y. Dasril, and T. Mustaqim, “Accuracy of Malaysia Public Response to Economic Factors During the Covid-19 Pandemic Using Vader and Random Forest,” Journal of Information System Exploration and Research, vol. 1, no. 1, pp. 49–70, 2023.
S. Shah, X. Luo, S. Kanakasabai, R. Tuason, and G. Klopper, “Neural networks for mining the associations between diseases and symptoms in clinical notes,” Health Information Science and Systems, vol. 7, no. 1, pp. 1–9, 2019, doi: 10.1007/s13755-018-0062-0.
S. Benbelkacem and B. Atmani, “Random forests for diabetes diagnosis,” 2019 International Conference on Computer and Information Sciences, ICCIS 2019, pp. 1–4, 2019, doi: 10.1109/ICCISci.2019.8716405.