Comparison of gridsearchcv and bayesian hyperparameter optimization in random forest algorithm for diabetes prediction
Main Article Content
Abstract
Diabetes Mellitus (DM) is a chronic disease whose complications have a significant impact on patients and the wider community. In its early stages, diabetes mellitus usually does not cause significant symptoms, but if it is detected too late and not handled properly, it can cause serious health problems. To overcome these problems, diabetes detection is one of the solutions used. In this research, diabetes detection was carried out using Random Forest with gridsearchcv and bayesian hyperparameter optimization. The research was carried out through the stages of study literature, model development using Kaggle Notebook, model testing, and results analysis. This study aims to compare GridSearchCV and Bayesian hyperparameter optimizations, then analyze the advantages and disadvantages of each optimization when applied to diabetes prediction using the Random Forest algorithm. From the research conducted, it was found that GridSearchCV and Bayesian hyperparameter optimization have their own advantages and disadvantages. The GridSearchCV hyperparameter excels in terms of accuracy of 0.74, although it takes longer for 338,416 seconds. On the other hand, Bayesian hyperparameter optimization has a lower accuracy rate than GridSearchCV optimization with a difference of 0.01, which is 0.73 and takes less time than GridSearchCV for 177,085 seconds.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Z. Punthakee, R. Goldenberg, and P. Katz, “Definition, Classification and Diagnosis of Diabetes, Prediabetes and Metabolic Syndrome,” Can. J. Diabetes, vol. 42, pp. S10–S15, Apr. 2018, doi: 10.1016/J.JCJD.2017.10.003.
R. A. Pamungkas, A. M. Usman, K. Chamroonsawasdi, and Abdurrasyid, “A smartphone application of diabetes coaching intervention to prevent the onset of complications and to improve diabetes self-management: A randomized control trial,” Diabetes Metab. Syndr. Clin. Res. Rev., vol. 16, no. 7, p. 102537, Jul. 2022, doi: 10.1016/J.DSX.2022.102537.
A. Viloria, Y. Herazo-Beltran, D. Cabrera, and O. B. Pineda, “Diabetes Diagnostic Prediction Using Vector Support Machines,” Procedia Comput. Sci., vol. 170, pp. 376–381, 2020, doi: 10.1016/j.procs.2020.03.065.
S. C. Gupta and N. Goel, “Predictive Modeling and Analytics for Diabetes using Hyperparameter tuned Machine Learning Techniques,” Procedia Comput. Sci., vol. 218, pp. 1257–1269, 2023, doi: 10.1016/j.procs.2023.01.104.
M. Ramadhan, I. Sitanggang, F. NASUTION, and A. GHIFARI, “Parameter Tuning in Random Forest Based on Grid Search Method for Gender Classification Based on Voice Frequency,” DEStech Trans. Comput. Sci. Eng., vol. 11, no. 9, Oct. 2017, doi: 10.12783/dtcse/cece2017/14611.
S. G. C. G and B. Sumathi, “Grid Search Tuning of Hyperparameters in Random Forest Classifier for Customer Feedback Sentiment Prediction,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 9, 2020, doi: 10.14569/IJACSA.2020.0110920.
V. Shalamov, V. Efimova, and A. Filchenkov, “Faster Hyperparameter Optimization via Finding Minimal Regions in Random Forest Regressor,” Procedia Comput. Sci., vol. 212, pp. 378–386, 2022, doi: 10.1016/j.procs.2022.11.022.
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for Hyper-Parameter Optimization,” in Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2011. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf
M. AKTURK, “Diabetes Dataset,” 2020. https://www.kaggle.com/datasets/mathchi/diabetes-data-set (accessed May 19, 2023).
G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” in 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), IEEE, Mar. 2019, pp. 1–5. doi: 10.1109/I2CT45611.2019.9033691.
T. T. Joy, S. Rana, S. Gupta, and S. Venkatesh, “Hyperparameter tuning for big data using Bayesian optimisation,” in 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, Dec. 2016, pp. 2574–2579. doi: 10.1109/ICPR.2016.7900023.
J. You, S. A. S. van der Klein, E. Lou, and M. J. Zuidhof, “Application of random forest classification to predict daily oviposition events in broiler breeders fed by precision feeding system,” Comput. Electron. Agric., vol. 175, p. 105526, Aug. 2020, doi: 10.1016/j.compag.2020.105526.
L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
J. Jumanto, M. A. Muslim, Y. Dasril, and T. Mustaqim, “Accuracy of Malaysia Public Response to Economic Factors During the Covid-19 Pandemic Using Vader and Random Forest,” J. Inf. Syst. Explor. Res., vol. 1, no. 1, pp. 49–70, Dec. 2022, doi: 10.52465/joiser.v1i1.104.
Q. Li and G. Clifford, “Signal Processing: False Alarm Reduction,” in Secondary Analysis of Electronic Health Records, 2016, pp. 391–403. doi: 10.1007/978-3-319-43742-2_27.
J. Wu, “Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization,” Journal of Electronic Science and Technology, vol. 17, no. 20190104. p. 26, 2019. doi: 10.11989/JEST.1674-862X.80904120.
Y. Chen et al., “Bayesian optimization based random forest and extreme gradient boosting for the pavement density prediction in GPR detection,” Constr. Build. Mater., vol. 387, p. 131564, Jul. 2023, doi: 10.1016/j.conbuildmat.2023.131564.