Comparison of gridsearchcv and bayesian hyperparameter optimization in random forest algorithm for diabetes prediction

Rini Muzayanah; Dwika Ananda Agustina Pertiwi; Muazam Ali; Much Aziz Muslim

doi:10.52465/joscex.v5i1.308

PDF

Published: Apr 2, 2024

DOI: https://doi.org/10.52465/joscex.v5i1.308

Article Metrics

Rini Muzayanah

Department of Computer Science, Universitas Negeri Semarang, Indonesia

Dwika Ananda Agustina Pertiwi

Faculty of Technology Management, Universiti Tun Hussein Onn Malaysia, Johor 86400, Malaysia

Muazam Ali

Faculty of Management Science, HITEC University Taxila, Pakistan

Much Aziz Muslim

Department of Computer Science, Universitas Negeri Semarang, Indonesia

Abstract

Diabetes Mellitus (DM) is a chronic disease whose complications have a significant impact on patients and the wider community. In its early stages, diabetes mellitus usually does not cause significant symptoms, but if it is detected too late and not handled properly, it can cause serious health problems. To overcome these problems, diabetes detection is one of the solutions used. In this research, diabetes detection was carried out using Random Forest with gridsearchcv and bayesian hyperparameter optimization. The research was carried out through the stages of study literature, model development using Kaggle Notebook, model testing, and results analysis. This study aims to compare GridSearchCV and Bayesian hyperparameter optimizations, then analyze the advantages and disadvantages of each optimization when applied to diabetes prediction using the Random Forest algorithm. From the research conducted, it was found that GridSearchCV and Bayesian hyperparameter optimization have their own advantages and disadvantages. The GridSearchCV hyperparameter excels in terms of accuracy of 0.74, although it takes longer for 338,416 seconds. On the other hand, Bayesian hyperparameter optimization has a lower accuracy rate than GridSearchCV optimization with a difference of 0.01, which is 0.73 and takes less time than GridSearchCV for 177,085 seconds.

Downloads

Download data is not yet available.

How to Cite

[1]

R. Muzayanah, D. A. A. Pertiwi, M. Ali, and M. A. Muslim, “Comparison of gridsearchcv and bayesian hyperparameter optimization in random forest algorithm for diabetes prediction”, J. Soft Comput. Explor., vol. 5, no. 1, pp. 86-91, Apr. 2024.

Issue

Vol. 5 No. 1 (2024): March 2024

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

Z. Punthakee, R. Goldenberg, and P. Katz, “Definition, Classification and Diagnosis of Diabetes, Prediabetes and Metabolic Syndrome,” Can. J. Diabetes, vol. 42, pp. S10–S15, Apr. 2018, doi: 10.1016/J.JCJD.2017.10.003.

R. A. Pamungkas, A. M. Usman, K. Chamroonsawasdi, and Abdurrasyid, “A smartphone application of diabetes coaching intervention to prevent the onset of complications and to improve diabetes self-management: A randomized control trial,” Diabetes Metab. Syndr. Clin. Res. Rev., vol. 16, no. 7, p. 102537, Jul. 2022, doi: 10.1016/J.DSX.2022.102537.

A. Viloria, Y. Herazo-Beltran, D. Cabrera, and O. B. Pineda, “Diabetes Diagnostic Prediction Using Vector Support Machines,” Procedia Comput. Sci., vol. 170, pp. 376–381, 2020, doi: 10.1016/j.procs.2020.03.065.

S. C. Gupta and N. Goel, “Predictive Modeling and Analytics for Diabetes using Hyperparameter tuned Machine Learning Techniques,” Procedia Comput. Sci., vol. 218, pp. 1257–1269, 2023, doi: 10.1016/j.procs.2023.01.104.

M. Ramadhan, I. Sitanggang, F. NASUTION, and A. GHIFARI, “Parameter Tuning in Random Forest Based on Grid Search Method for Gender Classification Based on Voice Frequency,” DEStech Trans. Comput. Sci. Eng., vol. 11, no. 9, Oct. 2017, doi: 10.12783/dtcse/cece2017/14611.

S. G. C. G and B. Sumathi, “Grid Search Tuning of Hyperparameters in Random Forest Classifier for Customer Feedback Sentiment Prediction,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 9, 2020, doi: 10.14569/IJACSA.2020.0110920.

V. Shalamov, V. Efimova, and A. Filchenkov, “Faster Hyperparameter Optimization via Finding Minimal Regions in Random Forest Regressor,” Procedia Comput. Sci., vol. 212, pp. 378–386, 2022, doi: 10.1016/j.procs.2022.11.022.

J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for Hyper-Parameter Optimization,” in Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2011. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf

M. AKTURK, “Diabetes Dataset,” 2020. https://www.kaggle.com/datasets/mathchi/diabetes-data-set (accessed May 19, 2023).

G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” in 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), IEEE, Mar. 2019, pp. 1–5. doi: 10.1109/I2CT45611.2019.9033691.

T. T. Joy, S. Rana, S. Gupta, and S. Venkatesh, “Hyperparameter tuning for big data using Bayesian optimisation,” in 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, Dec. 2016, pp. 2574–2579. doi: 10.1109/ICPR.2016.7900023.

J. You, S. A. S. van der Klein, E. Lou, and M. J. Zuidhof, “Application of random forest classification to predict daily oviposition events in broiler breeders fed by precision feeding system,” Comput. Electron. Agric., vol. 175, p. 105526, Aug. 2020, doi: 10.1016/j.compag.2020.105526.

L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

J. Jumanto, M. A. Muslim, Y. Dasril, and T. Mustaqim, “Accuracy of Malaysia Public Response to Economic Factors During the Covid-19 Pandemic Using Vader and Random Forest,” J. Inf. Syst. Explor. Res., vol. 1, no. 1, pp. 49–70, Dec. 2022, doi: 10.52465/joiser.v1i1.104.

Q. Li and G. Clifford, “Signal Processing: False Alarm Reduction,” in Secondary Analysis of Electronic Health Records, 2016, pp. 391–403. doi: 10.1007/978-3-319-43742-2_27.

J. Wu, “Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization,” Journal of Electronic Science and Technology, vol. 17, no. 20190104. p. 26, 2019. doi: 10.11989/JEST.1674-862X.80904120.

Y. Chen et al., “Bayesian optimization based random forest and extreme gradient boosting for the pavement density prediction in GPR detection,” Constr. Build. Mater., vol. 387, p. 131564, Jul. 2023, doi: 10.1016/j.conbuildmat.2023.131564.

Abstract viewed = 256 times

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)