Authors:
Mohammad Raquibul Hossain,Md. Jamal Hossain,Md. Mijanoor Rahman,Mohammad Manjur Alam,DOI NO:
https://doi.org/10.26782/jmcms.2025.01.00007Keywords:
Machine Learning Methods,Diabetes Prediction,Logistic Regression,Classification,Random Forest,XGBoost,Abstract
This paper focused on predicting diabetes disease using machine learning models which is a very active and highly important area of research. Six machine learning methods and three diabetes datasets were experimented with to investigate model performances. The methods are logistic regression, k-Nearest Neighbour, Gaussian Naïve Bayes, Decision Tree, Random Forest, and XGBoost. The datasets are Pima Indian, the Frankfurt Hospital dataset, and the combined dataset where all datasets have 08 (eight) feature variables and 01 (one) target variable. Train-test data split ratio can make a significant difference in model performance. Hence, two different split ratios 50-50 and 90-10 were experimented. Model performances were evaluated using four performance metrics which are precision, recall, F1-score, and accuracy. Random Forest and XGBoost were found to be highly efficient and best-performing among all the methods based on all performance metrics, all datasets, and both train-test split ratios. They performed comparatively better with the combined dataset which involved 2768 instances indicating the importance of a large dataset for better results. Also, the 90-10 train-test split ratio produced comparatively improved results than the 50-50 split ratio for all the datasets and even for almost all models.Refference:
I. Agliata, A., Giordano, D., Bardozzo, F., Bottiglieri, S., Facchiano, A., & Tagliaferri, R. (2023). Machine learning as a support for the diagnosis of type 2 diabetes. International Journal of Molecular Sciences, 24(7), 6775.
II. Aguilera-Venegas, G., López-Molina, A., Rojo-Martínez, G., & Galán-García, J. L. (2023). Comparing and tuning machine learning algorithms to predict type 2 diabetes mellitus. Journal of Computational and Applied Mathematics, 427, 115115.
III. Alenizi, A. S., & Al-karawi, K. A. (2023). Machine learning approach for diabetes prediction. International Congress on Information and Communication Technology, 745–756. Springer.
IV. Alzyoud, M., Alazaidah, R., Aljaidi, M., Samara, G., Qasem, M., Khalid, M., & Al-Shanableh, N. (2024). Diagnosing diabetes mellitus using machine learning techniques. International Journal of Data and Network Science, 8(1), 179–188.
V. Barik, S., Mohanty, S., Mohanty, S., & Singh, D. (2021). Analysis of prediction accuracy of diabetes using classifier and hybrid machine learning techniques. Intelligent and Cloud Computing: Proceedings of ICICC 2019, Volume 2, 399–409. Springer.
VI. Chang, V., Bailey, J., Xu, Q. A., & Sun, Z. (2023). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 35(22), 16157–16173.
VII. Ebrahim, O. A., & Derbew, G. (2023). Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021. Scientific Reports, 13(1), 7779.
VIII. Febrian, M. E., Ferdinan, F. X., Sendani, G. P., Suryanigrum, K. M., & Yunanda, R. (2023). Diabetes prediction using supervised machine learning. Procedia Computer Science, 216, 21–30.
IX. Gündoğdu, S. (2023). Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique. Multimedia Tools and Applications, 82(22), 34163–34181.
X. Hossain, M. R., & Ismail, M. T. (2020). Empirical mode decomposition based on theta method for forecasting daily stock price. Journal of Information and Communication Technology, 19(4), 533–558.
XI. Hossain, M. R., Ismail, M. T., & Hossain, M. J. (2022). Enhancing Stock Price Prediction Using Empirical Mode Decomposition, Rolling Forecast and Combining Statistical Methods. International Journal of Computing and Digital Systems, 12(1), 1343–1356.
XII. Ismail, L., Materwala, H., Tayefi, M., Ngo, P., & Karduck, A. P. (2022). Type 2 diabetes with artificial intelligence machine learning: methods and evaluation. Archives of Computational Methods in Engineering, 29(1), 313–333.
XIII. Khanam, J. J., & Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4), 432–439.
XIV. Kumar, N., Singh, P., Kumari, S., & Singh, B. K. (2023). Predicting Diabetes Using Machine Learning. 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), 1737–1742. IEEE.
XV. Modak, S. K. S., & Jha, V. K. (2024). Diabetes prediction model using machine learning techniques. Multimedia Tools and Applications, 83(13), 38523–38549.
XVI. Naz, H., & Ahuja, S. (2020). Deep learning approach for diabetes prediction using PIMA Indian dataset. Journal of Diabetes & Metabolic Disorders, 19, 391–403.
XVII. Oikonomou, E. K., & Khera, R. (2023). Machine learning in precision diabetes care and cardiovascular risk prediction. Cardiovascular Diabetology, 22(1), 259.
XVIII. Pima Indians Diabetes Database. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database. Accessed 16 Oct. 2024.
XIX. Sakib, S., Yasmin, N., Tasawar, I. K., Aziz, A., Siddique, M. A. B., & Khan, M. M. R. (2021). Performance analysis of machine learning approaches in diabetes prediction. IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC), 1–6. IEEE.
XX. Saxena, S., Mohapatra, D., Padhee, S., & Sahoo, G. K. (2023). Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms. Evolutionary Intelligence, 1–17.
XXI. Singh, Kabrambam. Type 2 Diabetes Dataset. IEEE, 17 Jan. 2024. ieee-dataport.org, https://ieee-dataport.org/documents/type-2-diabetes-dataset.
XXII. Srivastava, R., & Dwivedi, R. K. (2022). A survey on diabetes mellitus prediction using machine learning algorithms. ICT Systems and Sustainability: Proceedings of ICT4SD 2021, Volume 1, 473–480. Springer.
XXIII. Whig, P., Gupta, K., Jiwani, N., Jupalle, H., Kouser, S., & Alam, N. (2023). A novel method for diabetes classification and prediction with Pycaret. Microsystem Technologies, 29(10), 1479–1487.