A predictive machine learning framework for diabetes

Yıl 2024, Cilt: 8 Sayı: 3, 583 – 592, 28.07.2024

https://doi.org/10.31127/tuje.1434305

Öz

Diabetes, a non-communicable disease, is associated with a condition indicative of too much glucose in the bloodstream. In the year 2022, it was estimated that about 422 million were living with the disease globally. The impact of diabetes on the world economy was estimated at $ 1.31 trillion in the year 2015 and implicated in the death of 5 million adults between the ages of 20 and 79 years globally. If left untreated for an extended time, could result in a host of other health complications. The need for predictive models to supplement the diagnostic process and aid the early detection of diabetes is therefore important. The current study is an effort geared toward developing a machine learning framework for the prediction of diabetes, expected to aid medical practitioners in the early detection of the disease. The dataset used in this investigation was sourced from the Kaggle database. The dataset consists of 100,000 entries, with 8,500 diabetics and 91,500 non-diabetics, indicating an imbalanced dataset. The dataset was modified to achieve a more balanced dataset consisting of 8,500 entries each for the diabetic and non-diabetic classes. Gradient Boosting classifier (GBC), Adaptive Boosting classifier (ADA), and Light Gradient Boosting Machine (LGBM) were the best three performing classifiers after comparing fifteen classifiers. The proposed framework is a stack model consisting of GBC, ADA, and LGBM. The ADA classifier was utilized as the meta-model. This model achieved an average accuracy, area under the curve (AUC), recall, precision, and f1-score of 91.12 ± 0.75 %, 97.83 ± 0.29 %, 92.03 ± 1.55 %, 90.40 ± 1.01 %, and 91.12 ± 0.77 %, respectively. The selling point of the proposed framework is the high recall of 92.03 ± 1.55 %, indicating that the model is sensitive to both the diabetic and the non-diabetic classes.

Anahtar Kelimeler

Classification, Diabetes, Prediction, Accuracy, Recall

Etik Beyan

Not Applicable

Destekleyen Kurum

Obafemi Awolowo University

Teşekkür

Thanks

Kaynakça

  • WHO. (2023). Diabetes, Diabetes Report. https://www.who.int/health-topics/diabetes#tab=tab_1
  • IDF (2021). Facts & figures. https://idf.org/about-diabetes/diabetes-facts-figures/
  • Woldaregay, A. Z., Årsand, E., Botsis, T., Albers, D., Mamykina, L., & Hartvigsen, G. (2019). Data-driven blood glucose pattern classification and anomalies detection: machine-learning applications in type 1 diabetes. Journal of medical Internet research, 21(5), e11030. https://doi.org/10.2196/11030
  • Sabitha, E., & Durgadevi, M. (2022). Improving the diabetes Diagnosis prediction rate using data preprocessing, data augmentation and recursive feature elimination method. International Journal of Advanced Computer Science and Applications, 13(9), 921-930. https://doi.org/10.14569/IJACSA.2022.01309107
  • Choubey, S., Agrahari, S., Shaw, A., Dhar, S., Sarma, R. R., Singh, S. K., Das, P., & Saha, B. (2023). Diabetes Prediction Using ML. International Journal for Research in Applied Science and Engineering Technology, 11(6), 4209-4212. https://doi.org/10.22214/ijraset.2023.54415
  • Marcovecchio, M. L. (2017). Complications of acute and chronic hyperglycemia. US Endocrinol, 13(1), 17-21. https://doi.org/10.17925/USE.2017.13.01.17
  • El_Jerjawi, N. S., & Abu-Naser, S. S. (2018). Diabetes prediction using artificial neural network. International Journal of Advanced Science and Technology, 121, 54-64. http://dx.doi.org/10.14257/ijast.2018.121.05
  • Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516-76531. https://doi.org/10.1109/ACCESS.2020.2989857
  • Temurtas, H., Yumusak, N., & Temurtas, F. (2009). A comparative study on diabetes disease diagnosis using neural networks. Expert Systems with Applications, 36(4), 8610-8615. https://doi.org/10.1016/j.eswa.2008.10.032
  • Bashir, M., Naem, E., Taha, F., Konje, J. C., & Abou-Samra, A. B. (2019). Outcomes of type 1 diabetes mellitus in pregnancy; effect of excessive gestational weight gain and hyperglycaemia on fetal growth. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 13(1), 84-88. https://doi.org/10.1016/j.dsx.2018.08.030
  • Hammer, M., Storey, S., Hershey, D. S., Brady, V. J., Davis, E., Mandolfo, N., Bryant, A. L., & Olausson, J. (2019). Hyperglycemia and Cancer: A State-of-the-Science Review. Oncology Nursing Forum, 46(4), 459-472. https://doi.org/10.1188/19.ONF.459-472
  • Storey, S., Von Ah, D., & Hammer, M. (2017). Measurement of hyperglycemia and impact on the health outcomes in people with cancer: challenges and opportunities. Oncology Nursing Forum, 44(4), E141. https://doi.org/10.1188/17.ONF.E141-E151
  • Griffin, S. J., Little, P. S., Hales, C. N., Kinmonth, A. L., & Wareham, N. J. (2000). Diabetes risk score: towards earlier detection of type 2 diabetes in general practice. Diabetes/metabolism Research and Reviews, 16(3), 164-171. https://doi.org/10.1002/1520-7560(200005/06)16:3<164::AID-DMRR103>3.0.CO;2-R
  • Park, P. J., Griffin, S. J., Sargeant, L., & Wareham, N. J. (2002). The performance of a risk score in predicting undiagnosed hyperglycemia. Diabetes Care, 25(6), 984-988. https://doi.org/10.2337/diacare.25.6.984
  • Lindstrom, J., & Tuomilehto, J. (2003). The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care, 26(3), 725-731. https://doi.org/10.2337/diacare.26.3.725
  • Heikes, K. E., Eddy, D. M., Arondekar, B., & Schlessinger, L. (2008). Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care, 31(5), 1040-1045. https://doi.org/10.2337/dc07-1150
  • Stern, M. P., Williams, K., & Haffner, S. M. (2002). Identification of persons at high risk for type 2 diabetes mellitus: do we need the oral glucose tolerance test?. Annals of Internal Medicine, 136(8), 575-581. https://doi.org/10.7326/0003-4819-136-8-200204160-00006
  • Kodama, S., Fujihara, K., Horikawa, C., Kitazawa, M., Iwanaga, M., Kato, K., … & Sone, H. (2022). Predictive ability of current machine learning algorithms for type 2 diabetes mellitus: A meta‐analysis. Journal of Diabetes Investigation, 13(5), 900-908. https://doi.org/10.1111/jdi.13736
  • Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104-116. https://doi.org/10.1016/j.csbj.2016.12.005
  • Nai-Arun, N., & Moungmai, R. (2015). Comparison of classifiers for the risk of diabetes prediction. Procedia Computer Science, 69, 132-142. https://doi.org/10.1016/j.procs.2015.10.014
  • Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
  • Singh, A., Halgamuge, M. N., & Lakshmiganthan, R. (2017). Impact of different data types on classifier performance of random forest, naive bayes, and k-nearest neighbors algorithms. International Journal of Advanced Computer Science and Applications, 8(12), 1-10.
  • Tejedor, M., Woldaregay, A. Z., & Godtliebsen, F. (2020). Reinforcement learning application in diabetes blood glucose control: A systematic review. https://doi.org/10.1016/j.artmed.2020.101836
  • Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89-109. https://doi.org/10.1016/S0933-3657(01)00077-X
  • Asfaw, T. A. (2019). Prediction of diabetes mellitus using machine learning techniques. International Journal of Computer Engineering and Technology, 10(4), 145-148. https://doi.org/10.34218/ijcet.10.4.2019.004
  • Yu, W., Liu, T., Valdez, R., Gwinn, M., & Khoury, M. J. (2010). Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Medical Informatics and Decision Making, 10, 1-7. https://doi.org/10.1186/1472-6947-10-16
  • MacMahon, H., Naidich, D. P., Goo, J. M., Lee, K. S., Leung, A. N., Mayo, J. R., … & Bankier, A. A. (2017). Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology, 284(1), 228-243. https://doi.org/10.1148/radiol.2017161659
  • Maniruzzaman, M., Rahman, M. J., Al-MehediHasan, M., Suri, H. S., Abedin, M. M., El-Baz, A., & Suri, J. S. (2018). Accurate diabetes risk stratification using machine learning: role of missing value and outliers. Journal of Medical Systems, 42, 92. https://doi.org/10.1007/s10916-018-0940-7
  • Ahuja, R., Sharma, S. C., & Ali, M. (2019). A diabetic disease prediction model based on classification algorithms. Annals of Emerging Technologies in Computing (AETiC), 3(3), 44-52. https://doi.org/10.33166/AETiC.2019.03.005
  • Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine learning based diabetes classification and prediction for healthcare applications. Journal of Healthcare Engineering, 2021(1), 9930985. https://doi.org/10.1155/2021/9930985
  • Roy, K., Ahmad, M., Waqar, K., Priyaah, K., Nebhen, J., Alshamrani, S. S., … & Ali, I. (2021). An enhanced machine learning framework for type 2 diabetes classification using imbalanced data with missing values. Complexity, 2021(1), 9953314. https://doi.org/10.1155/2021/9953314
  • Muhammad, L. J., Algehyne, E. A., & Usman, S. S. (2020). Predictive supervised machine learning models for diabetes mellitus. SN Computer Science, 1(5), 240. https://doi.org/10.1007/s42979-020-00250-8
  • Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocrine Disorders, 19, 1-9. https://doi.org/10.1186/s12902-019-0436-6
  • Abnoosian, K., Farnoosh, R., & Behzadi, M. H. (2023). Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics, 24(1), 337. https://doi.org/10.1186/s12859-023-05465-z
  • Mustafa, M. (2023). A Comprehensive Dataset for Predicting Diabetes with Medical & Demographic Data. https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset
  • Morris, A., & Misra, H. (2002). Confusion matrix based posterior probabilities correction.
  • Allen, G. D., & Goldsby, D. (2014). Confusion theory and assessment. International Journal of Innovative Science, Engineering & Technology, 1(10), 436-443.
  • Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1), 168-192. https://doi.org/10.1016/j.aci.2018.08.003

Yıl 2024, Cilt: 8 Sayı: 3, 583 – 592, 28.07.2024

https://doi.org/10.31127/tuje.1434305

Öz

Kaynakça

  • WHO. (2023). Diabetes, Diabetes Report. https://www.who.int/health-topics/diabetes#tab=tab_1
  • IDF (2021). Facts & figures. https://idf.org/about-diabetes/diabetes-facts-figures/
  • Woldaregay, A. Z., Årsand, E., Botsis, T., Albers, D., Mamykina, L., & Hartvigsen, G. (2019). Data-driven blood glucose pattern classification and anomalies detection: machine-learning applications in type 1 diabetes. Journal of medical Internet research, 21(5), e11030. https://doi.org/10.2196/11030
  • Sabitha, E., & Durgadevi, M. (2022). Improving the diabetes Diagnosis prediction rate using data preprocessing, data augmentation and recursive feature elimination method. International Journal of Advanced Computer Science and Applications, 13(9), 921-930. https://doi.org/10.14569/IJACSA.2022.01309107
  • Choubey, S., Agrahari, S., Shaw, A., Dhar, S., Sarma, R. R., Singh, S. K., Das, P., & Saha, B. (2023). Diabetes Prediction Using ML. International Journal for Research in Applied Science and Engineering Technology, 11(6), 4209-4212. https://doi.org/10.22214/ijraset.2023.54415
  • Marcovecchio, M. L. (2017). Complications of acute and chronic hyperglycemia. US Endocrinol, 13(1), 17-21. https://doi.org/10.17925/USE.2017.13.01.17
  • El_Jerjawi, N. S., & Abu-Naser, S. S. (2018). Diabetes prediction using artificial neural network. International Journal of Advanced Science and Technology, 121, 54-64. http://dx.doi.org/10.14257/ijast.2018.121.05
  • Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516-76531. https://doi.org/10.1109/ACCESS.2020.2989857
  • Temurtas, H., Yumusak, N., & Temurtas, F. (2009). A comparative study on diabetes disease diagnosis using neural networks. Expert Systems with Applications, 36(4), 8610-8615. https://doi.org/10.1016/j.eswa.2008.10.032
  • Bashir, M., Naem, E., Taha, F., Konje, J. C., & Abou-Samra, A. B. (2019). Outcomes of type 1 diabetes mellitus in pregnancy; effect of excessive gestational weight gain and hyperglycaemia on fetal growth. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 13(1), 84-88. https://doi.org/10.1016/j.dsx.2018.08.030
  • Hammer, M., Storey, S., Hershey, D. S., Brady, V. J., Davis, E., Mandolfo, N., Bryant, A. L., & Olausson, J. (2019). Hyperglycemia and Cancer: A State-of-the-Science Review. Oncology Nursing Forum, 46(4), 459-472. https://doi.org/10.1188/19.ONF.459-472
  • Storey, S., Von Ah, D., & Hammer, M. (2017). Measurement of hyperglycemia and impact on the health outcomes in people with cancer: challenges and opportunities. Oncology Nursing Forum, 44(4), E141. https://doi.org/10.1188/17.ONF.E141-E151
  • Griffin, S. J., Little, P. S., Hales, C. N., Kinmonth, A. L., & Wareham, N. J. (2000). Diabetes risk score: towards earlier detection of type 2 diabetes in general practice. Diabetes/metabolism Research and Reviews, 16(3), 164-171. https://doi.org/10.1002/1520-7560(200005/06)16:3<164::AID-DMRR103>3.0.CO;2-R
  • Park, P. J., Griffin, S. J., Sargeant, L., & Wareham, N. J. (2002). The performance of a risk score in predicting undiagnosed hyperglycemia. Diabetes Care, 25(6), 984-988. https://doi.org/10.2337/diacare.25.6.984
  • Lindstrom, J., & Tuomilehto, J. (2003). The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care, 26(3), 725-731. https://doi.org/10.2337/diacare.26.3.725
  • Heikes, K. E., Eddy, D. M., Arondekar, B., & Schlessinger, L. (2008). Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care, 31(5), 1040-1045. https://doi.org/10.2337/dc07-1150
  • Stern, M. P., Williams, K., & Haffner, S. M. (2002). Identification of persons at high risk for type 2 diabetes mellitus: do we need the oral glucose tolerance test?. Annals of Internal Medicine, 136(8), 575-581. https://doi.org/10.7326/0003-4819-136-8-200204160-00006
  • Kodama, S., Fujihara, K., Horikawa, C., Kitazawa, M., Iwanaga, M., Kato, K., … & Sone, H. (2022). Predictive ability of current machine learning algorithms for type 2 diabetes mellitus: A meta‐analysis. Journal of Diabetes Investigation, 13(5), 900-908. https://doi.org/10.1111/jdi.13736
  • Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104-116. https://doi.org/10.1016/j.csbj.2016.12.005
  • Nai-Arun, N., & Moungmai, R. (2015). Comparison of classifiers for the risk of diabetes prediction. Procedia Computer Science, 69, 132-142. https://doi.org/10.1016/j.procs.2015.10.014
  • Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
  • Singh, A., Halgamuge, M. N., & Lakshmiganthan, R. (2017). Impact of different data types on classifier performance of random forest, naive bayes, and k-nearest neighbors algorithms. International Journal of Advanced Computer Science and Applications, 8(12), 1-10.
  • Tejedor, M., Woldaregay, A. Z., & Godtliebsen, F. (2020). Reinforcement learning application in diabetes blood glucose control: A systematic review. https://doi.org/10.1016/j.artmed.2020.101836
  • Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89-109. https://doi.org/10.1016/S0933-3657(01)00077-X
  • Asfaw, T. A. (2019). Prediction of diabetes mellitus using machine learning techniques. International Journal of Computer Engineering and Technology, 10(4), 145-148. https://doi.org/10.34218/ijcet.10.4.2019.004
  • Yu, W., Liu, T., Valdez, R., Gwinn, M., & Khoury, M. J. (2010). Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Medical Informatics and Decision Making, 10, 1-7. https://doi.org/10.1186/1472-6947-10-16
  • MacMahon, H., Naidich, D. P., Goo, J. M., Lee, K. S., Leung, A. N., Mayo, J. R., … & Bankier, A. A. (2017). Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology, 284(1), 228-243. https://doi.org/10.1148/radiol.2017161659
  • Maniruzzaman, M., Rahman, M. J., Al-MehediHasan, M., Suri, H. S., Abedin, M. M., El-Baz, A., & Suri, J. S. (2018). Accurate diabetes risk stratification using machine learning: role of missing value and outliers. Journal of Medical Systems, 42, 92. https://doi.org/10.1007/s10916-018-0940-7
  • Ahuja, R., Sharma, S. C., & Ali, M. (2019). A diabetic disease prediction model based on classification algorithms. Annals of Emerging Technologies in Computing (AETiC), 3(3), 44-52. https://doi.org/10.33166/AETiC.2019.03.005
  • Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine learning based diabetes classification and prediction for healthcare applications. Journal of Healthcare Engineering, 2021(1), 9930985. https://doi.org/10.1155/2021/9930985
  • Roy, K., Ahmad, M., Waqar, K., Priyaah, K., Nebhen, J., Alshamrani, S. S., … & Ali, I. (2021). An enhanced machine learning framework for type 2 diabetes classification using imbalanced data with missing values. Complexity, 2021(1), 9953314. https://doi.org/10.1155/2021/9953314
  • Muhammad, L. J., Algehyne, E. A., & Usman, S. S. (2020). Predictive supervised machine learning models for diabetes mellitus. SN Computer Science, 1(5), 240. https://doi.org/10.1007/s42979-020-00250-8
  • Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocrine Disorders, 19, 1-9. https://doi.org/10.1186/s12902-019-0436-6
  • Abnoosian, K., Farnoosh, R., & Behzadi, M. H. (2023). Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics, 24(1), 337. https://doi.org/10.1186/s12859-023-05465-z
  • Mustafa, M. (2023). A Comprehensive Dataset for Predicting Diabetes with Medical & Demographic Data. https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset
  • Morris, A., & Misra, H. (2002). Confusion matrix based posterior probabilities correction.
  • Allen, G. D., & Goldsby, D. (2014). Confusion theory and assessment. International Journal of Innovative Science, Engineering & Technology, 1(10), 436-443.
  • Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1), 168-192. https://doi.org/10.1016/j.aci.2018.08.003

Toplam 38 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Veri İletişimleri
BölümArticles
Yazarlar

Danjuma Maza Obafemi Awolowo University, Ile-Ife, Nigeria 0000-0002-7079-2301 Nigeria

Joshua Olufemi Ojo Obafemi Awolowo University, Ile-Ife, Nigeria 0009-0002-5977-9613 Nigeria

Grace Olubumi Akinlade Obafemi Awolowo University, Ile-Ife, Nigeria 0000-0002-0974-5629 Nigeria

Erken Görünüm Tarihi15 Temmuz 2024
Yayımlanma Tarihi28 Temmuz 2024
Gönderilme Tarihi9 Şubat 2024
Kabul Tarihi17 Nisan 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 8 Sayı: 3

Kaynak Göster

APAMaza, D., Ojo, J. O., & Akinlade, G. O. (2024). A predictive machine learning framework for diabetes. Turkish Journal of Engineering, 8(3), 583-592. https://doi.org/10.31127/tuje.1434305
AMAMaza D, Ojo JO, Akinlade GO. A predictive machine learning framework for diabetes. TUJE. Temmuz 2024;8(3):583-592. doi:10.31127/tuje.1434305
ChicagoMaza, Danjuma, Joshua Olufemi Ojo, ve Grace Olubumi Akinlade. “A Predictive Machine Learning Framework for Diabetes”. Turkish Journal of Engineering 8, sy. 3 (Temmuz 2024): 583-92. https://doi.org/10.31127/tuje.1434305.
EndNoteMaza D, Ojo JO, Akinlade GO (01 Temmuz 2024) A predictive machine learning framework for diabetes. Turkish Journal of Engineering 8 3 583–592.
IEEED. Maza, J. O. Ojo, ve G. O. Akinlade, “A predictive machine learning framework for diabetes”, TUJE, c. 8, sy. 3, ss. 583–592, 2024, doi: 10.31127/tuje.1434305.
ISNADMaza, Danjuma vd. “A Predictive Machine Learning Framework for Diabetes”. Turkish Journal of Engineering 8/3 (Temmuz 2024), 583-592. https://doi.org/10.31127/tuje.1434305.
JAMAMaza D, Ojo JO, Akinlade GO. A predictive machine learning framework for diabetes. TUJE. 2024;8:583–592.
MLAMaza, Danjuma vd. “A Predictive Machine Learning Framework for Diabetes”. Turkish Journal of Engineering, c. 8, sy. 3, 2024, ss. 583-92, doi:10.31127/tuje.1434305.
VancouverMaza D, Ojo JO, Akinlade GO. A predictive machine learning framework for diabetes. TUJE. 2024;8(3):583-92.

Download or read online: Click here