Empirical Evaluation of Different Algorithms to Assess The Probability of Diabetes in its Early Stages

Download

Volume 5 Issue 2 2024

Author(s):

Rania Ashraf* Liaquat University of Medical and Health Sciences, Jamshoro, Pakistan , raniya844@yahoo.com

Roz Nisha Liaquat University of Medical and Health Sciences, Jamshoro,Pakistan, rosenisha734@gmail.com

Fahad Shamim Liaquat University of Medical and Health Sciences, Jamshoro, Pakistan, fahad.shamim@lumhs.edu.pk

Shahzad Nasim The Begum Nusrat Bhutto Women University, Sukkur, Pakistan, shahzadnasim@live.com

Sarmad Shams Liaquat University of Medical and Health Sciences, Jamshoro, Pakistan, sarmad.shams@lumhs.edu.pk

Abstract High blood sugar is a symptom of metabolic disorder, diabetes, an incurable and fatal disease. The primary cause of the disease is a hormone imbalance, which causes insulin impaction. Insulin is the specific hormone that regulates the sugar intake from the blood. The disease results in the body's inability to either make sufficient insulin or inadequate use of the produced insulin. Almost 1.6 million population die yearly due to this deadly disease. Early diagnosis can help reduce malignancy and enhance life expectancy. Since the medical data of diabetic individuals display a recognizable pattern, diabetes can be predicted in its early stages using machine learning algorithms. This is another way to get an early diagnosis without a glucose screening test. In this proposed paper, the prediction of early-stage diabetes is made by machine learning. The study individually experimented with eight machine learning algorithms over a dataset of 521 instances with 17 features. The performance assessment of every model is evaluated not only with accuracy metrics and confusion matrix, but AUC, F-score, recall, precision, TPR, & FPR are also observed to improve the algorithms' performance. The results of the applied techniques are validated using 5-fold cross-validation. AdaBoost classifier measures the lowest accuracy score with 82.89% accuracy. In comparison, the best score is measured by a Random Forest of 93.4. Similarly, the highest rating, calculated using Support Vector Machine, is 93.4 as well. Still, SVM exhibits a higher score of F-score and recall than RF, making it the best fit classifier for the study conducted. The rest of the classifiers have also performed well-having an accuracy of more than 80%. The findings indicate that the SVM Classifier is the most effective machine learning technique against binary-based classification datasets and can be utilized in predicting early-stage diabetes.
Keywords Diabetes, Decision Tree, Naïve Bayes, Random Forest.
Year 2024
Volume 5
Issue 2
Type Research paper, manuscript, article
Recognized by Higher Education Commission of Pakistan, HEC
Category
Journal Name ILMA Journal of Technology & Software Management
Publisher Name ILMA University
Jel Classification --
DOI -
ISSN no (E, Electronic) 2790-590X
ISSN no (P, Print) 2709-2240
Country Pakistan
City Karachi
Institution Type University
Journal Type Open Access
Manuscript Processing Blind Peer Reviewed
Format PDF
Paper Link https://ijtsm.ilmauniversity.edu.pk/arc/Vol5/i2/pdf1.pdf
Page 1-10