Prediction of Diabetes Using Machine Learning Techniques
Keywords:
Diabetes Mellitus, Consistency-based Feature Selection, Correlation-based Feature Selection, Machine learning, predictionAbstract
Diabetes Mellitus is a metabolic disorder that occurs when the body cannot produce sufficient insulin. Its prevalence has seen a significant surge worldwide, necessitating improved methods for early and accurate prediction. Machine learning techniques have proven to be effective in the prediction of diabetes. This study harnesses the capabilities of machine learning (ML) techniques to predict diabetes. To improve the learning efficiency and prediction performance, feature selection techniques were employed in the study. This process selects only optimal features that contributes the most to prediction variables from entire feature set. In this study, three machine learning algorithms (Support Vector Machine, random forest and decision tree) were applied on Pima Indians diabetes dataset. Consistency and correlation-based feature selection techniques were applied on the dataset to improve prediction performance and reduce dimensionality. The results from the experiments show that of all the three models that were used, there was a significant improvement in the performance of the models when feature selection techniques were used. For instance, Support Vector Machine had an accuracy of 81.74% before feature selection as opposed to the accuracy of 79.13% before its application. Random Forest also had an accuracy of 80.08% using Consistency feature selection method as opposed to an accuracy of 77.78% before its application.