Filter-Based Feature Selection for Type II Diabetes Prediction

Publish Year: 1404
نوع سند: مقاله ژورنالی
زبان: English
View: 88

This Paper With 8 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_CCS-6-3_003

تاریخ نمایه سازی: 26 مهر 1404

Abstract:

Aims: Type ۲ diabetes mellitus is a major global health challenge, and early prediction is key to prevention. This study compared three filter-based feature selection methods (ANOVA (f-classif), mutual information, and Chi-square test) for identifying predictors of type ۲ diabetes and assessed their impact on the performance of logistic regression. Instrument & Methods: This retrospective study analyzed data from ۳,۲۰۳ adults aged ۳۵-۷۰ years from Yasuj, Kohgiluyeh and Boyer-Ahmad Province, Iran, gathered between ۲۰۲۰ and ۲۰۲۲ in the Dena-PERSIAN cohort, including ۴۰۲ (۱۲.۵۵%) individuals with type ۲ diabetes. Preprocessing included imputation, normalization, and class balancing using the synthetic minority oversampling technique. Each method ranked predictors, and the top five features were used to train logistic regression models. Model performance was evaluated on a test set using accuracy, precision, recall, and F۱-score. Findings: Fasting blood sugar and age consistently emerged as dominant predictors across methods. ANOVA highlighted metabolic factors (triglycerides, fatty liver, and kidney stones), while mutual information emphasized high-density lipoprotein cholesterol and lifestyle behaviors, and the Chi-square test prioritized categorical comorbidities. Logistic regression achieved the strongest performance with ANOVA and mutual information (accuracy and F۱=۰.۸۴), slightly outperforming the Chi-square test (accuracy and F۱=۰.۸۲). Conclusion: ANOVA and mutual information produced clinically meaningful and stable feature subsets for type ۲ diabetes prediction, centered on fasting glucose, age, and fatty liver.

Authors

M. Ghaderzadeh

Department of Medical Informatics, Boukan School of Medical Sciences, Urmia University of Medical Sciences, Urmia, Iran

C. Salehnasab

Social Determinants of Health Research Center, Yasuj University of Medical Sciences, Yasuj, Iran