Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem
Publish Year: 1397
Type: Journal paper
Language: English
View: 351
This Paper With 14 Page And PDF Format Ready To Download
- Certificate
- I'm the author of the paper
این Paper در بخشهای موضوعی زیر دسته بندی شده است:
Export:
Document National Code:
JR_JADM-6-2_003
Index date: 10 July 2019
Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem abstract
Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue for researchers. This study aims to develop a predictive model for 5-year survivability of breast cancer patients and discover relationships between certain predictive variables and survival. The dataset was obtained from SEER database. First, the effectiveness of two synthetic oversampling methods Borderline SMOTE and Density based Synthetic Oversampling method (DSO) is investigated to solve the class imbalance problem. Then a combination of particle swarm optimization (PSO) and Correlation-based feature selection (CFS) is used to identify most important predictive variables. Finally, in order to build a predictive model three classifiers decision tree (C4.5), Bayesian Network, and Logistic Regression are applied to the cleaned dataset. Some assessment metrics such as accuracy, sensitivity, specificity, and G-mean are used to evaluate the performance of the proposed hybrid approach. Also, the area under ROC curve (AUC) is used to evaluate performance of feature selection method. Results show that among all combinations, DSO + PSO_CFS + C4.5 presents the best efficiency in criteria of accuracy, sensitivity, G-mean and AUC with values of 94.33%, 0.930, 0.939 and 0.939, respectively.
Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem Keywords:
Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem authors
S. Miri Rostami
Faculty of computer and IT Engineering, Shiraz University of Technology, Shiraz, Iran.
M. Ahmadzadeh
Faculty of computer and IT Engineering, Shiraz University of Technology, Shiraz, Iran.