Use of bioinformatics’ models to determine the differences between features of proteins expressed in malignant and benign breast ?cancers

M. Ebrahimi; N Shamabadi; E Ebrahimie

Use of bioinformatics’ models to determine the differences between features of proteins expressed in malignant and benign breast ?cancers

Publish place: 1st National Congress of Information Technology in Health System

Publish Year: 1388

نوع سند: مقاله کنفرانسی

زبان: English

نسخه کامل این Paper ارائه نشده است و در دسترس نمی باشد

Certificate
من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

علوم پزشکی > سرطان

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/288765

شناسه ملی سند علمی:

NCITHS01_001

تاریخ نمایه سازی: 14 شهریور 1393

Abstract:

The second leading cause of cancer death in women is the most common cancer amongwomen; breast cancer. So far many approaches have bused to analyze and detect benignand malignant forms of this cancer. It is crucial to understanding the features involvedin proteins expressed by various types of breast cancers. Herein we compared featuresof proteins expressed in malignant, benign and both cancers using various screeningtechniques (anomaly detection, feature selection), clustering methods (K-Means,TwoStep cluster), decision tree models (C&RT, CHAID, Exhaustive CHAID, QUEST,C5.0), and generalized rule induction (GRI) models to search for patterns of similarityin each group. We found in all proteins the N-terminal amino acid was Met and 57 outof 838 proteins’ features ranked as important (p > 0.05) in feature selection modeling.The number of peer groups was 2 with 1 anomalous record in each group and nochanges were found in the numbers of clusters when K-Means and TwoStep clusteringmodeling was performed on datasets with/without feature selection filtering. The depthof the trees generated by various decision tree models varied from 5 (in the Questmodel) to 2 (in the C5.0 model) branches. The performance evaluation of the decisiontree models tested here showed that C&RT was the best and the CHAID model was theworst. We did not find any significant difference in the percent of correctness,performance evaluation, and mean correctness of various decision tree models whenfeature selected datasets were used, but the number of peer groups in clustering modelswas reduced significantly (p<0.05) compared to datasets without feature selection. In alldecision tree models, the frequency of Ile - Ile was the most important feature fordecision tree rule sets and all GRI association rules (100). The importance of sequencebasedclassification and the frequency of Ile – Ile in prediction of malignant and benignbreast cancer are discussed in this paper.

Keywords:

Bioinformatics , modeling , breast cancer , malignant , benign

Authors

M. Ebrahimi

Bioinformatics Research Group, Green Research Center, Qom University, Qom, IRAN

N Shamabadi

E Ebrahimie