Performance evaluation of different machine learning classification models on expression profiles of tumor educated platelets data
Publish place: The first international conference and the tenth national bioinformatics conference of Iran
Publish Year: 1400
نوع سند: مقاله کنفرانسی
زبان: English
View: 211
نسخه کامل این Paper ارائه نشده است و در دسترس نمی باشد
- Certificate
- من نویسنده این مقاله هستم
این Paper در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS10_032
تاریخ نمایه سازی: 5 تیر 1401
Abstract:
Since liquid biopsy is less invasive than tissue biopsy, studies on liquid biopsy biomarkers for the earlydetection of cancer are taken into consideration. Expression profiles of tumor-educated platelets (TEP) inliquid biopsy can be used as one of the biomarkers. Using classification machine learning models, given thefeature space derived from the expression data of TEPs, has given us the ability to predict data categories.Here, the aim is performance evaluation of different classification models for diagnosis of cancer-based onexpression profiles of platelets. First, expression profiles of TEPs in ۲۳۰ patients with breast, liver, colorectal,brain, pancreatic, and lung cancers in addition to profiles of ۵۵ healthy individuals were downloaded fromthe GEO database (GSE۶۸۰۸۶). Thereafter, the data were normalized using the edgeR package (R softwareversion ۴.۱.۰) and ۲۰۰۰ genes with the highest variance were selected. Then, different types of classificationmodels namely SVM, LDA, logistic regression, boosting, classification tree, and random forest, wereevaluated on the feature selected data in ۱۰-fold cross-validation. In addition, the variable importance ofselected genes was obtained using polynomial SVM. Then, pathway enrichment analysis was performedusing H, C۶, and C۷ gene sets of MSigDB database using preranked GSEA method. The results showed thatthe polynomial SVM has the highest performance on the validation set (accuracy ~ ۹۵%, mean AUC ~ ۰.۹۹۴,sd AUC ~ ۰.۰۰۹۳). Also, the linear SVM model had the second-best performance on validation set (meanAUC ~ ۰.۹۹۱۷). In pathway enrichment analysis ۱۰ immunological pathways were enriched in cancersamples compared to healthy donors. Overall, the results showed that polynomial SVM can be a model withgood performance for classifying TEP data. All in all, the results of this study indicate that the expressionprofile of TEPs can be considered as a candidate biomarker in liquid biopsy.
Keywords:
Authors
Sajedeh Bahonar
Department of Bioinformatics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Fahimeh Palizban
Department of Bioinformatics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Hesam Montazeri
Department of Bioinformatics, Institute Biochemistry and Biophysics, University of Tehran, Tehran, Iran