Using Classification and K‑means Methods to Predict Breast Cancer Recurrence in Gene Expression Data

Mohammadreza, Sehhati; Mohammad Amin, Tabatabaiefar; Ali, Haji Gholami; Mohammad, Sattari

Using Classification and K‑means Methods to Predict Breast Cancer Recurrence in Gene Expression Data

عنوان مقاله: Using Classification and K‑means Methods to Predict Breast Cancer Recurrence in Gene Expression Data
شناسه ملی مقاله: JR_JMSI-12-2_004
منتشر شده در در سال 1401

مشخصات نویسندگان مقاله:

Mohammadreza Sehhati - Medical Image and Signal Processing Research Center, Department of Bioinformatics,School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
Mohammad Amin Tabatabaiefar - Department of Genetics and Molecular Biology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran, Pediatric Inherited Diseases Research Center, Research Institute for Primordial Prevention of Non Communicable Disease, Isfahan Univer
Ali Haji Gholami - Department of Hematology-Oncology, Isfahan University of Medical Sciences, Isfahan, Iran
Mohammad Sattari - Health Information Technology Research Center, Isfahan University of Medical Sciences, Isfahan, Iran

خلاصه مقاله:

Background: Breast cancer is a type of cancer that starts in the breast tissue and affects about ۱۰% of women at different stages of their lives. In this study, we applied a new method to predict recurrence in biological networks made from gene expression data. Method: The method includes the steps such as data collection, clustering, determining differentiating genes, and classification. The eight techniques consist of random forest, support vector machine and neural network, randomforest + k‑means, hidden markov model, joint mutual information, neural network + k‑means and suportvector machine + k‑menas were implemented on ۱۲۱۷۲ genes and ۲۰۰ samples. Results: Thirty genes were considered as differentiating genes which used for the classification. The results showed that random forest + k‑means get better performance than other techniques. The two techniques including neural network + k‑means and random forest + k‑means performed better than other techniques in identifying high risk cases. Conclusion: Thirty of ۱۲,۱۷۲ genes are considered for classification that the use of clustering has improved the classification techniques performance.

کلمات کلیدی:

Classification, gene, K‑means

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1700815/