Protein Secondary Structure Prediction, Feature Extraction orExploiting the Very Amino Acid Sequence?

Leila, Khalatbar; Mohammad Reza, Kangavar

Protein Secondary Structure Prediction, Feature Extraction orExploiting the Very Amino Acid Sequence?

عنوان مقاله: Protein Secondary Structure Prediction, Feature Extraction orExploiting the Very Amino Acid Sequence?
شناسه ملی مقاله: CITCONF03_545
منتشر شده در سومین کنفرانس بین المللی پژوهشهای کاربردی در مهندسی کامپیوتر و فن آوری اطلاعات در سال 1394

مشخصات نویسندگان مقاله:

Leila Khalatbar - Department of Computer Engineering, Faculty of Electrical, IT and Computer Science, Qazvin Branch, Islamic Azad
Mohammad Reza Kangavar - Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

خلاصه مقاله:

All vital processes keeping an organism alive are performed by proteins. Proteins are created through DNAsequence transcription and translation. Protein functions vary based upon the biological task they undertake. Henceprotein function prediction is one of the most crucial, beneficial and complicated tasks in bioinformatics. It leads todrug and enzyme design and disease diagnosis. This function is strongly dependant on proteins structure. As the result,the most common mean to find out proteins’ function is to predict it based upon its structure. However functionprediction directly from tertiary structure is far too challenging. Protein secondary structure prediction is anintermediate step which eases up this complication. There are two major categories of methods for protein secondarystructure prediction namely experimental methods and computational methods. The first group of methods are veryexpensive, time consuming and inapplicable on some of the proteins. As the result, computational methods attractedspecial attentions which grew with the advent of machine learning approaches. Most of these approaches made effortsto extract numerical features from protein sequence or first structure and then applied computational methods orlearning strategies capable of working with numerical data. Nevertheless it’s biologically believed that proteinsequence contains all needed information to adopt secondary and tertiary structure. Consequently in the currentresearch it’s aimed to employ the very amino acid sequence as the richest source of information to predict proteinsecondary structure. In pursuance of this end, a novel compound dissimilarity measure based on LZ complexity and ngrampatterns has been proposed which can comprehensively capture sequence similar features from different aspects.This measure is later employed in a fuzzy KNN framework. Fuzzy KNN is known for its strength in modelingcomplicated patterns and implementing irregular boundaries if an efficient dissimilarly measure accompanies it. Theexperimental results confirm the competence of the proposed method.

کلمات کلیدی:

Protein secondary structure, Machine learning, Classification, Fuzzy K-nearest Neighbor,Dissimilarity measure, LZ complexity, N-gram patterns

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/467116/