An N-Gram Descriptor to Predict Protein-Protein Interactions Considering Spatial Structure of Proteins
عنوان مقاله: An N-Gram Descriptor to Predict Protein-Protein Interactions Considering Spatial Structure of Proteins
شناسه ملی مقاله: ICET01_009
منتشر شده در کنفرانس بین المللی مهندسی و فن آوری اطلاعات در سال 1396
شناسه ملی مقاله: ICET01_009
منتشر شده در کنفرانس بین المللی مهندسی و فن آوری اطلاعات در سال 1396
مشخصات نویسندگان مقاله:
Samaneh Aghajanbaglo - School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
Ali Montajab - Faculty of Electrical and Computer Engineering, Shahid Beheshti University, Tehran, Iran
خلاصه مقاله:
Samaneh Aghajanbaglo - School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
Ali Montajab - Faculty of Electrical and Computer Engineering, Shahid Beheshti University, Tehran, Iran
Proteins which are one of the most important polymers, have been made of monomers called amino acids. The interactions between proteins are essential for biological functions of living cells. Besides experimental methods developed for detecting Protein-Protein Interactions (PPIs), several efforts have been devoted during recent years to develop computational approaches using various data resources like sequence information. However, finding an appropriate feature encoding to characterize the sequence of proteins is a major challenge in such methods. In the presented work, each protein sequence is represented by a vector using an N-Gram feature encoding method, and a Relaxed Variable Kernel Density Estimator (RVKDE) as a machine learning tool predicts the interaction between protein pairs. Then a significance calculation and solvent accessible surface of proteins were applied on feature vectors. Moreover, a property called Undirected property which leads to reduce dimensions of the vector space was introduced considering spatial structure of proteins. The results show that among N-Gram descriptors, 2-Gram(20) achieves the superiority of prediction. In addition, 2-Gram(20) with Undirected property improving F-measure of 2.5% on Human Protein Reference Dataset (HPRD).
کلمات کلیدی: Protein-Protein Interaction, N-Gram Feature Encoding, Undirected Property, Sequence Information, Solvent Accessible Surface
صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/631605/