Improving the prediction of physical protein interaction by Balanced Random Forest interprotein residue contact predictions using sequence covariation information

Publish Year: 1400
نوع سند: مقاله کنفرانسی
زبان: English
View: 123

نسخه کامل این Paper ارائه نشده است و در دسترس نمی باشد

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

IBIS10_126

تاریخ نمایه سازی: 5 تیر 1401

Abstract:

Protein-protein interactions are essential for most cellular processes. There are a lot of protein interactionsand a large number of protein sequences with unknown interacting partners. Prediction of protein interactionfrom sequence information has always been a great challenge. Those predictions would be more challengingwhen someone is supposed to specifically detect physical but not functional protein interplays. Therefore,developing new approaches for the accurate prediction of sequence-based physical protein interactions couldbe an important advancement in computational biology. Inter-protein spatially interrelating residue positionsexhibit correlated patterns of sequence evolution in multiple sequence alignments. Those co-evolutions arewisely exploited for the prediction of physical protein interactions.It is shown that feeding norm values of whole covariation information of protein heterodimers into SupportVector Machines (SVM), could accurately predict the possibility of physical interaction of those dimers usingsequence information. In the present study, Balanced Random Forest (BRF) models were trained with thecovariations of inter-protein residues at different hypothetical interacting sites and then the models wereemployed for the prediction of possible inter-protein residue contacts. Instead of considering whole coevolutionaryinformation, those BRF predictions could take into account the covariation information of moreprobable physically interacting residues for further prediction of protein dimers at higher protein scales. BRFpredicted those more probable contacting residues as positive class and other interacting pairs of amino acidsas negative. After BRF predictions, previously computed covariation scores of negatively predicted residuepartners were zeroized, thereby the role of those pairs in the final calculation of norm values were driven out.Results of the current study indicated that feeding the updated norm values of residue-residue covariationmatrices, obtained after BRF predictions, into SVM models could significantly increase the accuracy of thefinal protein interaction predictions at the protein family level.

Authors

Sara Salmanian

Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran,Iran

Hamid Pezeshk

School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran (currently visiting Department of Mathematics and Statistics, Concordia University, Montreal, Canada)- School of Biological Sciences, Institute

Mehdi Sadeghi

National Institute of Genetic Engineering and Biotechnology, Tehran, Iran