Improving the Hierarchical Classification of Protein Families and Model Interpretation with the Grad-CAM Method and Transformers

Publish Year: 1404
نوع سند: مقاله ژورنالی
زبان: English
View: 76

This Paper With 14 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_JADM-13-3_001

تاریخ نمایه سازی: 12 شهریور 1404

Abstract:

In the era of massive data, analyzing bioinformatics fields and discovering its functions are very important. The rate of sequence generation using sequence generation techniques is increasing rapidly, and researchers are faced with many unknown functions. One of the essential operations in bioinformatics is the classification of sequences to discover unknown proteins. There are two methods to classify sequences: the traditional method and the modern method. The conventional methods use sequence alignment, which has a high computational cost. In the contemporary method, feature extraction is used to classify proteins. In this regard, methods such as DeepFam have been presented. This research is an improvement of the DeepFam model, and the special focus is on extracting the appropriate features to differentiate the sequences of different categories. As the model improved, the features tended to be more generic. The grad-CAM method has been used to analyze the extracted features and interpret improved network layers. Then, we used the fitting vector from the transformer model to check the performance of Grad-CAM. The COG database, a massive database of protein sequences, was used to check the accuracy of the presented method. We have shown that by extracting more efficient features, the conserved regions in the sequences can be discovered more accurately, which helps to classify the proteins better. One of the critical advantages of the presented method is that by increasing the number of categories, the necessary flexibility is maintained, and the classification accuracy in three tests is higher than that of other methods.

Authors

Naeimeh Mohammad Karimi

Computer Engineering Department, Yazd University, Yazd, Iran.

Mehdi Rezaeian

Computer Engineering Department, Yazd University, Yazd, Iran.

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • C. Yu, S.-Y. Cheng, R. L. He and S. S.-T. ...
  • F. Zhang, H. Song, M. Zeng, Y. Li, L. Kurgan ...
  • P. Larranaga, B. Calvo, R. . Santana, C. Bielza, J. ...
  • J. Shen, J. Zhang, X. Luo, W. Zhu, K. Yu, ...
  • Y. Ge, S. Zhao and X. Zhao, "A step-by-step classification ...
  • Z. Lv, S. Jin, H. Ding and Q. Zou, "A ...
  • C. L. P. Gupta, A. Bihari and S. Tripathi, "Protein ...
  • O. Yakhnenko, A. Silvescu and V. Honavar, "Discriminatively trained markov ...
  • W. Zheng, L. Yang, . R. J. Genco, J. Wactawski-Wende, ...
  • B. Dogan, "An alignment-free method for bulk comparison of protein ...
  • S. Biđin, I. Vujaklija, T. Paradžik, A. Bielen and D. ...
  • S. Seo, M. Oh, Y. Park and S. Kim, "DeepFam: ...
  • D. Zhang and M. Kabuka, "Protein Family Classification from Scratch: ...
  • A. Dabba, A. Tari and D. Zouache, "Multiobjective artificial fish ...
  • M. S. Waterman, T. F. Smith and W. A. Beyer, ...
  • J. D. Thompson, D. G. Higgins and T. J. Gibson, ...
  • K. Katoh, K. Misawa, K.-i. Kuma and T. Miyata, "MAFFT: ...
  • R. C. Edgar, "MUSCLE: a multiple sequence alignment method with ...
  • C. Notredame, D. G. Higgins and J. Heringa, "T-Coffee: A ...
  • F. Naznin, R. Sarker and D. Essam, "Vertical decomposition with ...
  • H. Zhu, Z. He and Y. Jia, "A novel approach ...
  • S. R. Eddy, "Profile hidden Markov models," Bioinformatics (Oxford, England), ...
  • F. Naznin, R. Sarker and D. Essam, "Progressive alignment method ...
  • W. R. Pearson and D. J. Lipman, "Improved tools for ...
  • W. R. Pearson, "Searching protein sequence libraries: comparison of the ...
  • S. F. Altschul, T. L. Madden, A. A. Schäffer, J. ...
  • M. Bhagwat, L. Young and . R. R. Robison, "Using ...
  • S. Schwartz, W. J. Kent, A. Smit, Z. Zhang, R. ...
  • B. Ma, J. Tromp and M. Li, "PatternHunter: faster and ...
  • A. Chakraborty and S. Bandyopadhyay, "FOGSAA: Fast optimal global sequence ...
  • A. Wong, T. Reichert, D. Cohen and B. Aygun, "A ...
  • S. Batzoglou, L. Pachter, J. P. Mesirov, B. Berger and ...
  • M. Brudno, . C. B. Do, G. M. Cooper, M. ...
  • A. L. Delcher, A. Phillippy, J. Carlton and S. L. ...
  • N. Bray, I. Dubchak and L. Pachter, "AVID: A global ...
  • W. Huang, D. M. Umbach and L. Li, "Accurate anchoring ...
  • S. Min, B. Lee and S. Yoon, "Deep learning in ...
  • N. Liu, J. Han, D. Zhang, S. Wen and T. ...
  • J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho and ...
  • R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. ...
  • E. Asgari and M. R. Mofrad, "Continuous distributed representation of ...
  • M. Zeng, F. Zhang, F.-X. Wu, Y. Li, J. Wang ...
  • W. Zhong and F. Gu, "Predicting Local Protein ۳D Structures ...
  • B. Panda and B. Majhi, "A novel improved prediction of ...
  • R. Jafari and . M. M. Javidi, "Solving the protein ...
  • H. Hou, T. Gan, Y. Yang, X. Zhu, S. Liu, ...
  • B. Liu, C.-C. Li and K. Yan, "DeepSVM-fold: protein fold ...
  • P. Baldi and G. Pollastri, "The principled design of large-scale ...
  • D. Bhowmik, S. Gao, M. T. Young and A. Ramanathan, ...
  • Y. Cao, T. A. Geddes, J. Y. H. Yang and ...
  • Z. Guo, J. Liu, Y. Wang, M. Chen, D. Wang, ...
  • S. Zhang, R. Fan, Y. Liu, S. Chen, Q. Liu ...
  • T. N. Kinyanjui, K. Mugoye and R. Kibuku, "Multi-Head Self-Attention ...
  • V. Vimbi, N. Shaffi and M. Mahmud, "Interpreting artificial intelligence ...
  • C. Molnar, "Interpretable machine learning," ۲۰۲۰ ...
  • P. H. "Game theory: A Multi-leveled approach," ۲۰۱۵ ...
  • R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. ...
  • J. Vig, A. Madani, L. R. Varshney, C. Xiong, R. ...
  • "Biological structure and function emerge from scaling unsupervised learning to ...
  • I.-I. Comm, "Abbreviations and symbols for nucleic acids, polynucleotides, and ...
  • X. Glorot and Y. Bengio, "Understanding the difficulty of training ...
  • D. P. Kingma and J. Ba, "Adam: A method for ...
  • R. L. Tatusov, M. Y. Galperin, D. A. Natale and ...
  • R. L. Tatusov, E. V. Koonin and D. J. Lipman, ...
  • M. Y. Galperin, K. S. Makarova, Y. I. Wolf and ...
  • N. M. Razali, . Y. B. Wah and others, "Power ...
  • R. C. Blair and J. J. Higgins, "Comparison of the ...
  • نمایش کامل مراجع