An Overview of Multimodal Natural Language Processing Based on Artificial Intelligence: From Text Translation to Subject-Specific Analysis
Publish place: the fourth Computer Engineering, Information Technology and Communications Students Conference
Publish Year: 1404
نوع سند: مقاله کنفرانسی
زبان: English
View: 8
This Paper With 7 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
CICTC04_062
تاریخ نمایه سازی: 21 بهمن 1404
Abstract:
In this review article, given that Multimodal Natural Language Processing (NLP) has made remarkable progress in the ability to convert multimedia inputs (text, image, audio) into each other, new architectures and solutions in the field of Multimodal NLP are examined; which include such things as translating text, audio, and image into each other, recognizing and generating image captions, and analyzing surrounding data. First, the architectures of convolutional neural networks, transformers, and various multimodal coding models are analyzed; then the advantages, challenges, and future research efforts are stated.
Keywords:
Authors
Ammar Arab
Student of Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran
Ahmad Sharif
Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran