An Overview of Multimodal Natural Language Processing Based on Artificial Intelligence: From Text Translation to Subject-Specific Analysis

In this review article, given that Multimodal Natural Language Processing (NLP) has made remarkable progress in the ability to convert multimedia inputs (text, image, audio) into each other, new architectures and solutions in the field of Multimodal NLP are examined; which include such things as translating text, audio, and image into each other, recognizing and generating image captions, and analyzing surrounding data. First, the architectures of convolutional neural networks, transformers, and various multimodal coding models are analyzed; then the advantages, challenges, and future research efforts are stated.

Keywords:

Multimodal NLP , Image Captioning , Artificial Intelligence , Machine Translation , Transformer

Authors

Ammar Arab

Student of Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran

Ahmad Sharif

Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/2530724

شناسه ملی سند علمی:

CICTC04_062

تاریخ نمایه سازی: 21 بهمن 1404

How to Cite to This Paper:

If you want to refer to this Paper in your research work, you can simply use the following phrase in the resources section:

Arab, Ammar and Sharif, Ahmad,1404,An Overview of Multimodal Natural Language Processing Based on Artificial Intelligence: From Text Translation to Subject-Specific Analysis,the fourth Computer Engineering, Information Technology and Communications Students Conference,Tehran,https://civilica.com/doc/2530724

مقالات مرتبط جدید