Learning an Efficient Text Augmentation Strategy: A Case Study in Sentiment Analysis
Publish place: International Journal of Web Research، Vol: 6، Issue: 2
Publish Year: 1402
نوع سند: مقاله ژورنالی
زبان: English
View: 93
This Paper With 9 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
این Paper در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_IJWR-6-2_006
تاریخ نمایه سازی: 27 فروردین 1403
Abstract:
Contemporary machine learning models, like deep neural networks, require substantial labeled datasets for proper training. However, in areas such as natural language processing, a shortage of labeled data can lead to overfitting. To address this challenge, data augmentation, which involves transforming data points to maintain class labels and provide additional valuable information, has become an effective strategy. In this paper, a deep reinforcement learning-based text augmentation method for sentiment analysis was introduced, combining reinforcement learning with deep learning. The technique uses Deep Q-Network (DQN) as the reinforcement learning method to search for an efficient augmentation strategy, employing four text augmentation transformations: random deletion, synonym replacement, random swapping, and random insertion. Additionally, various deep learning networks, including CNN, Bi-LSTM, Transformer, BERT, and XLNet, were evaluated for the training phase. Experimental findings show that the proposed technique can achieve an accuracy of ۶۵.۱% with only ۲۰% of the dataset and ۶۹.۳% with ۴۰% of the dataset. Furthermore, with just ۱۰% of the dataset, the method yields an F۱-score of ۶۲.۱%, rising to ۶۹.۱% with ۴۰% of the dataset, outperforming previous approaches. Evaluation on the SemEval dataset demonstrates that reinforcement learning can efficiently augment text datasets for improved sentiment analysis results.
Keywords:
Authors
Mehdy Roayaei
Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran
مراجع و منابع این Paper:
لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :