Topic Detection on COVID-۱۹ Tweets: A Comparative Study on Clustering and Transfer Learning Models
Publish place: Tabriz Journal of Electrical Engineering، Vol: 52، Issue: 4
Publish Year: 1401
نوع سند: مقاله ژورنالی
زبان: Persian
View: 207
This Paper With 11 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_TJEE-52-4_007
تاریخ نمایه سازی: 7 اسفند 1401
Abstract:
Automatic topic detection seems unavoidable in social media analysis due to big text data which their users generate. Clustering-based methods are one of the most important and up-to-date categories in topic detection. The goal of this research is to have a wide study on this category. Therefore, this paper aims to study the main components of clustering-based-topic-detection, which are embedding methods, distance metrics, and clustering algorithms. Transfer learning and consequently pretrained language models and word embeddings have been considered in recent years. Regarding the importance of embedding methods, the efficiency of five new embedding methods, from earlier to recent ones, are compared in this paper. To conduct our study, two commonly used distance metrics, in addition to five important clustering algorithms in the field of topic detection, are implemented by the authors. As COVID-۱۹ has turned into a hot trending topic on social networks in recent years, a dataset including one-month tweets collected with COVID-۱۹-related hashtags is used for this study. More than ۷۵۰۰ experiments are performed to determine tunable parameters. Then all combinations of embedding methods, distance metrics and clustering algorithms (۵۰ combinations) are evaluated using Silhouette metric. Results show that T۵ strongly outperforms other embedding methods, cosine distance is weakly better than other distance metrics, and DBSCAN is superior to other clustering algorithms.
Keywords:
Authors
الناز زعفرانی معطر
Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
محمدرضا کنگاوری
Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
امیر مسعود رحمانی
Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
مراجع و منابع این Paper:
لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :