Topic Detection on COVID-۱۹ Tweets: A Comparative Study on Clustering and Transfer Learning Models

Publish Year: 1401
نوع سند: مقاله ژورنالی
زبان: Persian
View: 152

This Paper With 11 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_TJEE-52-4_007

تاریخ نمایه سازی: 7 اسفند 1401

Abstract:

Automatic topic detection seems unavoidable in social media analysis due to big text data which their users generate. Clustering-based methods are one of the most important and up-to-date categories in topic detection. The goal of this research is to have a wide study on this category. Therefore, this paper aims to study the main components of clustering-based-topic-detection, which are embedding methods, distance metrics, and clustering algorithms. Transfer learning and consequently pretrained language models and word embeddings have been considered in recent years. Regarding the importance of embedding methods, the efficiency of five new embedding methods, from earlier to recent ones, are compared in this paper. To conduct our study, two commonly used distance metrics, in addition to five important clustering algorithms in the field of topic detection, are implemented by the authors. As COVID-۱۹ has turned into a hot trending topic on social networks in recent years, a dataset including one-month tweets collected with COVID-۱۹-related hashtags is used for this study. More than ۷۵۰۰ experiments are performed to determine tunable parameters. Then all combinations of embedding methods, distance metrics and clustering algorithms (۵۰ combinations) are evaluated using Silhouette metric. Results show that T۵ strongly outperforms other embedding methods, cosine distance is weakly better than other distance metrics, and DBSCAN is superior to other clustering algorithms.

Authors

الناز زعفرانی معطر

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

محمدرضا کنگاوری

Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

امیر مسعود رحمانی

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • Webpage, “Worldometers: Real Time World Statistics,” ۲۰۲۲. https://www.worldometers.info/coronavirus/?zarsrc=۱۳۰ ...
  • U.S. CDC, “CDC COVID Data Tracker,” U.S. Centers for Disease ...
  • X. Wang, S. Hegde, C. Son, B. Keller, A. Smith, ...
  • A. Zandifar and R. Badrfam, “Iranian mental health during the ...
  • A. Rafea and N. A. Gaballah, “Topic Detection Approaches in ...
  • F. Atefeh and W. Khreich, “A survey of techniques for ...
  • M. Hasan, M. A. Orgun, and R. Schwitter, “A survey ...
  • R. Ibrahim, A. Elbagoury, M. S. Kamel, and F. Karray, ...
  • Z. Mottaghinia, M.-R. Feizi-Derakhshi, L. Farzinvash, and P. Salehpour, “A ...
  • M. Asgari-Chenaghlu, N. Nikzad-Khasmakhi, and S. Minaee, “Covid-Transformer: Detecting Trending ...
  • S. R. Nayak, D. R. Nayak, U. Sinha, V. Arora, ...
  • M. Ahishali et al., “Advance Warning Methodologies for COVID-۱۹ Using ...
  • M. S. Iraji, M.-R. Feizi-Derakhshi, and J. Tanha, “COVID-۱۹ Detection ...
  • V. Ravi, H. Narasimhan, C. Chakraborty, and T. D. Pham, ...
  • L. L. Wang et al., “CORD-۱۹: The COVID-۱۹ Open Research ...
  • X. Guo, H. Mirzaalian, E. Sabir, A. Jaiswal, and W. ...
  • S. Zong, A. Baheti, W. Xu, and A. Ritter, “Extracting ...
  • C. E. Lopez, M. Vasu, and C. Gallemore, “Understanding the ...
  • E. Chen, K. Lerman, and E. Ferrara, “Tracking Social Media ...
  • R. Tang et al., “Rapidly Bootstrapping a Question Answering Dataset ...
  • D. Dimitrov et al., “TweetsCOV۱۹ - A Knowledge Base of ...
  • R. K. Gupta, A. Vishwanath, and Y. Yang, “COVID-۱۹ Twitter ...
  • R. Lamsal, “Design and analysis of a large-scale COVID-۱۹ tweets ...
  • J. Samuel, G. G. M. N. Ali, M. M. Rahman, ...
  • H. Jelodar, Y. Wang, R. Orji, and H. Huang, “Deep ...
  • J. Xue, J. Chen, C. Chen, C. Zheng, S. Li, ...
  • H. Yin, S. Yang, and J. Li, “Detecting Topic and ...
  • R. Chandrasekaran, V. Mehta, T. Valkunde, and E. Moustakas, “Topics, ...
  • A. Kruspe, M. Häberle, I. Kuhn, and X. X. Zhu, ...
  • O. Gencoglu, “Large-Scale, Language-Agnostic Discourse Classification of Tweets During COVID-۱۹,” ...
  • L. Li et al., “Characterizing the Propagation of Situational Information ...
  • Q. Jiao and S. Zhang, “A Brief Survey of Word ...
  • T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. ...
  • D. Nabergoj, A. D’Alconzo, D. Valerio, and E. Štrumbelj, “Topic ...
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching ...
  • A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag ...
  • J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global ...
  • J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: ...
  • M. T. Luong, H. Pham, and C. D. Manning, “Effective ...
  • C. Raffel et al., “Exploring the Limits of Transfer Learning ...
  • A. Vaswani et al., “Attention is all you need,” in ...
  • M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A ...
  • M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. ...
  • A. Y. Ng and M. I. Jordan, “On Spectral Clustering: ...
  • E. A. Patrick, “Clustering Using a Similarity Measure Based on ...
  • A. Mirzal, “Statistical Analysis of Microarray Data Clustering using NMF, ...
  • M. Asgari-Chenaghlu, M.-R. Feizi-Derakhshi, L. Farzinvash, M.-A. Balafar, and C. ...
  • S. Dehghani, V. Derhami, A. M. Zare Bidoki, and M. ...
  • M. A. Z. C. S. Sharifatzadeh, “Compilation Instance Transfer and ...
  • S. Smith, “Coronavirus (covid۱۹) Tweets - early April,” Kaggle.com, ۲۰۲۰. ...
  • S. Smith, “Coronavirus (covid۱۹) Tweets - late April | Kaggle,” ...
  • نمایش کامل مراجع