ParsNER-Social: A Corpus for Named Entity Recognition in Persian Social Media Texts

Publish Year: 1400
نوع سند: مقاله ژورنالی
زبان: English
View: 303

This Paper With 13 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_JADM-9-2_005

تاریخ نمایه سازی: 20 مرداد 1400

Abstract:

Named Entity Recognition (NER) is one of the essential prerequisites for many natural language processing tasks. All public corpora for Persian named entity recognition, such as ParsNERCorp and ArmanPersoNERCorpus, are based on the Bijankhan corpus, which is originated from the Hamshahri newspaper in ۲۰۰۴. Correspondingly, most of the published named entity recognition models in Persian are specially tuned for the news data and are not flexible enough to be applied in different text categories, such as social media texts. This study introduces ParsNER-Social, a corpus for training named entity recognition models in the Persian language built from social media sources. This corpus consists of ۲۰۵,۳۷۳ tokens and their NER tags, crawled from social media contents, including ۱۰ Telegram channels in ۱۰ different categories. Furthermore, three supervised methods are introduced and trained based on the ParsNER-Social corpus: Two conditional random field models as baseline models and one state-of-the-art deep learning model with six different configurations are evaluated on the proposed dataset. The experiments show that the Mono-Lingual Persian models based on Bidirectional Encoder Representations from Transformers (MLBERT) outperform the other approaches on the ParsNER-Social corpus. Among different Configurations of MLBERT models, the ParsBERT+BERT-TokenClass model obtained an F۱-score of ۸۹.۶۵%.

Authors

M. Asgari-Bidhendi

Computer Engineering School, Iran University of Science and Technology, Tehran, Iran.

B. Janfada

Computer Engineering School, Iran University of Science and Technology, Tehran, Iran.

O. R. Roshani Talab

Computer Engineering School, Iran University of Science and Technology, Tehran, Iran.

B. Minaei-Bidgoli

School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • M. E. Khademi and M. Fakhredanesh, “Persian automatic text summarization ...
  • R. Grishman and B. Sundheim. “Message understanding conference- ۶: A ...
  • A. Borthwick and R. Grishman, “A maximum entropy approach to ...
  • H. Poostchi, E. Z. Borzeshi, Abdous, M., and M. Piccardi, ...
  • E. F. T. K. Sang and F. D. Meulder, “Introduction ...
  • A. Farzindar and D. Inkpen. “Natural language processing for social ...
  • Y. Kim, J. Kim, and J. Seo, “Noise improves noise: ...
  • R. Weischedel, E. Hovy, M. Marcus, M. Palmer, R. Belvin, ...
  • L. Derczynski, E. Nichols, M. van Erp, and N. Limsopatham, ...
  • J. Li, A. Sun, J. Han, and C. Li, “A ...
  • F. Saad, H. Aras, and R. Hackl-Sommer, “Improving named entity ...
  • Zhou, C., Li, B., & Sun, X. “Improving software bug-specific ...
  • R. Sharma, S. Morwal, B. Agarwal, R. Chandra, and M. ...
  • V. Yadav and S. Bethard, “A survey on recent advances ...
  • I. Segura-Bedmar, P. Martínez, and M. Herrero-Zazo, “Semeval-۲۰۱۳ task ۹: ...
  • S. Zhang, and N. Elhadad, “Unsupervised biomedical named entity recognition: ...
  • F. Balouchzahi and H. Shashirekha, “Puner - Parsi ULMFiT for ...
  • S. Liu, B. Tang, Q. Chen, and X. Wang, “Effects ...
  • M. Habibi, L. Weber, M. L. Neves, D. L. Wiegandt, ...
  • Y. Xin, E. Hart, V. Mahajan, and J. D. Ruvini, ...
  • M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. ...
  • J. Devlin, M. Chang, L. Kristina, and K. Toutanova, “BERT: ...
  • A. Akbik, D. Blythe, and R. Vollgraf. “Contextual string embeddings ...
  • A. Akbik, T. Bergmann, and R. Vollgraf, “Pooled contextualized embeddings ...
  • J. Straková, M. Straka, and J. Hajic, “Neural architectures for ...
  • Y. Jiang, C. Hu, T. Xiao, C. Zhang, and J. ...
  • A. Baevski, S. Edunov, Y. Liu, L. Zettlemoyer, and M. ...
  • P. S. Mortazavi and M. Shamsfard. “Named entity recognition in ...
  • S. Rahati-Ghoochani, S. A. Esfahani, and J. Nader, “Persian name ...
  • M. Kolali Khormuji and M. Bazrafkan, “Persian named entity recognition ...
  • O. Moradiannasab, S. Momtazi, and A. Palmer, “A named entity ...
  • F. Ahmadi and H. Moradi, “A hybrid method for Persian ...
  • S. Hosseinnejad, Y. Shekofteh, and T. Emami Azadi, “A’laam corpus: ...
  • K. Dashtipour, M. Gogate, A. Adeel, A. Algarafi, N. Howard, ...
  • M. Khodakarami, “Toward implementation of a named entity recognition system ...
  • H. Poostchi, E. Z. Borzeshi, and M. Piccardi, “BiLSTM-CRF for ...
  • M. S. Shahshahani, M. Mohseni, A. Shakery, and H. Faili, ...
  • N. Taghizadeh, Z. Borhanifard, M. GolestaniPour, and H. Faili. “NSURL-۲۰۱۹ ...
  • E. Taher, S. A. Hoseini, and M. Shamsfard, “Beheshti-NER: Persian ...
  • S. Momtazi and F. Torabi, “Named entity recognition in Persian ...
  • L. Jafar Tafreshi and F. Soltanzadeh. “A novel approach to ...
  • M. S. Rasooli, M. Kouhestani, and A. Moloodi, “Development of ...
  • T. Baldwin, M. de Marneffe, B. Han, Y. Kim, A. ...
  • B. Strauss, B. Toma, A. Ritter, M. de Marneffe, and ...
  • P. von Daniken and M. Cieliebak, “Transfer learning and sentence-level ...
  • G. Aguilar, A. Pastor ́Lopez-Monroy, F. A. Gon ́zalez and ...
  • F. Oroumchian, S. Tasharofi, H. Amiri, H. Hojjat, and F. ...
  • M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, “ParsBERT: ...
  • نمایش کامل مراجع