A Transformer-based Approach for Persian Text Chunking

Publish Year: 1401
نوع سند: مقاله ژورنالی
زبان: English
View: 92

This Paper With 12 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_JADM-10-3_007

تاریخ نمایه سازی: 9 مهر 1401

Abstract:

Over the last few years, text chunking has taken a significant part in sequence labeling tasks. Although a large variety of methods have been proposed for shallow parsing in English, most proposed approaches for text chunking in Persian language are based on simple and traditional concepts. In this paper, we propose using the state-of-the-art transformer-based contextualized models, namely BERT and XLM-RoBERTa, as the major structure of our models. Conditional Random Field (CRF), the combination of Bidirectional Long Short-Term Memory (BiLSTM) and CRF, and a simple dense layer are employed after the transformer-based models to enhance the model's performance in predicting chunk labels. Moreover, we provide a new dataset for noun phrase chunking in Persian which includes annotated data of Persian news text. Our experiments reveal that XLM-RoBERTa achieves the best performance between all the architectures tried on the proposed dataset. The results also show that using a single CRF layer would yield better results than a dense layer and even the combination of BiLSTM and CRF.

Authors

P. Kavehzadeh

Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran.

M. M. Abdollah Pour

Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran.

S. Momtazi

Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran.

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • A. Akbik, D. Blythe, and R. Vollgraf. Contextual string embeddings ...
  • A. Akhundov, D. Trautmann, and G. Groh. Sequence labeling: A ...
  • A. AleAhmad, H. Amiri, E. Darrudi, M. Rahgozar, and F. ...
  • M. Bijankhan, J. Sheykhzadegan, M. Bahrani, and M. Ghayoomi, Lessons ...
  • K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. ...
  • K. Clark, M. Luong, C. D. Manning, and Q. V. ...
  • R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, ...
  • A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wen- ...
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova, BERT: ...
  • S. R. Eddy, Hidden Markov models. Current opinion instructural biology, ...
  • M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, Parsbert: ...
  • A. K. Ghalibaf, S. Rahati, and A. Estaji, Shallow semantic ...
  • M. Ghayoomi, Bootstrapping the development of an HPSG-based treebank for ...
  • A. Graves, and J. Schmidhuber, Framewise phoneme classification with bidirectional ...
  • C. Grover, and R. Tobin, Rule-based chunking and reusability, ۲۰۰۶. ...
  • A. Hadifar, and S. Momtazi, The impact of corpus domainon ...
  • K. Hashimoto, C. Xiong, Y. Tsuruoka, and R. Socher, A ...
  • ۱۹۲۳–۱۹۳۳). Association for Computational Linguistics. Retrieved from https://doi.org/۱۰.۱۸۶۵۳/v۱/d۱۷-۱۲۰۶ ...
  • S. Hochreiter, and J. Schmidhuber, Long short-term memory. Neural computation, ...
  • M. Homayoonpour, and A. Salimibadr, Determining the boundaries and syntactic ...
  • S. Hosseinnejad, Y. Shekofteh, and T. A. Emami Azadi, A’laam ...
  • Z. Huang, W. Xu, and K. Yu, Bidirectional LSTM-CRF models ...
  • S. Kiani, T. Akhavan, and M. Shamsfard, Developing a Persian ...
  • J. Lafferty, A. McCallum, and F. C. Pereira, Conditional random ...
  • G. Lample, and A. Conneau, Cross-lingual language model pretraining. arXiv ...
  • L. Liu, J. Shang, F. F. Xu, X. Ren, H. ...
  • Y. Liu, F. Meng, J. Zhang, J. Xu, Y. Chen, ...
  • C. Ma, H. Zheng, P. Xie, C. Li, L. Li, ...
  • C. Manning, and D. Klein, Optimization, maxent models, and conditional ...
  • M. Mohseni, J. Ghofrani, and H. Faili, Persianp: a Persian ...
  • S. Mohtaj, B. Roshanfekr, A. Zafarian, and H. Asghari, Parsivar: ...
  • S. Noferesti, and M. Shamsfard, A rule-based model and genetic ...
  • S.-B. Park, and B.-T. Zhang, Text chunking by combining hand-crafted ...
  • M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, ...
  • C. Qu, L. Yang, M. Qiu, W. B. Croft, Y. ...
  • A. Ramponi, R. van der Goot, R. Lombardo, and B. ...
  • M. S. Rasooli, M. Kouhestani, and A. Moloodi, Development of ...
  • A. Ratnaparkhi, A linear observed time statistical parser based on ...
  • (pp. ۲۱۲۱–۲۱۳۰). Association for Computational Linguistics. Retrieved from https://doi.org/۱۰.۱۸۶۵۳/v۱/P۱۷-۱۱۹۴ ...
  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ...
  • S. K. Saha, and A. Prakash, Experiments on document chunking ...
  • M. Shamsfard, and M. S. Mousavi, Thematic roleextraction using shallow ...
  • M. Shamsfard, and M. SadrMousavi, A rule-based semantic role labeling ...
  • M. SharifiAtshgah, Semi-automatic development of Persian treebank. In PhD dissertation ...
  • K. Simov, Z. Peev, M. Kouylekov, A. Simov, M. Dimitrov, ...
  • A. Søgaard, and Y. Goldberg, Deep multi-task learning with low ...
  • S. Tabatabayi, and S. HoseinNezhad, Finding the boundaries and syntactic ...
  • E. Taher, S. A. Hoseini, and M. Shamsfard, Beheshti-NER: Persian ...
  • C. Thompson, USF: Chunking for aspect-term identification and polarity classification. ...
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, ...
  • H. Xu, B. Liu, L. Shu, and P. S. Yu, ...
  • In J. Burstein, C. Doran, and T. Solorio (Eds.), Proceedings ...
  • J. Yang, M. Wang, H. Zhou, C. Zhao, W. Zhang, ...
  • M. Asgari-Bidhendi, B. Janfada, O. R. Roshani Talab, and B. ...
  • نمایش کامل مراجع