Mehr: A Persian Coreference Resolution Corpus

Hassan, Haji Mohammadi; Alireza, Talebpour; Ahamd, Mahmoudi Aznaveh; Samaneh, Yazdani

Mehr: A Persian Coreference Resolution Corpus

عنوان مقاله: Mehr: A Persian Coreference Resolution Corpus
شناسه ملی مقاله: JR_JADM-11-3_006
منتشر شده در در سال 1402

مشخصات نویسندگان مقاله:

Hassan Haji Mohammadi - Department of Computer Engineering, North Tehran Branch, Islamic Azad University, Tehran, Iran.
Alireza Talebpour - Department of computer engineering, Shahid Beheshti University, Tehran, Iran.
Ahamd Mahmoudi Aznaveh - Department of computer engineering, Shahid Beheshti University, Tehran, Iran.
Samaneh Yazdani - Department of Computer Engineering, North Tehran Branch, Islamic Azad University, Tehran, Iran.

خلاصه مقاله:

Coreference resolution is one of the essential tasks of natural languageprocessing. This task identifies all in-text expressions that refer to thesame entity in the real world. Coreference resolution is used in otherfields of natural language processing, such as information extraction,machine translation, and question-answering.This article presents a new coreference resolution corpus in Persiannamed Mehr corpus. The article's primary goal is to develop a Persiancoreference corpus that resolves some of the previous Persian corpus'sshortcomings while maintaining a high inter-annotator agreement. Thiscorpus annotates coreference relations for noun phrases, namedentities, pronouns, and nested named entities. Two baseline pronounresolution systems are developed, and the results are reported. Thecorpus size includes ۴۰۰ documents and about ۱۷۰k tokens. Corpusannotation is done by WebAnno preprocessing tool.

کلمات کلیدی:

Natural Language Processing, Mention, Anaphora resolution, Antecedent

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1880599/