CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

A new model for persian multi-part words edition based on statistical machine translation

عنوان مقاله: A new model for persian multi-part words edition based on statistical machine translation
شناسه ملی مقاله: JR_JADM-4-1_004
منتشر شده در شماره 1 دوره 4 فصل در سال 1395
مشخصات نویسندگان مقاله:

M. Zahedi - School of Computer Engineering & Information Technology, University of Shahrood, Shahrood,Iran.
A. Arjomandzadeh - School of Computer Engineering & Information Technology, University of Shahrood, Shahrood,Iran.

خلاصه مقاله:
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some serious issues in Persian text processing and text readability. In order to cope with the issues, this work proposes a new model to correct spacing in multi-part words. The proposed method is based on statistical machine translation paradigm. In machine translation paradigm, text in source language is translated into a text in destination language on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The proposed method uses statistical machine translation techniques considering unedited multi-part words as a source language and the space-edited multi-part words as a destination language. The results show that the proposed method can edit and improve spacing correction process of Persian multi-part words with a statistically significant accuracy rate.   

کلمات کلیدی:
Persian Multi-Part Words, Statistical Machine Translation, Fertility-based IBM Model, Syntax-Based Decoder, Spacing Rules

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/894165/