CIVILICA We Respect the Science
Publisher of Iranian Journals and Conference Proceedings
Paper
title

A new model for persian multi-part words edition based on statistical machine translation

Credit to Download: 1 | Page Numbers 8 | Abstract Views: 43
Year: 2016
COI code: JR_JADM-4-1_004
Paper Language: English

How to Download This Paper

For Downloading the Fulltext of CIVILICA papers please visit the orginal Persian Section of website.

Authors A new model for persian multi-part words edition based on statistical machine translation

  M. Zahedi - School of Computer Engineering & Information Technology, University of Shahrood, Shahrood,Iran.
  A. Arjomandzadeh - School of Computer Engineering & Information Technology, University of Shahrood, Shahrood,Iran.

Abstract:

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some serious issues in Persian text processing and text readability. In order to cope with the issues, this work proposes a new model to correct spacing in multi-part words. The proposed method is based on statistical machine translation paradigm. In machine translation paradigm, text in source language is translated into a text in destination language on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The proposed method uses statistical machine translation techniques considering unedited multi-part words as a source language and the space-edited multi-part words as a destination language. The results show that the proposed method can edit and improve spacing correction process of Persian multi-part words with a statistically significant accuracy rate.   

Keywords:

Persian Multi-Part Words, Statistical Machine Translation, Fertility-based IBM Model, Syntax-Based Decoder, Spacing Rules

Perma Link

https://www.civilica.com/Paper-JR_JADM-JR_JADM-4-1_004.html
COI code: JR_JADM-4-1_004

how to cite to this paper:

If you want to refer to this article in your research, you can easily use the following in the resources and references section:
Zahedi, M. & A. Arjomandzadeh, 2016, A new model for persian multi-part words edition based on statistical machine translation, Journal of Artificial Intelligence & Data Mining 4 (1), https://www.civilica.com/Paper-JR_JADM-JR_JADM-4-1_004.htmlInside the text, wherever referred to or an achievement of this article is mentioned, after mentioning the article, inside the parental, the following specifications are written.
First Time: (Zahedi, M. & A. Arjomandzadeh, 2016)
Second and more: (Zahedi & Arjomandzadeh, 2016)
For a complete overview of how to citation please review the following CIVILICA Guide (Citation)

Scientometrics

The University/Research Center Information:
Type: state university
Paper No.: 7051
in University Ranking and Scientometrics the Iranian universities and research centers are evaluated based on scientific papers.

Research Info Management

Export Citation info of this paper to research management softwares

New Related Papers

Iran Scientific Advertisment Netword

Share this paper

WHAT IS COI?

COI is a national code dedicated to all Iranian Conference and Journal Papers. the COI of each paper can be verified online.