Published in: مجله هوش مصنوعی و داده کاوی، دوره: 4، شماره: 1
COI code: JR_JADM-4-1_004
Paper Language: English
How to Download This Paper
For Downloading the Fulltext of CIVILICA papers please visit the orginal Persian Section of website.
Authors A new model for persian multi-part words edition based on statistical machine translationM. Zahedi - School of Computer Engineering & Information Technology, University of Shahrood, Shahrood,Iran.
A. Arjomandzadeh - School of Computer Engineering & Information Technology, University of Shahrood, Shahrood,Iran.
Abstract:Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some serious issues in Persian text processing and text readability. In order to cope with the issues, this work proposes a new model to correct spacing in multi-part words. The proposed method is based on statistical machine translation paradigm. In machine translation paradigm, text in source language is translated into a text in destination language on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The proposed method uses statistical machine translation techniques considering unedited multi-part words as a source language and the space-edited multi-part words as a destination language. The results show that the proposed method can edit and improve spacing correction process of Persian multi-part words with a statistically significant accuracy rate.
Keywords:Persian Multi-Part Words, Statistical Machine Translation, Fertility-based IBM Model, Syntax-Based Decoder, Spacing Rules
COI code: JR_JADM-4-1_004
how to cite to this paper:If you want to refer to this article in your research, you can easily use the following in the resources and references section:
Zahedi, M. & A. Arjomandzadeh, 2016, A new model for persian multi-part words edition based on statistical machine translation, Journal of Artificial Intelligence & Data Mining 4 (1), https://www.civilica.com/Paper-JR_JADM-JR_JADM-4-1_004.htmlInside the text, wherever referred to or an achievement of this article is mentioned, after mentioning the article, inside the parental, the following specifications are written.
First Time: (Zahedi, M. & A. Arjomandzadeh, 2016)
Second and more: (Zahedi & Arjomandzadeh, 2016)
For a complete overview of how to citation please review the following CIVILICA Guide (Citation)
The University/Research Center Information:
Type: state university
Paper No.: 7051
in University Ranking and Scientometrics the Iranian universities and research centers are evaluated based on scientific papers.
Research Info Management
Export Citation info of this paper to research management softwares
New Related Papers
- A Review of Neck Circumference As a Potent Anthropometric Predictor of Cardiovascular Disease and Metabolic Syndrome
- EVALUATING THE EFFICIENCY OF USING DISPERSANT IN DEALING WITH OIL POLLUTION AT SEA
- Sustainable Utilization Of Mino Island By Conservation Zoning
- Wind effect on changes of density in a theoretical oceanic model
- The Apply of Artificial Intelligence as a Strategic InformationTechnology in Banking (A case study: Bank-e-Melli of Iran)
The Above articles are recently indexed in the related subjects
Iran Scientific Advertisment Netword
Share this paper
WHAT IS COI?
COI is a national code dedicated to all Iranian Conference and Journal Papers. the COI of each paper can be verified online.