سیویلیکا را در شبکه های اجتماعی دنبال نمایید.

Extracting Persian-English Parallel Sentences from DocumentLevel Aligned Comparable Corpus using Bi-DirectionalTranslation

Publish Year: 1393
Type: Journal paper
Language: English
View: 798

This Paper With 7 Page And PDF Format Ready To Download

Export:

Link to this Paper:

Document National Code:

JR_ACSIJ-3-5_008

Index date: 3 November 2014

Extracting Persian-English Parallel Sentences from DocumentLevel Aligned Comparable Corpus using Bi-DirectionalTranslation abstract

Bilingual parallel corpora are very important in variousfiled of natural language processing (NLP). The quality of aStatistical Machine Translation (SMT) system stronglydependent upon the amount of training data. For low resourcelanguage pairs such as Persian-English, there are not enoughparallel sentences to build an accurate SMT system. This paperdescribes a new approach to use the Wikipedia as a comparablecorpus to extract Persian-English parallel sentences andeventually improve SMT system performance. This newapproach is also applicable to other low resource language pairs.In order to calculate the similarity score between two sentences, anovel bi-directional translation-based information retrievalsystem is proposed. A length penalty score is introduced toincrease the accuracy of extracted corpus. Using extractedparallel sentences, the performance of existing Persian-EnglishSMT is improved drastically

Extracting Persian-English Parallel Sentences from DocumentLevel Aligned Comparable Corpus using Bi-DirectionalTranslation Keywords:

Extracting Persian-English Parallel Sentences from DocumentLevel Aligned Comparable Corpus using Bi-DirectionalTranslation authors

Ebrahim Ansari

Department of Computer Science and Engineering, Shiraz UniversityShiraz, Fars, Iran

Mohammad Hadi Sadreddin

Department of Computer Science and Engineering, Shiraz UniversityShiraz, Fars, Ira

Alireza Tabebordba

Department of Computer Science and Engineering, Shiraz UniversityShiraz, Fars, Iran

Richard WALLAC

Distributed Systems Architecture Research Group, Complutense UniversityMadrid, Spain