Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

Saman Namdar; Hesham Faili; Shahram Khadivi

Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

Publish place: International Journal of Information and Communication Technology Research (IJICT، Vol: 5، Issue: 1

Publish Year: 1391

نوع سند: مقاله ژورنالی

زبان: English

This Paper With 14 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/1425805

شناسه ملی سند علمی:

JR_ITRC-5-1_005

تاریخ نمایه سازی: 22 فروردین 1401

Abstract:

Statistical Machine Translation (SMT) is a machine translation paradigm, in which translations are generated on the base of statistical models. In this system, parameters are derived from an analysis of a parallel corpus, and SMT quality depends on the ability of learning word translations. Enriching the SMT by a suitable morphology analyser decreases out of vocabulary words and dictionary size dramatically. This could be more considerable when it deals with a highly-inflectional, low-resource, language like Persian. Defining a suitable granularity for word segment may improve the alignment quality in the parallel corpus. In this paper different schemes and word’s combinations segments in a SMT’s experiment from Persian to English language are prospected and the best one-to-one alignment, which is called En-like scheme, is proposed. By using the mentioned scheme the translation’s quality from Persian to English is improved about ۳ points with respect to BLEU measure over the phrase-based SMT.

Keywords:

Statistical Machine Translation , Segmentation Schemes , Lexical Granularities , Morpheme , Persian Language

Authors

Saman Namdar

Hesham Faili

Shahram Khadivi