ArmanTTS single-speaker Persian dataset

Publish Year: 1401
نوع سند: مقاله کنفرانسی
زبان: English
View: 156

This Paper With 5 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

CEITCONF06_046

تاریخ نمایه سازی: 26 خرداد 1402

Abstract:

TTS, or text-to-speech, is a complicated process thatcan be accomplished through appropriate modeling using deeplearning methods. In order to implement deep learning models, asuitable dataset is required. Since there is a scarce amount ofwork done in this field for the Persian language, this paper willintroduce the single speaker dataset: ArmanTTS. We comparedthe characteristics of this dataset with those of various prevalentdatasets to prove that ArmanTTS meets the necessary standardsfor teaching a Persian text-to-speech conversion model. We alsocombined the Tacotron ۲ and HiFi GAN to design a model thatcan receive phonemes as input, with the output being thecorresponding speech. ۴.۰ value of MOS was obtained from realspeech, ۳.۸۷ value was obtained by the vocoder prediction and۲.۹۸ value was reached with the synthetic speech generated bythe TTS model.

Authors

Mohammd Hasan Shamgholi

MSc StudentSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran

Vahid Saeedi

MSc GraduateSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran

Javad Peymanfard

PhD CandidateSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran

Leila Alhabib

BSc StudentSchool of Computer EngineeringAmirkabir University of TechnologyTehran, Iran

Hossein Zeinali

Assistant ProfessorSchool of Computer EngineeringAmirkabir University of TechnologyTehran, Iran