ArmanTTS single-speaker Persian dataset

Mohammd Hasan, Shamgholi; Vahid, Saeedi; Javad, Peymanfard; Leila, Alhabib; Hossein, Zeinali

ArmanTTS single-speaker Persian dataset

عنوان مقاله: ArmanTTS single-speaker Persian dataset
شناسه ملی مقاله: CEITCONF06_046
منتشر شده در اولین کنفرانس بین المللی و ششمین کنفرانس ملی کامپیوتر، فناوری اطلاعات و کاربردهای هوش مصنوعی در سال 1401

مشخصات نویسندگان مقاله:

Mohammd Hasan Shamgholi - MSc StudentSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran
Vahid Saeedi - MSc GraduateSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran
Javad Peymanfard - PhD CandidateSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran
Leila Alhabib - BSc StudentSchool of Computer EngineeringAmirkabir University of TechnologyTehran, Iran
Hossein Zeinali - Assistant ProfessorSchool of Computer EngineeringAmirkabir University of TechnologyTehran, Iran

خلاصه مقاله:

TTS, or text-to-speech, is a complicated process thatcan be accomplished through appropriate modeling using deeplearning methods. In order to implement deep learning models, asuitable dataset is required. Since there is a scarce amount ofwork done in this field for the Persian language, this paper willintroduce the single speaker dataset: ArmanTTS. We comparedthe characteristics of this dataset with those of various prevalentdatasets to prove that ArmanTTS meets the necessary standardsfor teaching a Persian text-to-speech conversion model. We alsocombined the Tacotron ۲ and HiFi GAN to design a model thatcan receive phonemes as input, with the output being thecorresponding speech. ۴.۰ value of MOS was obtained from realspeech, ۳.۸۷ value was obtained by the vocoder prediction and۲.۹۸ value was reached with the synthetic speech generated bythe TTS model.

کلمات کلیدی:

dataset; Vocoders; Acoustic models

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1675610/