Speech Emotion Recognition using Enriched Spectrogram and Deep Convolutional Neural Network Transfer Learning

B. Z. Mansouri; H.R. Ghaffary; A. Harimi

Speech Emotion Recognition using Enriched Spectrogram and Deep Convolutional Neural Network Transfer Learning

Publish place: Journal of Artificial Intelligence & Data Mining، Vol: 10، Issue: 4

Publish Year: 1401

نوع سند: مقاله ژورنالی

زبان: English

This Paper With 10 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

هوش مصنوعی > شبکه عصبی

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/1570956

شناسه ملی سند علمی:

JR_JADM-10-4_008

تاریخ نمایه سازی: 28 آذر 1401

Abstract:

Speech emotion recognition (SER) is a challenging field of research that has attracted attention during the last two decades. Feature extraction has been reported as the most challenging issue in SER systems. Deep neural networks could partially solve this problem in some other applications. In order to address this problem, we proposed a novel enriched spectrogram calculated based on the fusion of wide-band and narrow-band spectrograms. The proposed spectrogram benefited from both high temporal and spectral resolution. Then we applied the resultant spectrogram images to the pre-trained deep convolutional neural network, ResNet۱۵۲. Instead of the last layer of ResNet۱۵۲, we added five additional layers to adopt the model to the present task. All the experiments performed on the popular EmoDB dataset are based on leaving one speaker out of a technique that guarantees the speaker's independency from the model. The model gains an accuracy rate of ۸۸.۹۷% which shows the efficiency of the proposed approach in contrast to other state-of-the-art methods.

Keywords:

Wideband and narrowband spectrogram , ResNet۱۵۲ , DCNN , Transfer learning , Speech emotion recognition

Authors

B. Z. Mansouri

Electrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.

H.R. Ghaffary

Electrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.

A. Harimi

Electrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :

M. El Ayadi, M. S. Kamel, and F. Karray, "Survey ...
E. H. Kim, K. H. Hyun, S. H. Kim, and ...
E. Bozkurt, E. Erzin, C. E. Erdem, and A. T. ...
C.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, "Features and classifiers ...
A. Harimi, A. AhmadyFard, A. Shahzadi, and K. Yaghmaie, "Anger ...
A. Shahzadi, A. Ahmadyfard, A. Harimi, and K. Yaghmaie, "Speech ...
A. Harimi, H. S. Fakhr, and A. Bakhshi, "Recognition of ...
A. Bakhshi, A. Harimi, and S. Chalup, "CyTex: Transforming speech ...
H. Marvi, Z. Esmaileyan, and A. Harimi, "Estimation of LPC ...
A. Harimi, A. Shahzadi, A. Ahmadyfard, and K. Yaghmaie, "Classification ...
E. Kalhor and B. Bakhtiari, "Multi-Task Feature Selection for Speech ...
B. Schuller, S. Steidl, and A. Batliner, The Interspeech ۲۰۰۹ ...
B. Schuller, A. Batliner, S. Steidl, F. Schiel, and J. ...
J.-C. Lin, C.-H. Wu, and W.-L. Wei, "Error weighted semi-coupled ...
B. Schuller, G. Rigoll, and M. Lang, "Hidden Markov model-based ...
M. Bejani, D. Gharavian, and N. M. Charkari, "Audiovisual emotion ...
J. Nicholson, K. Takahashi, and R. Nakatsu, "Emotion recognition in ...
A. Bhavan, P. Chauhan, and R. R. Shah, "Bagged support ...
B. Schuller, G. Rigoll, and M. Lang, "Speech emotion recognition ...
Y. Chavhan, M. Dhore, and P. Yesaware, "Speech emotion recognition ...
T. Zhang, W. Zheng, Z. Cui, Y. Zong, J. Yan, ...
Z. Huang, M. Dong, Q. Mao, and Y. Zhan, "Speech ...
Q. Mao, M. Dong, Z. Huang, and Y. Zhan, "Learning ...
G. Trigeorgis et al., "Adieu features? End-to-end speech emotion recognition ...
S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, ...
D. Guiming, W. Xia, W. Guangyan, Z. Yan, and L. ...
Z. Huang, M. Dong, Q. Mao, and Y. Zhan, "Speech ...
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification ...
K. He, X. Zhang, S. Ren, and J. Sun, "Spatial ...
F. Chollet, Deep learning with Python. Manning New York, ۲۰۱۸ ...
S. Zhang, S. Zhang, T. Huang, and W. Gao, "Speech ...
M. Falahzadeh, F. Farokhi, A. Harimi, and R. Sabbaghi, "Deep ...
S. Jothimani and K. Premalatha, "MFF-SAug: Multi feature fusion with ...
X. Xu, D. Li, Y. Zhou, and Z. Wang, "Multi-type ...
K. He, X. Zhang, S. Ren, and J. Sun, "Deep ...
F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and ...
S. M S, A. Elampulakkadu, T. Deepa, C. Shameema, and ...
S. Kanwal and S. Asghar, "Speech Emotion Recognition Using Clustering ...
L. Zão, D. Cavalcante, and R. Coelho, "Time-Frequency Feature and ...
H. Tao, R. Liang, C. Zha, X. Zhang, and L. ...
M. Lech, M. N. Stolar, C. Best, and R. S. ...
S. Sekkate, M. Khalil, A. Abdellah, and S. Jebara, "An ...
L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, and M. ...
F. Daneshfar and S. J. Kabudian, "Speech emotion recognition using ...
D. Issa, M. F. Demirci, and A. Yazıcı, "Speech emotion ...
A. Shirani and A. R. N. Nilchi, "Speech Emotion Recognition ...
Y. Ü. Sönmez and A. Varol, "A Speech Emotion Recognition ...
M. B. Er, "A Novel Approach for Classification of Speech ...
Z. Zhao et al., "Exploring deep spectrum representations via attention-based ...

نمایش کامل مراجع