Improving speech emotion recognition using audio transformer and features fusion

Fateme Mehrpouyan; Mehdi Ezoji

Improving speech emotion recognition using audio transformer and features fusion

Publish place: The first international artificial intelligence and smart car conference

Publish Year: 1402

نوع سند: مقاله کنفرانسی

زبان: English

This Paper With 8 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/1742307

شناسه ملی سند علمی:

ICAISV01_004

تاریخ نمایه سازی: 6 شهریور 1402

Abstract:

The purpose of speech emotion recognition is to recognize different speaker emotions by extracting and classifying salient features from a pre-processed speech signal. In this paper, a basic method based on the fusion of features, extracted from pre-trained AlexNet, BiLSTM and Wav۲vec۲.۰ models is improved for speech emotion recognition. To this end, similar to the basic model, spectrogram, MFCC and raw signal features are used, respectively. To improve the performance of the basic model, on the one hand, in addition to the MFCC, its first and second derivatives are also extracted. On the other hand, for feature extraction of the concatenated vector, the Audio Transformer with Patchout (PaSST) replaces the BiLSTM of the base model. Then, the attention unit is usedto use the effective information extracted from the MFCC and the spectrogram and also to weight the Wav۲vec۲.۰ output. Finally, the extracted features from AlexNet, PaSST, and also the weighted output of Wav۲vec۲.۰ are fused and fed to the Softmax as the classifier. Experiments have shown that the proposed algorithm has reached a weighted accuracy of ۶۱.۵۶% on RAVDESS dataset.

Keywords:

Speech Signal , , Feature fusion , Transfer learning , , Audio transformer , Speech emotion recognition

Authors

Fateme Mehrpouyan

Faculty of Electrical and Computer Engineering,Babol Noshirvani University of Technology, Mazandaran, Iran

Mehdi Ezoji

Faculty of Electrical and Computer Engineering,Babol Noshirvani University of Technology, Mazandaran, Iran