A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain

S. Mavaddati

A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain

Publish place: International Journal of Engineering (IJE)، Vol: 36، Issue: 8

Publish Year: 1402

نوع سند: مقاله ژورنالی

زبان: English

This Paper With 11 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/1707580

شناسه ملی سند علمی:

JR_IJE-36-8_008

تاریخ نمایه سازی: 10 مرداد 1402

Abstract:

Voice activity detectors are presented to extract silence/speech segments of the speech signal to eliminate different background noise signals. A novel voice activity detector is proposed in this paper using spectro-temporal features extracted from the auditory model of the speech signal. After extracting the scale, rate, and frequency features from this feature space, a sparse structured principal component analysis algorithm is used to consider the basic components of these features and reduce the dimension of learning data. Then these feature vectors are employed to learn the models by the sparse non-negative matrix factorization algorithm. The model learning procedure is performed to represent each feature vector with a proper sparse rate based on the selected atoms. Voice activity detection of the input frames is performed by computing the energy of the sparse representation for each input frame over the composite model. If the calculated energy exceeds a specified threshold, it indicates that the input frame has a structure similar to the atoms of the learned models and concludes that the observed frame has voice content. The results of the proposed detector were compared with other baseline methods and classifiers in this processing field. These results in the presence of stationary, non-stationary and periodic noises were investigated and they are shown that the proposed method based on model learning with spectro-temporal features can correctly detect the silence/speech activities.

Keywords:

Voice Activity Detector , Spectro-temporal domain , Sparse structured principal component analysis , Sparse non-negative matrix factorization

Authors

S. Mavaddati

Faculty of Engineering and Technology, University of Mazandaran, Babolsar, Iran

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :

Park, J.-S., Yoon, J.-S., Seo, Y.-H. and Jang, G.-J., "Spectral ...
Ahmadi, P. and Joneidi, M., "A new method for voice ...
You, D., Han, J., Zheng, G. and Zheng, T., "Sparse ...
You, D., Han, J., Zheng, G., Zheng, T. and Li, ...
Teng, P. and Jia, Y., "Voice activity detection via noise ...
Mavaddaty, S., Ahadi, S.M. and Seyedin, S., "Speech enhancement using ...
Chi, T., Ru, P. and Shamma, S.A., "Multiresolution spectrotemporal analysis ...
Elhilali, M., Chi, T. and Shamma, S.A., "A spectro-temporal modulation ...
Elhilali, M., Fritz, J.B., Klein, D.J., Simon, J.Z. and Shamma, ...
Hoyer, P.O., "Non-negative matrix factorization with sparseness constraints", Journal of ...
Ullah, R., Islam, M.S., Ye, Z. and Asif, M., "Semi-supervised ...
Ullah, R., Islam, M.S., Hossain, M.I., Wahab, F.E. and Ye, ...
Jolliffe, I.T., "Principal component analysis for special types of data, ...
Zou, H., Hastie, T. and Tibshirani, R., "Sparse principal component ...
Jenatton, R., Obozinski, G. and Bach, F., "Structured sparse principal ...
Jenatton, R., Audibert, J.-Y. and Bach, F., "Structured variable selection ...
Kapadia, S., Valtchev, V. and Young, S.J., "Mmi training for ...
Jafari, M.G. and Plumbley, M.D., "Speech denoising based on a ...
Sharma, P. and Rajpoot, A.K., "Automatic identification of silence, unvoiced ...
Wang, G.-B. and Zhang, W.-Q., "An rnn and crnn based ...
Jordán, P.G., Bailo, I.V., Giménez, A.O., Artiaga, A.M. and Solano, ...
Mavaddati, S., "Voice-based age and gender recognition using training generative ...
Sabzalian, B. and Abolghasemi, V., "Iterative weighted non-smooth non-negative matrix ...
Varga, A. and Steeneken, H.J., "Assessment for automatic speech recognition: ...
Hirsch, H.-G. and Pearce, D., "The aurora experimental framework for ...
Sigg, C.D., Dikk, T. and Buhmann, J.M., "Speech enhancement with ...
Sigg, C.D., Dikk, T. and Buhmann, J.M., "Speech enhancement using ...

نمایش کامل مراجع