A Deep Human Action Representation For Retrieval Application

Mohsen, Ramezani; Fardin, Akhlaghian Tab; Farzin, Yaghmaee

A Deep Human Action Representation For Retrieval Application

عنوان مقاله: A Deep Human Action Representation For Retrieval Application
شناسه ملی مقاله: JR_MSEEE-2-1_003
منتشر شده در در سال 1401

مشخصات نویسندگان مقاله:

Mohsen Ramezani - Department of Computer Science, University of Kurdistan, Sanandaj, Iran
Fardin Akhlaghian Tab - Department of Computer Engineering University of Kurdistan Sanandaj, Iran
Farzin Yaghmaee - Department of Electrical and Computer Engineering Semnan University Semnan, Iran

خلاصه مقاله:

Human action retrieval as a challenging research area has wide-spreading applications in surveillance, search engines, and human-computer interactions. Current methods seek to represent actions and create a model with global and local features. These methods do not consider the semantics of actions to create the model, so they do not have proper final retrieval results. Each action is not considered a sequence of sub-actions, and their model is created using scattered local or global features. Furthermore, current action retrieval methods ignore incorporating Convolutional Neural Networks (CNN) in the representation procedure due to a lack of training data for training them. At the same time, CNNs can help them improve the final representation. In the present paper, we propose a CNN-based human action representation method for retrieval applications. In this method, the video is initially segmented into sub-actions to represent each action based on their sequence using keyframes extracted from the segments. Then, the sequence of keyframes is given to a pre-trained CNN to extract deep spatial features of the action. Next, a ۱D average pooling is designed to combine the sequence of spatial features and represent the temporal changes by a lower-dimensional vector. Finally, the Dynamic Time Wrapping technique is used to find the best match between the representation vectors of two videos. Experiments on real video datasets for both retrieval and recognition applications indicate how created models for the actions can outperform other representation methods.

کلمات کلیدی:

action, deep features, key-frame, sub-action, CNN

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/2078862/