Human Activity Recognition using Two-stream Attention Based Bi-LSTM Networks
Abstract
Human Activity Recognition (HAR) is a challenging task that aims to identify theactions of humans from various data sources. Recently, deep learning methodshave been applied to HAR using RGB (Red, Green and Blue) videos, which capturethe spatial and temporal information of human actions. However, most ofthese methods rely on hand-crafted features or pre-trained models that may notbe optimal for HAR. In this Thesis, we propose a novel method for HAR usingTwo-stream Attention Based Bi-LSTM Networks (TAB-BiLSTM) in RGB videos.Our method consists of two components: a spatial stream and a temporal stream.The spatial stream uses a convolutional neural network (CNN) to extract featuresfrom RGB frames, while the temporal stream uses an optical flow network to capturethe motion information. Both streams are fed into an attention-based bidirectionallong short-term memory (Bi-LSTM) network, which learns the long-termdependencies and focuses on the most relevant features for HAR. The attentionmechanism is implemented by multiplying the outputs of the spatial and temporalstreams, applying a softmax activation, and then multiplying the result withthe temporal stream output again. This way, the attention mechanism can weighthe importance of each feature based on both streams. We evaluate our method onfour benchmark datasets: UCF11, UCF50, UCF101, and NTU RGB. This methodachieves state-of-the-art results on all datasets, with accuracies of 98.3%, 97.1%,92.1%, and 89.5%, respectively, demonstrating its effectiveness and robustness forHAR in RGB videos.
Collections
- M Tech Dissertations [923]