dc.description.abstract | Action recognition is a central problem in computer vision which is also known as action recognition or object detection. Action is any meaningful movement of the human and it is used to convey information or to interact naturally without any mechanical devices. It is of utmost importance in designing an intelligent and efficient human–computer interface. The applications of action recognition are manifold, ranging from sign language through medical rehabilitation to virtual reality. Human action recognition is motivated by some of the applications such as video retrieval, Human robot interaction, to interact with deaf and dumb people etc. In any Action Recognition System, a video stream can be captured by using a fixed camera, which may be mounted on the computer or somewhere else. Then some preprocessing steps are done for removing the noise caused because of illumination effects, blurring, false contour etc. Background subtraction is done to remove the static or slowly varying background. In this thesis, multiple background subtraction algorithms are tested and then one of them selected for action recognition system. Background subtraction is also known as foreground/background segmentation or background segmentation or foreground extraction. These terms are frequently used interchangeably in this thesis. The selection of background segmentation algorithm is done on the basis of result of these algorithms on the action database. Good background segmentation result provides a more robust basis for object class recognition. The following four methods for extracting the foreground which are tested: (1) Frame Difference, (2) Background Subtraction, (3) Adaptive Gaussian Mixture Model (Adaptive GMM) [25], and (4) Improved Adaptive Gaussian Mixture Model (Improved Adaptive GMM) [26] in which the last one gives the best result. Now the action region can be extracted in the original video sequences with the help of extracted foreground object. The next step is the feature extraction which deals with the extraction of the important feature (like corner points, optical flow, shape, motion vectors etc.) from the image frame which can be used for tracking in the video frame sequences. Feature reduction is an optional step which basically reduces the dimension of the feature vector. In order to recognize actions, any learning and classification algorithm can be employed. The System is trained by using a training dataset. Then, a new video can be classified according to the action occurring in the video. Following three features are applied for the action recognition task: (1) distance between centroid and corner point, (2) optical flow motion estimation [28, 29], (3) discrete Fourier transform (DFT) of the image block. Among these the proposed DFT feature plays very important role in uniquely identifying any specific action from the database. The proposed novel action recognition model uses discrete Fourier transform (DFT) of the small image block.
<p/>For the experimentation, MuHAVi data [33] and DA-IICT data are used which includes various kinds of actions of various actors. Following two supervised recognition techniques are used: K-nearest neighbor (KNN) and the classifier using Mahalanobis metric. KNN is parameterized classification techniques where K parameter is to be optimized. Mahalanobis Classifier is non-parameterized classification technique, so no need to worry about parameter optimization. To check the accuracy of the proposed algorithm, Sensitivity and False alarm rate test is performed. The results of this tests show that the proposed algorithm proves to be quite accurate in action recognition in video. And to compare result of the recognition system confusion matrices are created and then compared with other recognition techniques. All the experiments are performed in MATLAB®. | |