Facial expression recognition: feature based approaches to deep learning techniques
Facial expression recognition (FER) is a problem of pattern recognition that invites the attention of computer vision researchers for the last three decades. However, the problem is still alive due to challenges such as - blurring, illumination variation, pose variation, face image captured in the unconstrained environment, and so on. In the beginning, hand-crafted features followed by classical classification mechanism through a classifier have been studied for various features as well as various classifiers. The hand-crafted features that are associated with changes in expression are hard to extract due to the individual distinction and variations in emotional states. With the induction of deep neural network (DNN) and convolution neural network (CNN), a change in the techniques of facial expression recognition is observed both in terms of efficiency and handling various challenges mentioned above. The modular approach presented here mimics the capability of the human to identify a person with a limited facial part. Facial parts like eyes, nose, lips, and forehead contribute more to the expression recognition task. In this thesis, we have addressed classical feature-based approaches to deep learning techniques. This thesis presents approaches for Facial Expression Recognition (FER). Firstly, we propose two dimensional Taylor expansion for the facial feature extraction as well as to handle the local illumination. Most procedures just used the arrangement with global illumination varieties and thus yielded more unsatisfactory recognition performances within the case of natural illumination variations that are usually uncontrolled within the globe. Hence, to address the brightening variety issue, at that point we presented the (LL) Laplace-Logarithmic area in this article for further improving the exhibition. We applied the proposed 2D Taylor expansion theorem in the facial feature extraction phase and formulated the 2DTFP method. In our second FER approach, we propose a histogram of second-order gradients (HSOG) for the feature extraction. Most of the popular local image descriptors in the literature, such as SIFT, HOG, DAISY, LBP and GLOH, only use the first-order gradient information related to slope and elasticity, e.g., length, area, etc. of a surface, and therefore partly characterize the geometric properties of an image. We exploit the local image descriptor that extracts the histogram of second-order gradients (HSOG), which capture the local curvatures of differential geometry, i.e., cliffs, ridges, summits, valleys, basins, etc. That gives us a different shape index. The shape index is computed from the curvatures, and its different values correspond to different shapes. That different shape corresponds to different expressions of the face. Much work has been done in this field where local texture, features have been extracted and used in the classification. Due to the very local nature of this information, the dimension of the feature vector achieved for the full image is very high, posing computational challenges in real-time expression recognition. In recent times, Dimensionality Reduction methods have been successfully used in image recognition tasks. Here we propose two Dimensionality Reduction methods E-PCA (Euler Principal Component Analysis) and CS-ONPP (Orthogonal Neighborhood Preserving Projection with Class Similarity-based neighborhood). It proved to be gaining huge margin in terms of feature vector length while maintaining the same recognition accuracy. Classical FER methods do well in certain well-controlled cases. The fundamental issue with hand-crafted features based arrangement approaches is that they require space learning and not generalize well like in the complex dataset. Deep learning is fast becoming a go-to tool for many artificial intelligence problems due to its ability to overcome other approaches and even humans in many problems. DNN has millions of parameters. To get an optimal set of parameters, we need to have a lot of data to train. Even if we have a lot of data, training generally requires multiple iterations, and it takes a toll on the computing resources. The task of fine-tuning a network is to tweak the parameters of an already-trained network so that it adapts to the new task at hand. Here we propose two deep learning-based methods. The first method is DNNFG (DNN based on Fourier transform followed by Gabor filtering), where we used pre-trained model VGG16 with fine tuning for extracting the facial features. VGG16 is chosen due to the fact of its effective performance in visible detection and speedy convergence. It's concerning 138 million parameters and contains 13 convolutional layers, followed by 3 fully-connected layers (FCs). Since the VGG framework not designed for the FER tasks, so we modified the framework according to our requirements. And the second is 2DNN (Double-channel based Deep Neural Network). Where we utilized VGGFace architecture, VGGFace is trained on 2.6M face images from 2.6k different people. VGGFace architecture is the same as VGG16. Input images are just different in VGGFace other architecture is the same as VGG16. Adapt VGGFace to FER problem, VGGFace is fine-tuned. It easily utilized local and global information about the expressions. DNN based methods improved recognition accuracy compared to classical approaches. Facial expression recognition (FER) experiments are performed on a number of the benchmark FER databases. Here experiments performed on the four benchmark databases, which are JAFFE, VIDEO, CK+, OULU-CASIA. Basically thesis addresses the classical facial expression recognition approaches and its shortcomings, then moved to deep learning-based approaches to handle these shortcomings. It performed well compared to handcrafted methods. Also, experimentally proved in the thesis that a modular approach is to perform better than holistic approach.
- PhD Theses