M Tech Dissertations

Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 6 of 6

Open Access
Human Activity Recognition using Two-stream Attention Based Bi-LSTM Networks
(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Bhandarkar, Vaidehi; Khare, Manish
Human Activity Recognition (HAR) is a challenging task that aims to identify theactions of humans from various data sources. Recently, deep learning methodshave been applied to HAR using RGB (Red, Green and Blue) videos, which capturethe spatial and temporal information of human actions. However, most ofthese methods rely on hand-crafted features or pre-trained models that may notbe optimal for HAR. In this Thesis, we propose a novel method for HAR usingTwo-stream Attention Based Bi-LSTM Networks (TAB-BiLSTM) in RGB videos.Our method consists of two components: a spatial stream and a temporal stream.The spatial stream uses a convolutional neural network (CNN) to extract featuresfrom RGB frames, while the temporal stream uses an optical flow network to capturethe motion information. Both streams are fed into an attention-based bidirectionallong short-term memory (Bi-LSTM) network, which learns the long-termdependencies and focuses on the most relevant features for HAR. The attentionmechanism is implemented by multiplying the outputs of the spatial and temporalstreams, applying a softmax activation, and then multiplying the result withthe temporal stream output again. This way, the attention mechanism can weighthe importance of each feature based on both streams. We evaluate our method onfour benchmark datasets: UCF11, UCF50, UCF101, and NTU RGB. This methodachieves state-of-the-art results on all datasets, with accuracies of 98.3%, 97.1%,92.1%, and 89.5%, respectively, demonstrating its effectiveness and robustness forHAR in RGB videos.
Open Access
Quantile Regression and Deep Learning Models for Air Quality Analysis and Prediction in Delhi City
(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Jha, Gaurav; Anand, Pritam
Quantile regression models have gained popularity among researchers these days.The mean regression model estimates the mean of yi given x. But in some applications,estimation of the quantiles of yi given x is not very useful. This thesispresents a data-driven analysis and prediction of air quality in Delhi metro cityusing quantile regression and deep learning models.The main objectives are to investigate the monthly trend and correlation ofPM2.5, PM10, NO2 and SO2 concentration and temperature, to compare differentregression models such as linear, quadratic, kernel, and quantile regression toestimate the PM2.5, PM10, NO2 and SO2 concentration using the temperaturevariables, and to compare different deep learning models such as gated recurrentunits (GRUs), vanilla(LSTM), simple long short-term memory (LSTM) networks,convolutional neural network - long short-term memory (CNN-LSTM) networks,and support vector regression (SVR) for time series forecasting of pollution levels.The data used in this study is the Delhi air quality data from 2015 to 2020, whichcontains various pollutants and environmental factors.The results show that quantile regression is more flexible, robust, and informativethan other models, and can capture the variability and diversity of thePM2.5, PM10, NO2 and SO2 distribution over distinct quantiles or percentiles.The results also show that deep learning models are effective and powerful toolsfor time series forecasting on pollution data. Among them, the SVR model is superiorto other models. The study aims to contribute to the scientific knowledgeand practical solutions for air quality prediction and analysis.
Open Access
Location aware tumor segmentation on `MRI images
(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Jadiya, Kevin; Gohel, Bakul
In our research, we introduce an innovative approach to the segmentation of braintumors, utilizing a convolutional neural network (CNN) architecture that incorporateslocalization awareness. This approach represents a significant advancementin tumor segmentation, as it effectively addresses two critical challenges encounteredin this field: limited resources and the requirement for precise localization.To overcome these challenges, our methodology leverages 2D slices duringtraining and integrates registration operations for MRI images during application.The proposed method is evaluated extensively on the BRATS-2018 dataset andits augmented dataset version, encompassing distinct variations of CNN-basedmodels. Furthermore, it exhibits computational efficiency during inference, enablingthe segmentation of the entire brain in a matter of seconds. The outcomes ofour research position our deep learning model as a promising tool with immensepotential for both research purposes and clinical applications, offering good segmentationoutcomes.
Open Access
Analytical study of color spaces for object recognition in convolutional neural networks
(Dhirubhai Ambani Institute of Information and Communication Technology, 2019) Oza, Urvi; Kumar, Pankaj
In this work we present an analytical study of the classical problem of image recognition / classification using different color spaces and deep convolutional neural networks(CNNs). In the current decade deep CNN architectures for solving image recognition problem has become very popular because of high speed and accuracy in the detection results. Usually most such deep learning architectures or networks are applied on image dataset where images are in RGB color space. In this work, we have analysed the performance of 3 popular CNNs by providing input images in different color spaces. We describe the design of our novel experiment and present results on whether such deep learning networks (CNNs) for object recognition task is invariant to color spaces or not. Our experimental results vividly show that different color spaces have different performance results for image classification. We have compared the results in terms of test accuracy, test loss, and validation loss. Three different CNN architectures covered in our study are - VGGNet, ResNet, GoogleNet and five difference color spaces for analysis are RGB, rgb, YCbCr, HSV, CIE??Lab. Our objective was to find how CNNs perform in different color spaces and if we can clearly identify a color space for object recognition task in CNN where the performance is best. Our study shows that CNNs are �variant� to color spaces. Normalized RGB (rgb) and HSV very distinctly did not perform as well as RGB,YCbCr, and CIE??Lab. Off the later three good performing color spaces none could be identified as a clear winner. The three different deep CNNs performed differently on the three different color spaces.
Open Access
Environmental Sound Classification (ESC) using Handcrafted and Learned Features
(Dhirubhai Ambani Institute of Information and Communication Technology, 2017) Agrawal, Dharmeshkumar Maheshchandra; Patil, Hemant A.
"Environmental Sound Classification (ESC) is an important research field due to its application in various field such as hearing aids, road surveillance system for security and safety purpose, etc. ESC task was earlier done using Coefficients (MFCC) feature set and Gaussian Mixture Model (GMM) classifier. Recently, deep-learning based approaches are used for ESC task such as Convolutional Neural Network (CNN) based classification which built an end-to-end system for ESC on CNN framework. The ESC task is a quite challenging problem as of environmental sounds that contains the various categories of sounds are difficult to classify. In this thesis, we proposed two new and different feature sets for ESC task, namely, handcrafted feature set (i.e., signal processing-based approach), and data-driven feature set (i.e., machine learning-based approach). In handcrafted feature set, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for ESC task. In this thesis, we have used two classifiers, namely, GMM using cepstral features, and CNN using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with MFCC and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. The result shows score-level fusion of proposed TEO-based Gammatone feature-set and MFCC gave better performance than MFCC on both datasets by using GMM and CNN classifiers. This shows that proposed TEO-based Gammatone features contain complementary information, which is helpful in ESC task. In data-driven feature set, we use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in the mid-frequency range, while some of the low frequency subband filters resemble Gammatone basis. We have used our proposed model as a front-end for the ESC task with supervised CNN as a back-end."
Open Access
Object Recognition using Self Learned Features
(Dhirubhai Ambani Institute of Information and Communication Technology, 2016) Parikh, Ketul D.; Joshi, Manjunath V.
A great deal of research has been centered around developing algorithms forlearning features from unlabelled information. Much advance has been made onbenchmark datasets by utilizing progressively complex unsupervised learning algorithmsand deep models. However, the time required to train such deep networksis a major drawback. This thesis presents a generalized trainable frameworkfor object detection in static images. In this work, we have used a ConvolutionalNeural Network (CNN) for training and obtained good classificationresults in terms of accuracy. The main idea is to learn features from the data itself(in unsupervised way) and then apply a classifier (in supervised way) to classify.We have used CNN to extract useful hierarchical features using natural images[39] as training images. The learned convolutional kernels (weights) are appliedonto MNIST and CIFAR-10 datasets to extract their features. We then use CNNnetwork for classification. Despite the simplicity of our network, we achieve accuracyas good as previously published results on MNIST and CIFAR-10 datasets.Keywords: Object recognition, deep learning, Deep Neural Network (DNN), ConvolutionalNeural Network (CNN).

M Tech Dissertations

Browse

Filters

Settings

Sort By

Results per page

Search Results