M Tech Dissertations
Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3
Browse
4 results
Search Results
Item Open Access Semantic Segmentation Based Object Detection for Autonomous Driving(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Prajapati, Harsh; Maiti, Tapas KumarThis research focuses on solving the autonomous driving problem which is necessaryto fulfill the increasing demand of autonomous systems in today�s world.The key aspect in addressing this challenge is the real-time identification andrecognition of objects within the driving environment. To accomplish this, weemploy the semantic segmentation technique, integrating computer vision, machinelearning, deep learning, the PyTorch framework, image processing, and therobot operating system (ROS). Our approach involves creating an experimentalsetup using an edge device, specifically a Raspberry Pi, in conjunction with theROS framework. By deploying a deep learning model on the edge device, we aimto build a robust and efficient autonomous system that can accurately identifyand recognize objects in real time.Item Open Access Position Estimation of Intelligent Artificial Systems Using 3D Point Cloud(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Patel, Vraj; Maiti, Tapas KumarThe three-dimensional reality collected by various sensors such as LiDAR scanners,depth cameras and stereo cameras, is represented by point cloud data. Thecapacity of point clouds to provide rich geometric information about the surroundingsmakes them essential in various applications. Robotics, autonomouscars, augmented reality, virtual reality, and 3D reconstruction all use point clouds.They allow for object detection, localization, mapping, scene comprehension, andlightweight LiDAR SLAM has significant implications for various fields, includingrobotics, autonomous navigation, and augmented reality. Developing compactand efficient LiDAR SLAM systems makes it possible to unlock the potentialof lightweight platforms, enabling their deployment in a wide range of applicationsthat require real-time position mapping, and localization capabilities whileensuring practicality, portability, and cost-effectiveness.immersive visualization. Working with point clouds, on the other hand, presentssubstantial complications. Some primary issues are managing a vast volumeof data, dealing with noise and outliers, dealing with occlusions and missingdata, and conducting efficient processing and analysis. Furthermore, point cloudsfrequently necessitate complicated registration, segmentation, feature extraction,and interpretation methods, necessitating computationally costly processing. Addressingthese issues is critical for realizing the full potential of point cloud datain a variety of real-world applications.SLAM is a key technique in robotics and computer vision that addresses the challengeof estimating a robot�s pose and constructing a map of its environment. Itfinds applications in driverless cars, drones, and augmented reality, enabling autonomousnavigation without external infrastructure or GPS. Challenges includesensor noise, drift, and uncertainty, requiring robust sensor calibration, motionmodeling, and data association. Real-time speed, computing constraints, andmemory limitations are important considerations. Advanced techniques such asfeature extraction, point cloud registration, loop closure detection, and Graph-SLAM optimization algorithms are used. Sensor fusion, map representation, anddata association techniques are vital for reliable SLAM performance.The aim is to create a compact and lightweight LiDAR based SLAM that can beeasily integrated into various platforms without compromising on the accuracyand reliability of SLAM algorithms. Hence, we implemented a lightweight SLAMalgorithm on our dataset with various background situations and a few modificationsto the existing SLAM algorithm to improve the results. We have performedSLAM by using LiDAR sensor without the use of IMU or GPS sensor. TheItem Open Access Common Object Segmentation in Dynamic Image Collection using Attention Mechanism(Dhirubhai Ambani Institute of Information and Communication Technology, 2022) Baid, Sana; Hati, AvikSemantic segmentation of image groups is a crucial task in computer vision that aims to identify shared objects in multiple images. This work presents a deep neural network framework that exhibits congruity between images, thereby cosegmenting common objects. The proposed network is an encoderdecoder network where the encoder extracts high level semantic feature descriptors and the decoder generates segmentation masks. The task of cosegmentation between the images is boosted by an attention mechanism that leverages semantic similarity between feature descriptors. This attention mechanism is responsible for understanding the correspondence between the features, thereby determining the shared objects. The resultant masks localize the shared foreground objects while suppressing everything else as background. We have explored multiple attention mechanisms in 2 image input setup and have extended the model that outper forms the others for dynamic image input setup. The term dynamic image connotes that varying number of images can be input to the model, simultaneously, and the result will be the segmentation of common object from all of the input images. The model is trained end to end on image group dataset generated from the PASCALVOC 2012 [7] dataset. The experiments are conducted on other benchmark datasets as well and we can infer superiority of our model from the results achieved. Moreover, an important advantage of the proposed model is that it runs in linear time as opposed to quadratic time complexity observed in most works.Item Open Access Comparative Study: Neural Networks on MCUs at the Edge(2021) Anand, Harshita; Bhatt, AmitComputer vision has evolved excessively over the years, the sizes of the processor and camera shrinking, rising the computational complexity and power and also becoming affordable, making it achievable to be integrated onto embedded systems. It has several critical applications that require a Huge accuracy and vast real-time response in order to achieve a good user experience. The Neural network (NN) poses as an attractive choice for embedded vision architectures due to their superior performance and better accuracy in comparison to the traditional processing algorithms. Due to the security and latency issues which make larger systems unattractive for certain time-dependent applications, we require an always-on system; this application has a highly constrained power budget and needs to be typically run on tiny microcontroller systems having limited memory and compute capability. The NN design model must consider these above constraints. We have performed NN model explorations and evaluated the embedded vision applications including person detection, object detection, image classifications, and facial recognition on resource-constrained microcontrollers. We trained a variety of neural network architectures present in the literature, comparing their accuracies and memory/compute requirements. We present the possibility of optimizing the NN architectures in a way for them to be able to fit among the computational and memory criteria for the microcontroller systems without salvaging the accuracy. We also delve into the concepts of the depth-wise separable convolutional neural network (DS-CNN) and convolutional neural network (CNN) both of which are utilized in MobileNet Architecture. This thesis aims to present a comparative analysis based on the performance of edge devices in the field of embedded computer vision. The three parameters under major focus are latency, accuracy, and million operations, in this study.