M Tech Dissertations
Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3
Browse
11 results
Search Results
Item Open Access Explanations by Counterfactual Argument in Recommendation Systems(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Pathak, Yash; Rana, ArpitIn recent advances in the domains of Artificial Intelligence (AI) and MachineLearning (ML), complex models are used. Due to their complexity and approaches,they have black box type of nature and raise the question of a trustworthy for decisionprocess especially in the high cost decisions scenario. To overcome thisproblem, users of these systems can ask for an explanation about the decisionwhich can be provided by system in various ways. One way of generating theseexplanations is by the help of Counterfactual (CF) arguments. Although there is adebate on how AI can generate these explanations, either by Correlation or CausalInference, in Recommendation Systems (RecSys) the aim is to generate these explanationswith minimum Oracle calls and have near optimal length (eg., in termsof interactions) of provided explanations. In this study we analyze the nature ofCFs and different methods (eg., Model Agnostic approach, Genetic Algorithms(GA)) to generate them along with the quality measures. Extensive experimentsshow that the generation of CFs can be done through multiple approaches andselecting optimal CFs will improve the explanations.Item Open Access Anomalies Detection in Radon Time Series for Earthquake Prediction Using Machine Learning Techniques(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Gorasiya, Raghav; Chaudhury, BhaskarRadioactive soil and water radon gas emission is a significant precursor to earthquakes.The meteorological parameters such as temperature, pressure, humidity,rainfall, and windspeed influence the radon gas emission from the medium suchas soil and water. In this study, radioactive soil radon gas has been investigatedfor earthquake prediction. Before the seismic events, radon gas emission is also affectedby seismic energies. These seismic energies are responsible for the changesinside the earth�s crust, which causes earthquakes on earth. Our focus in this workis first to predict the radon gas concentration using Machine Learning algorithmsand then identify anomalies before and after the seismic events using standardconfidence interval methods. We experimented with different machine learningmodels for the detailed comparative study of radon concentration predictions. Adataset is divided into different settings of training and testing data. Testing dataincludes the seismic samples only. The models are trained on non-seismic daysamples and some of the seismic day samples and tested on seismic day samples.After acceptable predictions, anomaly detection can be done on test data.A simple mean plus two standard deviations away test has been used to identifythe original measured radon values, which are out of this prediction confidenceinterval. These values are then considered as an anomalyItem Open Access Performance and power prediction on disparate computer systems(2020) Amrutiya, AdityaPerformance and Power prediction is an active area of research due to its applications in the advancements of hardware-software co-development. Several empirical machine-learning models such as linear models, tree-based models, neural network etc are used for evaluating the performance of machine learning models. Furthermore, the prediction model’s accuracy may differ depending on performance data collected for different software types (compute-bound, memorybound) and different hardware (simulation-based or physical systems).Our results for performance prediction show that the tree-based machine-learning models outperform all other models with median absolute percentage error (MedAPE) of less than 5% consisting of bagging and boosting models that help to improve weak learners. We have also observed that in physical systems, the prediction accuracy of memory-bound applications is higher as compared to compute-bound algorithms due to manufacturer variability in processors. Moreover, the prediction accuracy is higher on simulation-based hardware due to its deterministic nature as compared to physical systems. We have used transfer learning for solving two problems cross-platform prediction and cross-systems prediction. Our result shows the prediction error of 15% in case of cross-systems prediction whereas in case of the cross-platform prediction error of 17% for simulationbased X86 to ARM system using best performing tree-based machine-learning model. For the prediction of the power consumption along with that of performance we have employed several machines learning univariate or multivariate models in our experiments. Our result shows that runtime and power prediction accuracy of more than 80% and 90% respectively is achieved for multivariate deep neural network model in cross-platform prediction. Similarly, for cross-system prediction runtime accuracy of 90% and power accuracy of 75% is achieved for the multivariate deep neural network.Item Open Access Machine learning in financial data EPS estimates(2020) Sharma, Rohan; Joshi, M.V.The project “EPS Estimates” is as the name suggests a work on Earnings Per Share figures released by companies annually and quarterly. The whole project is intended to come up with a better consensus methodology for EPS Estimates given by different brokers and give the clients a better idea of what the EPS figures will be like. There are various statistical methods and machine learning models used for the purpose and a comparison is done between them in this report. The details about the intuition behind the models, their shortcomings and some insights behind them are included in this report.Item Open Access ML-based clients prioritization and ranking algorithm(2020) Sharma, Rajat; Sasidhar, P S KalyanKristal.AI is an AI-powered DigitalWealth Management Platform. It is one of the leading firms in the Fin-tech industry, which provide its customers a platform for wealth investments, It has a very well experienced committee for handling customers queries and also has an AI-driven advisory algorithm that recommends portfolios to the customers according to their profile. As the company has stepped into the AI-driven world, it wants to implement one AI-driven algorithm for it’s clients prioritization and Ranking, so that Relation Management team of the company can focus more on more potential users of the company’s platform rather than just hovering around users who may not be worth of time, as there are also users who just do the sign up for the sake of curiosity but do not want to enroll themselves as the authenticated clients of the company. To tackle this problem there is a need of one AI-based automated algorithm which filters the more potential users from the data and ranks them according to their likelihood of becoming the company’s authenticated Registered KYC approved client. I with the Data Science team of the company has tackled this problem by creating one Machine Learning based client prioritization and ranking algorithm that takes raw company’s data as input on a daily basis and generates a list of clients with their corresponding ranks in which they are to be followed, and for this, weeks of Exploratory Data Analysis had been done to select the crucial features and One Regression Model(Gaussian Process Regression) was created and optimized to give the desired output. This model gave an accuracy of about 82% and a precision of about 84% over the test set.Item Open Access Performance and power modeling on disparate computer systems using machine learning(2020) Kumar, Rajat; Mankodi, AmitPerformance and Power prediction is an active area of research due to its applications in the advancements of hardware-software co-development. We have performed experiments to evaluate the performance of several machine learning models. Our results for performance prediction show that the tree-based machine-learning models outperform all other models with median absolute percentage error (MedAPE) of less than 5% followed by bagging and boosting models that help to improve weak learners. We have collected performance data both from simulation-based hardware as well as from physical systems and observed that prediction accuracy is higher on simulation-based hardware due to its deterministic nature as compared to physical systems. Moreover, in physical systems, prediction accuracy of memory-bound applications is higher as compared to compute-bound algorithms due to manufacturer variability in processors. Furthermore, our result shows the prediction error of 15% in case of crosssystems prediction whereas in case of the cross-platform prediction error of 17% for simulation-based X86 to ARM prediction and 23% for physical Intel Core to Intel-Xeon system using best performing tree-based machine-learning model. We have employed several machine learning univariate or multivariate models for our experiments. Our result shows that runtime and power prediction accuracy of more than 80% and 90% respectively is achieved for multivariate deep neural network model in cross-platform prediction. Similarly, for cross-system prediction runtime accuracy of 90% and power accuracy of 75% is achieved for the multivariate deep neural network.Item Open Access VIU content access layer intelligent & flexible content selection(2020) Marakana, Meet; Banerjee, AsimFor OTT media streaming products like VIU, it is really important to increase the consumption of media content as much as possible. To get the highest benefit, the user must stay on the platform and consume numerous content. To survive in markets where too many competitors are there as the Indian market, this problem is essential to resolve. The problem is to increase the engagement time between the customer and platform, which can be solved by augmenting the content selection. To solve the problem, the company should customize its homepage in favour of user appealing content. Also, the system must behave dynamically as all users have a different preference. By executing this approach, we can improve the engagement time of the users, and hence solved our problem. CAL is the solution to our problem, and it manages all the issues that we had in the past. Now, the users will get the preferred content from the combination of various content selectors, which can select content based on user preference. Trending APIs, recommendation APIs, and BecauseYouHaveWatched APIs are known as content selectors which used for generating intelligent content selection for the user. We are trying to build a system that will give intelligent and flexible content selection. It aims for flexible consumption patterns. It supports plug and plays models for additional content selection algorithms which means no need for updating the system when new content selector service will join the system in the future. To provide the plug and play feature, the use of a discovery service is necessary. I have developed the content selector registry, which is a discovery service API. It manages the availability of the content selector that resides inside the Kubernetes cluster. Also, written a Google Cloud Function that will store the data to BigQuery by initiating the DataFlow. Later the Data of BigQuery will use to generate Insights and KPI metrics.Item Open Access Crime information extraction from news articles(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Gohel, Prashant; Jat, P.M.In the modern era all news reportings are available in digital form. Most newsagencies put it on their website and are freely available. This motivates us totry extracting some information from online news reporting. While understandingnatural language text for information extraction is a complex task,we hopethat extracting information like crime type, crime location, and some profile informationof accused and victim should be feasible. In this work we pulled about1000 crime news articles from NDTV and Indian Express websites. Hand taggingwas done for crime location and crime types of all articles. Through this workwe show that a combination of LSTM and CNN based solution can be effectivelyused for extracting crime location. Using this technique we get 95.58 % precisionand 94.54 % recall. Further, determination of crime type, we found relatively easier.Through simple key word based classification approach we get 95% precision.We also tried out topic modeling for crime type extraction we do not gain any improvement,and we get 79 % precision. Keywords: crime related named entities,deep learning, neural network, LSTM, CNN, NER, NLPItem Open Access Set labeling of graphs(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kumar, Lokesh; Muthu, RahulGiven a universal set and its subsets, intersection graph can be characterized as the graph with one distinct subset of given universal set for each vertex of the graph and any two non-adjacent vertices have no element common in their respective set. This was first studied by Erdos. For Kneser graph and Petersen graph, adjacency is characterized by disjointness. This motivates us to look at disjointness instead of intersection. This report contains results about asymptotic bounds for valid labeling of some special classes of graphs such as harary graphs, split graphs, bipartite graphs, disjoint complete graphs and complete multipartite graphs. Parameters relevant to study of labeling of vertices of the graphs are minimum label size possible (ILN), minimum universe size possible (USN) and their uniform versions such as UILN and UUSN. We have also proposed one framework to label disconnected graphs.Item Open Access The Study of Vertex Coloring Algorithms Using Heuristic Approaches(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Lodha, Pratik; Muthu, RahulGraph vertex coloring is one of the most studied NP-complete optimization problem (READ, 1972) [2]. The problem is that; given a graph G, determine the number of colors required to color G, so that no two adjacent vertices share the same color. And the minimum number of colors required to color graph is known as Chromatic Number and is denoted by ?(G). By using existing properties of eccentricity, BFS, DFS (West, 2000) [3] and graph components we have proposed three new heuristic algorithms to obtain approximated chromatic number of a given graph G. And these approaches are as follows: 1. Eccentricity based coloring 2. DFS based coloring, and 3. Maximum degree based coloring.