M Tech Dissertations

Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 10 of 12
  • ItemOpen Access
    User Stories to Concept Map: An approach to Visualise Dependencies
    (2021) Shah, Dishant; Tiwari, Saurabh
    "Writing the user stories which captures the user’s perspective in the agile framework is the starting step of gathering requirements. As user stories are written informally with fewer restrictions but may get affected by the inherent NL issues such as ambiguity, incompleteness and inter-dependencies. In this thesis work, we have proposed an approach to automatically generate the conceptual model (i.e., concept maps) from the user stories. The approach also identifies the inter-dependencies between the user stories, and subsequently analyses the incompleteness among them. The approach makes use of natural language processing (NLP) techniques for the identification of linguistics patterns. Next, the linguistics patterns are mapped into the concepts and attributes, resulting in the generation of concept maps by applying the proposed heuristic rules. After generating concept maps from the user stories of a software system, we have recognized the dependency between the concepts for a single user story, and are able to identify inter-dependencies between the set of user stories. We have evaluated the applicability of the proposed approach by experimenting on 22 different projects available publicly. On average, we found that the generated concept maps are able to capture inter-dependency with 94.7% accuracy. We have developed tool support for realising the proposed approach."
  • ItemOpen Access
    Commonsense validation and explanation
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Makwana, Vivek H.; Lalchandani, Jayprakash
    Common-sense reasoning[1] is a field of artificial intelligence and machine learning that focuses on helping computers understand and interact with people more naturally by finding ways to collect these assumptions and teach them to computers. Common-sense reasoning has been most successful in the field of natural language processing (NLP).Without common-sense, it won’t be easy to build versatile and unsupervised NLP systems in an increasingly digital and mobile world. When we talk to each other and talk online, we try to be as interesting and take advantage of new ways to express things. There’s more to it than one would think. If we say, “can you put an elephant into the fridge?” you could answer the question quite easily despite the fact, in all probability, you had never pictured an elephant in the fridge. This is an example of we as humans, not just knowing about the world, but knowing how to apply our knowledge to things we haven’t thought about before. It remains a challenging question on how to evaluate whether a system has a sense-making capability. Existing benchmarks measure common-sense knowledge indirectly and without explanation. In this thesis, we directly test whether a system can differentiate natural language statements that make sense from those that do not make sense. A system is also asked to identify the most relevant reason why a given statement is against common-sense. We have used models trained over large-scale language modeling tasks and human performance, showing that there are different challenges for system sense-making.
  • ItemOpen Access
    Apparel attributes classification using deep learning
    (2020) Desai, Harsh Sanjaykumar; Jat, P.M
    Apparel attributes classification finds a practical applications in E-Commerce. The project is for www.Blibli.com website which is an E-commerce Platform in Indonesia and a partner of Coviam Technologies. This report describes an approach to classify attributes such as material, neck/collar, sleeves type etc. specific to various apparels using Natural Language Processing and Deep Learning techniques. The classified products based on attributes will be used as filters on search results page to enhance and improve search mechanism of website. We have classified 95% apparel products based on material attribute and achieved 87% test accuracy on neck/collar attribute classification. The report is divided into four main parts which covers: Introduction, DataSet Preparation, Methodology and the Experimentation. Lastly, other similar work performed during internship along with the future work is discussed.
  • ItemOpen Access
    Clickbait detection using deep learning Techniques
    (2020) Parikh, Apurva Ketanbhai; Majumder, Prasenjit
    With the growing shift towards news consumption primarily through social media sites like Twitter, Facebook etc., most of the news agencies are prompting their stories on social media platform. These news agencies are publishing fake news on social media to generate revenue by enticing users to click on their articles. To increase the number of readers agencies use eye-catchy headlines accompanied with article link, which attract the reader to read the article. These attractive headlines are called Clickbaits. Usually, clickbait article does not meet the expectation of the user. In this work we try to develop an end-to-end clickbait detection system using Transformer based model Bidirectional Encoder Representations from Transformers (BERT). We also found few clickbait specific features which we hypothesised can be utilised along with BERT model to develop a better classifier. Our proposed approach using BERT significantly outperformed baseline paper which utilised BiLSTM.
  • ItemOpen Access
    Augmenting dialogue generation using dialogue act embeddings: a transfer learning approach
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Bisht, Abhimanyu Singh; Majumder, Prasenjit
    The following work looks at contemporary end-to-end dialogue systems with the aim of improving dialogue generation in an open-domain setting. It provides an overview of popular literature in the domain of dialogue generation, followed by a brief look at how human dialogue is understood from the perspective of Linguistics and Cognitive Science. We try to extract useful ideas from these domains of research and implement them in a transfer learning approach where a pretrained language model is supplemented with dialogue act information using special embeddings. The hypothesis behind the proposed approach is that the dialogue act information will aid the generation process. The proposed approach is then compared with a baseline approach on their performance on the DailyDialog[12] dataset using perplexity as the evaluation metric. Though the proposed approach is a significant improvement over the baseline, the contribution of the Dialogue Act Embeddings in the development is shown to be marginal via ablation analysis.
  • ItemOpen Access
    Disease specific biomedical literature mining : A visual interface
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2017) Sen, Neelasha; Majumder, Prasenjit
    "The vast amount of online resources available in the biomdedical domain makes it challenging for the user to fulfill an information need. Most of the existing approaches for biomedical text mining try to alleviate the problem of information abundance by information extraction, summarization, etc and present the output in text format. In this study we present an approach to extract potential research topics from available biomedical literature, and analyse them in a temporal and geographical framework. In order to fulfill this task we have utilized the UMLS resource and geotext library. We identify important milestones in the lifecycle of each research topic using an approach based on topic novelty and published volume. Closely related concepts are identified and represented as graphs. We have also designed a novel visual interface for representing the multifaceted information extracted by this approach. The experiments have been performed on MEDLINE citations accessible through PubMed. We evaluate the performance of our approach using the metric recision. The results of our approach are presented and scope of improvement identified."
  • ItemOpen Access
    Formal semantic analysis and modeling of natural language Wh-question
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Patel, Rupali; Dasgupta, Sourish
    The problem of Natural Language Query Formalization is to understand the semantics of a user given query in natural language (NL) and then translating the query into a formal language (FL) such that the FL semantic interpretation has equivalence with the NL interpretation. Such linguistic analysis based formalization can be used as more accurate query analyzer when compared to statistical analyzers. In this thesis work we have proposed a linguistic analysis based query model called Description Logic based Wh-Query Modelthat syntactically characterizes wh-queries in English and has a complete semantic equivalency to Description Logics (DL). This work also includes a rules to identify desire depedency in case of complex and compound query. We evaluate the query characterization coverage using Microsoft Encarta query dataset and OWLS-TC V4.0 service query dataset.
  • ItemOpen Access
    SMS query processing for information retrieval
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Shinghal, Khushboo; Majumder, Prasenjit
    SMS text messaging is one of the fast and popular communication mode on mobile phones these days. This study presents a query processing system for information retrieval system when queries are Short-message-Service (SMS). SMS contains various user improvisation and typographical errors. Proposed approach uses approximate string matching techniques and context extraction to normalize SMS queries with minimum linguistic resources. We have tested the system on FIRE 2011 SMS based FAQ retrieval corpus. Results seems encouraging
  • ItemOpen Access
    Part of speech tagging for Gujarati text
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Dave, Mainak; Pandya, Abhinay
    Part-of-speech (POS) tagging is a process of assigning the lexicon category to each lexicons in a given natural language sentence, that best suits the definition of the lexicon as well as the context of the sentence in which it is used. Part-of-speech tagging is an important part of Natural Language Processing (NLP) and is useful for most NLP applications. Part-of-speech tagging is often a primary step in most of the NLP tasks such as chunking, parsing, etc. Gujarati is the state language of Gujarat, a western state of India, and is spoken by 70 percent of the state's population. More than 46 million people worldwide consider Gujarati as their first language. Apart from Gujarat, it is widely spoken in the states of Maharashtra, Rajasthan, Karnataka and Madhya Pradesh and also around the world. Natural language processing of Gujarati is in its early stage of existence. Gujarati POS tagger is a core component for most NLP applications. Information retrieval, machine translation, shallow parsing and word sense disambiguation tasks can be work more effectively and efficiently with the existence of a POS tagger. Our focus in this work is to develop an effective Gujarati text POS tagger. Our main task of thesis is to built a system which can annotate part-of-speech for Gujarati texts automatically, with the help of various machine learning algorithms. We have used tag sets defined by IIIT Hyderabad. We have used two machine learning techniques one is Hidden Markov Model (HMM) and second is Conditional random Field (CRF). Since Gujarati is a morphologically reach language, we can use Morphological Analyzer (MA) to restrict the set of possible tags for a given words. Gujarati language is based on Paninian framework, rules of morphology are well-defined. Hence we have defined morphological rules for Gujarati. While MA helps us to restrict the possible choice of tags for a given word, one can also use prefix/suffix information (i.e., the sequence of first/last few characters of a word) to further improve the models. HMM model uses suffix information during smoothing process while CRF uses suffixes as a feature. (http://www.oclc.org/languagesets/educational/languages/india.htm)
  • ItemOpen Access
    Shallow parsing of Gujarati text
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2011) Dave, Vidhi; Pandya, Abhinay
    Shallow parsing is the process of assigning tag to minimal, non recursive phrase of the sentence. It is useful for many applications like question answering system, information retrieval where there is no need of full parsing. Gujarati is one of the main languages of India and 26th most spoken native language in the world. There are more than 50 million speakers of Gujarati language worldwide. Natural language processing of Gujarati is in its infancy. Now days there are many data available in Gujarati on websites but due to lack of resources it is hard for users to retrieve it efficiently. So, shallow parsing of Gujarati can make task easier for another tasks like machine translation, information extraction and retrieval. In this thesis, we have worked on the automatic annotation of Shallow Parsing of Gujarati. 400 sentences have been manually tagged. Different Machine Learning techniques namely Hidden Markov Model and Conditional Random Field have been used. We achieved good accuracy and it is similar to Hindi chunker even though resources available for Gujarati are very less. The best performance is achieved using CRF with contextual information and Part-of-speech tags.