M Tech Dissertations

Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 7 of 7
  • ItemOpen Access
    Automated Analysis of Natural Language Textual Specifications : Conformance and Non-Conformance with Requirement Templates (RTs)
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Balwani, Shivani; Tiwari, Saurabh
    Natural Language (NL) is widely adopted as the primary method of expressingsoftware requirements, although determining its superiority is challenging. Em� irical evidence suggests that NL is the most commonly used notation in the in dustry for specifying requirements. One of the main advantages of NL is its ac� cessibility to various stakeholders, requiring minimal training for understandingdditionally, NL possesses universality, allowing its application across diverse roblem domains. However, the unrestricted use of NL requirements can result in ambiguities. To address this issue and restrict the usage of NL requirements, requirement Templates (RTs) are employed. RTs have a fixed syntactic structure and consist of predefined slots. When requirements are structured using RTs, en�suring they conform to the specified template is crucial.Manually verifying the conformity of requirements to RTs becomes a tedious task due to the large size of industry requirement documents, and it also intro� duces the possibility of errors. Furthermore, rewriting requirements to conform to the template structure when they initially do not conform presents a significant challenge. To overcome these issues, we propose a tool-assisted approach that automatically verifies whether Functional Requirements (FRs) conform to RTs. It provides a recommendation for a Template Non-Conformance (TNC) requireent by generating a semantically identical requirement that Conforms to th template structure. Our study focused on two well-known RTs, namely, Easy Ap� roach to Requirements Syntax (EARS) and RUPPs, for checking conformance and making recommendations. We utilized Natural Language Processing (NLP) techniques and applied our approach to industrial and publicly available case studies. Our results demonstrate that the tool-based approach facilitates requireent analysis and aids in recommending requirements based on their conformity ith RTs. Furthermore, we have developed an approach to assess Non-Functional Requirements (NFRs) testability by analyzing the associated acceptance criteria We evaluated the applicability of this approach by applying it to various casestudies and determining the testability of the NFRs.
  • ItemOpen Access
    Sentence detection
    (2020) Shah, Pushya; Mitra, Suman K.
    Sentence detection is a very important task for any natural language processing (NLP) application. Accuracy and performance of all other downstream natural language processing (NLP) task like Sentiment, Text Classification, named entity recognition (NER), Relation, etc depends on the accuracy of correctly detected sentence boundary. Clinical domain is very different compare to general domain of languages. Clinical sentence structure and vocabulary are different from general English. That’s why available sentence boundary detector tools are not performing well on clinical domain and we required a specific sentence detection model for clinical documents. ezDI Solutions (India) LLP have developed such system that can detect the sentence boundary. We examined Bidirectional Encoder Representations from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM) algorithm and used BiLSTM-BERT hybrid model for sentence boundary detection on medical corpora.
  • ItemOpen Access
    What does BERT learn about questions
    (2020) Tyagi, Akansha; Majumder, Prasenjit
    Recent research in Question Answering is highly motivated by the introduction of the BERT [5] model. This model has gained considerable attention since the researcher of Google AI Language has claimed state-of-the-art results over various NLP tasks, including QA. On one side, where the introduction of end-to-end pipeline models consisting of an IR and an RC model has opened the scope of research in two different areas, new BERT representations alone show a significant improvement in the performance of a QA system. In this study, we have covered several pipeline models like R3: Reinforced Ranker-Reader [15], Re-Ranker Model [16], and Interactive Retriever-Reader Model [4] along with the transformer-based QA system i.e., BERT. The motivation of this work is to deeply understand the black-box BERT model and try to identify the BERT’s learning about the question to predict the correct answer for it from a given context. We will discuss all the experiments that we have performed to understand BERT’s behavior from a different perspective. We have performed all the experiments using the SQuAD dataset. We have also used the LRP [3] technique to get a better understanding and for a better analysis of the experiment results. Along with the study about what the model learns, we have also tried to find what the model does not learn. For this, we have analyzed various examples from the dataset to determine the types of questions for whom the model predicts an incorrect answer. Finally, we have presented the overall findings of the BERT model in the conclusion section.
  • ItemOpen Access
    Crime information extraction from news articles
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Gohel, Prashant; Jat, P.M.
    In the modern era all news reportings are available in digital form. Most newsagencies put it on their website and are freely available. This motivates us totry extracting some information from online news reporting. While understandingnatural language text for information extraction is a complex task,we hopethat extracting information like crime type, crime location, and some profile informationof accused and victim should be feasible. In this work we pulled about1000 crime news articles from NDTV and Indian Express websites. Hand taggingwas done for crime location and crime types of all articles. Through this workwe show that a combination of LSTM and CNN based solution can be effectivelyused for extracting crime location. Using this technique we get 95.58 % precisionand 94.54 % recall. Further, determination of crime type, we found relatively easier.Through simple key word based classification approach we get 95% precision.We also tried out topic modeling for crime type extraction we do not gain any improvement,and we get 79 % precision. Keywords: crime related named entities,deep learning, neural network, LSTM, CNN, NER, NLP
  • ItemOpen Access
    Distant supervision for relation extraction
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Doshi, Prarthana; Jat, PM
    Relation Extraction(RE) is one of important task of Information Extraction. InformationExtraction is used to get data from natural language text. Relation extractionis done using different methods. Most techniques found in the area ofrelation extraction uses labelled data. The downside of using labelled data is thatit is very costly to generate the labelled data as it requires human labour to understandeach sentence and entities and label it accordingly. There is a big amount ofnatural language data available and it is increasing day by day. So, the supervisedtechniques may not scale and adapt well with real time dynamic data.The issue of human annotations is addressed by recent approach of distant supervision.Distant supervision is a task that attempts automatic labelling of data.This is realized by extracting facts from publicly available knowledge bases likeWikidata, DBPedia, etc. Most of the knowledge bases are freely available. Theassumption of distant supervision is that if there is a relation between entitiesin knowledge base, then a sentence, in which those entities are present together,represents that relation. But there are some problems associated with distant supervisionlike incomplete knowledge base or wrong label problem.Most techniques in the area of relation extraction used available NLP toolsfor the feature extraction. These tools themselves have errors. In this work, weexplore convolutional neural network for the task which does not require NLPbased preprocessing.To avoid the wrong label problem, we have used selective attention over instances.It considers the problem as the multi-instance problem and we have concludedthat it gives better result. We have also used CNN with context modelwhere the input of the model is divided in three parts based on the entity position.This helps model to understand the sentence representation and the modelperforms well as compared to basic CNN model.
  • ItemOpen Access
    Characterization of NON-ISA factual sentences in english
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Kansagara, Vishal J.; Dasgupta, Sourish
    Availability to reason for extraction of knowledge has made Ontology Learning subject of intensive study for past couple of years. With the scaling of the domain, time taken to manually conceptualize the domain knowledge increases, making a way for automatic learning tools. Thesis approach is to characterize Natural Language text documents, by taking sentences one by one from documents and giving subject, object, relation between them, subject modifier and object modifier(for particular sentence), which helps to convert Natural Language text document into semantically equivalent DL(Description Logic) expression. This DL expression is used to generate Ontology automatically.
  • ItemOpen Access
    Improving the Quality of Data to be used for Text Classification task
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2021) Mehta, Stuti; Mitra, Suman; Roy, Anil
    Text Classification is one of the most basic tasks of Natural Language Processing (NLP) and efforts have been made to improve the performance of the task by making changes to the classifier model. For any NLP problem, the data used is a very important aspect to get the best possible solution for the problem. However, not much work has been done to improve the data quality. Our work aims at improving the quality of the data for Text Classification task. This is done by removing some semantically difficult samples from the data, changing the training set and thereby improving the data quality. In order to improve the quality of data and removing the samples which cause a negative effect to the classifier performance, various methods have been considered. A novel method has been used to define difficulty. A penalty function is used to represent the difficulty of a sample. Based on the penalty value associated with the sample, the difficulty of the sample is determined. These difficult samples are then removed and newly obtained training set is used for classification. This newly obtained training set is an improved version of the training set. By training the classifier model on the newly obtained training set, an improvement is observed in the performance of the classifier. Thus, our work mainly improves the quality of the data by removing the difficult samples. Various methods have been used for finding the difficult samples and comparisons have been drawn from the results obtained.