Abstractive text summarization using universal dependency labels, LSA and graph based approach
With the advancements in technology and the increase in the available informationon the Internet it becomes a tiring task to go through each and everydocument available on net to get a gist of each. This could have been really easy ifa summary of each document, highlighting the key concept was readily available.A summary which is very close to a human generated summary. In this paper weaim at proposing a methodology for summarizing documents by using UniversalDependency Labels, Latent Semantic Analysis and Word Graph based approach.As a first step, we start with syntactic analysis and pre-processing of documentwhich involves tokenization, NER (Name Entity Recognition), pronoun resolutionetc. After this we try to modify each sentence by extracting the logical formtriplets viz subject, predicate and object entities using the Universal DependencyLabels. The modified sentences are then checked for similarity or relatedness usingsimilarity measure. Sentences with similarity score more than the pre-definedvalue are used for creating the word graph. It is ensured that each sentence isused only once for constructing the graph. The graph is then used for sentencecompression by finding the K shortest path from starting to end node. The newedge weight formula is used to find the lightest path among the K paths, beingmost informative. The new sentences are then combined together with sentenceshaving similarity score less than pre-defined value to obtain the new modifieddocument. The key sentences related with key topics in the document are identifiedusing Latent Semantic Analysis to give the required abstractive summary ofthe document.
- M Tech Dissertations