Publication:
Design and analysis of microblog-based summarization system

dc.contributor.affiliationDA-IICT, Gandhinagar
dc.contributor.authorModha, Sandip
dc.contributor.authorMajumder, Prasenjit
dc.contributor.authorMandl, Thomas
dc.contributor.authorSingla, Rishab
dc.contributor.authorMajumder, Prasenjit
dc.contributor.authorMajumder, Prasenjit
dc.contributor.authorMajumder, Prasenjit
dc.contributor.authorMajumder, Prasenjit
dc.contributor.authorMajumder, Prasenjit
dc.contributor.researcherModha, Sandip (201221001)
dc.date.accessioned2025-08-01T13:09:15Z
dc.date.issued02-11-2021
dc.description.abstractA daily summary or digest from microblogs allows social media users to stay up to date on what happened today on their favorite topic. Summarizing microblogs is a non-trivial task. This paper presents a summarization system built over the Twitter stream to summarize the topic for a given duration. Tweet ranking is the primary task of designing a microblog-based summarization system. After ranking tweets, the selection of relevant tweets is the crucial task for any summarization system due to the massive volume of tweets in the Twitter stream. In addition, the summarization system should include novel tweets in the summary or digest. The measure of relevance is typically the similarity score obtained from different text similarity algorithms. These measure the similarity between user information needs and each tweet. The more similar, the higher the score. So we need to choose a threshold that can minimize false-positive judgments for this task. In this paper, we proposed novel threshold estimation methods to find optimal values for these thresholds and evaluate them against thresholds determined via grid search. These methods estimate the thresholds with reasonable accuracy, according to the results. Previous research has empirically and heuristically set these thresholds, and our work suggests a method that exploits statistical features of the ranking list to estimate these thresholds. We used language models to rank the tweets and to select relevant tweets. For any language model, the selection of the smoothing technique and its parameters are critical. The results are also compared with the standard probabilistic ranking algorithm BM25. Learning to rank strategies is also implemented, which shows substantial improvement in some of the result metrics. Experiments were performed on standard benchmarks like the TREC Microblog 2015, TREC RTS 2016, and TREC RTS 2017 datasets. Different variants of normal discounted cumulative gain, the standard official evaluation metric of TREC, nDCG-1, nDCG-0, and nDCG-p are used in this study. We also performed a comprehensive failure analysis on our experiments and identified key issues for improvement that can be addressed in the future.
dc.identifier.citationSandip Modha, Majumder, Prasenjit, Mandl, Thomas & Rishab Singla "Design and analysis of microblog-based summarization system," Social Network Analysis and Mining, Springer, vol. 11, issue. 1, Article No. 114, ISSN: 18695450, 2021, doi: 10.1007/s13278-021-00830-3.
dc.identifier.doi10.1007/s13278-021-00830-3
dc.identifier.issn1869-5469
dc.identifier.scopus2-s2.0-85118751698
dc.identifier.urihttps://ir.daiict.ac.in/handle/dau.ir/1775
dc.identifier.wosWOS:000714035300001
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofseriesVol. 11; No. 1
dc.source Social Network Analysis and Mining
dc.source.urihttps://link.springer.com/article/10.1007/s13278-021-00830-3
dc.titleDesign and analysis of microblog-based summarization system
dspace.entity.typePublication
relation.isAuthorOfPublication2157d717-1c67-4d71-b314-ed3eddebf251
relation.isAuthorOfPublication2157d717-1c67-4d71-b314-ed3eddebf251
relation.isAuthorOfPublication.latestForDiscovery2157d717-1c67-4d71-b314-ed3eddebf251

Files

Collections