Design and analysis of microblog-based summarization system

Modha, Sandip; Majumder, Prasenjit; Mandl, Thomas; Singla, Rishab

Publication:
Design and analysis of microblog-based summarization system

dc.contributor.affiliation	DA-IICT, Gandhinagar
dc.contributor.author	Modha, Sandip
dc.contributor.author	Majumder, Prasenjit
dc.contributor.author	Mandl, Thomas
dc.contributor.author	Singla, Rishab
dc.contributor.author	Majumder, Prasenjit
dc.contributor.author	Majumder, Prasenjit
dc.contributor.author	Majumder, Prasenjit
dc.contributor.author	Majumder, Prasenjit
dc.contributor.author	Majumder, Prasenjit
dc.contributor.researcher	Modha, Sandip (201221001)
dc.date.accessioned	2025-08-01T13:09:15Z
dc.date.issued	02-11-2021
dc.description.abstract	A daily summary or digest from microblogs allows social media users to stay up to date on what happened today on their favorite topic. Summarizing microblogs is a non-trivial task. This paper presents a summarization system built over the Twitter stream to summarize the topic for a given duration. Tweet ranking is the primary task of designing a microblog-based summarization system. After ranking tweets, the selection of relevant tweets is the crucial task for any summarization system due to the massive volume of tweets in the Twitter stream. In addition, the summarization system should include novel tweets in the summary or digest. The measure of relevance is typically the similarity score obtained from different text similarity algorithms. These measure the similarity between user information needs and each tweet. The more similar, the higher the score. So we need to choose a threshold that can minimize false-positive judgments for this task. In this paper, we proposed novel threshold estimation methods to find optimal values for these thresholds and evaluate them against thresholds determined via grid search. These methods estimate the thresholds with reasonable accuracy, according to the results. Previous research has empirically and heuristically set these thresholds, and our work suggests a method that exploits statistical features of the ranking list to estimate these thresholds. We used language models to rank the tweets and to select relevant tweets. For any language model, the selection of the smoothing technique and its parameters are critical. The results are also compared with the standard probabilistic ranking algorithm BM25. Learning to rank strategies is also implemented, which shows substantial improvement in some of the result metrics. Experiments were performed on standard benchmarks like the TREC Microblog 2015, TREC RTS 2016, and TREC RTS 2017 datasets. Different variants of normal discounted cumulative gain, the standard official evaluation metric of TREC, nDCG-1, nDCG-0, and nDCG-p are used in this study. We also performed a comprehensive failure analysis on our experiments and identified key issues for improvement that can be addressed in the future.
dc.identifier.citation	Sandip Modha, Majumder, Prasenjit, Mandl, Thomas & Rishab Singla "Design and analysis of microblog-based summarization system," Social Network Analysis and Mining, Springer, vol. 11, issue. 1, Article No. 114, ISSN: 18695450, 2021, doi: 10.1007/s13278-021-00830-3.
dc.identifier.doi	10.1007/s13278-021-00830-3
dc.identifier.issn	1869-5469
dc.identifier.scopus	2-s2.0-85118751698
dc.identifier.uri	https://ir.daiict.ac.in/handle/dau.ir/1775
dc.identifier.wos	WOS:000714035300001
dc.language.iso	en
dc.publisher	Springer
dc.relation.ispartofseries	Vol. 11; No. 1
dc.source	Social Network Analysis and Mining
dc.source.uri	https://link.springer.com/article/10.1007/s13278-021-00830-3
dc.title	Design and analysis of microblog-based summarization system
dspace.entity.type	Publication
relation.isAuthorOfPublication	2157d717-1c67-4d71-b314-ed3eddebf251
relation.isAuthorOfPublication	2157d717-1c67-4d71-b314-ed3eddebf251
relation.isAuthorOfPublication.latestForDiscovery	2157d717-1c67-4d71-b314-ed3eddebf251

Collections

Journal Article

Publication: Design and analysis of microblog-based summarization system

Files

Collections

Publication:
Design and analysis of microblog-based summarization system