Publication: Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments
dc.contributor.affiliation | DA-IICT, Gandhinagar | |
dc.contributor.author | Madhu, Hiren | |
dc.contributor.author | Satapara, Shrey | |
dc.contributor.author | Modha, Sandip | |
dc.contributor.author | Mandl, Thomas | |
dc.contributor.author | Majumder, Prasenjit | |
dc.contributor.researcher | Satapara, Shrey (202111005) | |
dc.date.accessioned | 2025-08-01T13:09:15Z | |
dc.date.issued | 01-04-2023 | |
dc.description.abstract | The spread of Hate Speech on online platforms is a severe issue for societies and requires the identification of offensive content by platforms. Research has modeled Hate Speech recognition as a�text classification�problem that predicts the class of a message based on the text of the message only. However, context plays a huge role in communication. In particular, for short messages, the text of the preceding tweets can completely change the interpretation of a message within a discourse. This work extends previous efforts to classify Hate Speech by considering the current and previous tweets jointly. In particular, we introduce a clearly defined way of extracting context. We present the development of the first dataset for conversational-based Hate Speech classification with an approach for collecting context from long conversations for code-mixed Hindi (ICHCL dataset). Overall, our benchmark experiments show that the inclusion of context can improve classification performance over a baseline. Furthermore, we develop a novel processing pipeline for processing the context. The best-performing pipeline uses a fine-tuned SentBERT paired with an�LSTM�as a classifier. This pipeline achieves a macro F1 score of 0.892 on the ICHCL test dataset. Another�KNN, SentBERT, and ABC weighting-based pipeline yields an F1 Macro of 0.807, which gives the best results among traditional classifiers. So even a KNN model gives better results with an optimized�BERT�than a vanilla BERT model. | |
dc.format.extent | 1-16 | |
dc.identifier.citation | Hiren Madhu, Shrey Satapara, Sandip Modha, Mandl, Thomas, Majumder, Prasenjit, "Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments," Expert Systems with Applications, Elsevier, ISSN: 0957-4174, vol. 215, Article no. 119342, pp. 1-16, 1 Apr. 2023, doi: 10.1016/j.eswa.2022.119342. [Published date : 25 Nov. 2022] | |
dc.identifier.doi | 10.1016/j.eswa.2022.119342 | |
dc.identifier.issn | 0957-4174 | |
dc.identifier.scopus | 2-s2.0-85145576108 | |
dc.identifier.uri | https://ir.daiict.ac.in/handle/dau.ir/1777 | |
dc.identifier.wos | WOS:000895345700005 | |
dc.language.iso | en | |
dc.publisher | Elsevier | |
dc.relation.ispartofseries | Vol. 215; No. | |
dc.source | Expert Systems with Applications | |
dc.source.uri | https://www.sciencedirect.com/science/article/pii/S0957417422023600?via%3Dihub | |
dc.title | Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | 2157d717-1c67-4d71-b314-ed3eddebf251 | |
relation.isAuthorOfPublication | 2157d717-1c67-4d71-b314-ed3eddebf251 | |
relation.isAuthorOfPublication.latestForDiscovery | 2157d717-1c67-4d71-b314-ed3eddebf251 |