Publication:
Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

dc.contributor.affiliationDA-IICT, Gandhinagar
dc.contributor.authorMadhu, Hiren
dc.contributor.authorSatapara, Shrey
dc.contributor.authorModha, Sandip
dc.contributor.authorMandl, Thomas
dc.contributor.authorMajumder, Prasenjit
dc.contributor.researcherSatapara, Shrey (202111005)
dc.date.accessioned2025-08-01T13:09:15Z
dc.date.issued01-04-2023
dc.description.abstractThe spread of Hate Speech on online platforms is a severe issue for societies and requires the identification of offensive content by platforms. Research has modeled Hate Speech recognition as a�text classification�problem that predicts the class of a message based on the text of the message only. However, context plays a huge role in communication. In particular, for short messages, the text of the preceding tweets can completely change the interpretation of a message within a discourse. This work extends previous efforts to classify Hate Speech by considering the current and previous tweets jointly. In particular, we introduce a clearly defined way of extracting context. We present the development of the first dataset for conversational-based Hate Speech classification with an approach for collecting context from long conversations for code-mixed Hindi (ICHCL dataset). Overall, our benchmark experiments show that the inclusion of context can improve classification performance over a baseline. Furthermore, we develop a novel processing pipeline for processing the context. The best-performing pipeline uses a fine-tuned SentBERT paired with an�LSTM�as a classifier. This pipeline achieves a macro F1 score of 0.892 on the ICHCL test dataset. Another�KNN, SentBERT, and ABC weighting-based pipeline yields an F1 Macro of 0.807, which gives the best results among traditional classifiers. So even a KNN model gives better results with an optimized�BERT�than a vanilla BERT model.
dc.format.extent1-16
dc.identifier.citationHiren Madhu, Shrey Satapara, Sandip Modha, Mandl, Thomas, Majumder, Prasenjit, "Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments," Expert Systems with Applications, Elsevier, ISSN: 0957-4174, vol. 215, Article no. 119342, pp. 1-16, 1 Apr. 2023, doi: 10.1016/j.eswa.2022.119342. [Published date : 25 Nov. 2022]
dc.identifier.doi10.1016/j.eswa.2022.119342
dc.identifier.issn0957-4174
dc.identifier.scopus2-s2.0-85145576108
dc.identifier.urihttps://ir.daiict.ac.in/handle/dau.ir/1777
dc.identifier.wosWOS:000895345700005
dc.language.isoen
dc.publisherElsevier
dc.relation.ispartofseriesVol. 215; No.
dc.source Expert Systems with Applications
dc.source.urihttps://www.sciencedirect.com/science/article/pii/S0957417422023600?via%3Dihub
dc.titleDetecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments
dspace.entity.typePublication
relation.isAuthorOfPublication2157d717-1c67-4d71-b314-ed3eddebf251
relation.isAuthorOfPublication2157d717-1c67-4d71-b314-ed3eddebf251
relation.isAuthorOfPublication.latestForDiscovery2157d717-1c67-4d71-b314-ed3eddebf251

Files

Collections