Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Madhu, Hiren; Satapara, Shrey; Modha, Sandip; Mandl, Thomas; Majumder, Prasenjit

Publication:
Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Date

01-04-2023

Authors

Publisher

Elsevier

Abstract

The spread of Hate Speech on online platforms is a severe issue for societies and requires the identification of offensive content by platforms. Research has modeled Hate Speech recognition as a�text classification�problem that predicts the class of a message based on the text of the message only. However, context plays a huge role in communication. In particular, for short messages, the text of the preceding tweets can completely change the interpretation of a message within a discourse. This work extends previous efforts to classify Hate Speech by considering the current and previous tweets jointly. In particular, we introduce a clearly defined way of extracting context. We present the development of the first dataset for conversational-based Hate Speech classification with an approach for collecting context from long conversations for code-mixed Hindi (ICHCL dataset). Overall, our benchmark experiments show that the inclusion of context can improve classification performance over a baseline. Furthermore, we develop a novel processing pipeline for processing the context. The best-performing pipeline uses a fine-tuned SentBERT paired with an�LSTM�as a classifier. This pipeline achieves a macro F1 score of 0.892 on the ICHCL test dataset. Another�KNN, SentBERT, and ABC weighting-based pipeline yields an F1 Macro of 0.807, which gives the best results among traditional classifiers. So even a KNN model gives better results with an optimized�BERT�than a vanilla BERT model.

Citation

Hiren Madhu, Shrey Satapara, Sandip Modha, Mandl, Thomas, Majumder, Prasenjit, "Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments," Expert Systems with Applications, Elsevier, ISSN: 0957-4174, vol. 215, Article no. 119342, pp. 1-16, 1 Apr. 2023, doi: 10.1016/j.eswa.2022.119342. [Published date : 25 Nov. 2022]

URI

https://ir.daiict.ac.in/handle/dau.ir/1777

Collections

Journal Article

Full item page

Publication:
Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication: Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication:
Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments