Coarse and Fine-Grained Conversational Hate Speech and Offensive Content Identification in Code-Mixed Languages using Fine-Tuned Multilingual Embedding

Image credit: Unsplash

Abstract

We are seeing an increase in hateful and offensive tweets and comments on social media platforms like Facebook and Twitter, impacting our social lives. Because of this, there is an increasing need to identify online postings that can violate accepted norms. For resource-rich languages like English, the challenge of identifying hateful and offensive posts has been well investigated. However, it remains unexplored for languages with limited resources like Marathi. Code-mixing frequently occurs in the social media sphere. Therefore identification of conversational hate and offensive posts and comments in Code-Mixed languages is also challenging and unexplored. In three different objectives of the HASOC 2022 shared task, we proposed approaches for recognizing offensive language on Twitter in Marathi and two code-mixed languages (i.e., Hinglish and German). Some tasks can be expressed as binary classification (also known as coarse-grained, which entails categorizing hate and offensive tweets as either present or absent). At the same time, others can be expressed as multi-class classification (also known as fine-grained, where we must further categorize hate and offensive tweets as Standalone Hate or Contextual Hate). We concatenate the parent-comment-reply data set to create a dataset with additional context. We use the multilingual bidirectional encoder representations of the transformer (mBERT), which has been pre-trained to acquire the contextual representations of tweets. We have carried out several trials using various pre-processing methods and pre-trained models. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team (irlab@iitbhu) second out of 14, seventh out of 11, sixth out of 10, fourth out of 7, and fifth out of six for the ICHCL task 1, ICHCL task 2, Marathi subtask 3A, subtask 3B and subtask 3C respectively.

Publication
Forum for Information Retrieval Evaluation
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

#Supplementary notes can be added here, including code, math, and images.

Supriya Chanda
Supriya Chanda
Research Scholar (2018-2024)

Supriya Chanda (pronounced as Supriyo), completed his Ph.D in the Department of Computer Science and Engineering, IIT (BHU), Varanasi. He did his research under the guidance of Dr. Sukomal Pal at the Information retrieval lab.