Crossing Borders: Multilingual Hate Speech Detection

Image credit: Unsplash

Abstract

With the relentless growth of technology usage, particularly among younger generations, the alarming prevalence of hate speech on the internet has become an urgent global concern. This research paper addresses this critical need by presenting an extensive investigation encompassing three distinct hate speech detection tasks across a diverse linguistic landscape. The first task involves hate and offensive speech classification in Gujarati and Sinhala, assessing sentence-level hatefulness. The second task extends to fine-grained BIO tagging, enabling precise identification of hate speech within sentences. Finally, the third task expands the scope to hate speech classification in Bengali, Bodo, and Assamese using social media data, categorizing content as hateful or not. Employing state-of-the-art deep learning techniques tailored to each language’s characteristics, this research contributes significantly to the development of robust and culturally sensitive hate speech detection systems, imperative for nurturing safer online spaces and fostering cross-cultural understanding.

Publication
Forum for Information Retrieval Evaluation
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

#Supplementary notes can be added here, including code, math, and images.

Supriya Chanda
Supriya Chanda
Research Scholar (2018-2024)

Supriya Chanda (pronounced as Supriyo), completed his Ph.D in the Department of Computer Science and Engineering, IIT (BHU), Varanasi. He did his research under the guidance of Dr. Sukomal Pal at the Information retrieval lab.