Code-Mixed Information Retrieval

Photo taken from internet

Code-mixing, the mixing of lexical items and grammatical features from multiple languages in a single sentence, is prevalent worldwide. With the rise of online social networking, many users converse in their native languages using foreign scripts. In India, people often use Roman script on social media. This is especially true for migrants who form online communities to share information and experiences relevant to their need. For example, Bengali speakers from West Bengal who migrate to cities like Delhi or Bangalore create groups like “Bengali in Delhi” on platforms such as Facebook and WhatsApp. They seek advice on various local issues, which became crucial during the COVID-19 pandemic for sharing experiences and navigating frequently changing government guidelines. These conversations typically involve code-mixed text, with users employing informal, colloquial language often transliterated into Roman script. This lack of standardization makes it difficult to identify and highlight relevant answers within these discussions, particularly for those seeking similar information later. Our task aims to develop a mechanism to pinpoint the most relevant answers from these code-mixed conversations. The focus is on Roman transliterated Bengali mixed with English language.

Supriya Chanda
Supriya Chanda
Research Scholar (2018-2024)

Supriya Chanda (pronounced as Supriyo), completed his Ph.D in the Department of Computer Science and Engineering, IIT (BHU), Varanasi. He did his research under the guidance of Dr. Sukomal Pal at the Information retrieval lab.