Code-Mixed Sentiment Analysis

Photo taken from internet

Sentiment analysis on code-mixed text is essential for understanding user opinions, particularly in regions where people frequently switch between languages in their conversations. Traditional sentiment analysis models face difficulty in handling such data due to the complexity introduced by the mixture of languages and the lack of large labeled datasets. This project aimed to develop a robust model for detecting and classifying sentiment (positive, negative, neutral) in code-mixed social media text. A dataset was collected from social media platforms like Facebook and Twitter, and preprocessing steps included tokenization, language identification, and normalization of code-mixed text. The hybrid model combined traditional machine learning algorithms like Support Vector Machines (SVM) with deep learning techniques like Bidirectional Long Short-Term Memory (BiLSTM). The project also leveraged multilingual embeddings to capture the semantics of the words in different languages. One of the key challenges was handling context-sensitive elements like sarcasm and slang. The results showed improved accuracy by fine-tuning transformer models like BERT and multilingual BERT, significantly enhancing the performance of sentiment classification on code-mixed data.

Supriya Chanda
Supriya Chanda
Research Scholar (2018-2024)

Supriya Chanda (pronounced as Supriyo), completed his Ph.D in the Department of Computer Science and Engineering, IIT (BHU), Varanasi. He did his research under the guidance of Dr. Sukomal Pal at the Information retrieval lab.