EmoSense represents a cutting-edge advancement in emotion detection technology, combining Automatic Speech Recognition (ASR), Emotion Analysis, Speech Emotion Recognition (SER), and Facial Emotion Detection. This multi-modal system offers a comprehensive approach to understanding human emotions expressed through speech, text, and facial expressions. Through EmoSense, we delve into a new realm of personalized experiences, tailored interventions, and insightful analytics across various domains.
Building novel models for each emotion detection task.
Emotions are integral to human communication and interactions, yet accurately detecting and interpreting them presents significant challenges. Existing emotion detection systems often rely on single modalities, such as text or speech, leading to limited accuracy and depth of analysis. Inconsistent or inaccurate emotion detection can hinder personalized user experiences, effective mental health assessments, and interactive technologies.
Problems addressed:
-
Limited accuracy and depth of emotion analysis with single-modal systems.
-
Challenges in understanding emotions expressed through speech, text, and facial expressions.
-
Inconsistent and inaccurate emotion detection hindering personalized experiences and effective assessments.
"EmoSense" is an innovative multi-modal emotion detection system designed to analyze and interpret human emotions through various channels. By integrating Automatic Speech Recognition (ASR), Text Emotion Analysis, Speech Emotion Recognition (SER), and Facial Emotion Detection, EmoSense provides a comprehensive understanding of emotional expressions in speech, text, and facial cues. This project aims to revolutionize emotion detection, offering applications in healthcare, education, customer service, and entertainment for tailored experiences and enhanced interactions.
-
Automatic Speech Recognition - Whisper Large v3
-
Text Emotion Analysis - Fine-tuned RoBERTa for Sequence Classification
-
Face Detection - Fine-tuned YOLOv8
-
Face Emotion Detection - Fine-tuned VGG19
-
Speech Emotion Recognition - Novel SER model as described here
Here are the results of EmoSense for 5 videos sourced from Youtube:
Input 1 : Dramatic Film Monologue : The Society
Graphs:
Face Emotions | Speech Emotions |
---|---|
Labeled Transcript: Transcript can be found here
Graphs:
Face Emotions | Speech Emotions |
---|---|
Labeled Transcript: Transcript can be found here
Graphs:
Face Emotions | Speech Emotions |
---|---|
Labeled Transcript: Transcript can be found here
Input 4 : Closeup monologue from The Women
Graphs:
Face Emotions | Speech Emotions |
---|---|
Labeled Transcript: Transcript can be found here
Graphs:
Face Emotions | Speech Emotions |
---|---|
Labeled Transcript: Transcript can be found here