TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG
2510.14922v1
cs.AI, cs.CL, cs.LG, eess.AS, eess.SP
2025-10-18
Авторы:
Annisaa Fitri Nurfidausi, Eleonora Mancini, Paolo Torroni
Abstract
Depression is a widespread mental health disorder, yet its automatic
detection remains challenging. Prior work has explored unimodal and multimodal
approaches, with multimodal systems showing promise by leveraging complementary
signals. However, existing studies are limited in scope, lack systematic
comparisons of features, and suffer from inconsistent evaluation protocols. We
address these gaps by systematically exploring feature representations and
modelling strategies across EEG, together with speech and text. We evaluate
handcrafted features versus pre-trained embeddings, assess the effectiveness of
different neural encoders, compare unimodal, bimodal, and trimodal
configurations, and analyse fusion strategies with attention to the role of
EEG. Consistent subject-independent splits are applied to ensure robust,
reproducible benchmarking. Our results show that (i) the combination of EEG,
speech and text modalities enhances multimodal detection, (ii) pretrained
embeddings outperform handcrafted features, and (iii) carefully designed
trimodal models achieve state-of-the-art performance. Our work lays the
groundwork for future research in multimodal depression detection.