CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic
2511.03102v1
cs.CL, cs.AI
2025-11-07
Авторы:
Saad Mankarious, Ayah Zirikly
Abstract
Mental health disorders affect millions worldwide, yet early detection
remains a major challenge, particularly for Arabic-speaking populations where
resources are limited and mental health discourse is often discouraged due to
cultural stigma. While substantial research has focused on English-language
mental health detection, Arabic remains significantly underexplored, partly due
to the scarcity of annotated datasets. We present CARMA, the first
automatically annotated large-scale dataset of Arabic Reddit posts. The dataset
encompasses six mental health conditions, such as Anxiety, Autism, and
Depression, and a control group. CARMA surpasses existing resources in both
scale and diversity. We conduct qualitative and quantitative analyses of
lexical and semantic differences between users, providing insights into the
linguistic markers of specific mental health conditions. To demonstrate the
dataset's potential for further mental health analysis, we perform
classification experiments using a range of models, from shallow classifiers to
large language models. Our results highlight the promise of advancing mental
health detection in underrepresented languages such as Arabic.
Ссылки и действия
Дополнительные ресурсы: