Causality Guided Representation Learning for Cross-Style Hate Speech Detection
2510.07707v1
cs.CL, cs.AI, cs.LG
2025-10-11
Авторы:
Chengshuai Zhao, Shu Wan, Paras Sheth, Karan Patwa, K. Selçuk Candan, Huan Liu
Abstract
The proliferation of online hate speech poses a significant threat to the
harmony of the web. While explicit hate is easily recognized through overt
slurs, implicit hate speech is often conveyed through sarcasm, irony,
stereotypes, or coded language -- making it harder to detect. Existing hate
speech detection models, which predominantly rely on surface-level linguistic
cues, fail to generalize effectively across diverse stylistic variations.
Moreover, hate speech spread on different platforms often targets distinct
groups and adopts unique styles, potentially inducing spurious correlations
between them and labels, further challenging current detection approaches.
Motivated by these observations, we hypothesize that the generation of hate
speech can be modeled as a causal graph involving key factors: contextual
environment, creator motivation, target, and style. Guided by this graph, we
propose CADET, a causal representation learning framework that disentangles
hate speech into interpretable latent factors and then controls confounders,
thereby isolating genuine hate intent from superficial linguistic cues.
Furthermore, CADET allows counterfactual reasoning by intervening on style
within the latent space, naturally guiding the model to robustly identify hate
speech in varying forms. CADET demonstrates superior performance in
comprehensive experiments, highlighting the potential of causal priors in
advancing generalizable hate speech detection.
Ссылки и действия
Дополнительные ресурсы: