Quantifying Feature Importance for Online Content Moderation
2510.19882v1
cs.CY, cs.AI, cs.LG, cs.SI
2025-10-25
Авторы:
Benedetta Tessa, Alejandro Moreo, Stefano Cresci, Tiziano Fagni, Fabrizio Sebastiani
Abstract
Accurately estimating how users respond to moderation interventions is
paramount for developing effective and user-centred moderation strategies.
However, this requires a clear understanding of which user characteristics are
associated with different behavioural responses, which is the goal of this
work. We investigate the informativeness of 753 socio-behavioural, linguistic,
relational, and psychological features, in predicting the behavioural changes
of 16.8K users affected by a major moderation intervention on Reddit. To reach
this goal, we frame the problem in terms of "quantification", a task
well-suited to estimating shifts in aggregate user behaviour. We then apply a
greedy feature selection strategy with the double goal of (i) identifying the
features that are most predictive of changes in user activity, toxicity, and
participation diversity, and (ii) estimating their importance. Our results
allow identifying a small set of features that are consistently informative
across all tasks, and determining that many others are either task-specific or
of limited utility altogether. We also find that predictive performance varies
according to the task, with changes in activity and toxicity being easier to
estimate than changes in diversity. Overall, our results pave the way for the
development of accurate systems that predict user reactions to moderation
interventions. Furthermore, our findings highlight the complexity of
post-moderation user behaviour, and indicate that effective moderation should
be tailored not only to user traits but also to the specific objective of the
intervention.