Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
2510.06350v1
cs.CY, cs.AI, cs.CL, cs.HC, cs.LG
2025-10-10
Авторы:
Mattia Samory, Diana Pamfile, Andrew To, Shruti Phadke
Abstract
Online communities rely on a mix of platform policies and community-authored
rules to define acceptable behavior and maintain order. However, these rules
vary widely across communities, evolve over time, and are enforced
inconsistently, posing challenges for transparency, governance, and automation.
In this paper, we model the relationship between rules and their enforcement at
scale, introducing ModQ, a novel question-answering framework for
rule-sensitive content moderation. Unlike prior classification or
generation-based approaches, ModQ conditions on the full set of community rules
at inference time and identifies which rule best applies to a given comment. We
implement two model variants - extractive and multiple-choice QA - and train
them on large-scale datasets from Reddit and Lemmy, the latter of which we
construct from publicly available moderation logs and rule descriptions. Both
models outperform state-of-the-art baselines in identifying moderation-relevant
rule violations, while remaining lightweight and interpretable. Notably, ModQ
models generalize effectively to unseen communities and rules, supporting
low-resource moderation settings and dynamic governance environments.