DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
2510.17489v1
cs.CL, cs.LG
2025-10-22
Авторы:
Yongxin He, Shan Zhang, Yixuan Cao, Lei Ma, Ping Luo
Abstract
Detecting AI-involved text is essential for combating misinformation,
plagiarism, and academic misconduct. However, AI text generation includes
diverse collaborative processes (AI-written text edited by humans,
human-written text edited by AI, and AI-generated text refined by other AI),
where various or even new LLMs could be involved. Texts generated through these
varied processes exhibit complex characteristics, presenting significant
challenges for detection. Current methods model these processes rather crudely,
primarily employing binary classification (purely human vs. AI-involved) or
multi-classification (treating human-AI collaboration as a new class). We
observe that representations of texts generated through different processes
exhibit inherent clustering relationships. Therefore, we propose DETree, a
novel approach that models the relationships among different processes as a
Hierarchical Affinity Tree structure, and introduces a specialized loss
function that aligns text representations with this tree. To facilitate this
learning, we developed RealBench, a comprehensive benchmark dataset that
automatically incorporates a wide spectrum of hybrid texts produced through
various human-AI collaboration processes. Our method improves performance in
hybrid text detection tasks and significantly enhances robustness and
generalization in out-of-distribution scenarios, particularly in few-shot
learning conditions, further demonstrating the promise of training-based
approaches in OOD settings. Our code and dataset are available at
https://github.com/heyongxin233/DETree.
Ссылки и действия
Дополнительные ресурсы: