Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
2510.08073v1
cs.CV, cs.LG
2025-10-11
Авторы:
Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guoxuan Pang, Feng Liu, Bo Han, Shutao Li, Mingkui Tan
Abstract
AI-generated videos have achieved near-perfect visual realism (e.g., Sora),
urgently necessitating reliable detection mechanisms. However, detecting such
videos faces significant challenges in modeling high-dimensional spatiotemporal
dynamics and identifying subtle anomalies that violate physical laws. In this
paper, we propose a physics-driven AI-generated video detection paradigm based
on probability flow conservation principles. Specifically, we propose a
statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the
ratio of spatial probability gradients to temporal density changes, explicitly
capturing deviations from natural video dynamics. Leveraging pre-trained
diffusion models, we develop an NSG estimator through spatial gradients
approximation and motion-aware temporal modeling without complex motion
decomposition while preserving physical constraints. Building on this, we
propose an NSG-based video detection method (NSG-VD) that computes the Maximum
Mean Discrepancy (MMD) between NSG features of the test and real videos as a
detection metric. Last, we derive an upper bound of NSG feature distances
between real and generated videos, proving that generated videos exhibit
amplified discrepancies due to distributional shifts. Extensive experiments
confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall
and 10.75% in F1-Score, validating the superior performance of NSG-VD. The
source code is available at https://github.com/ZSHsh98/NSG-VD.
Ссылки и действия
Дополнительные ресурсы: