Towards Safer and Understandable Driver Intention Prediction
2510.09200v1
cs.CV, cs.AI, cs.HC
2025-10-14
Авторы:
Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai, Carlo Masone, C V Jawahar
Abstract
Autonomous driving (AD) systems are becoming increasingly capable of handling
complex tasks, mainly due to recent advances in deep learning and AI. As
interactions between autonomous systems and humans increase, the
interpretability of decision-making processes in driving systems becomes
increasingly crucial for ensuring safe driving operations. Successful
human-machine interaction requires understanding the underlying representations
of the environment and the driving task, which remains a significant challenge
in deep learning-based systems. To address this, we introduce the task of
interpretability in maneuver prediction before they occur for driver safety,
i.e., driver intent prediction (DIP), which plays a critical role in AD
systems. To foster research in interpretable DIP, we curate the eXplainable
Driving Action Anticipation Dataset (DAAD-X), a new multimodal, ego-centric
video dataset to provide hierarchical, high-level textual explanations as
causal reasoning for the driver's decisions. These explanations are derived
from both the driver's eye-gaze and the ego-vehicle's perspective. Next, we
propose Video Concept Bottleneck Model (VCBM), a framework that generates
spatio-temporally coherent explanations inherently, without relying on post-hoc
techniques. Finally, through extensive evaluations of the proposed VCBM on the
DAAD-X dataset, we demonstrate that transformer-based models exhibit greater
interpretability than conventional CNN-based models. Additionally, we introduce
a multilabel t-SNE visualization technique to illustrate the disentanglement
and causal correlation among multiple explanations. Our data, code and models
are available at: https://mukil07.github.io/VCBM.github.io/
Ссылки и действия
Дополнительные ресурсы: