CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation
2510.00726v1
cs.RO, cs.AI, cs.LG
2025-10-04
Авторы:
Giovanni Minelli, Giulio Turrisi, Victor Barasuol, Claudio Semini
Abstract
Learning robotic manipulation policies through supervised learning from
demonstrations remains challenging when policies encounter execution variations
not explicitly covered during training. While incorporating historical context
through attention mechanisms can improve robustness, standard approaches
process all past states in a sequence without explicitly modeling the temporal
structure that demonstrations may include, such as failure and recovery
patterns. We propose a Cross-State Transition Attention Transformer that
employs a novel State Transition Attention (STA) mechanism to modulate standard
attention weights based on learned state evolution patterns, enabling policies
to better adapt their behavior based on execution history. Our approach
combines this structured attention with temporal masking during training, where
visual information is randomly removed from recent timesteps to encourage
temporal reasoning from historical context. Evaluation in simulation shows that
STA consistently outperforms standard cross-attention and temporal modeling
approaches like TCN and LSTM networks across all tasks, achieving more than 2x
improvement over cross-attention on precision-critical tasks.
Ссылки и действия
Дополнительные ресурсы: