Self-localization on a 3D map by fusing global and local features from a monocular camera
2510.26170v1
cs.RO, cs.CV
2025-11-01
Авторы:
Satoshi Kikuch, Masaya Kato, Tsuyoshi Tasaki
Abstract
Self-localization on a 3D map by using an inexpensive monocular camera is
required to realize autonomous driving. Self-localization based on a camera
often uses a convolutional neural network (CNN) that can extract local features
that are calculated by nearby pixels. However, when dynamic obstacles, such as
people, are present, CNN does not work well. This study proposes a new method
combining CNN with Vision Transformer, which excels at extracting global
features that show the relationship of patches on whole image. Experimental
results showed that, compared to the state-of-the-art method (SOTA), the
accuracy improvement rate in a CG dataset with dynamic obstacles is 1.5 times
higher than that without dynamic obstacles. Moreover, the self-localization
error of our method is 20.1% smaller than that of SOTA on public datasets.
Additionally, our robot using our method can localize itself with 7.51cm error
on average, which is more accurate than SOTA.
Ссылки и действия
Дополнительные ресурсы: