NV3D: Leveraging Spatial Shape Through Normal Vector-based 3D Object Detection
2510.11632v1
cs.CV, cs.AI, cs.LG, I.2.6; I.2.9; I.2.10; I.4.8; I.4.10; I.5.1; I.5.4
2025-10-15
Авторы:
Krittin Chaowakarn, Paramin Sangwongngam, Nang Htet Htet Aung, Chalie Charoenlarpnopparut
Abstract
Recent studies in 3D object detection for autonomous vehicles aim to enrich
features through the utilization of multi-modal setups or the extraction of
local patterns within LiDAR point clouds. However, multi-modal methods face
significant challenges in feature alignment, and gaining features locally can
be oversimplified for complex 3D object detection tasks. In this paper, we
propose a novel model, NV3D, which utilizes local features acquired from voxel
neighbors, as normal vectors computed per voxel basis using K-nearest neighbors
(KNN) and principal component analysis (PCA). This informative feature enables
NV3D to determine the relationship between the surface and pertinent target
entities, including cars, pedestrians, or cyclists. During the normal vector
extraction process, NV3D offers two distinct sampling strategies: normal vector
density-based sampling and FOV-aware bin-based sampling, allowing elimination
of up to 55% of data while maintaining performance. In addition, we applied
element-wise attention fusion, which accepts voxel features as the query and
value and normal vector features as the key, similar to the attention
mechanism. Our method is trained on the KITTI dataset and has demonstrated
superior performance in car and cyclist detection owing to their spatial
shapes. In the validation set, NV3D without sampling achieves 86.60% and 80.18%
mean Average Precision (mAP), greater than the baseline Voxel R-CNN by 2.61%
and 4.23% mAP, respectively. With both samplings, NV3D achieves 85.54% mAP in
car detection, exceeding the baseline by 1.56% mAP, despite roughly 55% of
voxels being filtered out.