Adapting SAM with Dynamic Similarity Graphs for Few-Shot Parameter-Efficient Small Dense Object Detection: A Case Study of Chickpea Pods in Field Conditions
2509.25805v1
cs.CV, I.4.6; I.2.10; I.5.1; I.4.8
2025-10-02
Авторы:
Xintong Jiang, Yixue Liu, Mohamed Debbagh, Yu Tian, Valerio Hoyos-Villegas, Viacheslav Adamchuk, Shangpeng Sun
Abstract
Parameter-Efficient Fine-Tuning (PEFT) of foundation models for agricultural
computer vision tasks remains challenging due to limited training data and
complex field conditions. This study introduces a Dynamic Similarity-based
Graph Adaptation (DSGA) module to adapt the Segment Anything Model (SAM) under
extreme data constraints for precise foreground and instance segmentation of
small dense objects in complex agricultural environments. Through dynamic
similarity graph construction with a learnable polynomial decay-initialized
weight ranking mechanism and adaptive local feature aggregation, DSGA
establishes robust spatial and dynamic similarity representation with only
4.00M trainable parameters, which is 4.26% of the original SAM. Integrating
this graph-based feature adaptation with Low-Rank Adaptation (LoRA) creates a
complementary optimization framework that effectively captures both local and
global dependencies in image embeddings while preserving model stability and
parameter efficiency. Experimental results on a challenging chickpea pod
dataset demonstrated that DSGA with LoRA achieved superior performance across
multiple metrics evaluated under 2, 4, 8 and 10 shots, with progressive
performance gains as shot count increased. Quantitative metrics showed a 17.31%
improvement in Structure-measure and a 62.36% gain in adaptive F-measure
compared to the baseline SAM fine-tuning. Comprehensive ablation studies and
visualization analyses through Grad-CAM and t-SNE validated the framework's
effectiveness in feature discrimination. The proposed adaptation demonstrated
practical utility for automated agricultural monitoring applications, achieving
accurate pod-counting with an adjusted R-squared of 0.8987 for images with 10
to 120 pods under challenging field conditions.