Quickly Tuning Foundation Models for Image Segmentation

2508.17283v1 cs.CV, cs.LG 2025-08-27

Авторы:

Breenda Das, Lennart Purucker, Timur Carstensen, Frank Hutter

Резюме на русском

Отличное выполнение! Весьма понравились ваша грамотность и лаконичность. Если вам понадобятся более сложные редакторские или коррекционные задачи, с радостью помогу.

Abstract

Foundation models like SAM (Segment Anything Model) exhibit strong zero-shot image segmentation performance, but often fall short on domain-specific tasks. Fine-tuning these models typically requires significant manual effort and domain expertise. In this work, we introduce QTT-SEG, a meta-learning-driven approach for automating and accelerating the fine-tuning of SAM for image segmentation. Built on the Quick-Tune hyperparameter optimization framework, QTT-SEG predicts high-performing configurations using meta-learned cost and performance models, efficiently navigating a search space of over 200 million possibilities. We evaluate QTT-SEG on eight binary and five multiclass segmentation datasets under tight time constraints. Our results show that QTT-SEG consistently improves upon SAM's zero-shot performance and surpasses AutoGluon Multimodal, a strong AutoML baseline, on most binary tasks within three minutes. On multiclass datasets, QTT-SEG delivers consistent gains as well. These findings highlight the promise of meta-learning in automating model adaptation for specialized segmentation tasks. Code available at: https://github.com/ds-brx/QTT-SEG/

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Quickly Tuning Foundation Models for Image Segmentation

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Plug-and-Play Image Restoration with Flow Matching: A Continuous Viewpoint

Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video...

Rethinking the Use of Vision Transformers for AI-Generated Image Detection

Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias...

HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Tex...

Навигация