📊 Статистика дайджестов

Всего дайджестов: 35039 Добавлено сегодня: 432

Последнее обновление: сегодня

📄 Structured Output Regularization: a framework for few-shot transfer learning

2025-10-14

Авторы:

Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Traditional transfer learning typically reuses large pre-trained networks by freezing some of their weights and adding task-specific layers. While this approach is computationally efficient, it limits the model's ability to adapt to domain-specific features and can still lead to overfitting with very limited data. To address these limitations, we propose Structured Output Regularization (SOR), a simple yet effective framework that freezes the internal network structures (e.g., convolutional filt...

ID: 2510.08728v1 cs.CV, cs.LG, stat.ML

arXiv PDF

📄 Detecting spills using thermal imaging, pretrained deep learning models, and a robotic platform

2025-10-14

Авторы:

Gregory Yeghiyan, Jurius Azar, Devson Butani, Chan-Jin Chung

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper presents a real-time spill detection system that utilizes pretrained deep learning models with RGB and thermal imaging to classify spill vs. no-spill scenarios across varied environments. Using a balanced binary dataset (4,000 images), our experiments demonstrate the advantages of thermal imaging in inference speed, accuracy, and model size. We achieve up to 100% accuracy using lightweight models like VGG19 and NasNetMobile, with thermal models performing faster and more robustly acro...

ID: 2510.08770v1 cs.CV, cs.LG, cs.RO

arXiv PDF

📄 PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning

2025-10-14

Авторы:

Daiki Yoshikawa, Takashi Matsubara

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-language models have achieved remarkable success in multi-modal representation learning from large-scale pairs of visual scenes and linguistic descriptions. However, they still struggle to simultaneously express two distinct types of semantic structures: the hierarchy within a concept family (e.g., dog $\preceq$ mammal $\preceq$ animal) and the compositionality across different concept families (e.g., "a dog in a car" $\preceq$ dog, car). Recent works have addressed this challenge by empl...

ID: 2510.08919v1 cs.CV, cs.LG

arXiv PDF

📄 Denoised Diffusion for Object-Focused Image Augmentation

2025-10-14

Авторы:

Nisha Pillai, Aditi Virupakshaiah, Harrison W. Smith, Amanda J. Ashworth, Prasanna Gowda, Phillip R. Owens, Adam R. Rivers, Bindu Nanduri, Mahalingam Ramkumar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Modern agricultural operations increasingly rely on integrated monitoring systems that combine multiple data sources for farm optimization. Aerial drone-based animal health monitoring serves as a key component but faces limited data availability, compounded by scene-specific issues such as small, occluded, or partially visible animals. Transfer learning approaches often fail to address this limitation due to the unavailability of large datasets that reflect specific farm conditions, including va...

ID: 2510.08955v1 cs.CV, cs.LG

arXiv PDF

📄 Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation

2025-10-14

Авторы:

Yuki Nii, Futa Waseda, Ching-Chun Chang, Isao Echizen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

AI-based colorization has shown remarkable capability in generating realistic color images from grayscale inputs. However, it poses risks of copyright infringement -- for example, the unauthorized colorization and resale of monochrome manga and films. Despite these concerns, no effective method currently exists to prevent such misuse. To address this, we introduce the first defensive paradigm, Uncolorable Examples, which embed imperceptible perturbations into grayscale images to invalidate unaut...

ID: 2510.08979v1 cs.CV, cs.LG

arXiv PDF

📄 Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels

2025-10-14

Авторы:

Weitong Kong, Zichao Zeng, Di Wen, Jiale Wei, Kunyu Peng, June Moh Goo, Jan Boehm, Rainer Stiefelhagen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Accurate perception is critical for vehicle safety, with LiDAR as a key enabler in autonomous driving. To ensure robust performance across environments, sensor types, and weather conditions without costly re-annotation, domain generalization in LiDAR-based 3D semantic segmentation is essential. However, LiDAR annotations are often noisy due to sensor imperfections, occlusions, and human errors. Such noise degrades segmentation accuracy and is further amplified under domain shifts, threatening sy...

ID: 2510.09035v1 cs.CV, cs.LG, cs.RO

arXiv PDF

📄 MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation

2025-10-14

Авторы:

Akira Takahashi, Shusuke Takahashi, Yuki Mitsufuji

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce MMAudioSep, a generative model for video/text-queried sound separation that is founded on a pretrained video-to-audio model. By leveraging knowledge about the relationship between video/text and audio learned through a pretrained audio generative model, we can train the model more efficiently, i.e., the model does not need to be trained from scratch. We evaluate the performance of MMAudioSep by comparing it to existing separation models, including models based on both deterministic ...

ID: 2510.09065v1 cs.SD, cs.CV, cs.LG, eess.AS

arXiv PDF

📄 A Novel Multi-branch ConvNeXt Architecture for Identifying Subtle Pathological Features in CT Scans

2025-10-14

Авторы:

Irash Perera, Uthayasanker Thayasivam

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Intelligent analysis of medical imaging plays a crucial role in assisting clinical diagnosis, especially for identifying subtle pathological features. This paper introduces a novel multi-branch ConvNeXt architecture designed specifically for the nuanced challenges of medical image analysis. While applied here to the specific problem of COVID-19 diagnosis, the methodology offers a generalizable framework for classifying a wide range of pathologies from CT scans. The proposed model incorporates a ...

ID: 2510.09107v1 cs.CV, cs.LG

arXiv PDF

📄 Training Feature Attribution for Vision Models

2025-10-14

Авторы:

Aziz Bacha, Thomas George

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Deep neural networks are often considered opaque systems, prompting the need for explainability methods to improve trust and accountability. Existing approaches typically attribute test-time predictions either to input features (e.g., pixels in an image) or to influential training examples. We argue that both perspectives should be studied jointly. This work explores *training feature attribution*, which links test predictions to specific regions of specific training images and thereby provides ...

ID: 2510.09135v1 cs.CV, cs.LG

arXiv PDF

📄 Zero-shot image privacy classification with Vision-Language Models

2025-10-14

Авторы:

Alina Elena Baia, Alessio Xompero, Andrea Cavallaro

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

While specialized learning-based models have historically dominated image privacy prediction, the current literature increasingly favours adopting large Vision-Language Models (VLMs) designed for generic tasks. This trend risks overlooking the performance ceiling set by purpose-built models due to a lack of systematic evaluation. To address this problem, we establish a zero-shot benchmark for image privacy classification, enabling a fair comparison. We evaluate the top-3 open-source VLMs, accord...

ID: 2510.09253v1 cs.CV, cs.LG, cs.MM

arXiv PDF

Показано 381 - 390 из 863 записей