📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception

2025-11-26

Авторы:

Miguel Carvalho, Helder Dias, Bruno Martins

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Models (VLMs) often struggle with tasks that require fine-grained image understanding, such as scene-text recognition or document analysis, due to perception limitations and visual fragmentation. To address these challenges, we introduce CropVLM as an external low-cost method for boosting performance, enabling VLMs to dynamically ''zoom in'' on relevant image regions, enhancing their ability to capture fine details. CropVLM is trained using reinforcement learning, without using h...

ID: 2511.19820v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 MAPS: Preserving Vision-Language Representations via Module-Wise Proximity Scheduling for Better Vision-Language-Action Generalization

2025-11-26

Авторы:

Chengyue Huang, Mellon M. Zhang, Robert Azarcon, Glen Chou, Zsolt Kira

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language-Action (VLA) models inherit strong priors from pretrained Vision-Language Models (VLMs), but naive fine-tuning often disrupts these representations and harms generalization. Existing fixes -- freezing modules or applying uniform regularization -- either overconstrain adaptation or ignore the differing roles of VLA components. We present MAPS (Module-Wise Proximity Scheduling), the first robust fine-tuning framework for VLAs. Through systematic analysis, we uncover an empirical or...

ID: 2511.19878v1 cs.CV, cs.AI, cs.CL, cs.LG, cs.RO

arXiv PDF

📄 Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

2025-11-25

Авторы:

Dachuan Zhao, Weiyue Li, Zhenda Shen, Yushu Qiu, Bowen Xu, Haoyu Chen, Yongchao Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases, resulting in biased associations and misaligned predictions in downstream tasks. Such behavior undermines fairness and distorts the intended alignment between vision and language. Recent post-hoc approaches attempt to mitigate bias by replacing the most attribute-correlated embedding coordinates with neutral values. However, our systematic analy...

ID: 2511.18123v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Fantastic Bugs and Where to Find Them in AI Benchmarks

2025-11-25

Авторы:

Sang Truong, Yuheng Tu, Michael Hardy, Anka Reuel, Zeyu Tang, Jirayu Burapacheep, Jonathan Perera, Chibuike Uwakwe, Ben Domingue, Nick Haber, Sanmi Koyejo

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Benchmarks are pivotal in driving AI progress, and invalid benchmark questions frequently undermine their reliability. Manually identifying and correcting errors among thousands of benchmark questions is not only infeasible but also a critical bottleneck for reliable evaluation. In this work, we introduce a framework for systematic benchmark revision that leverages statistical analysis of response patterns to flag potentially invalid questions for further expert review. Our approach builds on a ...

ID: 2511.16842v1 cs.AI, cs.CL, cs.LG

arXiv PDF

📄 The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation

2025-11-21

Авторы:

Jiaheng Zhang, Daqiang Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The integration of Large Language Models (LLMs) into explainable recommendation systems often leads to a performance-efficiency trade-off in end-to-end architectures, where joint optimization of ranking and explanation can result in suboptimal compromises. To resolve this, we propose Prism, a novel decoupled framework that rigorously separates the recommendation process into a dedicated ranking stage and an explanation generation stage. Inspired by knowledge distillation, Prism leverages a pow...

ID: 2511.16543v1 cs.IR, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 VisPlay: Self-Evolving Vision-Language Models from Images

2025-11-21

Авторы:

Yicheng He, Chengsong Huang, Zongxia Li, Jiaxin Huang, Yonghui Yang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reinforcement learning (RL) provides a principled framework for improving Vision-Language Models (VLMs) on complex reasoning tasks. However, existing RL approaches often rely on human-annotated labels or task-specific heuristics to define verifiable rewards, both of which are costly and difficult to scale. We introduce VisPlay, a self-evolving RL framework that enables VLMs to autonomously improve their reasoning abilities using large amounts of unlabeled image data. Starting from a single base ...

ID: 2511.15661v2 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 VisPlay: Self-Evolving Vision-Language Models from Images

2025-11-21

Авторы:

Yicheng He, Chengsong Huang, Zongxia Li, Jiaxin Huang, Yonghui Yang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

ID: 2511.15661v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation

2025-11-20

Авторы:

Firdavs Nasriddinov, Rafal Kocielnik, Anima Anandkumar, Andrew J. Hung

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

High-quality intraoperative feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition. Automating natural, trainer-style feedback promises timely, accessible, and consistent guidance at scale but requires models that understand clinically relevant representations. We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts (33 surgeries) and uses it to condition feedback generation. We c...

ID: 2511.15159v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

2025-11-19

Авторы:

Satanu Ghosh, Collin Holgate, Neal R. Brodnik, Doug Downey, Samantha Daly, Tresa M. Pollock, Samuel Carton

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple des...

ID: 2511.12036v1 cs.CE, cond-mat.mtrl-sci, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 STEP: Success-Rate-Aware Trajectory-Efficient Policy Optimization

2025-11-19

Авторы:

Yuhan Chen, Yuxuan Liu, Long Zhang, Pengzhi Gao, Jian Luan, Wei Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multi-turn interaction remains challenging for online reinforcement learning. A common solution is trajectory-level optimization, which treats each trajectory as a single training sample. However, this approach can be inefficient and yield misleading learning signals: it applies uniform sampling across tasks regardless of difficulty, penalizes correct intermediate actions in failed trajectories, and incurs high sample-collection costs. To address these issues, we propose STEP (Success-rate-aware...

ID: 2511.13091v1 cs.AI, cs.CL, cs.LG

arXiv PDF

Показано 21 - 30 из 196 записей