📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment

2025-12-05

Авторы:

Huy Nghiem, Swetasudha Panda, Devashish Khatwani, Huy V. Nguyen, Krishnaram Kenthapadi, Hal Daumé

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models (LLMs) are increasingly used in healthcare, yet ensuring their safety and trustworthiness remains a barrier to deployment. Conversational medical assistants must avoid unsafe compliance without over-refusing benign queries. We present an iterative post-deployment alignment framework that applies Kahneman-Tversky Optimization (KTO) and Direct Preference Optimization (DPO) to refine models against domain-specific safety signals. Using the CARES-18K benchmark for adversarial r...

ID: 2512.04210v1 cs.AI, cs.CL, cs.CY

arXiv PDF

📄 DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

2025-12-05

Авторы:

Fangyu Lei, Jinxiang Meng, Yiming Huang, Junjie Zhao, Yitong Zhang, Jianwen Luo, Xin Zou, Ruiyi Yang, Wenbo Shi, Yan Gao, Shizhu He, Zuo Wang, Qian Liu, Yang Wang, Ke Wang, Jun Zhao, Kang Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark of 210 tasks that mirrors these complex workflows. Data engineering (DE) tasks require repository-level engineering on industrial schemas, including designing and building multi-stage SQL pipelines from scratch and evolving existing systems under evolving requi...

ID: 2512.04324v1 cs.CL, cs.AI

arXiv PDF

📄 Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

2025-12-05

Авторы:

Kai-Po Chang, Wei-Yuan Cheng, Chi-Pin Huang, Fu-En Yang, Yu-Chiang Frank Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent advancement in multimodal LLMs (MLLMs) has demonstrated their remarkable capability to generate descriptive captions for input videos. However, these models suffer from factual inaccuracies in the generated descriptions, causing severe hallucination issues. While prior works have explored alleviating hallucinations for static images, jointly mitigating visual object and temporal action hallucinations for dynamic videos remains a challenging and unsolved task. To tackle this challenge, we ...

ID: 2512.04356v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 MASE: Interpretable NLP Models via Model-Agnostic Saliency Estimation

2025-12-05

Авторы:

Zhou Yang, Shunyan Luo, Jiazhen Zhu, Fang Jin

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Deep neural networks (DNNs) have made significant strides in Natural Language Processing (NLP), yet their interpretability remains elusive, particularly when evaluating their intricate decision-making processes. Traditional methods often rely on post-hoc interpretations, such as saliency maps or feature visualization, which might not be directly applicable to the discrete nature of word data in NLP. Addressing this, we introduce the Model-agnostic Saliency Estimation (MASE) framework. MASE offer...

ID: 2512.04386v1 cs.CL, cs.AI

arXiv PDF

📄 Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

2025-12-05

Авторы:

Darshan Fofadiya

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Autoregressive Language Models (LLMs) trained on Next-Token Prediction (NTP) often suffer from ``Topic Drift'' where the generation wanders away from the initial prompt due to a reliance on local associations rather than global planning \citep{holtzman2019curious}. While scaling model size mitigates this \citep{brown2020language}, the fundamental myopia of the NTP objective remains. In this work, we introduce the Idea-Gated Transformer, a novel architecture that separates semantic planning from ...

ID: 2512.03343v1 cs.CL, cs.AI

arXiv PDF

📄 Fine-grained Narrative Classification in Biased News Articles

2025-12-05

Авторы:

Zeba Afroz, Harsh Vardhan, Pawan Bhakuni, Aanchal Punia, Rajdeep Kumar, Md. Shad Akhtar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Narratives are the cognitive and emotional scaffolds of propaganda. They organize isolated persuasive techniques into coherent stories that justify actions, attribute blame, and evoke identification with ideological camps. In this paper, we propose a novel fine-grained narrative classification in biased news articles. We also explore article-bias classification as the precursor task to narrative classification and fine-grained persuasive technique identification. We develop INDI-PROP, the first ...

ID: 2512.03582v1 cs.CL, cs.AI

arXiv PDF

📄 SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

2025-12-05

Авторы:

Hanxiu Zhang, Yue Zheng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting ...

ID: 2512.03620v1 cs.CR, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 AlignCheck: a Semantic Open-Domain Metric for Factual Consistency Assessment

2025-12-05

Авторы:

Ahmad Aghaebrahimian

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models have significantly advanced natural language processing tasks, but remain prone to generating incorrect or misleading but plausible arguments. This issue, known as hallucination, is particularly concerning in high-stakes domains like clinical applications, where factual inaccuracies can have severe consequences. Existing evaluation metrics fail to adequately assess factual consistency and lack interpretability, making diagnosing and mitigating errors difficult. We propose a...

ID: 2512.03634v1 cs.CL, cs.AI

arXiv PDF

📄 Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective

2025-12-05

Авторы:

Jingyang Ou, Jiaqi Han, Minkai Xu, Shaoxuan Xu, Jianwen Xie, Stefano Ermon, Yi Wu, Chongxuan Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reinforcement Learning (RL) has proven highly effective for autoregressive language models, but adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges. The core difficulty lies in likelihood approximation: while autoregressive models naturally provide token-level conditional probabilities essential for token-level RL objectives (e.g., GRPO), dLLMs generate sequences through iterative non-autoregressive denoising steps that lack this factorization. To ad...

ID: 2512.03759v1 cs.CL, cs.AI, cs.LG

arXiv PDF

📄 In-Context Representation Hijacking

2025-12-05

Авторы:

Itay Yona, Amir Sarid, Michael Karasik, Yossi Gandelsman

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce \textbf{Doublespeak}, a simple \emph{in-context representation hijacking} attack against large language models (LLMs). The attack works by systematically replacing a harmful keyword (e.g., \textit{bomb}) with a benign token (e.g., \textit{carrot}) across multiple in-context examples, provided a prefix to a harmful request. We demonstrate that this substitution leads to the internal representation of the benign token converging toward that of the harmful one, effectively embedding th...

ID: 2512.03771v1 cs.CL, cs.AI, cs.CR, cs.LG

arXiv PDF

Показано 81 - 90 из 7506 записей