📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

2025-10-08

Авторы:

Ayan Majumdar, Feihao Chen, Jinghui Li, Xiaozhen Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large-scale web-scraped text corpora used to train general-purpose AI models often contain harmful demographic-targeted social biases, creating a regulatory need for data auditing and developing scalable bias-detection methods. Although prior work has investigated biases in text datasets and related detection methods, these studies remain narrow in scope. They typically focus on a single content type (e.g., hate speech), cover limited demographic axes, overlook biases affecting multiple demograp...

ID: 2510.04641v1 cs.CL, cs.CY, cs.LG

arXiv PDF

📄 Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models

2025-10-08

Авторы:

Leander Girrbach, Stephan Alaniz, Genevieve Smith, Trevor Darrell, Zeynep Akata

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-language models trained on large-scale multimodal datasets show strong demographic biases, but the role of training data in producing these biases remains unclear. A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M. We address this gap by creating person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions. These annotations ar...

ID: 2510.03721v1 cs.CV, cs.CL, cs.CY, cs.LG

arXiv PDF

📄 Know Thyself? On the Incapability and Implications of AI Self-Recognition

2025-10-08

Авторы:

Xiaoyan Bai, Aryan Shrivastava, Ari Holtzman, Chenhao Tan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Self-recognition is a crucial metacognitive capability for AI systems, relevant not only for psychological analysis but also for safety, particularly in evaluative scenarios. Motivated by contradictory interpretations of whether models possess self-recognition (Panickssery et al., 2024; Davidson et al., 2024), we introduce a systematic evaluation framework that can be easily applied and updated. Specifically, we measure how well 10 contemporary larger language models (LLMs) can identify their ow...

ID: 2510.03399v1 cs.AI, cs.CL, cs.CY, cs.LG

arXiv PDF

📄 Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards

2025-10-08

Авторы:

Faisal Hamman, Chenyang Zhu, Anoop Kumar, Xujun Peng, Sanghamitra Dutta, Daben Liu, Alfy Samuel

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

RAG systems are increasingly deployed in high-stakes domains where users expect outputs to be consistent across semantically equivalent queries. However, existing systems often exhibit significant inconsistencies due to variability in both the retriever and generator (LLM), undermining trust and reliability. In this work, we focus on information consistency, i.e., the requirement that outputs convey the same core content across semantically equivalent inputs. We introduce a principled evaluation...

ID: 2510.04392v1 cs.CL, cs.AI, cs.CY, cs.LG

arXiv PDF

📄 AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives

2025-10-08

Авторы:

Khalid Mehtab Khan, Anagha Kulkarni

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Identifying cultural capital (CC) themes in student reflections can offer valuable insights that help foster equitable learning environments in classrooms. However, themes such as aspirational goals or family support are often woven into narratives, rather than appearing as direct keywords. This makes them difficult to detect for standard NLP models that process sentences in isolation. The core challenge stems from a lack of awareness, as standard models are pre-trained on general corpora, leavi...

ID: 2510.04983v2 cs.CL, cs.AI, cs.CY, cs.LG

arXiv PDF

📄 BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

2025-10-04

Авторы:

Xin Xu, Xunzhi He, Churan Zhi, Ruizhe Chen, Julian McAuley, Zexue He

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing studies on bias mitigation methods for large language models (LLMs) use diverse baselines and metrics to evaluate debiasing performance, leading to inconsistent comparisons among them. Moreover, their evaluations are mostly based on the comparison between LLMs' probabilities of biased and unbiased contexts, which ignores the gap between such evaluations and real-world use cases where users interact with LLMs by reading model responses and expect fair and safe outputs rather than LLMs' p...

ID: 2510.00232v1 cs.CL, cs.AI, cs.CY, cs.LG

arXiv PDF

📄 Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

2025-10-04

Авторы:

Erfan Shayegani, Keegan Hines, Yue Dong, Nael Abu-Ghazaleh, Roman Lutz, Spencer Whitehead, Vidhisha Balachandran, Besmira Nushi, Vibhav Vineet

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmar...

ID: 2510.01670v1 cs.AI, cs.CL, cs.CR, cs.CY, cs.LG

arXiv PDF

📄 Secure Multi-Modal Data Fusion in Federated Digital Health Systems via MCP

2025-10-04

Авторы:

Aueaphum Aueawatthanaphisut

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Secure and interoperable integration of heterogeneous medical data remains a grand challenge in digital health. Current federated learning (FL) frameworks offer privacy-preserving model training but lack standardized mechanisms to orchestrate multi-modal data fusion across distributed and resource-constrained environments. This study introduces a novel framework that leverages the Model Context Protocol (MCP) as an interoperability layer for secure, cross-agent communication in multi-modal feder...

ID: 2510.01780v1 cs.CR, cs.AI, cs.CY, cs.LG

arXiv PDF

📄 Personalized Auto-Grading and Feedback System for Constructive Geometry Tasks Using Large Language Models on an Online Math Platform

2025-10-03

Авторы:

Yong Oh Lee, Byeonghun Bang, Joohyun Lee, Sejun Oh

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As personalized learning gains increasing attention in mathematics education, there is a growing demand for intelligent systems that can assess complex student responses and provide individualized feedback in real time. In this study, we present a personalized auto-grading and feedback system for constructive geometry tasks, developed using large language models (LLMs) and deployed on the Algeomath platform, a Korean online tool designed for interactive geometric constructions. The proposed syst...

ID: 2509.25529v1 cs.CY, cs.LG

arXiv PDF

📄 Toward Preference-aligned Large Language Models via Residual-based Model Steering

2025-10-01

Авторы:

Lucio La Cava, Andrea Tagarelli

## Контекст Одна из основных проблем с Large Language Models (LLMs) заключается в том, чтобы выравнять их беспристрастную продуктивность с целями и предпочтениями пользователей. Несмотря на то, что существуют методы, такие как Reinforcement Learning from Human Feedback (RLHF) и Direct Preference Optimization (DPO), эти подходы требуют больших объемов данных, дорогостоящей оптимизации и постоянной адаптации модели к конкретным задачам. Это приводит к значительным затратам времени и ресурсов. Для решения этой проблемы необходимо разработать метод, который бы становился более эффективным, гибким и менее дешевле, не требовал бы огромных вычислительных мощностей и мог бы использоваться в разных сценариях применения. ## Метод Метод, предложенный в работе, называется **Preference alignment of Large Language Models via Residual Steering (PaLRS)**, и является тренировочно-свободным подходом. Он использует "резидуальные потоки" (residual streams), отражающие динамику нелинейных связей в модели, для извлечения легких в использовании векторов управления. Такие векторы могут быть применены во время инференса, чтобы направить модель на поведение, соответствующее предпочтениям пользователя. Метод требует сравнительно малого количества примеров (например, одного из сто до пользовательских предпочтений) для создания этих векторов управления. Это позволяет подстраивать модель под задачи и пользовательские требования без необходимости снова тренировать модель, а также обеспечивает высокую эффективность и гибкость. ## Результаты Авторы проверили PaLRS на различных опен-сорсных LLMs, включая модели малого и среднего масштаба. На бенчмарк-задачах, таких как математическое разумание и генерация кода, модели с PaLRS-встраиваемыми векторами управления показали значительные улучшения в производительности. Эти модели сохранили свои общие качественные показатели, такие как гибкость и базовые функциональные возможности, не потеряв в общей точности и галости. Кроме того, PaLRS показала значительные экономии времени и ресурсов по сравнению с Direct Preference Optimization (DPO), в то же время оставаясь более эффективной и перспективной альтернативой. ## Значимость Предлагаемый подход имеет широкие применения в области адаптации LLMs к пользовательским предпочтениям. Он может использоваться в сферах, где требуется высокая гибкость и эффективность в настройке моделей на особые задачи (например, генерация кода, медицинские задачи, специализированные задачи технического письма). Одним из преимуществ PaLRS является его тренировочно-свободный характер, который позволяет избежать времязатратных и ресурсоемких процессов оптимизации. Благодаря этому, PaLRS может быть широко использован в сценариях, где не

Annotation:

Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically require curated data and expensive optimization over billions of parameters, and eventually lead to persistent task-specific models. In this work, we introduce Preference alignment of Large Language Models via Residual Steering (PaLRS), a training-free method t...

ID: 2509.23982v1 cs.CL, cs.AI, cs.CY, cs.LG, cs.NE

arXiv PDF

Показано 31 - 40 из 67 записей