📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models

2025-12-03

Авторы:

Olivia Kim

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Prompt design plays a critical role in the reasoning performance of large language models (LLMs), yet the impact of prompt specificity - how detailed or vague a prompt is - remains understudied. This paper introduces DETAIL, a framework for evaluating LLM performance across varying levels of prompt specificity. We generate multi-level prompts using GPT-4, quantify specificity via perplexity, and assess correctness using GPT-based semantic equivalence. Experiments on 30 novel reasoning tasks acro...

ID: 2512.02246v1 cs.CL, cs.AI

arXiv PDF

📄 HealthContradict: Evaluating Biomedical Knowledge Conflicts in Language Models

2025-12-03

Авторы:

Boya Zhang, Alban Bornet, Rui Yang, Nan Liu, Douglas Teodoro

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

How do language models use contextual information to answer health questions? How are their responses impacted by conflicting contexts? We assess the ability of language models to reason over long, conflicting biomedical contexts using HealthContradict, an expert-verified dataset comprising 920 unique instances, each consisting of a health-related question, a factual answer supported by scientific evidence, and two documents presenting contradictory stances. We consider several prompt settings, ...

ID: 2512.02299v1 cs.CL, cs.AI

arXiv PDF

📄 Memory-Augmented Knowledge Fusion with Safety-Aware Decoding for Domain-Adaptive Question Answering

2025-12-03

Авторы:

Lei Fu, Xiang Chen, Kaige Gao Xinyue Huang, Kejian Tong

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Domain-specific question answering (QA) systems for services face unique challenges in integrating heterogeneous knowledge sources while ensuring both accuracy and safety. Existing large language models often struggle with factual consistency and context alignment in sensitive domains such as healthcare policies and government welfare. In this work, we introduce Knowledge-Aware Reasoning and Memory-Augmented Adaptation (KARMA), a novel framework designed to enhance QA performance in care scenari...

ID: 2512.02363v1 cs.CL, cs.AI

arXiv PDF

📄 ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce

2025-12-03

Авторы:

Zheng Fang, Donghao Xie, Ming Pang, Chunyuan Yuan, Xue Jiang, Changping Peng, Zhangang Lin, Zheng Luo

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Relevance modeling in e-commerce search remains challenged by semantic gaps in term-matching methods (e.g., BM25) and neural models' reliance on the scarcity of domain-specific hard samples. We propose ADORE, a self-sustaining framework that synergizes three innovations: (1) A Rule-aware Relevance Discrimination module, where a Chain-of-Thought LLM generates intent-aligned training data, refined via Kahneman-Tversky Optimization (KTO) to align with user behavior; (2) An Error-type-aware Data Syn...

ID: 2512.02555v1 cs.CL, cs.AI, cs.IR

arXiv PDF

📄 Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs

2025-12-03

Авторы:

Julian Ma, Jun Wang, Zafeirios Fountas

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) excel at explicit reasoning, but their implicit computational strategies remain underexplored. Decades of psychophysics research show that humans intuitively process and integrate noisy signals using near-optimal Bayesian strategies in perceptual tasks. We ask whether LLMs exhibit similar behaviour and perform optimal multimodal integration without explicit training or instruction. Adopting the psychophysics paradigm, we infer computational principles of LLMs from sy...

ID: 2512.02719v1 cs.CL, cs.AI, cs.CV, cs.LG, q-bio.NC

arXiv PDF

📄 An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation

2025-12-03

Авторы:

Daiki Shirafuji, Tatsuhiko Saito, Yasutomo Kimura

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) are known to inherit and even amplify societal biases present in their pre-training corpora, threatening fairness and social trust. To address this issue, recent work has explored ``editing'' LLM parameters to mitigate social bias with model merging approaches; however, there is no empirical comparison. In this work, we empirically survey seven algorithms: Linear, Karcher Mean, SLERP, NuSLERP, TIES, DELLA, and Nearswap, applying 13 open weight models in the GPT, LLaM...

ID: 2512.02689v1 cs.CL, cs.AI

arXiv PDF

📄 SurveyEval: Towards Comprehensive Evaluation of LLM-Generated Academic Surveys

2025-12-03

Авторы:

Jiahao Zhao, Shuaixing Zhang, Nan Xu, Lei Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

LLM-based automatic survey systems are transforming how users acquire information from the web by integrating retrieval, organization, and content synthesis into end-to-end generation pipelines. While recent works focus on developing new generation pipelines, how to evaluate such complex systems remains a significant challenge. To this end, we introduce SurveyEval, a comprehensive benchmark that evaluates automatically generated surveys across three dimensions: overall quality, outline coherence...

ID: 2512.02763v1 cs.CL, cs.AI

arXiv PDF

📄 Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages

2025-12-03

Авторы:

Lechen Zhang, Yusheng Zhou, Tolga Ergen, Lajanugen Logeswaran, Moontae Lee, David Jurgens

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

System prompts provide a lightweight yet powerful mechanism for conditioning large language models (LLMs) at inference time. While prior work has focused on English-only settings, real-world deployments benefit from having a single prompt to operate reliably across languages. This paper presents a comprehensive study of how different system prompts steer models toward accurate and robust cross-lingual behavior. We propose a unified four-dimensional evaluation framework to assess system prompts i...

ID: 2512.02841v1 cs.CL, cs.AI, cs.HC, cs.LG

arXiv PDF

📄 Fine-Tuned Large Language Models for Logical Translation: Reducing Hallucinations with Lang2Logic

2025-12-03

Авторы:

Muyu Pan, Dheeraj Kodakandla, Mahfuza Farooque

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent advances in natural language processing (NLP), particularly large language models (LLMs), have motivated the automatic translation of natural language statements into formal logic without human intervention. This enables automated reasoning and facilitates debugging, finding loop invariants, and adhering to specifications in software systems. However, hallucinations-incorrect outputs generated by LLMs are challenging, particularly for logical translation tasks requiring precision. This wo...

ID: 2512.02987v1 cs.CL, cs.AI

arXiv PDF

📄 AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models

2025-12-02

Авторы:

Yann Le Beux, Oluchi Audu, Oche D. Ankeli, Dhananjay Balakrishnan, Melissah Weya, Marie D. Ralaiarinosy, Ignatius Ezeani

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing AI bias evaluation benchmarks largely reflect Western perspectives, leaving African contexts underrepresented and enabling harmful stereotypes in applications across various domains. To address this gap, we introduce AfriStereo, the first open-source African stereotype dataset and evaluation framework grounded in local socio-cultural contexts. Through community engaged efforts across Senegal, Kenya, and Nigeria, we collected 1,163 stereotypes spanning gender, ethnicity, religion, age, a...

ID: 2511.22016v1 cs.CL, cs.AI, cs.LG

arXiv PDF

Показано 41 - 50 из 2042 записей