📊 Статистика дайджестов

Всего дайджестов: 34123 Добавлено сегодня: 101

Последнее обновление: сегодня

📄 RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables

2025-11-08

Авторы:

Nikhil Abhyankar, Purvi Chaurasia, Sanchit Kabra, Ananya Srivastava, Vivek Gupta, Chandan K. Reddy

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing tabular reasoning benchmarks mostly test models on small, uniform tables, underrepresenting the complexity of real-world data and giving an incomplete view of Large Language Models' (LLMs) reasoning abilities. Real tables are long, heterogeneous, and domain-specific, mixing structured fields with free text and requiring multi-hop reasoning across thousands of tokens. To address this gap, we introduce RUST-BENCH, a benchmark of 7966 questions from 2031 real-world tables spanning two doma...

ID: 2511.04491v1 cs.CL, cs.AI, cs.DB, cs.IR, cs.LG

arXiv PDF

📄 Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering

2025-11-08

Авторы:

Christos-Nikolaos Zacharopoulos, Revekka Kyriakoglou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As Large Language Models (LLMs) become integral to human-centered applications, understanding their personality-like behaviors is increasingly important for responsible development and deployment. This paper systematically evaluates six LLMs, applying the Big Five Inventory-2 (BFI-2) framework, to assess trait expressions under varying sampling temperatures. We find significant differences across four of the five personality dimensions, with Neuroticism and Extraversion susceptible to temperatur...

ID: 2511.04499v1 cs.CL, cs.AI

arXiv PDF

📄 RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG

2025-11-08

Авторы:

Joshua Gao, Quoc Huy Pham, Subin Varghese, Silwal Saurav, Vedhus Hoskere

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Retrieval-Augmented Generation (RAG) is a critical technique for grounding Large Language Models (LLMs) in factual evidence, yet evaluating RAG systems in specialized, safety-critical domains remains a significant challenge. Existing evaluation frameworks often rely on heuristic-based metrics that fail to capture domain-specific nuances and other works utilize LLM-as-a-Judge approaches that lack validated alignment with human judgment. This paper introduces RAGalyst, an automated, human-aligned ...

ID: 2511.04502v1 cs.CL, cs.AI

arXiv PDF

📄 Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

2025-11-08

Авторы:

Amir Zur, Atticus Geiger, Ekdeep Singh Lubana, Eric Bigelow

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

When a language model generates text, the selection of individual tokens might lead it down very different reasoning paths, making uncertainty difficult to quantify. In this work, we consider whether reasoning language models represent the alternate paths that they could take during generation. To test this hypothesis, we use hidden activations to control and predict a language model's uncertainty during chain-of-thought reasoning. In our experiments, we find a clear correlation between how unce...

ID: 2511.04527v1 cs.CL, cs.AI

arXiv PDF

📄 OceanAI: A Conversational Platform for Accurate, Transparent, Near-Real-Time Oceanographic Insights

2025-11-07

Авторы:

Bowen Chen, Jayesh Gajbhar, Gregory Dusek, Rob Redmon, Patrick Hogan, Paul Liu, DelWayne Bohnenstiehl, Dongkuan Xu, Ruoying He

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Artificial intelligence is transforming the sciences, yet general conversational AI systems often generate unverified "hallucinations" undermining scientific rigor. We present OceanAI, a conversational platform that integrates the natural-language fluency of open-source large language models (LLMs) with real-time, parameterized access to authoritative oceanographic data streams hosted by the National Oceanic and Atmospheric Administration (NOAA). Each query such as "What was Boston Harbor's high...

ID: 2511.01019v2 cs.CL, cs.AI, cs.CE, cs.LG, physics.ao-ph

arXiv PDF

📄 Reading Between the Lines: The One-Sided Conversation Problem

2025-11-07

Авторы:

Victoria Ebert, Rishabh Singh, Tuochao Chen, Noah A. Smith, Shyamnath Gollakota

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Conversational AI is constrained in many real-world settings where only one side of a dialogue can be recorded, such as telemedicine, call centers, and smart glasses. We formalize this as the one-sided conversation problem (1SC): inferring and learning from one side of a conversation. We study two tasks: (1) reconstructing the missing speaker's turns for real-time use cases, and (2) generating summaries from one-sided transcripts. Evaluating prompting and finetuned models on MultiWOZ, DailyDialo...

ID: 2511.03056v1 cs.CL, cs.AI, cs.LG

arXiv PDF

📄 CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic

2025-11-07

Авторы:

Saad Mankarious, Ayah Zirikly

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Mental health disorders affect millions worldwide, yet early detection remains a major challenge, particularly for Arabic-speaking populations where resources are limited and mental health discourse is often discouraged due to cultural stigma. While substantial research has focused on English-language mental health detection, Arabic remains significantly underexplored, partly due to the scarcity of annotated datasets. We present CARMA, the first automatically annotated large-scale dataset of Ara...

ID: 2511.03102v1 cs.CL, cs.AI

arXiv PDF

📄 Control Barrier Function for Aligning Large Language Models

2025-11-07

Авторы:

Yuya Miyaoka, Masaki Inoue

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the CBF safety filter to the predicted token generated from the baseline LLM, to intervene in the generated text. The safety filter includes two significant advantages: this safety filter is an add-on type, allowing it to be used for alignment purposes without fine-tuning the baseline LLM, ...

ID: 2511.03121v2 cs.CL, cs.AI, cs.SY, eess.SY

arXiv PDF

📄 Who Sees the Risk? Stakeholder Conflicts and Explanatory Policies in LLM-based Risk Assessment

2025-11-07

Авторы:

Srishti Yadav, Jasmina Gajcin, Erik Miehling, Elizabeth Daly

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Understanding how different stakeholders perceive risks in AI systems is essential for their responsible deployment. This paper presents a framework for stakeholder-grounded risk assessment by using LLMs, acting as judges to predict and explain risks. Using the Risk Atlas Nexus and GloVE explanation method, our framework generates stakeholder-specific, interpretable policies that shows how different stakeholders agree or disagree about the same risks. We demonstrate our method using three real-w...

ID: 2511.03152v1 cs.CL, cs.AI

arXiv PDF

📄 LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval

2025-11-07

Авторы:

Wenchang Lei, Ping Zou, Yue Wang, Feng Sun, Lei Zhao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) exhibit strong semantic understanding, yet struggle when user instructions involve ambiguous or conceptually misaligned terms. We propose the Language Graph Model (LGM) to enhance conceptual clarity by extracting meta-relations-inheritance, alias, and composition-from natural language. The model further employs a reflection mechanism to validate these meta-relations. Leveraging a Concept Iterative Retrieval Algorithm, these relations and related descriptions are dyna...

ID: 2511.03214v1 cs.CL, cs.AI

arXiv PDF

Показано 351 - 360 из 2054 записей