MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification

2508.02584v1 cs.CL, cs.AI 2025-08-09

Авторы:

Ming Pok Ng, Junqi Jiang, Gabriel Freedman, Antonio Rago, Francesca Toni

Резюме на русском

Настоящая работа предлагает MArgE, новую архитектуру для объединения выводов нескольких больших языковых моделей (LLM) при выполнении задачи проверки утверждений. Основная проблема заключается в том, что нынешние методы комбинации выводов нескольких LLMs часто основываются на неструктурированных интеракциях (например, свободных дебатах), что приводит к недостоверной и незащищаемой окончательной ответной модели. Разработанная архитектура MArgE использует ArgLLMs для построения структурированных деревьев аргументов для каждого LLM, чтобы обеспечить прозрачную и доказуемую логику вывода. Экспериментальные исследования показали, что MArgE превосходит открытые LLM модели, включая GPT-4o-mini, а также другие методы для решения этой задачи, демонстрируя преимущество формальных методов аргументированного рассуждения при объединении выводов нескольких LLM.

Abstract

Leveraging outputs from multiple large language models (LLMs) is emerging as a method for harnessing their power across a wide range of tasks while mitigating their capacity for making errors, e.g., hallucinations. However, current approaches to combining insights from multiple LLMs often involve unstructured interactions (e.g., free debate), resulting in model generations that are not faithfully justifiable. In this work, we introduce MArgE, a novel framework to provide formal structure to the evidence from each LLM, in the form of a tree of extracted arguments, for the task of claim verification. We use a variant of Argumentative LLMs (ArgLLMs), i.e. LLMs driven by frameworks and semantics from the field of computational argumentation, to construct structured argument trees for given claims. This process creates an inspectable pathway from the initial arguments to the final claim verification decisions, providing a faithful justification thereof. We show experimentally that MArgE can significantly outperform single LLMs, including three open-source models (4B to 8B parameters), GPT-4o-mini and existing ArgLLMs, as well as prior methods for unstructured multi-LLM debates. We thus demonstrate the advantages of incorporating formal, argumentative reasoning mechanisms when combining multiple LLM outputs.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

UW-BioNLP at ChemoTimelines 2025: Thinking, Fine-Tuning, and Dictionary-Enhanced...

AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quan...

Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Sou...

SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over ...

Навигация