Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective

2512.04691v1 cs.AI, cs.CL, cs.MA 2025-12-06

Авторы:

Jae Hee Lee, Anne Lauscher, Stefano V. Albrecht

Abstract

Large language models (LLMs) have been widely deployed in various applications, often functioning as autonomous agents that interact with each other in multi-agent systems. While these systems have shown promise in enhancing capabilities and enabling complex tasks, they also pose significant ethical challenges. This position paper outlines a research agenda aimed at ensuring the ethical behavior of multi-agent systems of LLMs (MALMs) from the perspective of mechanistic interpretability. We identify three key research challenges: (i) developing comprehensive evaluation frameworks to assess ethical behavior at individual, interactional, and systemic levels; (ii) elucidating the internal mechanisms that give rise to emergent behaviors through mechanistic interpretability; and (iii) implementing targeted parameter-efficient alignment techniques to steer MALMs towards ethical behaviors without compromising their performance.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective

Авторы:

Abstract

Ссылки и действия

Связанные статьи

AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scie...

DataSage: Multi-agent Collaboration for Insight Discovery with External Knowledg...

SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning

AI Founding Fathers: A Case Study of GIS Search in Multi-Agent Pipelines

Solving a Million-Step LLM Task with Zero Errors

Навигация