MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence
2511.01594v1
cs.RO, cs.CV, I.2.9; I.2.11; I.2.6; I.4.8
2025-11-06
Авторы:
Renjun Gao, Peiyan Zhong
Abstract
Multimodal large language models (MLLMs) have shown remarkable capabilities
in cross-modal understanding and reasoning, offering new opportunities for
intelligent assistive systems, yet existing systems still struggle with
risk-aware planning, user personalization, and grounding language plans into
executable skills in cluttered homes. We introduce MARS - a Multi-Agent Robotic
System powered by MLLMs for assistive intelligence and designed for smart home
robots supporting people with disabilities. The system integrates four agents:
a visual perception agent for extracting semantic and spatial features from
environment images, a risk assessment agent for identifying and prioritizing
hazards, a planning agent for generating executable action sequences, and an
evaluation agent for iterative optimization. By combining multimodal perception
with hierarchical multi-agent decision-making, the framework enables adaptive,
risk-aware, and personalized assistance in dynamic indoor environments.
Experiments on multiple datasets demonstrate the superior overall performance
of the proposed system in risk-aware planning and coordinated multi-agent
execution compared with state-of-the-art multimodal models. The proposed
approach also highlights the potential of collaborative AI for practical
assistive scenarios and provides a generalizable methodology for deploying
MLLM-enabled multi-agent systems in real-world environments.