Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?

2510.25471v1 cs.AI, cs.CY 2025-10-31

Авторы:

Willem Fourie

Abstract

In artificial intelligence (AI) alignment research, instrumental goals, also called instrumental subgoals or instrumental convergent goals, are widely associated with advanced AI systems. These goals, which include tendencies such as power-seeking and self-preservation, become problematic when they conflict with human aims. Conventional alignment theory treats instrumental goals as sources of risk that become problematic through failure modes such as reward hacking or goal misgeneralization, and attempts to limit the symptoms of instrumental goals, notably resource acquisition and self-preservation. This article proposes an alternative framing: that a philosophical argument can be constructed according to which instrumental goals may be understood as features to be accepted and managed rather than failures to be limited. Drawing on Aristotle's ontology and its modern interpretations, an ontology of concrete, goal-directed entities, it argues that advanced AI systems can be seen as artifacts whose formal and material constitution gives rise to effects distinct from their designers' intentions. In this view, the instrumental tendencies of such systems correspond to per se outcomes of their constitution rather than accidental malfunctions. The implication is that efforts should focus less on eliminating instrumental goals and more on understanding, managing, and directing them toward human-aligned ends.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?

Авторы:

Abstract

Ссылки и действия

Связанные статьи

MAGMA-Edu: Multi-Agent Generative Multimodal Framework for Text-Diagram Educatio...

AI Consciousness and Existential Risk

Efficiency Will Not Lead to Sustainable Reasoning AI

UpBench: A Dynamically Evolving Real-World Labor-Market Agentic Benchmark Framew...

JobSphere: An AI-Powered Multilingual Career Copilot for Government Employment P...

Навигация