3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG
2510.04536v1
cs.GR, cs.AI, cs.CV
2025-10-08
Авторы:
Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Satoshi Ohshima, Takahiro Katagiri
Abstract
This paper proposes "3Dify," a procedural 3D computer graphics (3D-CG)
generation framework utilizing Large Language Models (LLMs). The framework
enables users to generate 3D-CG content solely through natural language
instructions. 3Dify is built upon Dify, an open-source platform for AI
application development, and incorporates several state-of-the-art LLM-related
technologies such as the Model Context Protocol (MCP) and Retrieval-Augmented
Generation (RAG). For 3D-CG generation support, 3Dify automates the operation
of various Digital Content Creation (DCC) tools via MCP. When DCC tools do not
support MCP-based interaction, the framework employs the Computer-Using Agent
(CUA) method to automate Graphical User Interface (GUI) operations. Moreover,
to enhance image generation quality, 3Dify allows users to provide feedback by
selecting preferred images from multiple candidates. The LLM then learns
variable patterns from these selections and applies them to subsequent
generations. Furthermore, 3Dify supports the integration of locally deployed
LLMs, enabling users to utilize custom-developed models and to reduce both time
and monetary costs associated with external API calls by leveraging their own
computational resources.
Ссылки и действия
Дополнительные ресурсы: