Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video
2511.03227v2
cs.HC, cs.AI, cs.MM
2025-11-07
Авторы:
Alexander Htet Kyaw, Lenin Ravindranath Sivalingam
Abstract
We present a node-based storytelling system for multimodal content
generation. The system represents stories as graphs of nodes that can be
expanded, edited, and iteratively refined through direct user edits and
natural-language prompts. Each node can integrate text, images, audio, and
video, allowing creators to compose multimodal narratives. A task selection
agent routes between specialized generative tasks that handle story generation,
node structure reasoning, node diagram formatting, and context generation. The
interface supports targeted editing of individual nodes, automatic branching
for parallel storylines, and node-based iterative refinement. Our results
demonstrate that node-based editing supports control over narrative structure
and iterative generation of text, images, audio, and video. We report
quantitative outcomes on automatic story outline generation and qualitative
observations of editing workflows. Finally, we discuss current limitations such
as scalability to longer narratives and consistency across multiple nodes, and
outline future work toward human-in-the-loop and user-centered creative AI
tools.
Ссылки и действия
Дополнительные ресурсы: