RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance
2510.22684v1
cs.CV, cs.CL
2025-10-29
Авторы:
Jiuniu Wang, Gongjie Zhang, Quanhao Qian, Junlong Gao, Deli Zhao, Ran Xu
Abstract
Scalable Vector Graphics (SVGs) are fundamental to digital design and robot
control, encoding not only visual structure but also motion paths in
interactive drawings. In this work, we introduce RoboSVG, a unified multimodal
framework for generating interactive SVGs guided by textual, visual, and
numerical signals. Given an input query, the RoboSVG model first produces
multimodal guidance, then synthesizes candidate SVGs through dedicated
generation modules, and finally refines them under numerical guidance to yield
high-quality outputs. To support this framework, we construct RoboDraw, a
large-scale dataset of one million examples, each pairing an SVG generation
condition (e.g., text, image, and partial SVG) with its corresponding
ground-truth SVG code. RoboDraw dataset enables systematic study of four tasks,
including basic generation (Text-to-SVG, Image-to-SVG) and interactive
generation (PartialSVG-to-SVG, PartialImage-to-SVG). Extensive experiments
demonstrate that RoboSVG achieves superior query compliance and visual fidelity
across tasks, establishing a new state of the art in versatile SVG generation.
The dataset and source code of this project will be publicly available soon.
Ссылки и действия
Дополнительные ресурсы: