SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model
2510.10910v1
cs.CV, eess.IV
2025-10-15
Авторы:
Honghui Yuan, Keiji Yanai
Abstract
With the rapid development of diffusion models, style transfer has made
remarkable progress. However, flexible and localized style editing for scene
text remains an unsolved challenge. Although existing scene text editing
methods have achieved text region editing, they are typically limited to
content replacement and simple styles, which lack the ability of free-style
transfer. In this paper, we introduce SceneTextStylizer, a novel training-free
diffusion-based framework for flexible and high-fidelity style transfer of text
in scene images. Unlike prior approaches that either perform global style
transfer or focus solely on textual content modification, our method enables
prompt-guided style transformation specifically for text regions, while
preserving both text readability and stylistic consistency. To achieve this, we
design a feature injection module that leverages diffusion model inversion and
self-attention to transfer style features effectively. Additionally, a region
control mechanism is introduced by applying a distance-based changing mask at
each denoising step, enabling precise spatial control. To further enhance
visual quality, we incorporate a style enhancement module based on the Fourier
transform to reinforce stylistic richness. Extensive experiments demonstrate
that our method achieves superior performance in scene text style
transformation, outperforming existing state-of-the-art methods in both visual
fidelity and text preservation.
Ссылки и действия
Дополнительные ресурсы: