Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis

2508.04551v1 cs.CV 2025-08-09

Авторы:

Angang Zhang, Fang Deng, Hao Chen, Zhongjian Chen, Junyan Li

Резюме на русском

Несмотря на развитие технологий виртуальной попытки одежды (VTON), обратная задача — виртуальная попытка снятия (VTOFF), направленная на воссоздание шаблонов одежды из одетых людей — остается значительно недообследованной. Большинство работ рассматривают эти задачи как изолированные: VTON специализируется на одевании, в то время как VTOFF — на извлечении одежды, что недостаточно учитывает их взаимосвязь. Мы предлагаем новую модель Two-Way Garment Transfer Model (TWGTM), которая, по первым данным, является первым универсальным подходом к объединенному синтезу изображений одежды, решающим обе задачи одновременно. Модель использует двойное влияние условий из латентного и пиксельного пространств справочных изображений, чтобы объединить две задачи в единое целое. Для того, чтобы устранить асимметрию в зависимости от масок между VTON и VTOFF, мы разработали тренировочный парадигму, постепенно решающую эту проблему. Эксперименты на датасетах DressCode и VITON-HD подтвердили высокую эффективность и конкурентную привлекательность нашего подхода.

Abstract

While recent advances in virtual try-on (VTON) have achieved realistic garment transfer to human subjects, its inverse task, virtual try-off (VTOFF), which aims to reconstruct canonical garment templates from dressed humans, remains critically underexplored and lacks systematic investigation. Existing works predominantly treat them as isolated tasks: VTON focuses on garment dressing while VTOFF addresses garment extraction, thereby neglecting their complementary symmetry. To bridge this fundamental gap, we propose the Two-Way Garment Transfer Model (TWGTM), to the best of our knowledge, the first unified framework for joint clothing-centric image synthesis that simultaneously resolves both mask-guided VTON and mask-free VTOFF through bidirectional feature disentanglement. Specifically, our framework employs dual-conditioned guidance from both latent and pixel spaces of reference images to seamlessly bridge the dual tasks. On the other hand, to resolve the inherent mask dependency asymmetry between mask-guided VTON and mask-free VTOFF, we devise a phased training paradigm that progressively bridges this modality gap. Extensive qualitative and quantitative experiments conducted across the DressCode and VITON-HD datasets validate the efficacy and competitive edge of our proposed approach.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimoda...

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with P...

ViDiC: Video Difference Captioning

Beyond the Ground Truth: Enhanced Supervision for Image Restoration

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task ...

Навигация