📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Christian Marinoni, Riccardo Fosco Gramaccioni, Kazuki Shimada, Takashi Shibuya, Yuki Mitsufuji, Danilo Comminiello
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Although audio generation has been widely studied over recent years,
video-aligned audio generation still remains a relatively unexplored frontier.
To address this gap, we introduce StereoSync, a novel and efficient model
designed to generate audio that is both temporally synchronized with a
reference video and spatially aligned with its visual context. Moreover,
StereoSync also achieves efficiency by leveraging pretrained foundation models,
reducing the need for extensive training while maintai...
Авторы:
Riccardo Fosco Gramaccioni, Christian Marinoni, Eleonora Grassucci, Giordano Cicchetti, Aurelio Uncini, Danilo Comminiello
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this work, we present FoleyGRAM, a novel approach to video-to-audio
generation that emphasizes semantic conditioning through the use of aligned
multimodal encoders. Building on prior advancements in video-to-audio
generation, FoleyGRAM leverages the Gramian Representation Alignment Measure
(GRAM) to align embeddings across video, text, and audio modalities, enabling
precise semantic control over the audio generation process. The core of
FoleyGRAM is a diffusion-based audio synthesis model con...