QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution

2508.04485v1 cs.CV 2025-08-09

Авторы:

Bowen Chai, Zheng Chen, Libo Zhu, Wenbo Li, Yong Guo, Yulun Zhang

Резюме на русском

Существующие diffusion-based VSR-модели гарантируют высокую точность, но слишком ресурсоемки для реального применения. Бит-дробление может уменьшить эту нагрузку, но требует учета характеристик VSR, таких как временные зависимости. Мы предлагаем QuantVSR — модель квантования в реальном видео-суперразрешении с низким битовым режимом. Механизм spatio-temporal complexity aware (STCA) оценивает пространственную и временную сложность каждого слоя и назначает им уровни точности. Это позволяет оптимизировать параллельные полноформатные и низкобитные ветки, а также компенсировать квантовое расхождение с помощью learnable bias alignment (LBA). Эксперименты показали, что QuantVSR сохраняет точность FP-модели и показывает существенное превосходство по сравнению с текущими low-bit-quantization-методами. Результаты доступны на GitHub: [https://github.com/bowenchai/QuantVSR](https://github.com/bowenchai/QuantVSR).

Abstract

Diffusion models have shown superior performance in real-world video super-resolution (VSR). However, the slow processing speeds and heavy resource consumption of diffusion models hinder their practical application and deployment. Quantization offers a potential solution for compressing the VSR model. Nevertheless, quantizing VSR models is challenging due to their temporal characteristics and high fidelity requirements. To address these issues, we propose QuantVSR, a low-bit quantization model for real-world VSR. We propose a spatio-temporal complexity aware (STCA) mechanism, where we first utilize the calibration dataset to measure both spatial and temporal complexities for each layer. Based on these statistics, we allocate layer-specific ranks to the low-rank full-precision (FP) auxiliary branch. Subsequently, we jointly refine the FP and low-bit branches to achieve simultaneous optimization. In addition, we propose a learnable bias alignment (LBA) module to reduce the biased quantization errors. Extensive experiments on synthetic and real-world datasets demonstrate that our method obtains comparable performance with the FP model and significantly outperforms recent leading low-bit quantization methods. Code is available at: https://github.com/bowenchai/QuantVSR.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimoda...

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with P...

ViDiC: Video Difference Captioning

Beyond the Ground Truth: Enhanced Supervision for Image Restoration

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task ...

Навигация