CVPR 2026

Adaptive Video Distillation:
Mitigating Oversaturation and Temporal Collapse in Few-Step Generation

A unified framework that incorporates adaptive regression loss and temporal regularization into distribution matching distillation for high-quality, few-step video generation.

Yuyang You^1,* Yongzhi Li^2,† Jiahui Li² Yadong Mu^1,‡ Quan Chen^2,‡ Peng Jiang²

¹ Peking University ² Kuaishou Technology

^† Project Leader · ^‡ Corresponding Authors

CVPR 2026

Paper Code

INTRODUCTION

Overview

We incorporate Adaptive Regression Loss and Temporal Regularization Loss into Distribution Matching Distillation (DMD) to mitigate oversaturation and low dynamism in video tasks. Furthermore, our approach enables Supervised Fine-tuning (SFT) concurrently with distillation, facilitating effective style transfer.

🎯 Adaptive Regression ⏳ Temporal Regularization ⚡ Few-Step Generation 🎨 Style Transfer via SFT

RESULTS

Video Results

High-quality video examples generated by the student model distilled from Wan2.1-T2V-1.3B using 4-step sampling with our method.

APPROACH

Method

Our method distills a pre-trained teacher model s_data into a few-step video generator G_φ. The training procedure consists of the following steps:

A batch of real video-text pairs is sampled from the dataset. After applying noise perturbations to the videos, the student model performs denoising reconstruction. A regression loss is computed between the reconstructed video and the ground-truth video. Subsequently, this loss is adaptively weighted using our Loss Mean Cache to produce the final adaptive regression loss.
Text conditions are sampled from the dataset to guide the student model in generating a video from pure noise. The denoised output from this process is used to compute a temporal regularization loss and a distribution matching loss.
Finally, the generator G_φ is updated via gradient descent using the combined losses. The s_{gen, ξ} in DMD are updated separately, following the methodology of DMD2.

APPLICATION

Fine-tuning during Distillation

The real datasets used during distillation enable the student model to learn new knowledge, effectively performing fine-tuning during the distillation process — unlocking seamless style transfer capabilities.

CITATION

BibTeX