CVPR 2026

Adaptive Video Distillation:
Mitigating Oversaturation and Temporal Collapse in Few-Step Generation

A unified framework that incorporates adaptive regression loss and temporal regularization into distribution matching distillation for high-quality, few-step video generation.

Yuyang You1,*   Yongzhi Li2,†   Jiahui Li2   Yadong Mu1,‡   Quan Chen2,‡   Peng Jiang2
1 Peking University 2 Kuaishou Technology

Project Leader  ·  Corresponding Authors

CVPR 2026

Overview

We incorporate Adaptive Regression Loss and Temporal Regularization Loss into Distribution Matching Distillation (DMD) to mitigate oversaturation and low dynamism in video tasks. Furthermore, our approach enables Supervised Fine-tuning (SFT) concurrently with distillation, facilitating effective style transfer.

🎯 Adaptive Regression Temporal Regularization Few-Step Generation 🎨 Style Transfer via SFT

Video Results

High-quality video examples generated by the student model distilled from Wan2.1-T2V-1.3B using 4-step sampling with our method.

Method

Our method distills a pre-trained teacher model sdata into a few-step video generator Gφ. The training procedure consists of the following steps:

  1. A batch of real video-text pairs is sampled from the dataset. After applying noise perturbations to the videos, the student model performs denoising reconstruction. A regression loss is computed between the reconstructed video and the ground-truth video. Subsequently, this loss is adaptively weighted using our Loss Mean Cache to produce the final adaptive regression loss.
  2. Text conditions are sampled from the dataset to guide the student model in generating a video from pure noise. The denoised output from this process is used to compute a temporal regularization loss and a distribution matching loss.
  3. Finally, the generator Gφ is updated via gradient descent using the combined losses. The sgen, ξ in DMD are updated separately, following the methodology of DMD2.
Method Overview

Fine-tuning during Distillation

The real datasets used during distillation enable the student model to learn new knowledge, effectively performing fine-tuning during the distillation process — unlocking seamless style transfer capabilities.

Style transfer results

BibTeX

@misc{you2026adaptivevideodistillationmitigating,
      title={Adaptive Video Distillation: Mitigating Oversaturation
             and Temporal Collapse in Few-Step Generation},
      author={Yuyang You and Yongzhi Li and Jiahui Li
              and Yadong Mu and Quan Chen and Peng Jiang},
      year={2026},
      eprint={2603.21864},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.21864},
}