
Figure 1. Lvmin Zhang looking at his looooooong TODO list at 1:30 am.

FramePack: Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion
Models
Lvmin Zhang, Shengqu Cai, Muyang Li, Gordon Wetzstein,
Maneesh Agrawala
FramePack is a neural network structure for next-frame video prediction. It compresses input frames based on importance, enabling longer contexts by packing more frames into a fixed length. Drift prevention methods reduce error accumulation during generation.
ArXiv v1 / ArXiv P1 (coming soon) /
FramePack /
FramePack-P1 /
Code

PaintsAlter: Generating Past and Future in Digital Painting Processes
ACM Transactions on Graphics (SIGGRAPH 2025)
Lvmin Zhang, Chuan Yan, Yuwei Guo, Jinbo Xing, Maneesh Agrawala
The PaintsAlter framework generates past and future drawing process states from a single user canvas image. It repurposes video diffusion models to learn a set-to-set mapping, enabling generation of non-contiguous preceding or succeeding drawing steps.
PaintsUndo /
PaintsAlter /
Code

IC-Light: Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by
Imposing Consistent Light Transport
International Conference on Learning Representations (ICLR) 2025 Oral (1%)
Lvmin Zhang, Anyi Rao and Maneesh Agrawala
IC-Light is a training method for diffusion-based relighting models. It imposes a consistent light transport principle, based on linear light blending, to enable scalable training that modifies only illumination while preserving intrinsic image properties.
OpenReview /
Code

Transparent Image Layer Diffusion using Latent Transparency
(LayerDiffuse)
ACM Transactions on Graphics (SIGGRAPH 2024)
Lvmin Zhang and Maneesh Agrawala
LayerDiffuse is an approach for generating transparent images or layers with latent diffusion models. It introduces a "latent transparency" offset, encoding the alpha channel into the latent space via finetuning while preserving original model quality.
ArXiv /
Code

Adding Conditional Control to Text-to-Image Diffusion Models
(ControlNet)
International Conference on Computer Vision (ICCV) 2023 Best Paper (Marr Prize)
Lvmin Zhang, Anyi Rao and Maneesh Agrawala
ControlNet is a neural network architecture adding spatial conditioning to large text-to-image models. It locks a pretrained model's weights while a trainable copy, connected via zero convolutions, learns controls like edges, depth, and human pose.
ArXiv /
Code /
Code(v1.1)

Sprite-from-Sprite: Cartoon Animation Decomposition with Self-supervised Sprite
Estimation
ACM Transactions on Graphics (SIGGRAPH ASIA 2022, Journal Track)
Lvmin
Zhang, Tien-Tsin Wong, and
Yuxin Liu
The "sprites" in real-world cartoons are unique: artists may draw arbitrary sprite animations
for expressiveness, or alternatively, artists may also reduce their workload by tweening and
adjusting contents. Can we use these properties to do a "reverse engineering" to get the
original sprites in digital animation?
Know more
...

SmartShadow: Artistic Shadow Drawing Tool for Line Drawings
IEEE International Conference on Computer Vision (ICCV) 2021 Oral (3%)
Lvmin
Zhang, Jinyue Jiang, Yi Ji, and
Chunping Liu
A flexible shadow drawing tool for line drawings, supporting interactive editing of
cartoon-style shadows.
Know more
...

Generating Digital Painting Lighting Effects via RGB-space Geometry
ACM Transactions on Graphics (Presented in SIGGRAPH 2020)
Lvmin
Zhang, Edgar Simo-Serra, Yi Ji, and
Chunping Liu
A project conducted to investigate how artists apply lighting effects to their artworks, and
how we can assist such workflow. The main idea is to observe the real painting behaviors and
procedures of artists so that we can model the illumination in their artworks.
Know more ...

Generating Manga from Illustrations via Mimicking Manga Workflow
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
Lvmin
Zhang, Xinrui Wang, Qingnan Fan, Yi Ji, and
Chunping Liu
We propose a data-driven framework to convert a digital illustration into three corresponding
components: manga line drawing, regular screentone, and irregular screen texture. These
components can be directly composed into manga images and can be further retouched for more
plentiful manga creations.
Know more ...

User-Guided Line Art Flat Filling with Split Filling Mechanism
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
Lvmin
Zhang, Chengze Li, Edgar Simo-Serra, Yi Ji, Tien-tsin Wong, and
Chunping Liu
We present a deep learning framework for user-guided line art flat filling that explicitly
controls the "influence areas" of the user colour scribbles, i.e., the areas where the user
scribbles should propagate and influence, to manipulate the colours of image details and avoid
colour contamination between scribbles, and simultaneously, leverages data-driven colour
generation to facilitate content creation. Know more ...

DanbooRegion: An Illustration Region Dataset
European Conference on Computer Vision (ECCV) 2020
Lvmin
Zhang, Yi Ji, and Chunping Liu
Region is a fundamental element of various cartoon animation techniques and artistic painting
applications. Achieving satisfactory region is essential to the success of these techniques. To
assist diversiform region-based cartoon applications, we use semi-automatic method to annotate
regions for in-the-wild artworks. Know more
...

Erasing Appearance Preservation in Optimization-based Smoothing
European Conference on Computer Vision (ECCV) 2020 Spotlight (5%)
Lvmin
Zhang, Chengze Li, Yi Ji, Chunping Liu, and Tien-tsin Wong
Optimization-based smoothing can be formulated as a smoothing energy and an appearance
preservation energy. We show that partially "erasing" the energy facilitate the smoothing.
Know more ...

Two-stage Sketch Colorization
ACM Transactions on Graphics (SIGGRAPH Asia 2018)
Lvmin
Zhang, Chengze Li, Tien-tsin Wong, Yi Ji,
and Chunping Liu
With the advances of neural networks, automatic or semi-automatic colorization of sketch
become feasible and practical. We present a state-of-the-art semi-automatic (as well as
automatic) colorization from line art. Our improvement is accounted by a divide-and-conquer
scheme. We divide this complex colorization task into two simplier and goal-clearer subtasks,
drafting and refinement. Know more
...