Lvmin Zhang

CineVision: An Interactive Pre-visualization Storyboard System for Director–Cinematographer Collaboration
User Interface Software and Technology (UIST) 2025
Zheng Wei, Hongtao Wu, Lvmin Zhang, Xian Xu, Yefeng Zheng, Pan Hui, Maneesh Agrawala, Huamin Qu, Anyi Rao
CineVision is an AI platform for film pre-production, integrating script writing with real-time visual pre-visualization. It provides dynamic lighting control and style emulation. A user study showed it lowered communication barriers and accelerated storyboarding. Coming soon ...

Instance Segmentation of Scene Sketches Using Natural Image Priors
ACM SIGGRAPH 2025 Conference
Mia Tang, Yael Vinker, Chuan Yan, Lvmin Zhang and Maneesh Agrawala
InkLayer is a method for instance segmentation of raster scene sketches. It adapts segmentation models using class-agnostic fine-tuning and depth cues, then organizes the sketch into sorted, inpainted layers for advanced editing applications. ArXiv / Code

Radial Attention: O(n\log n) Sparse Attention with Energy Decay for Long Video Generation
Neural Information Processing Systems (Neurips) 2025
Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han
Observing "Spatiotemporal Energy Decay," Radial Attention is a scalable sparse attention mechanism for video diffusion. It employs an attention window that shrinks with temporal distance, enabling longer video generation with improved computational efficiency and quality preservation. ArXiv / Code

FramePack: Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models
Neural Information Processing Systems (Neurips) 2025 Spotlight (3%)
Lvmin Zhang, Shengqu Cai, Muyang Li, Gordon Wetzstein, Maneesh Agrawala
FramePack is a neural network structure for next-frame video prediction. It compresses input frames based on importance, enabling longer contexts by packing more frames into a fixed length. Drift prevention methods reduce error accumulation during generation.
ArXiv / FramePack / FramePack-P1 / Code / P1 Code (coming soon)

PaintsAlter: Generating Past and Future in Digital Painting Processes
ACM Transactions on Graphics (SIGGRAPH 2025)
Lvmin Zhang, Chuan Yan, Yuwei Guo, Jinbo Xing, Maneesh Agrawala
The PaintsAlter framework generates past and future drawing process states from a single user canvas image. It repurposes video diffusion models to learn a set-to-set mapping, enabling generation of non-contiguous preceding or succeeding drawing steps.
PaintsUndo / PaintsAlter / Code

IC-Light: Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport
International Conference on Learning Representations (ICLR) 2025 Oral (1%)
Lvmin Zhang, Anyi Rao and Maneesh Agrawala
IC-Light is a training method for diffusion-based relighting models. It imposes a consistent light transport principle, based on linear light blending, to enable scalable training that modifies only illumination while preserving intrinsic image properties. OpenReview / Code

Transparent Image Layer Diffusion using Latent Transparency
(LayerDiffuse)
ACM Transactions on Graphics (SIGGRAPH 2024)
Lvmin Zhang and Maneesh Agrawala
LayerDiffuse is an approach for generating transparent images or layers with latent diffusion models. It introduces a "latent transparency" offset, encoding the alpha channel into the latent space via finetuning while preserving original model quality. ArXiv / Code

Adding Conditional Control to Text-to-Image Diffusion Models
(ControlNet)
International Conference on Computer Vision (ICCV) 2023 Best Paper (Marr Prize)
Lvmin Zhang, Anyi Rao and Maneesh Agrawala
ControlNet is a neural network architecture adding spatial conditioning to large text-to-image models. It locks a pretrained model's weights while a trainable copy, connected via zero convolutions, learns controls like edges, depth, and human pose. ArXiv / Code / Code(v1.1)

Sprite-from-Sprite: Cartoon Animation Decomposition with Self-supervised Sprite Estimation
ACM Transactions on Graphics (SIGGRAPH ASIA 2022, Journal Track)
Lvmin Zhang, Tien-Tsin Wong, and Yuxin Liu
The "sprites" in real-world cartoons are unique: artists may draw arbitrary sprite animations for expressiveness, or alternatively, artists may also reduce their workload by tweening and adjusting contents. Can we use these properties to do a "reverse engineering" to get the original sprites in digital animation? Know more ...

SmartShadow: Artistic Shadow Drawing Tool for Line Drawings
IEEE International Conference on Computer Vision (ICCV) 2021 Oral (3%)
Lvmin Zhang, Jinyue Jiang, Yi Ji, and Chunping Liu
A flexible shadow drawing tool for line drawings, supporting interactive editing of cartoon-style shadows. Know more ...

Screenshots from Screen Photography
SIGGRAPH '21: ACM SIGGRAPH 2021 Posters
Lvmin Zhang and Chengze Li
Screenshot is a frequently used tool in our daily life, while the screenshot capturing techniques are not much discussed in computer graphics and image processing researches. Might we be able to achieve a computer graphic solution to directly convert a screen photography to a screenshot, which looks like as if it was taken using software? Know more ...

Generating Digital Painting Lighting Effects via RGB-space Geometry
ACM Transactions on Graphics (Presented in SIGGRAPH 2020)
Lvmin Zhang, Edgar Simo-Serra, Yi Ji, and Chunping Liu
A project conducted to investigate how artists apply lighting effects to their artworks, and how we can assist such workflow. The main idea is to observe the real painting behaviors and procedures of artists so that we can model the illumination in their artworks. Know more ...

Generating Manga from Illustrations via Mimicking Manga Workflow
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
Lvmin Zhang, Xinrui Wang, Qingnan Fan, Yi Ji, and Chunping Liu
We propose a data-driven framework to convert a digital illustration into three corresponding components: manga line drawing, regular screentone, and irregular screen texture. These components can be directly composed into manga images and can be further retouched for more plentiful manga creations. Know more ...

User-Guided Line Art Flat Filling with Split Filling Mechanism
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
Lvmin Zhang, Chengze Li, Edgar Simo-Serra, Yi Ji, Tien-tsin Wong, and Chunping Liu
We present a deep learning framework for user-guided line art flat filling that explicitly controls the "influence areas" of the user colour scribbles, i.e., the areas where the user scribbles should propagate and influence, to manipulate the colours of image details and avoid colour contamination between scribbles, and simultaneously, leverages data-driven colour generation to facilitate content creation. Know more ...

DanbooRegion: An Illustration Region Dataset
European Conference on Computer Vision (ECCV) 2020
Lvmin Zhang, Yi Ji, and Chunping Liu
Region is a fundamental element of various cartoon animation techniques and artistic painting applications. Achieving satisfactory region is essential to the success of these techniques. To assist diversiform region-based cartoon applications, we use semi-automatic method to annotate regions for in-the-wild artworks. Know more ...

Erasing Appearance Preservation in Optimization-based Smoothing
European Conference on Computer Vision (ECCV) 2020 Spotlight (5%)
Lvmin Zhang, Chengze Li, Yi Ji, Chunping Liu, and Tien-tsin Wong
Optimization-based smoothing can be formulated as a smoothing energy and an appearance preservation energy. We show that partially "erasing" the energy facilitate the smoothing.
Know more ...

Two-stage Sketch Colorization
ACM Transactions on Graphics (SIGGRAPH Asia 2018)
Lvmin Zhang, Chengze Li, Tien-tsin Wong, Yi Ji, and Chunping Liu
With the advances of neural networks, automatic or semi-automatic colorization of sketch become feasible and practical. We present a state-of-the-art semi-automatic (as well as automatic) colorization from line art. Our improvement is accounted by a divide-and-conquer scheme. We divide this complex colorization task into two simplier and goal-clearer subtasks, drafting and refinement. Know more ...

Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN
Asian Conference on Pattern Recognition (ACPR) 2017 Spotlight (5%)
Lvmin Zhang, Yi Ji, and Xin Lin
We integrate U-net to apply exemplar style to the grayscale sketch with auxiliary classifier generative adversarial network. The whole process is automatic and fast, and the results are creditable in the quality of artistic style as well as colorization. Know More ...