FramePack-P1, Additional Results, Batch 1

(more batches of results will be uploaded soon. models and paper will be uploaded soon)

Planned Anti-drifting and History Discretization

FramePack-P1 is the next version of FramePack. The “P” means plan, prepare, prevision, plot, pre-arranging etc..

FramePack-P1 is based on FramePack with two new designs: Planned Anti-Drifting (d) and History Discretization (e).

Planned Anti-Drifting predicts sections that are far away from the next section before generating nearby sections. This reduces drifting “between” planned endpoints (frames will not drift between the endpoints).

History Discretization converts all history to discretization tokens (directly apply K-Mean to the entire dataset), aimed at finding a history representation that does not have obvious gap between train and inference. This reduces drifting “over” planned endpoints (the endpoints themselves will not drift). This is inspired by a potential observation that LLMs with discretization tokens tend to suffer less from drift compared to autoregressive video diffusion models.

Single-Prompt 70-Seconds Anti-drifting Stress Tests (>2100 frames)

The model uses 1 second as each section. The training data of this test are common clips/shots of about or less than 10 seconds. This is a challenging stress test that extrapolates beyond the training scope. We use relatively dynamic results to show that the model does not sacrifice motion dynamic range.

More 15-second results

Videos compressed by h264crf18 for faster loading.

Multiple-prompt Results (Prompt travelling)

Videos compressed by h264crf18 for faster loading.

Each result iterates 12 promts, each promt for about 3.5 seconds: The man waves hands. The man laughs. The man talks. The man dances. The man waves hands. The man scratches his head. The man talks. The man dances. The man waves hands. The man spins around. The man talks. The man dances.

Text-to-video Anti-drifting Stress Tests (>2100 frames, no reference image)

We use common benchmarking prompts.

Videos compressed by h264crf18 for faster loading.

▶ Click here to see prompts

Prompt 1: The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it's tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.

Prompt 2: A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

Prompt 3: A large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain. Its tentacles are spread out around its body, and its eyes are closed. The octopus is unaware of a king crab that is crawling towards it from behind a rock, its claws raised and ready to attack. The crab is brown and spiny, with long legs and antennae. The scene is captured from a wide angle, showing the vastness and depth of the ocean. The water is clear and blue, with rays of sunlight filtering through. The shot is sharp and crisp, with a high dynamic range. The octopus and the crab are in focus, while the background is slightly blurred, creating a depth of field effect.

Prompt 4: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.

Prompt 5: An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.

Prompt 6: A modern Japanese urban street where the view moves smoothly forward, passing multiple stylish houses with dark wood and light wood facades, minimalist concrete walls, and lush greenery.

More results

This web page is the first batch of results. More batches of results will be uploaded soon. The model and repo and paper will be updated soon.

BibTeX

@inproceedings{zhang2025framepack,
    title={Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models},
    author={Lvmin Zhang and Shengqu Cai and Muyang Li and Gordon Wetzstein and Maneesh Agrawala},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
}

@article{zhang2025framepackv1,
    title={Packing Input Frame Contexts in Next-Frame Prediction Models for Video Generation},
    author={Lvmin Zhang and Maneesh Agrawala},
    journal={Arxiv},
    year={2025}
}