VideoCrafter is an open source video generation and editing toolbox for crafting video content.

Currently includes the following three models:

Base T2V: Universal Text-to-Video Generation

Provides a basic text-to-video (T2V) generative model based on Latent Video Diffusion Model (LVDM), which can synthesize realistic videos from input textual descriptions.

“Campfire at night in a snowy forest with starry sky in the background.”
“An evening bonfire in a snowy forest with a starry sky in the background.”

“Cars running on the highway at night.”
“Cars driving on the highway at night.”

VideoLoRA: Generating Personalized Text-to-Video Using LoRA

Based on a pretrained LVDM, one can create one’s own video generative model by fine-tuning it on a set of video clips or images depicting a specific concept.

Below are the generated results of VideoLoRA models trained on four different styles of video clips.

By providing sentences describing the content of the video along with LoRA trigger words (specified during LoRA training), it can generate videos with the desired style (or theme/concept).

Will A monkey is playing a piano, ${trigger_word} Input the results of four VideoLoRA models:

“Loving Vincent style”

“frozen movie style” “Frozen Movie Style”

“MakotoShinkaiYourName style” “Makoto Shinkai Your Name Style”

“coco style” “Cocoa Style”

VideoControl: Video generation with more conditional control

Generation results with more detailed control signals such as depth can be obtained by plugging a lightweight adapter module on the T2V model.

input text: Ironman is fighting against the enemy, big fire in the background, photorealistic, 4k

