Diffusion model

[Proof-of-Concept] DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

TL;DR Typical diffusion models create images using input text. DreamPose, presented at ECCV 2023, enhances this functionality by generating a video from an image incorporating a human model and pose sequence, as represented by DensePose. Problem statements Common diffusion models able to generate images based on given text. However, they can not produce animated sequence nor able to be conditioned on an input pose sequence. Method Apply the following modifications to a diffusion model:...

[Summary] Break-A-Scene: Extracting Multiple Concepts from a Single Image

TL;DR Fine-tuning of a diffusion model using a single image to generate images conditions on user-provided concepts. Problem statements Diffusion models are not able to generate a new image of user-provided concepts. Methods (DreemBooth) that enable this capabilities require several input images that contain the desired concept. Method The method consists of two phases. Freezing the model weights, and optimize handles to reconstruct the input image. This is done with a large learning rate to not harm the model generalization....

[Summary] MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

TL;DR To enable a more controllable image diffusion, MultiDiffusion introduce patches generation with a global constrain. Problem statements Diffusion models lack user controllability and methods that offer such control require a costly fine-tuning. Method The method can be reduced to the following algorithm: At each time step t: Extract patches from the global image I_{t-1} Execute the de-noising step to generate the patches J_{i,t} Combine the patches by average their pixel values to create the global image I_t For the panorama use case: simply generate N images with overlapping regions between them....