[Summary] Control Net: Adding Conditional Control to Text-to-Image Diffusion Models

TL;DR Control Net is a framework designed to control the content of images generated by diffusion models. The process involves taking a trained diffusion model, freezing its weights, cloning some of its building blocks, and training the cloned weights with a conditioning input image. Method Architecture. Given a trainable diffusion model, the Control Net model is created by: Freezing the parameters of the original model. Cloning some of the original model blocks to a trainable copy....

March 2, 2024 · 2 min · 316 words

[Summary] Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

TL;DR This work enables interactive editing of a GAN’s generated image by translating (“dragging”) any point in the image to a target location. Problem statements GAN based image generation takes a noise vector to generate an image. There is a need of a localized controlled image manipulation as moving a region to a different location in the image. Method Given a GAN generated image, a user input of the source coordinates (q) and the coordinates of the destination (p)...

October 14, 2023 · 1 min · 206 words

[Summary] MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

TL;DR To enable a more controllable image diffusion, MultiDiffusion introduce patches generation with a global constrain. Problem statements Diffusion models lack user controllability and methods that offer such control require a costly fine-tuning. Method The method can be reduced to the following algorithm: At each time step t: Extract patches from the global image I_{t-1} Execute the de-noising step to generate the patches J_{i,t} Combine the patches by average their pixel values to create the global image I_t For the panorama use case: simply generate N images with overlapping regions between them....

May 19, 2023 · 1 min · 125 words