[Concept] Reinforcement learning from human feedback (RLHF)

TL;DR Machine learning models require a loss function to tune their parameters. Designing a loss function to reflect ambiguous human values poses a challenge, e.g., it’s not clear how to formulate a loss function to represent what is funny or ethical. To this end, a reward model is trained via human feedback. This reward model takes the model’s output and predicts a reward score that is then used by the model to optimize its parameters....

December 9, 2023 · 2 min · 350 words

[Proof-of-Concept] DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

TL;DR Typical diffusion models create images using input text. DreamPose, presented at ECCV 2023, enhances this functionality by generating a video from an image incorporating a human model and pose sequence, as represented by DensePose. Problem statements Common diffusion models able to generate images based on given text. However, they can not produce animated sequence nor able to be conditioned on an input pose sequence. Method Apply the following modifications to a diffusion model:...

November 18, 2023 · 3 min · 437 words

[Summary] CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

TL;DR A new video representation by (i) a canonical image that aggregates the static contents and (ii) a temporal deformation field that reconstructs the video frames when applied to the static image. Problem statements Video processing comes at a high cost,and naively processing frames results in poor cross-frame consistency. Method High level objective. The proposed representations should have the following characteristics: Fitting capability for faithful video reconstruction. Semantic correctness of the canonical image to ensure the performance of image processing algorithms....

October 27, 2023 · 2 min · 361 words

[Summary] Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

TL;DR This work enables interactive editing of a GAN’s generated image by translating (“dragging”) any point in the image to a target location. Problem statements GAN based image generation takes a noise vector to generate an image. There is a need of a localized controlled image manipulation as moving a region to a different location in the image. Method Given a GAN generated image, a user input of the source coordinates (q) and the coordinates of the destination (p)...

October 14, 2023 · 1 min · 206 words

title: “[Summary] MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation” date: 2023-05-19 tags: - Diffusion Models - Image Editing - Controllability TL;DR To enable a more controllable image diffusion, MultiDiffusion introduce patches generation with a global constrain. Problem statements Diffusion models lack user controllability and methods that offer such control require a costly fine-tuning. Method The method can be reduced to the following algorithm: At each time step t: Extract patches from the global image I_{t-1} Execute the de-noising step to generate the patches J_{i,t} Combine the patches by average their pixel values to create the global image I_t For the panorama use case: simply generate N images with overlapping regions between them....

1 min · 146 words

title: “[Summary] Break-A-Scene: Extracting Multiple Concepts from a Single Image” date: 2023-07-21 tags: - Diffusion Models - Concept Extraction - Image Generation TL;DR Fine-tuning of a diffusion model using a single image to generate images conditions on user-provided concepts. Problem statements Diffusion models are not able to generate a new image of user-provided concepts. Methods (DreemBooth) that enable this capabilities require several input images that contain the desired concept. Method The method consists of two phases....

2 min · 362 words