Diffusion Model Study Notes

Basic modules of Stable Diffusion Model

Basic tutorial: https://comfyanonymous.github.io/ComfyUI_tutorial_vn/
CLIP: encode text, another name: text encoder
Sampler: takes the main Stable Diffusion model as an input, takes both positive and negative prompts encoded by CLIP model + a latent image (can be blank)
1. sampler takes this input latent image, adds noise to it and then denoises it using the main model
2. prompts and negative prompts are passed to model at each sampling step
3. sampler outputs the denoised image
VAE: translates an image from latent space to pixel space
Prompting: (word:1.5) means it is 1.5 more effective