Stable diffusion explained

What are diffusion models.

Generative Modeling by Estimating Gradients of the Data Distribution.

Figure 1. [Source] UNet architecutre with residual connections and attention layers.
Figure 2. [Source] DDPM traing and sampling.

Diffusion process system information analysis:

Figure 3. [Source] System entropy indicates how much information the system contains, or the randamness of the system.
Figure 4. [Source] Diffusion system entropy analysis on delta information change during one-step process.

Since \(X^{(t)}\) is from adding Gaussian noise to \(X^{(t-1)}\) and Gaussian distribution alteration maximizes the entropy increase, therefore \(H_q(X^{(t)}|X^{(t-1)}) >= H_q(X^{(t-1)}|X^{(t)})\). In order words, \(H_q(X^{(t-1)}|X^{(t)})\) has less randamness than the corresponding Gaussian noise.

“A lower bound on the entropy difference can be established by observing that additional steps in a Markov chain do not increase the information available about the initial state in the chain, and thus do not decrease the conditional entropy of the initial state.” (page 12 original paper)

For the lower bound, any information gain against the initial state, i.e. \(H_q(X^{(t)}|X^{(0)}) - H_q(X^{(t-1)}|X^{(0)})\), could possiblily be introduced during the last process from \(t-1\) to \(t\), so when we remove that gained information entirely, we get the lower bound of the amount of the information.

Based on which, the improved DDPM paper decides to learn the variances to help improve the model’s log-likelihood by representing the variances as an log domain interpolation between the upper and lower bounds.

Figure 5. [Source] Variances represented as a log domain interpolation.

The difference between distributions is a great way to model the variance because the variance at any step should model how much the distribution changes between timesteps.

These bounds are very useful to have the model estimate the variance at any timestep in the diffusion process. The improved DDPM paper notes that βₜ and βₜ~ represent two extremes on the variance (when the equal sign holds). One when the original image, x₀, is pure Gaussian, and the other when the original image, x₀, is a single value. Any input image x₀ will fall either between a pure Gaussian or a single-valued image, making it intuitive to interpolate between these two extremes.

Stable diffusion practice

The Annotated Diffusion Model .

Github of pytorch denoising diffusion by lucidrains.

OpenAI guided-diffusion.

References