Welcome file

The Physics Behind Diffusion Models

From Brownian motion to Stable Diffusion — how physics shapes the world of generative AI.

Introduction

Diffusion models have transformed how machines generate images — from creating photorealistic art to driving large-scale creative systems.
But beneath all the neural network abstractions lies something beautifully simple and universal: physics.

At their core, diffusion models are governed by the laws of stochastic dynamics — the same equations that describe the random motion of particles in fluids.
What began as a physical theory of diffusion has evolved into the mathematical backbone of modern generative AI.

Let’s begin with the physical world and climb gradually toward the equations that drive Stable Diffusion 3.

1. Diffusion in the Physical World

In the physical world, diffusion describes how particles spread out over time due to random motion — like ink dispersing in water.
The key idea is that random motion tends to smooth out differences in concentration, pushing systems toward equilibrium.

Now, imagine applying this intuition to images.
An image can be represented as a point in a high-dimensional space, where each axis corresponds to a pixel intensity. Since such spaces are impossible to visualize, we simplify to three dimensions.

Here, the x and y axes represent pixel features, and the z-axis represents the probability density — the likelihood that a configuration of pixels forms a real image.
A tall, narrow peak on this surface represents a clear, realistic image. A flat region corresponds to pure noise, where no structure exists.

When we add random noise to an image, these peaks begin to flatten and spread, destroying the structure of the original data.
At the limit, the probability surface becomes uniform — every point equally random, no structure left.

But here’s the key insight: if we can learn the reverse path, the trajectory from noise back to structure, we can generate images from scratch by simply inverting diffusion itself.

2. From Motion to Mathematics

We begin with the classical equation of motion:

$dx = v.dt$

where ( v ) represents velocity and ( x ) represents position.

When velocity depends on both position and time, this becomes:

$dx = f(x, t),dt$

However, this deterministic formulation cannot describe the random fluctuations inherent in diffusion.
To incorporate randomness, we add a stochastic term derived from Brownian motion, known as the Wiener process ( dW ):

$dx = f(x, t),dt + dW$

This marks the transition from classical to stochastic systems — a particle now moves under the influence of both deterministic and random forces.

3. Adding Control: The Jitter Function

To control how strongly noise affects the system, we introduce a scaling function ( g(x, t) ):

$dx = f(x, t),dt + g(x, t),dW$

Here, ( $f(x, t)$ ) acts as the drift or restoring term, driving the system toward stable configurations, while $g(x, t)dW$ acts as the diffusion term, injecting randomness.

This equation captures the interplay between order and chaos — the deterministic structure of $f(x, t)$ and the stochastic energy of $g(x, t)dW$ .

4. The Forward Diffusion Process

In most diffusion-based generative models, these functions are chosen as:

$f(x, t) = -\frac{1}{2} \beta(t) x, \quad g(x, t) = \sqrt{\beta(t)}$

Substituting these into our stochastic equation gives the forward diffusion process:

$dx = -\frac{1}{2}\beta(t) x,dt + \sqrt{\beta(t)},dW$

Here, $\beta(t)$ represents a noise scheduler, controlling how quickly noise is added over time.
This forward process gradually corrupts a structured image $x_0$ into pure noise $x_T$ , step by step, through continuous stochastic perturbation.

5. Why Add Jitter?

One might ask — why not simply rely on Brownian motion ( $dW$ ), without any scaling?

The answer lies in the nature of Brownian motion itself.
Brownian motion preserves the mean of the process — if we average multiple diffused images, we would get back the original mean image.
While that might sound ideal, it limits the randomness and diversity of generated samples.

To make the process more exploratory and allow the system to visit new trajectories in latent space, we add jitter through $g(x, t)$ .
This scaling of randomness makes diffusion models not just accurate, but creative — each run can take a slightly different path, producing unique outcomes.

6. Learning the Gradient: The Compass of Diffusion

The training goal of a diffusion model is to learn the score function, which is the gradient of the log-probability density:

$\nabla_x \log p_t(x)$

This gradient points toward regions of higher probability — that is, it tells us the direction in which the data distribution becomes denser.
In a sense, it acts like a compass guiding us up the probability landscape from noise toward meaningful structure.

We take the logarithm because it converts products of probabilities into sums and simplifies the optimization process.
Since the log function is concave, it consistently points us toward high-density regions — toward the peaks that correspond to real images.

7. The Reverse Diffusion Process

In 1982, Brian Anderson introduced the reverse-time diffusion equation, which allows us to invert the forward process.
While not applicable to physical matter, this equation is powerful in virtual systems like generative modeling.

The reverse-time SDE is expressed as:

$dx = \Big[f(x, t) - g(x, t)^2 \nabla_x \log p_t(x)\Big],dt + g(x, t),dW$

This formulation defines the process of denoising — reconstructing structure from randomness using the learned gradient ( $\nabla_x \log p_t(x)$ ).
At each step, the model predicts how to move slightly “uphill” toward the clean data distribution.

8. Why Add Noise in the Reverse Process

If we did not add noise during the reverse process, every particle would follow the same deterministic path to a single global minimum.
This would cause all generated samples to converge into one blurred, average image.

By reintroducing small amounts of noise at each step, particles can deviate slightly from one another, exploring multiple local minima instead of collapsing.
This stochasticity is crucial for maintaining the diversity and richness of generated outputs.

9. Deterministic Diffusion: The DDIM Approach

Later, in the DDIM (Denoising Diffusion Implicit Models) paper, researchers proposed removing the stochastic noise term altogether, making the process deterministic.
This gives rise to the following ordinary differential equation (ODE):

$dx = \Big[f(x, t) - \frac{1}{2} g(x, t)^2 \nabla_x \log p_t(x)\Big],dt$

The DDIM approach produces results that are similar to the stochastic version but allows for faster and more predictable sampling.
Each step moves deterministically along the learned gradient field, removing uncertainty while preserving structure.

10. The ODE Perspective and Yang Song’s 2021 Model

In 2021, Yang Song and colleagues extended this concept in Score-Based Generative Modeling through SDEs and ODEs.
They showed that solving this deterministic ODE produces results nearly identical to the stochastic process — but with greater control and computational efficiency.

The generative ODE takes the same form:

$dx = \Big[f(x, t) - \frac{1}{2} g(x, t)^2 \nabla_x \log p_t(x)\Big],dt$

By eliminating the random term ( dW ), the ODE formulation transforms diffusion from a noisy simulation into a controlled generative process — one that can be tuned precisely for quality or diversity.

11. SDE versus ODE

The choice between stochastic and deterministic diffusion depends on the goal.
When diversity in generated samples is the priority, the stochastic SDE formulation is preferred.
When speed and fine-grained control are required, the deterministic ODE formulation is the better choice.

Modern models such as Stable Diffusion 3 rely heavily on ODE-based sampling because it balances visual quality with generation speed.

12. Physics Meets Intelligence

Diffusion models embody the perfect fusion of physics and machine learning.
They model how order emerges from randomness — a process that nature has been performing for billions of years.

We begin with noise, let it wander through controlled randomness, and gradually guide it home, step by step, until it forms coherent structure once again.
The same mathematics that describes molecules drifting in water now describes how an AI dreams a painting into existence.

References

Anderson, B. D. O. (1982). Reverse-time diffusion equations.
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models.
Song, Y., et al. (2021). Score-Based Generative Modeling through SDEs and ODEs.
Song, Y., et al. (2021). Denoising Diffusion Implicit Models (DDIM).

Written by Vishesh Shekhawat — exploring the intersection of physics, AI, and human creativity.

Search This Blog

The Physics Behind Diffusion Models

Physics Behind Diffusion Models

The Physics Behind Diffusion Models

Introduction

1. Diffusion in the Physical World

2. From Motion to Mathematics

3. Adding Control: The Jitter Function

4. The Forward Diffusion Process

5. Why Add Jitter?

6. Learning the Gradient: The Compass of Diffusion

7. The Reverse Diffusion Process

8. Why Add Noise in the Reverse Process

9. Deterministic Diffusion: The DDIM Approach

10. The ODE Perspective and Yang Song’s 2021 Model

11. SDE versus ODE

12. Physics Meets Intelligence

References

Comments

Post a Comment