Intro to VAEs – Part 2 – Aditya Venkatesh's Personal Website

Loss and Reparameterization

VAEs optimize a cleverly designed loss function that balances two goals. First, reconstruction loss measures how well the decoder rebuilds the original input, often using mean squared error for images: $L_{recon} = ∥ x - \hat{x} ∥^{2}$

where $x$ is the input and $\hat{x}$ the reconstruction.

Second, KL divergence regularizes the latent distribution to stay close to a standard normal prior $N (0, 1)$ : $L_{KL} = D_{KL} (q (z ∣ x) ∥ p (z))$

This prevents the model from memorizing training data and encourages a structured latent space. The full VAE loss is $L = L_{recon} + β L_{KL}$ , where $β$ tunes the tradeoff.

Training requires the reparameterization trick to make sampling differentiable. Instead of sampling $z \sim q (z ∣ x)$ directly (which breaks gradients), we compute $z = μ + σ ⊙ ϵ$ where $ϵ \sim N (0, 1)$ . This shifts randomness to a constant, allowing backpropagation through $μ$ and $σ$ .

The Magic of Smooth Latent Spaces

VAEs shine in their latent space properties. Because encodings form distributions pulled toward a Gaussian prior, the space becomes continuous and interpolable. Encode two cat images, then linearly interpolate their latent vectors – the decoder spits out convincing morphs between them. This smoothness arises directly from KL regularization, which spaces out representations evenly.

Latent dimensions also capture semantic features. Lower dimensions might encode broad categories like “animal” or “background,” while higher ones handle details like fur texture. This structure makes VAEs ideal for tasks beyond pure reconstruction.

Aditya Venkatesh's Personal Website

recent posts

about