Loss and Reparameterization
VAEs optimize a cleverly designed loss function that balances two goals. First, reconstruction loss measures how well the decoder rebuilds the original input, often using mean squared error for images:
where is the input and the reconstruction.
Second, KL divergence regularizes the latent distribution to stay close to a standard normal prior :
This prevents the model from memorizing training data and encourages a structured latent space. The full VAE loss is , where tunes the tradeoff.
Training requires the reparameterization trick to make sampling differentiable. Instead of sampling directly (which breaks gradients), we compute where . This shifts randomness to a constant, allowing backpropagation through and .
The Magic of Smooth Latent Spaces
VAEs shine in their latent space properties. Because encodings form distributions pulled toward a Gaussian prior, the space becomes continuous and interpolable. Encode two cat images, then linearly interpolate their latent vectors – the decoder spits out convincing morphs between them. This smoothness arises directly from KL regularization, which spaces out representations evenly.
Latent dimensions also capture semantic features. Lower dimensions might encode broad categories like “animal” or “background,” while higher ones handle details like fur texture. This structure makes VAEs ideal for tasks beyond pure reconstruction.
Leave a comment