Why the Loss Landscape Metaphor is Misleading
July 22, 2025
The phrase “loss landscape” conjures images of rolling hills, sharp valleys, and saddle points — neat topographical features that we can reason about geometrically. But this intuition, while useful as a first approximation, can be deeply misleading when applied to modern neural networks operating in where can easily exceed .
The Dimensionality Problem
When we plot a “loss landscape,” we are projecting a function onto a 2D plane. The choice of projection directions dramatically affects what we see. Consider a simple quadratic:
where is the Hessian. In 2D, a saddle point is a single critical point with one positive and one negative eigenvalue. In dimensions, a critical point can have any combination of positive and negative eigenvalues. The probability that a random critical point is a local minimum (all eigenvalues positive) decreases exponentially with .
What This Means in Practice
For a network with parameters, a critical point drawn from a random distribution has probability approximately of being a true local minimum. This is effectively zero. Most critical points in high dimensions are saddle points with a vast number of escape directions.
The practical implication: SGD doesn’t get “stuck” in local minima in the way the landscape metaphor suggests. It gets slowed down by saddle points, and even that effect diminishes with added stochastic noise.
A Better Mental Model
Instead of landscapes, think about level sets and gradient flow. The quantity that matters is not the shape of the surface, but the spectrum of the Hessian and how it evolves along the training trajectory. The condition number tells you far more about optimization difficulty than any 2D picture ever could.