
Diffusion models such as DALL·E, Stable Diffusion, and Imagen, should really just regurgitate what they’ve seen during training. Instead, they seem to improvise. That’s the paradox Giulio Biroli points out: “If they worked perfectly, they should just memorize … but they don’t—they actually produce new samples.”
Here’s the secret: creativity is baked into the denoising process itself. Mason Kamb, a graduate student studying applied physics at Stanford University, and physicist Surya Ganguli showed that it’s not magic but math. They built an analytical model, the equivariant local score (ELS) machine, that reproduces not just noisy reconstructions, but the creative flair of diffusion models. Their key insight: limitations like locality (patch-by-patch generation) and translational equivariance (shifts in input cause shifts in output) aren’t bugs. They’re the source of novelty, reports Quanta Magazine.
In experiments, the ELS machine matched trained diffusion models’ outputs with around 90% accuracy—a stunning result in machine learning. Basically, creativity emerged as a deterministic, predictable outcome of architectural constraints, not emergent behavior.
They liken this to biological morphogenesis—cells assemble complex forms without any central planner—and occasionally slip up (hello, extra fingers). Similarly, diffusion models stitch together images from local patches, and small misalignments or imbalances become creative quirks.
What this really means is that novelty in AI isn’t magic—it’s structure. Creativity, in this context, is a by-product of how these systems are built, not something apart from their architecture. It’s a powerful framework—with implications for demystifying AI creativity—and maybe even hinting at how human creativity works.