Artificial Intelligence · Part 2 of 3

How a Polynomial Becomes a Bell Curve

The mathematics of concentration — from symmetry to certainty

Deriving the Density

The symmetry argument from Article 1 tells us the width of the distribution — variance 1/d — but nothing about its shape. To find the shape, we need the actual density. The derivation is short and the punchline is worth it.

Slicing the Sphere

The idea is the same one we used for the hat-box theorem in Article 1: slice the sphere horizontally and ask how much surface area sits at each height. Two effects compete — circumference and slant — and at d = 3 they cancelled exactly. What happens in other dimensions? The slant stays the same, but the circumference factor changes — and that turns out to be all we need.

Fix the first coordinate at some value x_1 — this is the same as slicing at a given height. The remaining d - 1 coordinates must satisfy x_2^2 + \cdots + x_d^2 = 1 - x_1^2, so they live on a smaller sphere of radius r = \sqrt{1 - x_1^2} in d{-}1 dimensions.

How does the surface area of that smaller sphere scale with r? Start with what you know: a circle (2D) has circumference \propto r^1, an ordinary sphere (3D) has surface area \propto r^2. In general, surface area in k dimensions scales as r^{k-1} — it is a (k{-}1)-dimensional measurement, so scaling the radius by r scales each of those k - 1 dimensions. Our slice lives in d{-}1 dimensions, so its surface area scales as r^{d-2}.

The slant factor is the same 1/\cos\phi effect from the hat-box. Since x_1 = \sin\phi (the height on the unit sphere), we have \cos\phi = \sqrt{1 - x_1^2}, so the slant contributes (1 - x_1^2)^{-1/2} regardless of dimension. Putting the two together:

Deriving the density
  1. Slice. Fix x_1. The remaining coordinates live on a sphere of radius r = \sqrt{1 - x_1^2} in d{-}1 dimensions.
  2. Circumference. Surface area scales as r^{d-2}:
    \text{circumference} \;\propto\; (\sqrt{1 - x_1^2})^{d-2} \;=\; (1 - x_1^2)^{(d-2)/2}
  3. Slant. The tilt of the surface — the hat-box effect — contributes:
    \text{slant} \;\propto\; (1 - x_1^2)^{-1/2}
  4. Multiply.
    p(x_1) \;\propto\; \underbrace{(1 - x_1^2)^{(d-2)/2}}_{\text{circumference}} \;\times\; \underbrace{(1 - x_1^2)^{-1/2}}_{\text{slant}} \;=\; (1 - x_1^2)^{(d-3)/2}

One number — (d{-}3)/2 — controls everything. The circumference pushes probability toward the equator (exponent +(d{-}2)/2), and the slant pushes it toward the poles (exponent -1/2). At d = 3 the exponent is zero: (1 - x_1^2)^0 = 1, a flat uniform distribution — the hat-box theorem, recovered algebraically. At every other dimension, one effect wins.

Here is what the formula gives across all our dimensions:

d Exponent (d{-}3)/2 Name
2-1/2Arcsine distribution
30Uniform on [-1, 1]
41/2Semicircle
51Quadratic
107/2Becoming Gaussian
200197/2Approximately \mathcal{N}(0, 1\!/\!200)

One exponent, ticking upward with dimension, produces every shape we saw in the histograms. At d = 2 the slant wins (U-shaped), at d = 3 the two effects draw (flat), and for large d the circumference dominates so overwhelmingly that the density collapses into a narrow spike around zero.

Why the Gaussian Emerges

The large-d histograms look Gaussian. We can now see exactly why, straight from the density we just derived:

p(x_1) \;\propto\; (1 - x_1^2)^{(d-3)/2}

From Article 1, we already know the spread: variance 1/d, standard deviation 1/\sqrt{d}. If the distribution really is becoming Gaussian, it should be \mathcal{N}(0, 1/d). Let us check by substituting x_1 = t/\sqrt{d}, so that t counts standard deviations from zero:

p(x_1) \;\propto\; (1 - x_1^2)^{(d-3)/2} \;=\; \left(1 - \frac{t^2}{d}\right)^{(d-3)/2}

As d \to \infty, we can apply the classic limit (1 - a/n)^n \to e^{-a}:

\left(1 - \frac{t^2}{d}\right)^{(d-3)/2} \;\xrightarrow{d \to \infty}\; e^{-t^2/2}

That is a standard Gaussian. The polynomial density literally morphs into a bell curve — and the width we measured empirically in Article 1 falls out as well, since we rescaled by exactly 1/\sqrt{d} to get here.

Polynomial → Gaussian Convergence

Next: Why Your AI Search Works

We now have the full picture: the exact density and the Gaussian limit. In the final article, we see what concentration means for the dot products that power modern AI — and find that the noise floor is not a problem to solve, but a guarantee you get for free.