How a Polynomial Becomes a Bell Curve

Deriving the Density

The symmetry argument from Article 1 tells us the width of the distribution — variance $1/d$ — but nothing about its shape. To find the shape, we need the actual density. The derivation is short and the punchline is worth it.

Slicing the Sphere

The idea is the same one we used for the hat-box theorem in Article 1: slice the sphere horizontally and ask how much surface area sits at each height. Two effects compete — circumference and slant — and at $d = 3$ they cancelled exactly. What happens in other dimensions? The slant stays the same, but the circumference factor changes — and that turns out to be all we need.

Fix the first coordinate at some value $x_1$ — this is the same as slicing at a given height. The remaining $d - 1$ coordinates must satisfy $x_2^2 + \cdots + x_d^2 = 1 - x_1^2$ , so they live on a smaller sphere of radius $r = \sqrt{1 - x_1^2}$ in $d{-}1$ dimensions.

How does the surface area of that smaller sphere scale with $r$ ? Start with what you know: a circle (2D) has circumference $\propto r^1$ , an ordinary sphere (3D) has surface area $\propto r^2$ . In general, surface area in $k$ dimensions scales as $r^{k-1}$ — it is a $(k{-}1)$ -dimensional measurement, so scaling the radius by $r$ scales each of those $k - 1$ dimensions. Our slice lives in $d{-}1$ dimensions, so its surface area scales as $r^{d-2}$ .

The slant factor is the same $1/\cos\phi$ effect from the hat-box. Since $x_1 = \sin\phi$ (the height on the unit sphere), we have $\cos\phi = \sqrt{1 - x_1^2}$ , so the slant contributes $(1 - x_1^2)^{-1/2}$ regardless of dimension. Putting the two together:

Deriving the density

Slice. Fix $x_1$ . The remaining coordinates live on a sphere of radius $r = \sqrt{1 - x_1^2}$ in $d{-}1$ dimensions.
Circumference. Surface area scales as $r^{d-2}$ : $\text{circumference} \;\propto\; (\sqrt{1 - x_1^2})^{d-2} \;=\; (1 - x_1^2)^{(d-2)/2}$
Slant. The tilt of the surface — the hat-box effect — contributes: $\text{slant} \;\propto\; (1 - x_1^2)^{-1/2}$
Multiply. $p(x_1) \;\propto\; \underbrace{(1 - x_1^2)^{(d-2)/2}}_{\text{circumference}} \;\times\; \underbrace{(1 - x_1^2)^{-1/2}}_{\text{slant}} \;=\; (1 - x_1^2)^{(d-3)/2}$

One number — $(d{-}3)/2$ — controls everything. The circumference pushes probability toward the equator (exponent $+(d{-}2)/2$ ), and the slant pushes it toward the poles (exponent $-1/2$ ). At $d = 3$ the exponent is zero: $(1 - x_1^2)^0 = 1$ , a flat uniform distribution — the hat-box theorem, recovered algebraically. At every other dimension, one effect wins.

Here is what the formula gives across all our dimensions:

$d$	Exponent $(d{-}3)/2$	Name
2	$-1/2$	Arcsine distribution
3	$0$	Uniform on $[-1, 1]$
4	$1/2$	Semicircle
5	$1$	Quadratic
10	$7/2$	Becoming Gaussian
200	$197/2$	Approximately $\mathcal{N}(0, 1\!/\!200)$

One exponent, ticking upward with dimension, produces every shape we saw in the histograms. At $d = 2$ the slant wins (U-shaped), at $d = 3$ the two effects draw (flat), and for large $d$ the circumference dominates so overwhelmingly that the density collapses into a narrow spike around zero.

Why the Gaussian Emerges

The large- $d$ histograms look Gaussian. We can now see exactly why, straight from the density we just derived:

p(x_1) \;\propto\; (1 - x_1^2)^{(d-3)/2}

From Article 1, we already know the spread: variance $1/d$ , standard deviation $1/\sqrt{d}$ . If the distribution really is becoming Gaussian, it should be $\mathcal{N}(0, 1/d)$ . Let us check by substituting $x_1 = t/\sqrt{d}$ , so that $t$ counts standard deviations from zero:

p(x_1) \;\propto\; (1 - x_1^2)^{(d-3)/2} \;=\; \left(1 - \frac{t^2}{d}\right)^{(d-3)/2}

As $d \to \infty$ , we can apply the classic limit $(1 - a/n)^n \to e^{-a}$ :

\left(1 - \frac{t^2}{d}\right)^{(d-3)/2} \;\xrightarrow{d \to \infty}\; e^{-t^2/2}

That is a standard Gaussian. The polynomial density literally morphs into a bell curve — and the width we measured empirically in Article 1 falls out as well, since we rescaled by exactly $1/\sqrt{d}$ to get here.

Polynomial → Gaussian Convergence

Next: Why Your AI Search Works

We now have the full picture: the exact density and the Gaussian limit. In the final article, we see what concentration means for the dot products that power modern AI — and find that the noise floor is not a problem to solve, but a guarantee you get for free.