How Dyson obtained his Brownian motion in 1962

2014/01/04



Looking for an answer to a colleague’s question, I was reading a 1962 seminal paper by Freeman J. Dyson: “A Brownian-Motion Model for the Eigenvalues of a Random Matrix” published in J. Math. Phys. 3, 1191 (1962).

The question was, how the original proof of Dyson’s result about his Brownian motion of eigenvalues of random matrices was carried out? I will try to reproduce Dyson’s arguments. His physics paper was missing some of the computations; I will also not go to all technical details, but rather work out finite cases at an elementary level.

Arguments similar to the original Dyson’s ones also appear (in a more advanced form) in this post by T. Tao. He deals with the proof of Theorem 2 (Dyson’s Brownian motion), but does not mention the proof Theorem 1 (density of eigenvalues) in that post.


1. Formulation

Let \(H=[H_{ij}]_{i,j=1}^{N}\) be a random Hermitian matrix of size \(N\times N\). Such a matrix depends on \(N^{2}\) real parameters, so the space of such matrices \(\mathcal{H}\) can be identified with \(\mathbb{R}^{N^{2}}\) as

\[H\longleftrightarrow (H_{11},\ldots,H_{NN}; \sqrt 2 \mathrm{Re}(H_{12}),\sqrt 2 \mathrm{Re}(H_{13}),\ldots,\sqrt 2 \mathrm{Re}(H_{N-1,N}); \sqrt 2 \mathrm{Im}(H_{12}),\sqrt 2 \mathrm{Im}(H_{13}),\ldots,\sqrt 2 \mathrm{Im}(H_{N-1,N}))\]

Considering a standard multidimensional Gaussian measure on \(\mathbb{R}^{N^{2}}\) (so that coordinates are independent standard normal random variables), one arrives at the distribution on Hermitian matrices called the Gaussian Unitary ensemble.

In more detail, consider the following probability density on the space \(\mathcal{H}\) of \(N\times N\) Hermitian matrices:

\begin{equation}\label{mu_defn} \displaystyle d\mu^{(u)}(H)= \frac{1}{Z_N(u)} e^{-(2u)^{-1}\cdot \mathrm{Trace}(H^2)}dH. \end{equation}

Here \(u>0\) in \eqref{mu_defn} is a scaling constant playing the role of the variance (case $u=0$ corresponds to the delta measure at the zero matrix). The first result I would like to discuss is the following:

Theorem 1.

The probability density of eigenvalues (i.e., of the spectrum) of a random Hermitian matrix with the above distribution \(\mu^{(u)}\) is given by

One can think that the eigenvalues are ordered as \(\lambda_1\ge \lambda_2\ge \ldots\ge \lambda_N\). Note that they are all real as eigenvalues of an Hermitian matrix.

The second result is about Markov stochastic dynamics on the space of Hermitian matrices \(\mathcal{H}\) and on the eigenvalues (this is the Dyson’s result form his 1962 paper cited above). Namely, consider a standard Brownian motion on \(\mathcal{H}\) identified with \(\mathbb{R}^{N^{2}}\) and starting from zero. After time \(t>0\), the distribution of the matrix will be \(\mu^{(\sqrt t)}\).

Theorem 2 (Dyson).

The Brownian motion on \(\mathcal{H}\) described above reduces to a Markovian evolution of the eigenvalues \((\lambda_1,\ldots,\lambda_N)\). This evolution is a diffusion with a Markov generator being a second order differential operator (acting on functions in \((\lambda_1,\ldots,\lambda_N)\)) given by

\[\displaystyle D=\frac12\sum_{i=1}^{N} \frac{\partial^{2}}{\partial \lambda_i^{2}}+ \sum_{i=1}^{N} \Big(\sum_{j\colon j\ne i}\frac{1}{\lambda_i- \lambda_j}\Big) \frac{\partial}{\partial \lambda_i}\]

In more modern terms, the eigenvalues \((\lambda_1,\ldots,\lambda_N)\) satisfy the following coupled stochastic differential equations:

\[\displaystyle d \lambda_i=d W_i+ \sum_{j\colon j\ne i}\frac{dt}{\lambda_i- \lambda_j},\]

where \(dW_i\) are independent standard Brownian motions. The diffusion of the eigenvalues \((\lambda_1,\ldots,\lambda_N)\) is called the Dyson Brownian motion.

(By Theorem 1, the distribution of the eigenvalues after time \(t>0\) is \(\mu_{spec}^{(u)}\)).

Remark.

In his paper, Dyson proves similar results for the other two Gaussian invariant ensembles, namely, the Orthogonal and Symplectic ones, as well as explained physical interpretation of the dynamics (that the eigenvalues move as if repelled by the Couloumb electric forces).

Remark.

Theorem 1 can be proven without Theorem 2, by computing the Jacobian of the map taking a Hermitian matrix \(H\) to its eigenvalues \((\lambda_1,\ldots,\lambda_N)\). For a direct computation (including an explicit \(2\times 2\) example), see the book by Deift and Gioev.

This computation also has a strong algebraic side, as the Jacobian appears at the interplay of Lie group and Lie algebra structure related to the invariance of the measure \(\mu^{(u)}\) on \(\mathcal{H}\) under conjugations by unitary matrices.

Let me now show how Dyson proved Theorem 2, and then deduced Theorem 1 from Theorem 2.

2. Perturbation of eigenvalues and proof of Theorem 2

To understand how the eigenvalues evolve, let us start the Brownian motion on \(\mathcal{H}\) (which is a simpler object) from a fixed matrix \(X\) (equivalently, one can say that we wait for some time \(T>0\), then condition on the known matrix \(X\) at this time \(T\)).

Since we project to eigenvalues, we may assume that we have changed the basis such that \(X\) is already diagonal, with eigenvalues \((x_1,\ldots,x_N)\). After a small time \(t\to 0\), the matrix \(X\) will turn into \(X+\varepsilon G\), where \(\varepsilon\sim \sqrt t\), and \(G\) is a matrix from \(\mathcal{H}\) having the distribution \(\mu^{(1)}\).

The next question is to understand how the eigenvalues \((x_1(\varepsilon),\ldots,x_N(\varepsilon))\) of \(X+\varepsilon G\) depend on \(\varepsilon\). This can be done by means of the perturbation theory. In short, the \(x_i(\varepsilon)\)’s should be written as power series in some fractional powers of \(\varepsilon\) (in our case, integer powers will suffice), and then plugged into the characteristic equation for the eigenvalues. The correct ansatz for this problem is just the Taylor expansion

\[\displaystyle x_i(\varepsilon)=x_i+x_i'\varepsilon+x_i''\frac{\varepsilon^{2}}{2} +\ldots\]

\(\spadesuit\) Let me work out the case \(N=2\) in an elementary way. We have

\[\displaystyle \det(X+ \varepsilon G- \lambda)=\det \begin{bmatrix} x_1- \lambda+\varepsilon g_1& \varepsilon g_{12}\\ \varepsilon \bar g_{12}& x_2- \lambda+ \varepsilon g_2 \end{bmatrix}= (x_1- \lambda)(x_2- \lambda)+ \varepsilon \big(g_1(x_2- \lambda)+g_2(x_1- \lambda)\big)+ \varepsilon^{2}\big(g_1 g_2-|g_{12}|^{2}\big),\]

Plugging in \(\lambda=x_1(\varepsilon)=x_1+x_1'\varepsilon+x_1''\frac{\varepsilon^{2}}{2}\) (and then similarly \(\lambda=x_2(\varepsilon)\)), one gets

\[\displaystyle \det(X+ \varepsilon G- x_1(\varepsilon))= \varepsilon\left(x_2-x_1\right) \left(g_1-x_1'\right) +\varepsilon^{2}\Big( \left(g_1-x_1'\right) \left(g_2-x_1'\right)-|g_{12}|^2+\frac{1}{2} \left(x_1-x_2\right) x_1'' \Big)+O(\varepsilon^{3}).\]

We want the above expression to be zero up to \(O(\varepsilon^{3})\), so the coefficients by \(\varepsilon\) and \(\varepsilon^{2}\) must be zero. This gives

\[\displaystyle x_1'=g_1,\qquad x_1''=\frac{2 |g_{12}|^2}{x_1-x_2}, \qquad x_1(\varepsilon)=x_1+\varepsilon g_1+ \frac{\varepsilon^{2} |g_{12}|^2}{x_1-x_2}\]

Since \(G\) had the distribution \(\mu^{(1)}\), the above formula for \(x_1(\varepsilon)\) means that (up to some formal justifications; see this wikipedia article for more informal discussions) the quantity \(x_1(t)\) satisfies the stochastic differential equation

\[\displaystyle dx_1(t)=dW_1+\frac{dt}{x_1(t)-x_2(t)},\qquad x_1(0)=x_1.\]

This (together with a similar result for \(x_2\)) concludes the proof of Theorem 2 for the case \(N=2\). \(\spadesuit\)

For general \(N\), one needs to consider the matrix

\[X+\varepsilon G- \lambda=\begin{bmatrix} x_1- \lambda+\varepsilon g_1& \varepsilon g_{12} & \ldots & \varepsilon g_{1,N}\\ \varepsilon \bar g_{12} & x_2- \lambda+\varepsilon g_2 & \ldots &\varepsilon g_{2,N}\\ \ldots \ldots \ldots&\ldots \ldots \ldots&\ldots&\ldots \ldots \ldots\\ \varepsilon \bar g_{1,N} & \varepsilon \bar g_{2,N} & \ldots & x_N- \lambda+ \varepsilon g_N \end{bmatrix},\]

plug in \(\lambda=x_i(\varepsilon)=x_i+x_i'\varepsilon+x_i''\frac{\varepsilon^{2}}{2}\), and expand the determinant of the resulting matrix in powers of \(\varepsilon\). The free term will vanish, and one only needs to compute coefficients by \(\varepsilon\) and \(\varepsilon^{2}\). This computation can be done, and it gives the desired formulas of Theorem 2.

See also this post by T. Tao for a full computation in different notation.

3. Fokker-Planck (Kolmogorov forward; Smoluchowski) equation and proof of Theorem 1

Now let us deduce Theorem 1 (about the distribution of the eigenvalues of a random Hermitian matrix distributed as \(\mu^{(u)}\)) from Theorem 2. The starting idea is that the measures \(\mu^{(u)}\) on the space \(\mathcal{H}\) of Hermitian matrices are mapped one into another by the Brownian motion on \(\mathcal{H}\). Theorem 2 shows that the Brownian motion on \(\mathcal{H}\) projects into the Dyson Brownian motion (with Markov generator \(D\)) on eigenvalues.

Thus, if we prove that the measures \(\mu^{(u)}_{spec}\) on eigenvalues are mapped into one another by the Dyson Brownian motion, then Theorem 1 will be established, see the picture (all arrows mean Markov transition kernels):

Diagram related to the Fokker-Planck equation for Dyson BM

In order to see that the measures \(\mu^{(u)}_{spec}\) on eigenvalues are mapped into one another by the Dyson Brownian motion, we have to show that the following Fokker-Planck equation holds (here, by a slight abuse of notation, \(\mu^{(u)}_{spec}(\lambda_1,\ldots,\lambda_N)\) denotes the probability density of the measure \(\mu^{(u)}_{spec}\)):

\[\displaystyle \frac{\partial}{\partial u}\mu^{(u)}_{spec}(\lambda_1,\ldots,\lambda_N)=D^*\mu^{(u)}_{spec}(\lambda_1,\ldots,\lambda_N),\]

where \(D^*\) is the dual of the generator of the Dyson Brownian motion which looks as

\[\displaystyle (D^*f)(\lambda_1,\ldots,\lambda_N)=\frac12\sum_{i=1}^{N} \frac{\partial^{2}f}{\partial \lambda_i^{2}}(\lambda_1,\ldots,\lambda_N)- \sum_{i=1}^{N} \frac{\partial}{\partial \lambda_i} \left(\Big(\sum_{j\colon j\ne i}\frac{1}{\lambda_i- \lambda_j}\Big) f(\lambda_1,\ldots,\lambda_N)\right)\]

(see, e.g., the wikipedia for more details). Note that in the above discussion of the picture we implicitly used the uniqueness of solution to the Fokker-Planck equation, which in our situation holds.

First, we need to understand the dependence of the normalization constant \(\tilde Z_N(u)\) in \(\mu^{(u)}_{spec}\) on \(u\).

Lemma.

We have \(\tilde Z_N(u)=Cu^{N^2/2}\), where \(C\) does not depend on \(u\).

\(\spadesuit\) The proof of this lemma is a simple scaling argument. Indeed, by the symmetry of the probability density which amounts to the factor of \(1/N!\), we can write

$ \tilde Z_N(1)=\frac{1}{N!} \int_{-\infty}^{\infty} d \lambda_1 \ldots \int_{-\infty}^{\infty} d \lambda_N \prod_{i=1}^{N}e^{-\lambda_i^{2}/2} \prod_{i<j}|\lambda_i- \lambda_j|^{2},$

which is a constant not depending on \(u\). Changing the variables in the integral (multiplying each \(\lambda_i\) by \(1/\sqrt u\)), we get an overall factor of \(u^{N^{2}/2}\), as desired. \(\spadesuit\)

Now, we can directly prove that \(\frac{\partial}{\partial u}\mu^{(u)}_{spec}(\lambda_1,\ldots,\lambda_N)=D^*\mu^{(u)}_{spec}(\lambda_1,\ldots,\lambda_N)\). For simplicity, consider the case \(N=3\). We have for the time derivative:

\[\displaystyle \frac{\frac{\partial}{\partial u}\mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3)}{\mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3)}=\frac{\lambda _1^2+\lambda _2^2+\lambda _3^2-9 u}{2 u^2}.\]

The first derivative looks as

\(\displaystyle \frac{\frac{\partial}{\partial \lambda_1} \left(\sum_{j\ne 1}\frac{1}{\lambda_1- \lambda_j} \mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3) \right) }{\mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3)} =\frac{1}{\left(\lambda _1-\lambda _2\right){}^2}+\frac{1}{\left(\lambda _1-\lambda _3\right){}^2}+\frac{\lambda _1 \left(-2 \lambda _1+\lambda _2+\lambda _3\right)+4 u}{\left(\lambda _1-\lambda _2\right) \left(\lambda _1-\lambda _3\right) u},\) so that all first derivatives in \(D^*\) contribute

\[\displaystyle - \frac{1}{\mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3)} \sum_{i=1}^{3} \frac{\partial}{\partial \lambda_i} \left(\Big(\sum_{j\colon j\ne i}\frac{1}{\lambda_i- \lambda_j}\Big) \mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3)\right) =-\frac{2}{\left(\lambda _1-\lambda _2\right){}^2}-\frac{2}{\left(\lambda _1-\lambda _3\right){}^2}-\frac{2}{\left(\lambda _2-\lambda _3\right){}^2}+\frac{3}{u}.\]

Finally, the second derivatives together give

\[\displaystyle \frac1{\mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3)}\frac{1}{2} \sum_{i=1}^{3} \frac{\partial^{2}}{\partial \lambda_i^{2}}\mu^{(u)}_{spec}(\lambda_1,\lambda_2,\lambda_3) =\frac{2}{\left(\lambda _1-\lambda _3\right){}^2}+\frac{2}{\left(\lambda _2-\lambda _3\right){}^2}+\frac{2}{\left(\lambda _1-\lambda _2\right){}^2}+\frac{\lambda _1^2+\lambda _2^2+\lambda _3^2-15 u}{2u^2}.\]

We thus readily see that the Fokker-Planck equation for \(N=3\) holds. For general \(N\) the proof can be continued by induction. Thus, Theorem 1 is also established.

Wigner's semicirle law
Wigner's semicirle law