Facts about Markov chains

2014/02/10

I have collected a number of facts about Markov chains that were discussed in lectures 5-9 in the graduate probability course in Spring 2014.

Finite Markov chains

Absorbing chains

Definition.

A finite Markov chain is called absorbing, if it has absorbing states (\(p_{ii}=1\)), and from any state it is possible to reach an absorbing state.

The quantities one can compute for an absorbing chain include

mean time that the chain will spend in state \(j\) given that it started from state \(i\);
mean time until absorption;
probability that a chain, started from state \(i\), will end up in an absorbing state \(k\).

These quantities are computed using the matrix \(N=(I-Q)^{-1}\), where \(Q\) is the matrix from the canonical form of \(P\).

Ergodic chains

Definition.

A Markov chain is called ergodic, if it is possible to reach any state \(i\) from any other state \(j\) (possibly in some number of steps).

Definition.

A Markov chain is called regular, if there exists \(n\) such that all matrix elements of the \(n\)-th power \(P^n\) of the transition matrix \(P\) are positive.

If the chain is regular, then it is ergodic. The converse is not true.

Theorem (Perron-Frobenius).

If \(P\) is a transition matrix of a finite regular Markov chain, then

there exists a unique stationary probability vector \(w=(w_1,\ldots,w_n)^{T}\), with \(w_1+\ldots+w_n=1\), and \(w_i>0\), such that \(w^{T}P=w^{T}\).

The rows of the matrix \(P^n\) converge to \(w^{T}\) as \(n\to\infty\).

There is a version of this theorem for ergodic, but not necessarily regular Markov chains:

Theorem.

If \(P\) is a transition matrix of a finite ergodic Markov chain, then

there exists a unique stationary probability vector \(w=(w_1,\ldots,w_n)^{T}\), with \(w_1+\ldots+w_n=1\), and \(w_i>0\), such that \(w^{T}P=w^{T}\).

The rows of the matrix \(\frac 1n(I+P+P^2+\ldots+P^{n-1})\) converge to \(w^{T}\) as \(n\to\infty\).

The quantities one can compute for an absorbing chain include

the mean return time to state \(i\) (if the chain started from state \(i\));
the mean number of steps \(m_{ij}\) a chain needs to get from state \(i\) to state \(j\).

The mean return time is equal to \(1/w_i\), where \(w_i\) is the component of the stationary probability distribution. The quantities \(m_{ij}\) can be computed in two ways:

either via the fundamental matrix \(Z=(I-P+ew^{T})^{-1}\),
or one can compute \(m_{ij}\) by making the state \(j\) into an absorbing state, and then compute the mean time until absorption (into state \(j\), which is the only absorbing state in the modified Markov chain).

Markov chains: classification of states

Let \(f_{jj}\) be the probability that the Markov chain, started at state \(j\), will ever return to state \(j\). We have

\[f_{jj}=\sum_{n=0}^{\infty}f_{jj}(n),\]

where \(f_{jj}(n)\) is the probability that the first return happens after exactly \(n\) steps.

We also can define

\[\mu_j:=\sum_{n=0}^{\infty}n f_{jj}(n)\]

to be the mean return time to state \(j\).

Classification of states.

A state of a Markov chain can be:

transient, if \(f_{jj}<1\);

persistent, if \(f_{jj}=1\). In this case, the state can be

positive persistent, if \(\mu_{j}<\infty\);

null persistent, if \(\mu_{j}=\infty\).

Theorem.

A state \(j\) is persistent if and only if
\[\sum_{n=0}^{\infty} p_{jj}(n)=\infty.\]

Definition.

States \(i\) and \(j\) intercommunicate if \(p_{ij}(n)>0\) and \(p_{ij}(m)>0\) for some \(n\) and \(m\).

Theorem.

Intercommunicating states \(i\) and \(j\) are both transient or both persistent at the same time.

Therefore, all states of the Markov chain are separated into classes:

\[T\sqcup C_1\sqcup C_2 \sqcup C_3\ldots,\]

where \(T\) is the collection of all transient states, and \(C_i\) are ergodic classes, such that once the chain arrives at \(C_i\), it can never leave this class. Moreover, we require that all states inside each of \(C_i\) intercommunicate.

One can compute probabilities that a chain, starting from a transient state, will end up in some given class \(C_j\). If the number of classes \(C_1,C_2,\ldots\) is finite, then these probabilities are computed very similarly to the probabilities of absorption for absorbing finite Markov chains.

Infinite Markov chains

Existence of stationary distribution

Assume that \(P\) is an infinite Markov chain such that all its states are persistent and intercommunicate. The notion of stationary distribution is more complicated in the infinite case.

Definition.

A vector \(v=(v_1,v_2,\ldots)^T\) is called the stationary vector if
\[\sum_{i}v_i p_{ij}=v_j\]
for any state \(j\).

If \(\sum_{i}v_j<\infty\), then it is possible to normalize \(v\) (divide each \(v_i\) by this finite sum \(\sum_{i}v_j\)) to get a stationary probability distribution.

Definition.

Let \(T_k\) be the (random) time to return to state \(k\). By persistence, \(T_k<\infty\).

Definition.

Let \(\rho_i(k)\) be the expected number of times that the chain visits the state \(i\) between to successive visits to state \(k\).

Lemma.

\(\sum_{i}\rho_i(k)=\mu_k=E(T_k)\).

Corollary.

The state \(k\) is persistent if and only if \(\sum_{i}\rho_i(k)=\mu_k=E(T_k)<\infty\).

Lemma.

\(\rho_i(k)<\infty\) for all \(k,i\).

Theorem.

The vector \(v=(\rho_1(k),\rho_2(k),\ldots)^T\) (for any fixed state \(k\)) is the stationary vector of the Markov chain.

Conclusion.

For every persistent Markov chain with intercommunicating states, for every state \(k\) we have a positive stationary vector \(\rho(k)\). We should get a stationary distribution by normalizing to make it a probability distribution. But \(\sum_i \rho_i(k)=\mu_k\), so this is possible only if \(k\) is positive persistent.

Theorem.

For every persistent Markov chain with intercommunicating states:

either all states are null persistent;

or all states are positive persistent. In this second case the Markov chain has a unique stationary probability distribution.

Convergence to stationary distribution

We will consider positive persistent Markov chains with all states intercommunicating. We need one more condition:

Definition.

Let \(i\) be a state. Period of state \(i\) is, by definition,
\[d(i)=GCD\{n\colon p_{ii}(n)>0\}.\]
State \(i\) is called aperiodic if \(d(i)=1\), and chain is called aperiodic if all states are aperiodic.

In fact, since all states intercommunicate, one needs only one aperiodic state.

Aperiodic chains play the role of regular finite Markov chains.

Theorem.

Let \(P\) be a positive persistent, aperiodic Markov chain with all states intercommunicating. Then
\[P(X_n=k\mid X_0=i)\to \pi_k\]
as \(n\to\infty\), where \(\pi=(\pi_1,\pi_2,\pi_3,\ldots)^T\) is the stationary probability distribution.