Research Background Pages

These notes are written for my PhD student, who is a Mathematician who needs to know a little Statistical Mechanics for his thesis introduction. YMMV. He has asked: What is Statistical Mechanics? Who is interested in it?
What is the relationship between the partition function and Observables? ...let's start with those questions. %{{{ 1 \subsection{What? Why? (Click Me)}

Statistical Mechanics is part of Physics. Physics might be characterised, in the large, as the scientific exercise (as opposed to involuntary reflex) of modeling of the observable physical world. That is, the representation of the physical world by something `simpler', which nonetheless captures some of the physical world's humanistically essential features.
There are various phases to this exercise, such as:
(i) deciding which toy is the model;
(ii) working out what the model itself does; and
(iii) interpreting this behaviour as a prediction for the physical world.
The simple toys at our disposal (such as real toys, and systems of equations in mathematics) either exist themselves in the physical world, or are abstractions formulated by creatures living in the physical world. In particular Scientists have had notable success summarizing large amounts of observational data from the physical world with certain relatively simple mathematical models. A very successful such model is, reasonably, regarded as close to nature itself; and hence fundamental. Key to this is the expectation that such a model, pushed into an as yet unobserved (but suitably nearby) regime, will correctly predict the result of observations subsequently made there. There has been notable success too in this predictive aspect of Physics, and great technological benefits have accrued.
...So where does Statistical Mechanics fit in?

"The fundamental laws necessary for the mathematical treatment of a large part of physics and the whole of chemistry are thus completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved."
Paul Dirac

In this quote Dirac points out that the problems of Physics do not end, by any means, with the determination of fundamental principles. They include such fundamental problems; and also problems of computation.
(Indeed for the subject we are going to describe here, its original historical development was assumed to be on the fundamental side. Only a better understanding of its setting later showed otherwise.)
An example of the laws that Dirac is refering to would be Newton's laws, which do a good job of determining the classical dynamics of a single particle moving through a given force-field. Two-body systems are also manageable but after that, even though it may well still be Newtonian (or some other well-understood) laws that apply in principle, exact dynamics will simply not be computationally accessible.

At least some understanding of the modelling of many-body systems is needed in order to work with a number of important materials (magnets, magnetic recording materials, LCDs, non-perturbative QFT etc). In each such case, the key dynamical components of the system are numerous, and interact with each other. Thus the force fields affecting the movement of one, are caused by the others; and when it moves, its own field changes, moving the others.

The solution:

The equilibrium Statistical Mechanical approach to such problems is to try to model only certain special types of observation that could be made on the system. One then models these observations by weighted averages over all possible instantaneous states of the system. In other words dynamics is not modelled directly (questions about dynamics are not asked directly). As far as is appropriate, dynamics is encoded in the weightings -- the probabilities asigned to states.

It is most convenient to pass to an example. We shall choose a bar magnet. We shall assume that the metal crystal lattice is essentially fixed (the formation of the lattice is itself a significant problem, but we will have enough on our plate). The set of states of the system that we shall allow shall be the possible orientations of the atomic magnetic dipoles (not their positions, which shall be fixed on the lattice sites). What next? %}}} %{{{ 2

A good rule of thumb when analysing a physical system is: "follow the energy". (This begs many questions, all of which we ignore.) The kinetic energy of a system of $N$ point particles with masses $m_i$ and velocities $v_i$ is \[ E_{kin} = \sum_{i=1}^{N} \frac{1}{2} m_i v_i^2 \] What can affect a particle's subsequent velocity, and hence change its kinetic energy? That is, what causes $\frac{dv}{dt}$ to be non-zero? A force: \[ F = m \frac{dv}{dt} \] Thus we also need to understand the forces acting on the particles. For example: If they are really pointlike then they interact pairwise via the Coulomb force \[ F_1 = \frac{q_1 q_2}{4 \pi \epsilon_0} \frac{\underline{r}_{12}}{r_{12}^3} = -F_2 \] Here $q_1,q_2$ are the charges (perhaps in coulombs); $\epsilon_0$ is a constant (depending on that unit choice); and $\underline{r}_{12} = \underline{r}_1 -\underline{r}_2$. For a moment we can think of this as a force field created by the second particle, acting on any charged first particle. This is a conservative force field; meaning that there is a function $\phi(\underline{r})$ such that \[ F = - \nabla \phi \] The function $\phi(\underline{r})$ is part of the potential energy of the first particle. In other words its `total energy' is \[ E = \frac{1}{2} m v^2 + \phi \] In practice, since $\phi$ is only defined up to an additive constant, $E$ itself is not so significant as changes in $E$. %

Notice that system energy $E$ depends on the velocities and positions of all the atoms in the system. There are $10^{23}$ or so atoms in a handful of Earthbound matter, so we are not going to be able to keep track of them all (nor do we really want to). We would rather know about the bulk, averaged behaviour of the matter. Let us call the inaccessible complete microscopic specification of all positions and velocities in the system a `microstate'. Then for each microstate $\sigma$ we know, in principle, the total energy $E(\sigma)$. We could ask: What is the probability $P$ of finding the system, at any given instant, in a specific microstate? Then we could compute an expected value for some bulk observation ${\mathcal O}$ by a weighted average over the microstates: \begin{equation} \label{initexpect} \langle {\mathcal O} \rangle = \sum_{\sigma} {\mathcal O}(\sigma) P(\sigma) \end{equation} In principle the probability $P$ could depend on every aspect of $\sigma$. This would make computation very hard. At the other extreme, $P$ could be independent of $\sigma$. But this turns out to be a problematic assumption for a number of Mathematical and Physical reasons. Another working assumption would be that two microstates are equally likely if they have the same energy; i.e. that $P$ depends on $\sigma$ just through $E$. That is, that $P$ depends only on the total energy of the system. Let us try this. The next question is: How does $P$ depend on $E$? What is the function $P(E)$? If we have a large system, then we could consider describing it in two parts (left and right side, say), separated by some notional boundary, with the total microstate $\sigma$ being made up of $\sigma_L$ and $\sigma_R$. These halves are in contact, of course, along the boundary. But if the system is also in contact with other systems (so that energy is not required to be locally conserved), then it is plausible to assume that the states of the two halves are independent variables. In this case \[ P(\sigma) = P(\sigma_L) P(\sigma_R) \] as for such probabilities in general. Similarly, the total energy \[ E = E_L + E_R + E_{int} \] (where $ E_{int} $ is the interaction energy between the halves) is reasonably approximated by \[ E \sim E_L + E_R \] (Why is this reasonable?!... Clearly the kinetic energy is localised in each of the two halves. The potential energy is made up of contributions from all pairs, including pairs with one in each half. But we assume that the pair potential is greater for pairs that are closer together; and that the boundary is a structure of lower dimension that the system overall. In this sense $E_{int}$ is localised in the boundary (pairs that are close together but in separate halves are necessarily close to the boundary); while being part of the overall potential energy, which is spread with essentially constant density over the whole system. Thus $E_{int}$ is a vanishing proportion of the whole energy for a large system. (We shall return to these core Physical assumptions of Statistical Mechanics later. They imply an intrinsic restriction in Statistical Mechanics to treating interactions that are, in a suitable sense, short-range. Fortunately this seems Physically justifiable.)) The $L$ and $R$ subsystems will each have their own `energy-only' probability function. Thus we have something like \begin{equation} \label{ppp1} P(E_L+E_R) = P_L(E_L) P_R(E_R) \end{equation} In this expression $E_L$ and $E_R$ are independent variables, so \[ \frac{\partial P(E_L+E_R)}{\partial E_L} = \frac{\partial P(E_L+E_R)}{\partial E_R} \] so $ P_L'(E_L) P_R(E_R) = P_L(E_L) P_R'(E_R) $, so \[ \frac{P_L'(E_L)}{P_L(E_L)} = \frac{P_R'(E_R)}{P_R(E_R)} \] This separates. We write $-\beta$ for the constant of separation. We have $P'_L(E_L) = -\beta P_L(E_L)$ (and similarly for $R$). This is solved by a function of form \[ P(E) = C \exp(-\beta E) \] where $C$ is any constant. In our case $C$ is determined by \[ \sum_{\sigma} P(E(\sigma)) =1 \] The separation constant $\beta$ is interesting, since it is the only thing (other than the form of the function itself) that connects the subsystems. We will see later that this connection corresponds (inversely) to a notion of temperature. [click to minimize subsection]

The normalisation function for our system \[ Z(\beta) =\sum_{\sigma} \exp(-\beta E(\sigma)) \] ($Z$ for zustatensummen, or some such name due to Boltzmann) is called the partition function. That is, for given $\beta$, \[ P(E) = \frac{\exp(-\beta E)}{Z} \] Recall that, by our derivation, $\beta$ represents the effect of thermal (energetic) contact with the universe of other systems. Our usual notion of the bulk contribution of neighbouring systems on the energetics of a given system, at least where long-time-stable (equilibrium) properties are concerned, is the notion of temperature. Thus $\beta$ encodes temperature. How specifically does it do this? See later. First we want to consider the pay-off for the analysis we have made so far. The idea was that we would be able to compute time-averaged bulk properties of the system. To produce a concrete example, we are going to need to make a concrete choice for $E$. If $S$ is the set of all possible instantaneous states of the system, then \[ E : S \rightarrow \Re \] associates a real energy value to each state. We now formulate choices for $S$ and $E$ via a long series of simplifying, but not trivialising, assumptions.

Figure: square lattice array of spins
For the sake of simplicity we can choose a 'model' in which the important dynamical components of the system are physically fixed, and only vary in orientation. By convention we call such components 'spins'. We shall assume that these orientation variations involve negligible kinetic energy. Instead we shall make $E$ non-trivial by saying that the potential energy depends on the relative orientations, in pairs. Thus the state of the system will be a list of orientations of spins. (That is, we omit to record any positions or velocities of spins in our example, because we are not going to need these to compute our model $E$.) (Later we will see that useful models, of certain magnetic materials for example, do arise this way.) Again for the sake of simplicity, let us say that each spin orientation can take one of $Q$ possible values ($Q=2$ would be the simplest non-trivial possibility - perhaps the spins are each oriented either 'up' or 'down' with respect to some frame). In this case, $S$ is a set of lists of states of spins. Each element of $S$ asigns a state (from $Q$ possibilities) to each spin. Specifically, if there are $N$ spins, then there are $Q^N$ possible states of the system in total. Next we need to define the energy function on this set. Each spin is located at some point in Physical space. It is reasonable to consider a system in which spins interact more strongly with those other spins that are closest to them. As a very simple approximation to this, let us consider a graph $G$, with a vertex for each spin, and an edge if two spins are close enough to interact 'significantly'. For example, we might have in mind a simple cubic graph (a crystal lattice). Then we are saying that two spins only interact significantly if they are nearest neighbours on the crystal lattice. (This is not always going to be a good approximation, but for now we are just playing.) For $s \in S$ we have \[ E(s) = \sum_{(i,j) \in G} f(s(i),s(j)) \] where function $f$ is the spin-spin contribution to the potential energy, given the orientation of the two spins at either end of an edge of graph $G$ (i.e. two spins close to each other on the crystal lattice). Note that, in our simple setup, there are only a grand total of $Q^2$ possible arguments for $f$. We can define a model by choosing an energy value for each of these. (Some choices are going to be more 'physically realistic' than others.) A reasonable initial choice is to enegetically favour configurations in which spins are oriented in the same direction: \[ f = -1 \qquad (s(i)=s(j)); \qquad \; f =0 \qquad (otherwise) \] (that is, $f(s,s') =-\delta_{s,s'}$ where $\delta$ is the Kronecker delta function). This is the Potts model.

To summarize, we have the $Q$-state Potts model partition function \[ Z = Z(G) = \sum_{s \in S} \exp\left(\beta \sum_{(i,j) \in G} \delta_{s(i),s(j)} \right) \] Note that, fixing $Q$ ($Q=2$, say), this is simply a polymial in \[ x= \exp(\beta) \] for each choice of graph $G$. Let's start with an almost trivial example. If $Q=2$ and the graph is $K_2$ ($K_n$ is the complete graph on $N$ vertices) we have \[ Z(K_2) =2x+2 \] Notice that with this choice of energy function it does not matter exactly what form the $Q$ distinct spin 'states' take, since it only depends on whether they are equal or not. For definiteness let us take the set \[ \{ 1,2,...,Q \} \] Our case $Q=2$ coincides almost exactly with the famous Ising model. The only difference is that there it is conventional to take the set \[ \{ +1, -1 \} \] and then to define $f(s,s')=ss'$, giving \[ Z_{Ising}(K_2) = 2x + 2x^{-1} \] This differs from the Potts answer by an overall factor, and by $\beta \rightarrow \beta/2$ (both here and for arbitrary $G$). We will see later that these changes are essentially trivial, and we will not trouble even to remark on them again thereafter.

The partition function $Z$ is 'just' a normalising factor, but \[ \frac{d \ln Z}{d \beta} = - \frac{1}{Z} \sum_{s\in S} E(s) \exp(-\beta E(s)) \] which satisfies our definition of an observable. Indeed it is an important observable called the 'internal energy' (scale this by $1/N$, where $N$ is the number of spins in the system, for the 'energy density' $U$). In light of this, we see that the analysis of $Z$ does contain Physics! Suppose that our energy function is quantised, as in \[ E:S \rightarrow \pm {\mathbb N} \] (as it is in the Potts case). Then $Z$ is polynomial in $x = \exp(\beta)$. Accordingly its only analytic structure is zeros. How can zeros of a polynomial reveal physics?...
cubic lattice Ising model partition function

Figure: cubic lattice Ising model partition function
The answer to this question is not simple (but it is interesting). Let us start with an example.
The Figure shows the distribution of zeros of the partition function for a $Q=2$ state Potts model on a finite (but largeish) cubical lattice. (The result is obtained, in the end, by a mixture of algebra and brute computation. Such results can be checked in a number of ways. In this case it has been independently computed by Y Valani and P Martin, using different methods. We shall discuss these methods later - it turns out that they can be of considerable intrinsic interest.) This partition function is a polynomial of degree several hundred! It is a polynomial with positive coefficients, so many of the zeros occur in complex conjugate pairs. It is also 'approximately' (in a sense we shall explain later) a polynomial in $x^2$, so there is an approximate left/right symmetry. With these observations in mind, it is enough to focus our attention on the 'positive' quadrant... The solid line in the figure is the interval [0,1] in the Argand plane. Since Physically meaningful temperatures are real and positive, so is $\beta$. Therefore the only part of the complex $x=\exp(\beta)$ plane with Physical significance is the interval $[1,\infty)$. In this Figure there are no zeros in this interval. However, the interval clearly contains a point 'pinched' by two conjugate lines of complex zeros. We shall see that this is a signal of a 'phase transition' in the Physical system at this point. (We can also give a Physical interpretation for the other pinch point on the positive side - we'll come back to this later.) Strictly speaking, one needs to look at a number of such results in order to extract Physics. (We shall do this later.) For now: What does the picture tell us? The SPECIFIC HEAT $S$ is an observable telling us the rate of change of internal energy with temperature (or in practice the other way round - in an experiment one might add heat to a system, and measure the resultant change in temperature): $$ S = \frac{\partial^2 \frac{1}{N} \ln Z}{\partial \beta^2} $$ Consider the formulation \[ Z = \prod_j (x-x_j) \] \[ \frac{\partial \frac{1}{N} \ln Z}{\partial\beta} = \frac{x}{N} \sum_j \frac{1}{x-x_j} \] So, moving along the real line (varying the temperature), both $S$ and $U$ go crazy when there are zeros close by. 'Going crazy' is, for the moment, our signal of a phase transition!