1 Introduction and Historical Overview ..... 3
2 Basic Statistical Notions ..... 7
2.1 Probability Theory and Random Variables ..... 7
2.2 Entropy ..... 11
2.3 Examples of probability distributions ..... 12
2.3.1 Gaussian distribution ..... 13
2.3.2 Binomial and Poisson distribution ..... 13
2.3.3 Ising model ..... 14
2.3.4 Site percolation ..... 15
2.3.5 Random walk on a lattice ..... 17
2.4 Ensembles in Classical Mechanics ..... 18
2.5 Ensembles in Quantum Mechanics (Statistical Operators / Density Matrices) ..... 24
3 Time-evolving ensembles ..... 29
3.1 Boltzmann Equation in Classical Mechanics ..... 29
3.2 Boltzmann Equation, Approach to Equilibrium in Quantum Mechanics ..... 36
4 Equilibrium Ensembles ..... 39
4.1 Generalities ..... 39
4.2 Micro-Canonical Ensemble ..... 39
4.2.1 Micro-Canonical Ensemble in Classical Mechanics ..... 39
4.2.2 Microcanonical Ensemble in Quantum Mechanics ..... 46
4.2.3 Mixing entropy of the ideal gas ..... 49
4.3 Canonical Ensemble ..... 51
4.3.1 Canonical Ensemble in Quantum Mechanics ..... 51
4.3.2 Canonical Ensemble in Classical Mechanics ..... 54
4.3.3 Equidistribution Law and Virial Theorem in the Canonical Ensemble ..... 57
4.4 Grand Canonical Ensemble ..... 61
4.5 Summary of different equilibrium ensembles ..... 64
4.6 Approximation methods ..... 65
4.6.1 The cluster expansion ..... 65
4.6.2 Peierls contours ..... 67
5 The Ideal Quantum Gas ..... 71
5.1 Hilbert Spaces, Canonical and Grand Canonical Formulations ..... 71
5.2 Degeneracy pressure for free fermions ..... 78
5.3 Spin Degeneracy ..... 81
5.4 Black Body Radiation ..... 83
5.5 Degenerate Bose Gas ..... 87
6 The Laws of Thermodynamics ..... 91
6.1 The Zeroth Law ..... 92
6.2 The First Law ..... 94
6.3 The Second Law ..... 99
6.4 Cyclic processes ..... 102
6.4.1 The Carnot Engine ..... 102
6.4.2 General Cyclic Processes ..... 107
6.4.3 The Diesel Engine ..... 109
6.5 Thermodynamic potentials ..... 110
6.6 Chemical Equilibrium ..... 116
6.7 Phase Co-Existence and Clausius-Clapeyron Relation ..... 118
6.8 Osmotic Pressure ..... 123
A Dynamical Systems and Approach to Equilibrium ..... 125
A. 1 The Master Equation ..... 125
A. 2 Properties of the Master Equation ..... 127
A. 3 Relaxation time vs. ergodic time ..... 129
A. 4 Monte Carlo methods and Metropolis algorithm ..... 132
A. 5 Eigenstate thermalization ..... 134
B Exercises ..... 137
B. 1 Exercises for chapter 2 ..... 137
B. 2 Exercises for chapter 3 ..... 141
B. 3 Exercises for chapter 4 ..... 143
B. 4 Exercises for chapter 5 ..... 149
B. 5 Exercises for chapter 6 ..... 152
List of Figures
1.1 Boltzmann's tomb with his famous entropy formula engraved at the top ..... 4
2.1 Graphical expression for the first four moments. ..... 10
2.2 Sketch of a well-potential W\mathcal{W}. ..... 19
2.3 Evolution of a phase space volume under the flow map Phi_(t)\Phi_{t} ..... 20
2.4 Sketch of the situation described in the proof of Poincaré recurrence. ..... 23
3.1 Classical scattering of particles in the "fixed target frame". ..... 31
3.2 Pressure on the walls due to the impact of particles. ..... 33
3.3 Sketch of the air-flow across a wing. ..... 34
4.1 Gas in a piston maintained at pressure PP. ..... 43
4.2 The joint number of states for two systems in thermal contact. ..... 45
4.3 Number of states with energies lying between E-Delta EE-\Delta E and EE. ..... 48
4.4 Two gases separated by a removable wall. ..... 49
4.5 A small system in contact with a large heat reservoir. ..... 51
4.6 Distribution and velocity of stars in a galaxy. ..... 59
4.7 Sketch of a potential V\mathcal{V} of a lattice with a minimum at Q_(0)Q_{0}. ..... 59
4.8 A small system coupled to a large heat and particle reservoir. ..... 61
4.9 A Peierls contour ..... 68
5.1 The potential V( vec(r))\mathcal{V}(\vec{r}) ocurring in (5.46). ..... 81
5.2 Lowest-order Feynman diagram for photon-photon scattering in Quantum Electrodynamics. ..... 84
5.3 Sketch of the Planck distribution for different temperatures. ..... 86
6.1 The triple point of ice water and vapor in the (P,T)(P, T) phase diagram ..... 93
6.2 A large system divided into subsystems I and II by an imaginary wall. ..... 94
6.3 Change of system from initial state ii to final state ff along two different paths. ..... 94
6.4 A curve gamma:[0,1]rarrR^(2)\gamma:[0,1] \rightarrow \mathbb{R}^{2}. ..... 95
6.5 Sketch of the submanifolds A\mathcal{A}. ..... 99
6.6 Adiabatics of the ideal gas ..... 101
6.7 Carnot cycle for an ideal gas. The solid lines indicate isotherms and the dashed lines indicate adiabatics. ..... 104
6.8 The Carnot cycle in the (T,S)(T, S)-diagram. ..... 106
6.9 A generic cyclic process in the ( T,ST, S )-diagram. ..... 107
6.10 A generic cyclic process divided into two parts by an isotherm at temper- ature T_(I)T_{I}. ..... 108
6.11 The process describing the Diesel engine in the (P,V)(P, V)-diagram. ..... 109
6.12 The phase boundary between solution and a solute. ..... 119
6.13 Imaginary phase diagram for the case of 6 different phases. At each point on a phase boundary which is not an intersection point, varphi=2\varphi=2 phases are supposed to coexist. At each intersection point varphi=4\varphi=4 phases are supposed to coexist. ..... 120
6.14 Phase boundary of a vapor-solid system in the (P,T)(P, T)-diagram ..... 122
Chapter 1
Introduction and Historical Overview
As the name suggests, thermodynamics historically developed as an attempt to understand phenomena involving heat. This notion is intimately related to irreversible processes involving typically many, essentially randomly excited, degrees of freedom. The proper understanding of this notion as well as the 'laws' that govern it took the better part of the 19^("th ")19^{\text {th }} century. The basic rules that were, essentially empirically, observed were clarified and laid out in the so-called "laws of thermodynamics". These laws are still useful today, and will, most likely, survive most microscopic models of physical systems that we use.
Before the laws of thermodynamics were identified, other theories of heat were also considered. A curious example from the 17^("th ")17^{\text {th }} century is a theory of heat proposed by J. Becher. He put forward the idea that heat was carried by special particles he called scientists such as A.L. de Lavoisier ^(2){ }^{2}, who showed that the existence of such a particle did not explain, and was in fact inconsistent with, the phenomenon of burning, which he instead correctly associated also with chemical processes involving oxygen. Heat had already previously been associated with friction, especially through the work of B. Thompson, who showed that in this process work (mechanical energy) is converted to heat. That heat transfer can generate mechanical energy was in turn exemplified through the steam engine as developed by inventors such as J. Watt, J. Trevithick, and
T. Newcomen - the key technical invention of the 18^("th ")18^{\text {th }} and 19^("th ")19^{\text {th }} century. A broader theoretical description of processes involving heat transfer was put forward in 1824 by N.L.S. Carnot, who emphasized in particular the importance of the notion of equilibrium. The quantitative understanding of the relationship between heat and energy was found by J.P. Joule and R. Mayer, who were the first to state clearly that heat is a form of energy. This finally lead to the principle of conservation of energy put forward by H . von Helmholtz in 1847.
Parallel to this largely phenomenological view of heat, there were also early attempts to understand this phenomenon from a microscopic angle. This viewpoint seems to have been first stated in a transparent fashion by D. Bernoulli in 1738 in his work on hydrodynamics, in which he proposed that heat is transferred from regions with energetic molecules (high internal energy) to regions with less energetic molecules (low energy). The microscopic viewpoint ultimately lead to the modern 'bottom up' view of heat by J.C. Maxwell, J. Stefan and especially L. Boltzmann. According to Boltzmann, heat is associated with a quantity called "entropy" which increases in irreversible processes. In the context of equilibrium states, entropy can be understood as a measure of the number of accessible states at a defined energy according to his famous formula
which Planck had later engraved in Boltzmann's tomb on Wiener Zentralfriedhof:
Figure 1.1: Boltzmann's tomb with his famous entropy formula engraved at the top.
The formula thereby connects a macroscopic, phenomenological quantity SS to the microscopic states of the system (counted by W(E)=W(E)= number of accessible states of energy EE ). His proposal to relate entropy to counting problems for microscopic configurations and thereby to ideas from probability theory was entirely new and ranks as one of the major intellectual accomplishments in Physics. The systematic understanding of the relationship between the distributions of microscopic states of a system and
macroscopic quantities such as SS is the subject of statistical mechanics. That subject nowadays goes well beyond the original goal of understanding the phenomenon of heat but is more broadly aimed at the analysis of systems with a large number of, typically interacting, degrees of freedom and their description in an "averaged", or "statistical", or "coarse grained" manner. As such, statistical mechanics has found an ever growing number of applications to many diverse areas of science, such as
Neural networks and other networks
Financial markets
Data analysis and mining
Astronomy
Black hole physics
and many more. Here is an, obviously incomplete, list of some key innovations in the subject:
Timeline
17^("th ")17^{\text {th }} century:
Ferdinand II, Grand Duke of Tuscany: Quantitative measurement of temperature 18^("th ")18^{\text {th }} century:
A.Celsius, C. von Linné: Celsius temperature scale
A.L. de Lavoisier: basic calometry
D. Bernoulli: basics of kinetic gas theory
B. Thompson (Count Rumford): mechanical energy can be converted to heat 19^("th ")19^{\text {th }} century:
1802 J. L. Gay-Lussac: heat expansion of gases
1824 N.L.S.Carnot: thermodynamic cycles and heat engines
1847 H. von Helmholtz: energy conservation (1 ^("st "){ }^{\text {st }} law of thermodynamics)
1848 W. Thomson (Lord Kelvin): definition of absolute thermodynamic temperature scale based on Carnot processes
1850 W. Thomson and H. von Helmholtz: impossibility of a perpetuum mobile (2 2^("nd ")2^{\text {nd }} law)
1857 R. Clausius: equation of state for ideal gases
1860 J.C. Maxwell: distribution of the velocities of particles in a gas
1865 R.Clausius: new formulation of 2^("nd ")2^{\text {nd }} law of thermodynamics, notion of entropy
1877 L. Boltzmann: S=k_(B)log WS=\mathrm{k}_{\mathrm{B}} \log W
1876 (as well as 1896 and 1909) controversy concerning entropy, Poincaré recurrence is not compatible with macroscopic behavior
1894 W. Wien: black body radiation 20^("th ")20^{\text {th }} century:
1900 M. Planck: radiation law rarr\rightarrow Quantum Mechanics
1911 P. Ehrenfest: foundations of Statistical Mechanics
1924 Bose-Einstein statistics
1925 Fermi-Pauli statistics
1931 L. Onsager: theory of irreversible processes
1937 L. Landau: phase transitions, later extended to superconductivity by Ginzburg
1930's W. Heisenberg, E. Ising, R. Peierls,... : spin models for magnetism
1943 S. Chandrasekhar, R.H. Fowler: applications of statistical mechanics in astrophysics
1956 J. Bardeen, L.N. Cooper, J.R. Schrieffer: explanation of superconductivity
1956-58 L. Landau: theory of Fermi liquids
1960's T. Matsubara, E. Nelson, K. Symanzik,... : application of Quantum Field Theory methods to Statistical Mechanics
1970's L. Kadanoff, K.G. Wilson, W. Zimmermann, F. Wegner,...: renormalization group methods in Statistical Mechanics
1973 J. Bardeen, B. Carter, S. Hawking, J. Bekenstein, R.M. Wald, W.G. Unruh,....: laws of black hole mechanics, Bekenstein-Hawking entropy
1975 - Neural networks
1985 - Statistical physics in economy
Chapter 2
Basic Statistical Notions
2.1 Probability Theory and Random Variables
Statistical mechanics is an intrinsically probabilistic description of a system, so we do not ask questions like "What is the velocity of the N^("th ")\mathrm{N}^{\text {th }} particle?" but rather questions of the sort "What is the probability for the N^("th ")\mathrm{N}^{\text {th }} particle having velocity between vv and v+Delta vv+\Delta v ?" in an ensemble of particles. Thus, basic notions and manipulations from probability theory can be useful, and we now introduce some of these, without any attention paid to mathematical rigor.
A random variable xx can have different outcomes forming a set Omega={x_(1),x_(2),dots}\Omega=\left\{x_{1}, x_{2}, \ldots\right\}, e.g. for tossing a coin Omega_("coin ")={\Omega_{\text {coin }}=\{ head,tail }\} or for a dice Omega_("dice ")={1,2,3,4,5,6}\Omega_{\text {dice }}=\{1,2,3,4,5,6\}, or for the velocity of a particle Omega_("velocity ")={( vec(v))=(v_(x),v_(y),v_(z))inR^(3)}\Omega_{\text {velocity }}=\left\{\vec{v}=\left(v_{x}, v_{y}, v_{z}\right) \in \mathbb{R}^{3}\right\}, etc.
An event is a subset E sub OmegaE \subset \Omega (not all subsets need to be events).
A probability measure is a map that assigns a number P(E)P(E) to each event, subject to the following general rules:
(i) P(E) >= 0P(E) \geqslant 0.
(ii) P(Omega)=1P(\Omega)=1.
(iii) If E nnE^(')=O/=>P(E uuE^('))=P(E)+P(E^('))E \cap E^{\prime}=\varnothing \Rightarrow P\left(E \cup E^{\prime}\right)=P(E)+P\left(E^{\prime}\right).
In mathematics, the data (Omega,P,{E})(\Omega, P,\{E\}) is called a probability space and the above axioms basically correspond to the axioms for such spaces. For instance, for a fair dice the probabilities would be P_("dice ")({1})=dots=P_("dice ")({6})=(1)/(6)P_{\text {dice }}(\{1\})=\ldots=P_{\text {dice }}(\{6\})=\frac{1}{6} and EE would be any subset of {1,2,3,4,5,6}\{1,2,3,4,5,6\}. In practice, probabilities are determined by repeating the experiment (independently) many times, e.g. throwing the dice very often. Thus, the "empirical definition" of the probability of an event EE is
where N_(E)=N_{E}= number of times EE occurred, and N=N= total number of experiments. For one real variable x in Omega subRx \in \Omega \subset \mathbb{R}, it is common to write the probability of an event E subRE \subset \mathbb{R} formally as
{:(2.2)P(E)=int_(E)p(x)dx:}\begin{equation*}
P(E)=\int_{E} p(x) d x \tag{2.2}
\end{equation*}
Here, p(x)p(x) is the probability density "function", defined formally by:
"p(x)dx=P((x,x+dx))"" p(x) d x=P((x, x+d x)) "
The axioms for pp formally imply that we should have
A mathematically more precise way to think about the quantity p(x)dxp(x) d x is provided by measure theory, i.e. we should really think of p(x)dx=d mu(x)p(x) d x=d \mu(x) as defining a measure and of {E}\{E\} as the corresponding collection of measurable subsets. A typical case is that pp is a smooth (or even just integrable) function on R\mathbb{R} and that dxd x is the Lebesgue measure, with EE from the set of all Lebesgue measurable subsets of R\mathbb{R}. However, we can also consider more pathological cases, e.g. by allowing pp to have certain singularities. It is possible to define "singular" measures d mud \mu relative to the Lebesgue measure dxd x which are not writable as p(x)dxp(x) d x and pp an integrable function which is non-negative almost everywhere. An example of this is the Dirac measure, which is formally written as
where p_(i) >= 0p_{i} \geqslant 0 and sum_(i)p_(i)=1\sum_{i} p_{i}=1. Nevertheless, we will, by abuse of notation, stick with the informal notation p(x)dxp(x) d x. We can also consider several random variables, such as x=(x_(1),dots,x_(N))in Omega=R^(N)x=\left(x_{1}, \ldots, x_{N}\right) \in \Omega=\mathbb{R}^{N}. The probability density function would now be - again formally - a function p(x) >= 0p(x) \geqslant 0 on R^(N)\mathbb{R}^{N} with total integral of 1 .
Of course, as the example of the coin shows, one can and should also consider discrete probability spaces such as Omega={1,dots,N}\Omega=\{1, \ldots, N\}, with the events EE being all possible subsets. For the elementary event {n}\{n\} the probability p_(n)=P({n})p_{n}=P(\{n\}) is then a non-negative number and sum_(i)p_(i)=1\sum_{i} p_{i}=1. The collection of {p_(1),dots,p_(N)}\left\{p_{1}, \ldots, p_{N}\right\} completely characterizes the probability distribution.
Let us collect some standard notions and terminology associated with probability spaces:
The expectation value (:F(x):)\langle F(x)\rangle of a function R^(N)sup Omega∋x|->F(x)inR\mathbb{R}^{N} \supset \Omega \ni x \mapsto F(x) \in \mathbb{R} ("observable") of a random variable is
{:(2.4)(:F(x):):=int_(Omega)F(x)p(x)d^(N)x.:}\begin{equation*}
\langle F(x)\rangle:=\int_{\Omega} F(x) p(x) d^{N} x . \tag{2.4}
\end{equation*}
Here, the function F(x)F(x) should be such that this expression is actually well-defined, i.e. FF should be integrable with respect to the probability measure d mu=p(x)d^(N)xd \mu=p(x) d^{N} x.
The moments m_(n)m_{n} of a probability density function pp of one real variable x in Omega=Rx \in \Omega=\mathbb{R} are defined by
{:(2.5)m_(n):=(:x^(n):)=int_(-oo)^(oo)x^(n)p(x)dx:}\begin{equation*}
m_{n}:=\left\langle x^{n}\right\rangle=\int_{-\infty}^{\infty} x^{n} p(x) d x \tag{2.5}
\end{equation*}
Note that it is not automatically guaranteed that the moments are well-defined, and the same remark applies to the expressions given below. The probability distribution pp can be reconstructed from the moments under certain conditions. This is known as the "Hamburger moment problem".
The characteristic function tilde(p)\tilde{p} of a probability density function of one real variable is its Fourier transform, defined as
{:(2.6) tilde(p)(k)=int_(-oo)^(oo)dxe^(-ikx)p(x)=(:e^(-ikx):)=sum_(n=0)^(oo)((-ik)^(n))/(n!)(:x^(n):):}\begin{equation*}
\tilde{p}(k)=\int_{-\infty}^{\infty} d x e^{-i k x} p(x)=\left\langle e^{-i k x}\right\rangle=\sum_{n=0}^{\infty} \frac{(-i k)^{n}}{n!}\left\langle x^{n}\right\rangle \tag{2.6}
\end{equation*}
The Fourier inversion formula gives
{:(2.7)p(x)=(1)/(2pi)int_(-oo)^(oo)dke^(ikx) tilde(p)(k):}\begin{equation*}
p(x)=\frac{1}{2 \pi} \int_{-\infty}^{\infty} d k e^{i k x} \tilde{p}(k) \tag{2.7}
\end{equation*}
The cumulants (:x^(n):)_(c)\left\langle x^{n}\right\rangle_{c} are defined via
There is an important combinatorial scheme relating moments to cumulants. The result expressed by this combinatorial scheme is called the linked cluster theorem, and a variant of it will appear when we discuss the cluster expansion. In order to state and illustrate the content of the linked cluster theorem, we represent the first four moments graphically as follows:
Figure 2.1: Graphical expression for the first four moments.
A blob indicates a connected moment, also called 'cluster'. The linked cluster theorem states that the numerical coefficients in front of the various terms can be obtained by finding the number of ways to break points into clusters of this type. A proof of the linked cluster theorem can be obtained as follows: we write
where sum'\sum^{\prime} is restricted to sum ni_(n)=m\sum n i_{n}=m. The claimed graphical expansion follows because (m!)/(prod_(n)i_(n)!(n!)^(i_(n)))\frac{m!}{\prod_{n} i_{n}!(n!)^{i_{n}}} is the number of ways to break mm points into clusters characterized by the numbers {i_(n)}\left\{i_{n}\right\}, where i_(n)i_{n} is the number of clusters with nn points.
Let pp be a probability density on the space Omega=Omega_(A)xxOmega_(B)={x=(x_(A),x_(B)):x_(A)in:}\Omega=\Omega_{A} \times \Omega_{B}=\left\{x=\left(x_{A}, x_{B}\right): x_{A} \in\right.{:Omega_(A),x_(B)inOmega_(B)}\left.\Omega_{A}, x_{B} \in \Omega_{B}\right\}. If the density is pp on Omega\Omega factorized, as in
then we say that the variables x_(A)x_{A} and x_(B)x_{B} are independent. If F_(A)F_{A} is an observable for x_(A)x_{A} and F_(B)F_{B} an observable for x_(B)x_{B}, then for independent random variables x_(A),x_(B)x_{A}, x_{B}, one has
and one says that AA and BB are uncorrelated.
The notion of independence can be generalized immediately to any "Cartesian product" Omega=Omega_(1)xx dots xxOmega_(N)\Omega=\Omega_{1} \times \ldots \times \Omega_{N} of probability spaces. In the case of independent identically distributed real random variables x_(i),i=1,dots,Nx_{i}, i=1, \ldots, N, there is an important theorem characterizing the limit as N rarr ooN \rightarrow \infty, which is treated in more detail in problem B.1. Basically it says that (under certain assumptions about pp ) the random variable y=(sum(x_(i)-mu))/(sqrtN)y=\frac{\sum\left(x_{i}-\mu\right)}{\sqrt{N}} has Gaussian distribution for large NN with mean 0 and spread sigma//sqrtN\sigma / \sqrt{N}. Thus, in this sense, a sum of a large number of arbitrary random variables is approximately distributed as a Gaussian random variable. This so called "Central Limit Theorem" explains, in some sense, the empirical evidence that the random variables appearing in various applications are distributed as Gaussians.
2.2 Entropy
An important quantity associated with a probability distribution is its "information entropy". Let {p_(i)}\left\{p_{i}\right\} be a probability distribution for a random variable taking values in Omega={x_(1),dots,x_(N)}\Omega=\left\{x_{1}, \ldots, x_{N}\right\}. If the probability p_(i)p_{i} for finding x_(i)x_{i} is very small, then we should be surprised that x_(i)x_{i} occurred. A measure of surprise for the event x_(i)x_{i} is
{:(2.13)" surprise at seeing event "x_(i)=log((1)/(p_(i))):}\begin{equation*}
\text { surprise at seeing event } x_{i}=\log \frac{1}{p_{i}} \tag{2.13}
\end{equation*}
because (i) the surprise is larger the smaller p_(i)p_{i} and (ii) the surprise of independent events (see above) should be additive, thus we should take a log. The average surprise is
{:(2.14)" average surprise "=(:log((1)/(p_(i))):)=-sum_(i)p_(i)log p_(i):}\begin{equation*}
\text { average surprise }=\left\langle\log \frac{1}{p_{i}}\right\rangle=-\sum_{i} p_{i} \log p_{i} \tag{2.14}
\end{equation*}
This average surprise is defined to be the "information entropy":
Definition: Let Omega={x_(1),dots,x_(N)}\Omega=\left\{x_{1}, \ldots, x_{N}\right\} and let {p_(i)}\left\{p_{i}\right\} be a probability distribution. The quan-
tity
is called information entropy. (Our convention is that 0log 0=00 \log 0=0 and log\log is the natural logarithm).
The factor k_(B)\mathrm{k}_{\mathrm{B}} is merely inserted here to be consistent with the conventions in statistical physics. In the context of computer science, it is dropped, and the natural log is replaced by the logarithm with base 2 , which is natural to use if we think of information encoded in bits. The information entropy is also defined with the opposite sign sometimes, since more entropy means less information. It can be shown that the information entropy (in computer science normalization) is roughly equal to the average (with respect to the given probability distribution) number of yes/no questions necessary to determine whether a given event has occurred (cf. problem B.4).
Maximum entropy principle: A practical application of information entropy is as follows: suppose one has an ensemble whose probability distribution {p_(i):i=1,dots,n}\left\{p_{i}: i=1, \ldots, n\right\} is not completely known. One would like to make a good guess about {p_(i)}\left\{p_{i}\right\} based on some partial information such as a finite number of moments, or other observables. Thus, suppose that F_(A)(x),A=1,2,dots,mF_{A}(x), A=1,2, \ldots, m are m < nm<n observables for which (:F_(A)(x):)=f_(A)\left\langle F_{A}(x)\right\rangle=f_{A} are known. Then a good guess, representing in some sense a minimal bias about {p_(i)}\left\{p_{i}\right\}, is to maximize S_(inf)S_{\mathrm{inf}}, subject to the constraints (:F_(A)(x):)=f_(A)\left\langle F_{A}(x)\right\rangle=f_{A}. In the case when the observables are the mean value mu\mu and variance sigma\sigma, the distribution obtained in this way is the Gaussian. So the Gaussian is, in this sense, our best guess if we only know mu\mu and sigma\sigma (cf. problem B.3).
The maximum entropy principle is typically analyzed with the help of Lagrange multipliers. For the constraint sum_(i)p_(i)=1\sum_{i} p_{i}=1 we take a multiplier mu\mu and for the other constraints sum_(i)p_(i)F_(A)(x_(i))=f_(A)\sum_{i} p_{i} F_{A}\left(x_{i}\right)=f_{A} we take multipliers lambda_(A)\lambda_{A}. Then we should first look at the unconstrained maximization problem phi(p_(1),dots,p_(n))=S(p_(1),dots,p_(n))+mu(sum_(i)p_(i)-1)+sum_(A)lambda_(A)(sum_(i)p_(i)F_(A)(x_(i))-f_(A))rarr\phi\left(p_{1}, \ldots, p_{n}\right)=S\left(p_{1}, \ldots, p_{n}\right)+\mu\left(\sum_{i} p_{i}-1\right)+\sum_{A} \lambda_{A}\left(\sum_{i} p_{i} F_{A}\left(x_{i}\right)-f_{A}\right) \rightarrow maximum
The constraints are linear in the p_(i)p_{i} and S({p_(i)})S\left(\left\{p_{i}\right\}\right) is a concave function. Thus a stationary point that is a solution to the constraints is automatically a maximum.
2.3 Examples of probability distributions
We next give some important examples of probability distributions and related models in statistical mechanics:
2.3.1 Gaussian distribution
The Gaussian distribution for one real random variable x in Omega=Rx \in \Omega=\mathbb{R} is defined by the following probability density:
We find mu=(:x:)\mu=\langle x\rangle and sigma^(2)=(:x^(2):)-(:x:)^(2)=(:x:)_(c)^(2)\sigma^{2}=\left\langle x^{2}\right\rangle-\langle x\rangle^{2}=\langle x\rangle_{c}^{2}. The higher moments are all expressible in terms of mu\mu and sigma\sigma in a systematic fashion. For example:
The generating functional for the moments is (:e^(-ikx):)=e^(-ik mu)e^(-sigma^(2)k^(2)//2)\left\langle e^{-i k x}\right\rangle=e^{-i k \mu} e^{-\sigma^{2} k^{2} / 2}. The NN-dimensional generalization of the Gaussian distribution (Omega=R^(N))\left(\Omega=\mathbb{R}^{N}\right) is expressed in terms of a "covariance matrix", CC, which is symmetric, real, with positive eigenvalues and a vector vec(mu)\vec{\mu}. It is
The first two moments are (:x_(i):)=mu_(i),(:x_(i)x_(j):)=C_(ij)+mu_(i)mu_(j)\left\langle x_{i}\right\rangle=\mu_{i},\left\langle x_{i} x_{j}\right\rangle=C_{i j}+\mu_{i} \mu_{j}, so C_(ij)C_{i j} is the generalization of the variance and mu_(i)\mu_{i} that of the mean value.
2.3.2 Binomial and Poisson distribution
The binomial distribution occurs naturally when we perform a yes/no probability experiment independently a number of times. Fix NN and let Omega={1,dots,N}\Omega=\{1, \ldots, N\}. Then the events are subsets of Omega\Omega, such as {n}\{n\}. We think of nn as the number of times an outcome AA occurs in NN trials, where 0 <= q <= 10 \leqslant q \leqslant 1 is the probability for the event AA.
The Poisson distribution is the limit of the binomial distribution for N rarr ooN \rightarrow \infty when nn is fixed and q=(alpha )/(N)q=\frac{\alpha}{N}, with alpha\alpha fixed (rare events). It is given by ( n in{0,1,2,dots}=Omegan \in\{0,1,2, \ldots\}=\Omega ):
A standard application of the Poisson distribution is radioactive decay: let q=lambda Delta tq=\lambda \Delta t the decay probability in a time interval Delta t=(T)/(N)\Delta t=\frac{T}{N}. If nn denotes the number of decays, then the probability is obtained as:
The Ising model is basically a probability distribution for spins on a lattice. For each lattice site ii (atom), there is a spin taking values sigma_(i)in{+-1}\sigma_{i} \in\{ \pm 1\}. So, an individual spin configuration CC is a collection C={sigma_(i)}C=\left\{\sigma_{i}\right\} of spin values, one for each site. In dd dimensions, the lattice is usually taken to be a volume V=[0,L]^(d)subZ^(d)V=[0, L]^{d} \subset \mathbb{Z}^{d}. The number of lattice sites is then |V|={L~|^(d)|V|=\{L\rceil^{d}, and the set of possible configurations C={sigma_(i)}C=\left\{\sigma_{i}\right\} is Omega={C}={-1,1}^(|V|)\Omega=\{C\}=\{-1,1\}^{|V|} since each spin can take precisely two values. In the Ising model, one assigns to each configuration an energy
where J,bJ, b are parameters, and where the first sum is over all lattice bonds iki k in the volume VV. The second sum is over all lattice sites in VV. The probability of a configuration is then given by the Boltzmann weight
A large coupling constant J≫1J \gg 1 energetically favors adjacent spins to be parallel and a large b≫1b \gg 1 favors spins to be preferentially up ( +1 ). The coupling bb can thus be thought of as an external magnetic field. Z=Z(V,J,b)Z=Z(V, J, b) is a normalization constant ensuring that all the probabilities add up to unity. For the computation of ZZ in dimension d=1d=1 and for b=0b=0, see problem B.5. Of particular interest in the Ising model are the mean magnetization m=|V|^(-1)sum(:sigma_(i):)m=|V|^{-1} \sum\left\langle\sigma_{i}\right\rangle and the free energy density f=^(-1)|V|^(-1)log Zf={ }^{-1}|V|^{-1} \log Z, see problem B.16. Another quantity of interest is the two-point function (:sigma_(i)sigma_(j):)\left\langle\sigma_{i} \sigma_{j}\right\rangle in the limit of large V rarrZ^(d)V \rightarrow \mathbb{Z}^{d} (called "thermodynamic limit") and a large separation between ii and jj.
2.3.4 Site percolation
Consider a finite lattice V=[0,L]^(d)subZ^(d)V=[0, L]^{d} \subset \mathbb{Z}^{d}. Now we occupy each lattice site ii randomly with a spin sigma_(i)=+1\sigma_{i}=+1 with probability qq and with spin sigma_(i)=-1\sigma_{i}=-1 with probability 1-q1-q. If we assume that these choices are independent, then the probability of a configuration C={sigma_(i)}C=\left\{\sigma_{i}\right\} of spin values is
with N_(+-)N_{ \pm}the number of + or - spins. The probability space is, as before, Omega={C}\Omega=\{C\}. The physical interpretation of such a model can be for example that sites occupied by + spins are conducting, whereas those occupied by - spins are not. A cluster is a set of sites containing only + spins such that between any two sites of the cluster there is a path containing only + spins. If we have a long cluster spanning from one side of the lattice to the opposite side, then the lattice is a conductor, otherwise it is an insulator.
Interesting observables are, for example:
n_(s)(C)n_{s}(C) : The number of clusters of size ss in CC, i.e., the number of such clusters divided by the number of all lattice sites (i.e., L^(d)L^{d} ).
chi_(i)(C)\chi_{i}(C) : This is one if ii belongs to a cluster in CC and zero otherwise.
xi(C)\xi(C) : The average over all sizes of clusters in CC, etc.
We can then form expectation values as usual, for instance
{:(2.27)(:n_(s):)=sum_(C)n_(s)(C)p(C)=" average normalized number of clusters of size "s",":}\begin{equation*}
\left\langle n_{s}\right\rangle=\sum_{C} n_{s}(C) p(C)=\text { average normalized number of clusters of size } s, \tag{2.27}
\end{equation*}
so s(:n_(s):)s\left\langle n_{s}\right\rangle is the probability that an arbitrarily chosen site belongs to a cluster of size ss. We can also consider
{:(2.28)N=(:sum_(s=1)^(oo)n_(s):)=" average normalized number of clusters. ":}\begin{equation*}
N=\left\langle\sum_{s=1}^{\infty} n_{s}\right\rangle=\text { average normalized number of clusters. } \tag{2.28}
\end{equation*}
The probability that a site occupied by a+spin\mathrm{a}+\operatorname{spin} belongs to a cluster of size ss is
{:(2.30)S=sum_(s=1)^(oo)s(s(:n_(s):))/(sum_(s^(')=1)^(oo)s^(')(:n_(s^(')):)):}\begin{equation*}
S=\sum_{s=1}^{\infty} s \frac{s\left\langle n_{s}\right\rangle}{\sum_{s^{\prime}=1}^{\infty} s^{\prime}\left\langle n_{s^{\prime}}\right\rangle} \tag{2.30}
\end{equation*}
Physically, one is interested in the question how, for example, SS depends on qq. Can the clusters become macroscopically large for sufficiently large qq ? If this is the case,
then the material is conducting, otherwise it is an insulator and the transition from one behavior to another is thought of as a "phase transition". To study this question more precisely, one likes to take L rarr ooL \rightarrow \infty and asks whether SS can diverge as qq goes to some critical value q_(**)q_{*}, i.e. when q rarrq_(**)q \rightarrow q_{*}. Near such a critical value, one expects for instance
{:(2.31)S prop(q-q_(**))^(-gamma):}\begin{equation*}
S \propto\left(q-q_{*}\right)^{-\gamma} \tag{2.31}
\end{equation*}
and calls gamma\gamma a "critical exponent". Other examples of critical exponents can also be defined. For instance, let PP be the probability that a given " + " site belongs to an infinite cluster. This number can be computed as
where the limit L rarr ooL \rightarrow \infty is again understood in the end. To see this, consider an arbitrary lattice site. It either has spin - or it has spin + and therefore belongs to some finite cluster of size ss or an infinite cluster. This means 1=1-q+sum_(s=1)^(oo)s(:n_(s):)+qP1=1-q+\sum_{s=1}^{\infty} s\left\langle n_{s}\right\rangle+q P, which gives the statement. Near the critical value q_(**)q_{*} we then likewise expect
{:(2.33)P prop{[0," for "q < q_(**)],[(q-q_(**))^(beta)," for "q >= q_(**)]:}:}P \propto \begin{cases}0 & \text { for } q<q_{*} \tag{2.33}\\ \left(q-q_{*}\right)^{\beta} & \text { for } q \geqslant q_{*}\end{cases}
The numbers gamma,beta\gamma, \beta and similar other quantities are called "critical exponents". Analogous quantities can be defined for many systems, and their determination is one of the standard problems in statistical mechanics, since, for example in this model, the appearance of a macroscopic cluster is the sign of change of a macroscopic property of the system (conductance vs. insulator in this model). The importance of the values of the exponents is that we may expect them to be independent of the precise nature of the model at the microscopic level. For instance, we would expect to get the same values if we replace a cubic lattice by some other lattice which is not too different.
Unfortunately, the determination of critical points and critical exponents is a very complicated business, and we will not have the time to introduce the methods for doing this in this lecture course. By way of an example, let us treat the trivial case d=1d=1 (chain) of the percolation model, which can be treated by elementary means. In this example, we can immediately calculate the normed cluster number (:n_(s):)\left\langle n_{s}\right\rangle : On the one hand, as we have already noted, the probability that an arbitrarily chosen site ii belongs to a cluster of size ss is given by s(:n_(s):)s\left\langle n_{s}\right\rangle. On the other hand, it equals sq^(s)(1-q)^(2)s q^{s}(1-q)^{2} because ss consecutive sites have to carry spin + (factor {:q^(s))\left.q^{s}\right), the left and right boundaries of the cluster have to be occupied by -(:}-\left(\right. factor {:(1-q)^(2))\left.(1-q)^{2}\right), and there are ss clusters of size ss
containing the chosen site ii (factor ss ). Thus, we conclude that
As long as there is no infinite cluster, we also have sum_(s=1)^(oo)s(:n_(s):)=q\sum_{s=1}^{\infty} s\left\langle n_{s}\right\rangle=q, because the probabilities that an arbitrarily chosen site belongs to a cluster of size ss add up to the probability that this site is occupied by a + spin. As a consequence, our formula for SS gives
So we read off that q_(**)=1q_{*}=1 and that gamma=1\gamma=1 for d=1d=1.
2.3.5 Random walk on a lattice
A walk omega\omega in a volume VV of a lattice as in the Ising model can be characterized by the sequence of sites omega=(x,i_(1),i_(2),dots,i_(N-1),y)\omega=\left(x, i_{1}, i_{2}, \ldots, i_{N-1}, y\right) encountered by the walker, where xx is the fixed beginning and yy the fixed endpoint. The number of sites in the walk is denoted l(omega)(=N+1l(\omega)(=N+1 in the example), and the number of self-intersections is denoted by n(omega)n(\omega). The set of walks from xx to yy is our probability space Omega_(x,y)\Omega_{x, y}, and a natural probability distribution is
Here, mu,g\mu, g are positive constants. For large mu≫1\mu \gg 1, short walks between xx and yy are favored, and for large g≫1g \gg 1, self-avoiding walks are favored. Z=Z_(x,y)(V,mu,g)Z=Z_{x, y}(V, \mu, g) is a normalization constant ensuring that the probabilities add up to unity. Of interest are e.g. the "free energy density" f=|V|^(-1)log Zf=|V|^{-1} \log Z, or the average number of steps the walk spends in a given subset S sub VS \subset V, given by (:#{S nn omega}:)\langle \#\{S \cap \omega\}\rangle.
In general, such observables are very difficult to calculate, but for g=0g=0 (unconstrained walks) there is a nice connection between ZZ and the Gaussian distribution, which is the starting point to obtain many further results. Let del_(alpha)f(i)=f(i+ vec(e)_(alpha))-f(i)\partial_{\alpha} f(i)=f\left(i+\vec{e}_{\alpha}\right)-f(i) be the "lattice partial derivative" of a function f(i)f(i) defined on the lattice sites i in Vi \in V, in the direction of the alpha\alpha-th unit vector, vec(e)_(alpha),alpha=1,dots,d\vec{e}_{\alpha}, \alpha=1, \ldots, d. Let sumdel_(alpha)^(2)=Delta\sum \partial_{\alpha}^{2}=\Delta be the "lattice
Laplacian". The lattice Laplacian can be identified with a matrix Delta_(ij)\Delta_{i j} of size |V|xx|V||V| \times|V| defined by Delta f(i)=sum_(j)Delta_(ij)f(j)\Delta f(i)=\sum_{j} \Delta_{i j} f(j). Define the covariance matrix as C=(-Delta+m^(2))^(-1)C=\left(-\Delta+m^{2}\right)^{-1} and consider the corresponding Gaussian measure for the variables {phi_(i)}inR^(|V|)\left\{\phi_{i}\right\} \in \mathbb{R}^{|V|} (one real variable per lattice site in VV ). One shows that
for g=0,mu=log(2d+m^(2))g=0, \mu=\log \left(2 d+m^{2}\right) (exercise).
2.4 Ensembles in Classical Mechanics
The basic ideas of probability theory outlined in the previous sections can be used for the statistical description of systems obeying the laws of classical mechanics. Consider a classical system of NN particles, described by 6N6 N phase space coordinates ^(1){ }^{1} which we abbreviate as
The probability distribution rho(P,Q)\rho(P, Q) represents our limited knowledge about the system which, in reality, is of course supposed to be described by a single trajectory (P(t),Q(t))(P(t), Q(t)) in phase space. In practice, we cannot know what this trajectory is precisely other than for a very small number of particles NN and, in some sense, we do not really want to know the precise trajectory at all. The idea behind ensembles is rather that the time evolution (=phase space trajectory (Q(t),P(t)))(Q(t), P(t))) typically scans the entire accessible phase space (or sufficiently large parts of it) such that the time average of FF equals the ensemble average of FF, i.e. in many cases we expect to have:
for a suitable (stationary) probability density function. This is closely related to the "ergodic theorem" which in turn is related to the fact that the equations of motion are derivable from a (time independent) Hamiltonian. Hamilton's equations are
$$
where$i=1,dots,N$and$alpha=1,2,3$.TheHamiltonian$H$istypicallyoftheformwhere $i=1, \ldots, N$ and $\alpha=1,2,3$. The Hamiltonian $H$ is typically of the form
$$
if there are no internal degrees of freedom. It is a standard theorem in classical mechanics that E=H(P,Q)E=H(P, Q) is conserved under time evolution. Let us imagine a well-potential W( vec(x))\mathcal{W}(\vec{x}) as in the following picture:
Figure 2.2: Sketch of a well-potential W\mathcal{W}.
Then Omega_(E)={(P,Q)∣H(P,Q)=E}\Omega_{E}=\{(P, Q) \mid H(P, Q)=E\} is compact for sufficiently large W_(0)\mathcal{W}_{0}. We call Omega_(E)\Omega_{E} the energy surface. The "hyper"-area of this surface is denoted by |Omega_(E)|\left|\Omega_{E}\right|, so
{:(2.44)|Omega_(E)|-=int_(Omega_(E))dS-=int_(Omega)delta(H(P","Q)-E)d^(3N)Pd^(3N)Q:}\begin{equation*}
\left|\Omega_{E}\right| \equiv \int_{\Omega_{E}} d S \equiv \int_{\Omega} \delta(H(P, Q)-E) d^{3 N} P d^{3 N} Q \tag{2.44}
\end{equation*}
Particle trajectories do not leave this surface by energy conservation. If Hamilton's equations admit other constants of motion, then it is natural to define a corresponding surface with respect to all constants of motion.
An important feature of the dynamics given by Hamilton's equations is
Liouville's Theorem: The flow map Phi_(t):(P,Q)|->(P(t),Q(t))\Phi_{t}:(P, Q) \mapsto(P(t), Q(t)) is area-preserving.
Figure 2.3: Evolution of a phase space volume under the flow map Phi_(t)\Phi_{t}.
Proof of the theorem: Let (P^('),Q^('))=(P(t),Q(t))\left(P^{\prime}, Q^{\prime}\right)=(P(t), Q(t)), such that (P(0)=P,Q(0)=Q)(P(0)=P, Q(0)=Q). Then we have
and we would like to show that J_(P,Q)(t)=1J_{P, Q}(t)=1 for all tt. Let us write the Jacobian as J_(P,Q)(t)=(del(P^('),Q^(')))/(del(P,Q))J_{P, Q}(t)=\frac{\partial\left(P^{\prime}, Q^{\prime}\right)}{\partial(P, Q)}. Since the flow evidently satisfies Phi_(t+t^('))(P,Q)=Phi_(t^('))(Phi_(t)(P,Q))\Phi_{t+t^{\prime}}(P, Q)=\Phi_{t^{\prime}}\left(\Phi_{t}(P, Q)\right), the chain rule and the properties of the Jacobian imply J_(P,Q)(t+t^('))=J_(P,Q)(t)J_(P^('),Q^('))(t^('))J_{P, Q}\left(t+t^{\prime}\right)=J_{P, Q}(t) J_{P^{\prime}, Q^{\prime}}\left(t^{\prime}\right). We now show that delJ_(P,Q)//del t(0)=0\partial J_{P, Q} / \partial t(0)=0. For small tt, we can expand as follows:
{:[J_(P,Q)(t)=(del(P^('),Q^(')))/(del(P,Q))],[=det[([1_(3N xx3N),0],[0,1_(3N xx3N)])+t([-del_(P)del_(Q)H,-del_(Q)^(2)H],[del_(P)^(2)H,del_(Q)del_(P)H])+O(t^(2))]],[=1+ttr([-del_(P)del_(Q)H,-del_(Q)^(2)H],[del_(P)^(2)H,del_(Q)del_(P)H])+O(t^(2))],[=1+t(ubrace(-sum_(alpha,i)(del^(2)H)/(delx_(i alpha)delp_(i alpha))+sum_(alpha,i)(del^(2)H)/(delp_(i alpha)delx_(i alpha))ubrace)_(=0))+O(t^(2))],[=1+O(t^(2)).]:}\begin{aligned}
J_{P, Q}(t) & =\frac{\partial\left(P^{\prime}, Q^{\prime}\right)}{\partial(P, Q)} \\
& =\operatorname{det}\left[\left(\begin{array}{cc}
\mathbb{1}_{3 N \times 3 N} & 0 \\
0 & \mathbb{1}_{3 N \times 3 N}
\end{array}\right)+t\left(\begin{array}{cc}
-\partial_{P} \partial_{Q} H & -\partial_{Q}^{2} H \\
\partial_{P}^{2} H & \partial_{Q} \partial_{P} H
\end{array}\right)+\mathcal{O}\left(t^{2}\right)\right] \\
& =1+\operatorname{ttr}\left(\begin{array}{cc}
-\partial_{P} \partial_{Q} H & -\partial_{Q}^{2} H \\
\partial_{P}^{2} H & \partial_{Q} \partial_{P} H
\end{array}\right)+\mathcal{O}\left(t^{2}\right) \\
& =1+t(\underbrace{-\sum_{\alpha, i} \frac{\partial^{2} H}{\partial x_{i \alpha} \partial p_{i \alpha}}+\sum_{\alpha, i} \frac{\partial^{2} H}{\partial p_{i \alpha} \partial x_{i \alpha}}}_{=0})+\mathcal{O}\left(t^{2}\right) \\
& =1+\mathcal{O}\left(t^{2}\right) .
\end{aligned}𝟙𝟙
(Here we used det(I+tA)=1+t tr A+O(t^(2))\operatorname{det}(I+t A)=1+t \operatorname{tr} A+\mathcal{O}\left(t^{2}\right) for any matrix A.) This implies
Together with J_(P,Q)(0)=1J_{P, Q}(0)=1, this gives the result J_(P,Q)(t)=1J_{P, Q}(t)=1 for all tt, i.e. the flow is area-preserving.
If we start with a classical ensemble, described by a probability distribution rho(P,Q)\rho(P, Q) on phase space, then its time evolution is defined as
The reason for the "-" is as follows: The probability to be at (P,Q)(P, Q) at time tt should be given by the probability for having been at Phi_(-t)(P,Q)\Phi_{-t}(P, Q) at time 0 . Note that the time evolution of observables (corresponding to the Heisenberg picture in Quantum Mechanics) is defined oppositely:
Using Liouville's theorem, one then easily checks that (:F:)_(rho_(t))=(:F_(t):)_(rho)\langle F\rangle_{\rho_{t}}=\left\langle F_{t}\right\rangle_{\rho}.
Differentiating (2.49) with respect to tt gives
where {*,*}\{\cdot, \cdot\} denotes the Poisson bracket. An ensemble is called stationary if rho_(t)\rho_{t} remains constant, i.e. delrho_(t)//del t=0\partial \rho_{t} / \partial t=0. It follows that an ensemble is stationary if and only if rho\rho Poisson-commutes with HH, that is {rho,H}=0\{\rho, H\}=0. Examples of stationary ensembles are thus the functions of HH, i.e.
where beta > 0\beta>0 is a parameter and the normalization factor Z_(beta)Z_{\beta} ensures that rho\rho is properly normalized, so Z_(beta)=inte^(-beta H)d^(3N)Pd^(3N)QZ_{\beta}=\int e^{-\beta H} d^{3 N} P d^{3 N} Q. We will come back to this ensemble below.
The flow Phi_(t)\Phi_{t} is not only area preserving on the entire phase-space, but also on the energy surface Omega_(E)\Omega_{E} (with the natural integration element understood). Such area-preserving flows under certain conditions imply that the phase space average equals the time average, cf. (2.41). This is expressed by the ergodic theorem:
Theorem: Let (P(t),Q(t))(P(t), Q(t)) be dense in Omega_(E)\Omega_{E} and FF continuous. Then the time average is equal to the ensemble average:
{:(2.54)lim_(T rarr oo)(1)/(T)int_(0)^(T)F(P(t)","Q(t))dt=(1)/(|Omega_(E)|)int_(Omega_(E))F(P","Q)dS:}\begin{equation*}
\lim _{T \rightarrow \infty} \frac{1}{T} \int_{0}^{T} F(P(t), Q(t)) d t=\frac{1}{\left|\Omega_{E}\right|} \int_{\Omega_{E}} F(P, Q) d S \tag{2.54}
\end{equation*}
The ergodic theorem may thus be summarized as
time average = ensemble average,
where the "ensemble" is the uniform distribution on Omega_(E)\Omega_{E} given by rho(P,Q)=1//|Omega_(E)|\rho(P, Q)=1 /\left|\Omega_{E}\right| (we will come back to this ensemble later in the context of " S=k_(B)log WS=k_{B} \log W ", where it is also called the "micro-canonical ensemble"). In quantum theory, there is an analogue of this phenomenon going under the name "eigenstate thermalization", which is outlined in the appendix.
The key hypothesis is that the orbit lies dense in Omega_(E)\Omega_{E} and that this surface is compact. The first is clearly not the case if there are further constants of motion, since the orbit must then lie on a submanifold of Omega_(E)\Omega_{E} corresponding to particular values of these constants. The Kolmogorov-Arnold-Moser (KAM) theorem shows that small perturbations of systems with sufficiently many constants of motion again possess such invariant submanifolds, i.e. the ergodic theorem does not hold in such cases. Nevertheless, the ergodic theorem still remains an important motivation for studying ensembles.
One puzzling consequence of Liouville's theorem is that a trajectory starting at (P_(0),Q_(0))\left(P_{0}, Q_{0}\right) comes back arbitrarily closely to that point, a phenomenon called Poincaré recurrence. An intuitive "proof" of this statement can be given as follows:
Figure 2.4: Sketch of the situation described in the proof of Poincaré recurrence.
Let B_(0)B_{0} be an epsilon\epsilon-neighborhood of a point (P_(0),Q_(0))\left(P_{0}, Q_{0}\right). For k inNk \in \mathbb{N} define B_(k):=Phi_(k)(B_(0))B_{k}:=\Phi_{k}\left(B_{0}\right), which are epsilon\epsilon-neighborhoods of (P_(k),Q_(k))=Phi_(k)((Q_(0),P_(0)))\left(P_{k}, Q_{k}\right)=\Phi_{k}\left(\left(Q_{0}, P_{0}\right)\right). Let us assume that the statement of the theorem is wrong. This yields
B_(0)nnB_(k)=O/quad AA k inNB_{0} \cap B_{k}=\varnothing \quad \forall k \in \mathbb{N}
Then it follows that
B_(n)nnB_(k)=O/quad AA n,k inN,n!=kB_{n} \cap B_{k}=\varnothing \quad \forall n, k \in \mathbb{N}, n \neq k
This clearly contradicts the assumption that Omega_(E)\Omega_{E} is compact and therefore the statement of the theorem has to be true.
Historically, the recurrence argument played an important role in early discussions of the notion of irreversibility, i.e. the fact that systems generically tend to approach an equilibrium state, whereas they never seem to spontaneously leave an equilibrium state and evolve back to the (non-equilibrium) initial conditions. To explain the origin of resp. the mechanisms behind irreversibility is one of the major challenges of non-equilibrium thermodynamics and we shall briefly come back to this point later. For the moment, we simply note that in practice the recurrence time tau_("recurrence ")\tau_{\text {recurrence }} would be extremely large compared to the natural scales of the system such as the equilibration time. We will verify this by investigating the dynamics of a toy model in the appendix. Here we only
give a heuristic explanation. Consider a gas of NN particles in a volume VV. The volume is partitioned into sub volumes V_(1),V_(2)V_{1}, V_{2} of equal size. We start the system in a state where the atoms only occupy V_(1)V_{1}. By the ergodic theorem we estimate that the fraction of time the system spends in such a state is (:chi_(Q inV_(1)):)=2^(-3N)\left\langle\chi_{Q \in V_{1}}\right\rangle=2^{-3 N} (for an ideal gas), where chi_(Q inV_(1))\chi_{Q \in V_{1}} gives 1 if all particles are in V_(1)V_{1}, and zero otherwise. For N=1molN=1 \mathrm{~mol}, i.e. N=O(10^(23))N=\mathcal{O}\left(10^{23}\right), this fraction is astronomically small. So there is no real puzzle!
One often has the situation that a system can be divided up into two (or more) parts which can be treated approximately as isolated or which have some simple well-known interaction. One way to model this situation is to suppose that the phase space is a direct product Omega=Omega_(A)xxOmega_(B)\Omega=\Omega_{A} \times \Omega_{B}, where AA and BB label the subsystems. For instance, in an NN-particle system, Omega_(A)\Omega_{A} could comprise the phase space coordinates of one (or more) distinguished particle, and Omega_(B)\Omega_{B} those of all the others. If we have a probability density (ensemble) of the total system, i.e. function rho(P,Q)\rho(P, Q) on Omega\Omega, we may decide to probe it with observables of system A only, i.e. with observables F_(A)=F_(A)(P_(A),Q_(A))F_{A}=F_{A}\left(P_{A}, Q_{A}\right) which are functions of the phase space coordinates of Omega_(A)\Omega_{A} only. It is then clear that for such an F_(A)F_{A}, we can write (with P=(P_(A),P_(B)),Q=(Q_(A),Q_(B))P=\left(P_{A}, P_{B}\right), Q=\left(Q_{A}, Q_{B}\right) )
so it is natural to make the following definition:
Definition For a phase space Omega=Omega_(A)xxOmega_(B)\Omega=\Omega_{A} \times \Omega_{B} describing two subsystems AA and BB of a given system, the reduced probability distribution for AA is defined by
The reduced probability distribution is that assigned to the system by an observer having only access to observables of system A (and similarly B). Note that rho_(A)\rho_{A} satisfies again all the axioms of a probability density (on Omega_(A)\Omega_{A} ).
2.5 Ensembles in Quantum Mechanics (Statistical Operators / Density Matrices)
Quantum mechanical systems are of an intrinsically probabilistic nature, so the language of probability theory is, in this sense, not just optional but actually essential. In fact, to
say that the system is in a state |Psi:)|\Psi\rangle really means that, if AA is a self adjoint operator and
Thus, if we assign the state |Psi:)|\Psi\rangle to the system, the set of possible measuring outcomes for AA is the probability space Omega={a_(1),a_(2),dots}\Omega=\left\{a_{1}, a_{2}, \ldots\right\} with (discrete) probability distribution given by {p_(1),p_(2),dots}\left\{p_{1}, p_{2}, \ldots\right\}.
In statistical mechanics we have incomplete information about the state of a quantum mechanical system. In particular, we do not want to prejudice ourselves by ascribing a pure state |Psi:)|\Psi\rangle to the system. Instead, we describe it by a statistical mixture i.e. an ensemble of pure states. Suppose we believe that the system is in the state |Psi_(i):)\left|\Psi_{i}\right\rangle with probability lambda_(i)\lambda_{i}, where, as usual, sumlambda_(i)=1,lambda_(i) >= 0\sum \lambda_{i}=1, \lambda_{i} \geqslant 0. For example, before preparing the state we perform a classical random experiment to determine which state |Psi_(i):)\left|\Psi_{i}\right\rangle we prepare. The states |Psi_(i):)\left|\Psi_{i}\right\rangle should be normalized, i.e. (:Psi_(i)∣Psi_(i):)=1\left\langle\Psi_{i} \mid \Psi_{i}\right\rangle=1, but they do not have to be orthogonal or complete. Then the expectation value (:A:)\langle A\rangle of an operator is defined as
Introducing the density matrix rho=sum_(i)lambda_(i)|Psi_(i):)(:Psi_(i)|\rho=\sum_{i} \lambda_{i}\left|\Psi_{i}\right\rangle\left\langle\Psi_{i}\right| this may also be written as
{:(2.60)(:A:)=tr(rho A).:}\begin{equation*}
\langle A\rangle=\operatorname{tr}(\rho A) . \tag{2.60}
\end{equation*}
The density matrix has the properties tr rho=1\operatorname{tr} \rho=1, as well as rho^(†)=rho\rho^{\dagger}=\rho. Furthermore, for any state |Phi:)|\Phi\rangle we have
The density matrix should be thought of as analogous to the classical probability distribution {p_(i)}\left\{p_{i}\right\} given by the eigenvalues p_(i)p_{i} of rho\rho (which coincide with the lambda_(i)\lambda_{i} if and only if the states |Psi_(i):)\left|\Psi_{i}\right\rangle are orthogonal).
In the context of quantum mechanical ensembles one can define a quantity that is closely analogous to the information entropy for ordinary probability distributions. This
andiscalledthevonNeumannentropyassociatedwith$rho$.Accordingtotherulesofquantummechanics,thetimeevolutionofastateisdescribedbySchrödinger^(')sequationand is called the von Neumann entropy associated with $\rho$.
According to the rules of quantum mechanics, the time evolution of a state is described by Schrödinger's equationö
{:[iℏ(d)/(dt)|Psi(t):)=H|Psi(t):)],[=>quad iℏ(d)/(dt)rho(t)=[H","rho(t)]-=H rho(t)-rho(t)H]:}\begin{aligned}
i \hbar \frac{d}{d t}|\Psi(t)\rangle & =H|\Psi(t)\rangle \\
\Rightarrow \quad i \hbar \frac{d}{d t} \rho(t) & =[H, \rho(t)] \equiv H \rho(t)-\rho(t) H
\end{aligned}
$$
Therefore an ensemble is stationary if [H,rho]=0[H, \rho]=0. In particular, rho\rho is stationary if it is of the form
where sum_(i)f(E_(i))=1\sum_{i} f\left(E_{i}\right)=1 and p_(i)=f(E_(i)) > 0p_{i}=f\left(E_{i}\right)>0 (here, E_(i)E_{i} label the eigenvalues of the Hamiltonian HH and |Psi_(i):)\left|\Psi_{i}\right\rangle its eigenstates, i.e. {:H|Psi_(i):)=E_(i)|Psi_(i):))\left.H\left|\Psi_{i}\right\rangle=E_{i}\left|\Psi_{i}\right\rangle\right). The characteristic example is given by
where Z_(beta)=sum_(i)e^(-betaE_(i))Z_{\beta}=\sum_{i} e^{-\beta E_{i}}. More generally, if {Q_(alpha)}\left\{Q_{\alpha}\right\} are operators commuting with HH, then another choice is
We will come back to discuss such ensembles below in chapter 4.
One often deals with situations in which a system is comprised of two sub-systems A and B described by Hilbert spaces H_(A),H_(B)\mathcal{H}_{A}, \mathcal{H}_{B}. The total Hilbert space is then H=H_(A)oxH_(B)\mathcal{H}=\mathcal{H}_{A} \otimes \mathcal{H}_{B} ( ox\otimes is the tensor product). If {|i:)_(A)}\left\{|i\rangle_{A}\right\} and {|j:)_(B)}\left\{|j\rangle_{B}\right\} are orthonormal bases of H_(A)\mathcal{H}_{A} and H_(B)\mathcal{H}_{B}, an orthonormal basis of H\mathcal{H} is given by {|i,j:)=|i:)_(A)ox|j:)_(B)}\left\{|i, j\rangle=|i\rangle_{A} \otimes|j\rangle_{B}\right\}.
Consider a (pure) state |Psi:)|\Psi\rangle in H\mathcal{H}, i.e. a pure state of the total system. It can be expanded as
Observables describing measurements of subsystem A consist of operators of the form tilde(a)=a ox1_(B)\tilde{a}=a \otimes \mathbb{1}_{B}𝟙, where aa is an operator on H_(A)\mathcal{H}_{A} and 1_(B)\mathbb{1}_{B}𝟙 is the identity operator on H_(B)\mathcal{H}_{B} (similarly
an observable describing a measurement of system BB corresponds to tilde(b)=1_(A)ox b\tilde{b}=1_{A} \otimes b ). For such an operator we can write:
The operator rho_(A)\rho_{A} on H_(A)\mathcal{H}_{A} by definition satisfies rho_(A)^(†)=rho_(A)\rho_{A}^{\dagger}=\rho_{A} and by (2.64), it satisfies trrho_(A)=1\operatorname{tr} \rho_{A}=1. It is also not hard to see that (:Phi|rho_(A)|Phi:) >= 0\langle\Phi| \rho_{A}|\Phi\rangle \geqslant 0. Thus, rho_(A)\rho_{A} defines a density matrix on the Hilbert space H_(A)\mathcal{H}_{A} of system A . One similarly defines rho_(B)\rho_{B} on H_(B)\mathcal{H}_{B}.
Definition: The operator rho_(A)\rho_{A} is called reduced density matrix of subsystem A , and rho_(B)\rho_{B} that of subsystem B.
The reduced density matrix reflects the limited information of an observer only having access to a subsystem. The quantity
is called the entanglement entropy of subsystem A. One shows that S_("v.N. ")(rho_(A))=S_{\text {v.N. }}\left(\rho_{A}\right)=S_("v.N. ")(rho_(B))S_{\text {v.N. }}\left(\rho_{B}\right), so it does not matter which of the two subsystems we use to define it.
Example: Let H_(A)=C^(2)=H_(B)\mathcal{H}_{A}=\mathbb{C}^{2}=\mathcal{H}_{B} with orthonormal basis {|uarr:),|darr:)}\{|\uparrow\rangle,|\downarrow\rangle\} for either system A or B. The orthonormal basis of H\mathcal{H} is then given by {|uarr uarr:),|uarr darr:),|darr uarr:),|darr darr:)}\{|\uparrow \uparrow\rangle,|\uparrow \downarrow\rangle,|\downarrow \uparrow\rangle,|\downarrow \downarrow\rangle\}.
(i) Let |Psi:)=|uarr darr:)|\Psi\rangle=|\uparrow \downarrow\rangle. Then
In order to understand the dynamical properties of systems in statistical mechanics one has to study non-stationary (i.e. time-dependent) ensembles. A key question, already brought up earlier, is whether systems initially described by a non-stationary ensemble will eventually approach an equilibrium ensemble. An important quantitative tool to understand the approach to equilibrium (e.g. in the case of thin media or weakly coupled systems) is the Boltzmann equation, which we discuss here in the case of classical mechanics.
Let rho\rho be an ensemble and rho_(t)(P,Q)=rho(P(t),Q(t))\rho_{t}(P, Q)=\rho(P(t), Q(t)) be the time evolving ensemble defined in the previous chapter. We would like to learn something about this function rho_(t)\rho_{t}. It is of course in general impossible to determine this exactly, because we cannot find the particle trajectories (P(t),Q(t))(P(t), Q(t)) for a system with a large number NN of particles. Also, even if we could, the full rho_(t)(P,Q)\rho_{t}(P, Q) as a function of 6N6 N phase space coordinates is in general way more information than we really would like to have in practice. Instead, we often only want to know the time evolution of a relatively small subset of observables. One such observable is the 1 -particle density f_(1)f_{1} which is defined by
Analogously, we define the ss-particle densities f_(s)f_{s}, for 2 < s <= N2<s \leqslant N. Note that the ss-particle densities are, up to the normalization, nothing but the reduced probability distributions obtained from rho_(t)\rho_{t} if we divide the total system into a system A consisting of ss fixed particles and a system B consisting of the remaining ones, i.e. for instance for s=1s=1
so that in particular H_(N)=HH_{N}=H. One finds the relations
{:(3.5)ubrace((delf_(s))/(del t)-{H_(s),f_(s)}ubrace)_("streaming term ")=ubrace(sum_(i=1)^(s)intd^(3)p_(s+1)d^(3)x_(s+1)(delV( vec(x)_(i)- vec(x)_(s+1)))/(del vec(x)_(i))*(delf_(s+1))/(del vec(p)_(i))ubrace)_("collision term "):}\begin{equation*}
\underbrace{\frac{\partial f_{s}}{\partial t}-\left\{H_{s}, f_{s}\right\}}_{\text {streaming term }}=\underbrace{\sum_{i=1}^{s} \int d^{3} p_{s+1} d^{3} x_{s+1} \frac{\partial \mathcal{V}\left(\vec{x}_{i}-\vec{x}_{s+1}\right)}{\partial \vec{x}_{i}} \cdot \frac{\partial f_{s+1}}{\partial \vec{p}_{i}}}_{\text {collision term }} \tag{3.5}
\end{equation*}
This system of equations is called the BBGKY hierarchy (for Bogoliubov-Born-GreenKirkwood Y\mathbf{Y} von hierarchy). The first term (s=1)(s=1) is given by
An obvious feature of the BBGKY hierarchy is that the equation for f_(1)f_{1} involves f_(2)f_{2}, that for f_(2)f_{2} involves f_(3)f_{3}, etc. In this sense the equations for the individual f_(i)f_{i} are not closed. To get a manageable system, some approximations/truncations are necessary.
The simplest truncation, which leads to the Boltzmann equation, is to assume uncorrelated densities, i.e.
where in the first line we took the different proportionality factors of reduced density matrices and the f_(s)f_{s} into account. With this assumption, the equation (3.6) closes. In general, this assumption is inconsistent with the dynamical equation for f_(2)f_{2}, i.e., it is not preserved under time evolution. However, in a certain limit in which N rarr ooN \rightarrow \infty and the interaction range d rarr0d \rightarrow 0, one can prove the consistency of this truncation. To discuss conditions under which the assumption is approximately valid, one introduces several time scales which have to be sufficiently separated:
(i) Let vv be the typical velocity of gas particles (e.g. v~~100((m))/((s))v \approx 100 \frac{\mathrm{~m}}{\mathrm{~s}} at room temperature and 1atm) and let LL be the scale over which W( vec(x))\mathcal{W}(\vec{x}) varies, i.e. the box size. Then tau_(v):=(L)/(v)\tau_{v}:=\frac{L}{v} is the extrinsic scale (e.g. tau_(v)~~10^(-5)\tau_{v} \approx 10^{-5} s for L~~1mmL \approx 1 \mathrm{~mm} ).
(ii) If dd is the range of the interaction V( vec(x))\mathcal{V}(\vec{x}) (e.g. d~~10^(-10)md \approx 10^{-10} \mathrm{~m} ), then tau_(c):=(d)/(v)\tau_{c}:=\frac{d}{v} is the collision time (e.g. tau_(c)~~10^(-12)s\tau_{c} \approx 10^{-12} \mathrm{~s} ). We should have tau_(c)≪tau_(v)\tau_{c} \ll \tau_{v}.
(iii) We can also define the mean free time tau_(x)~~(tau_(c))/(nd^(3))~~(1)/(nvd^(2)),n=(N)/(V)\tau_{x} \approx \frac{\tau_{c}}{n d^{3}} \approx \frac{1}{n v d^{2}}, n=\frac{N}{V}, which is the average time between subsequent collisions. We have tau_(x)~~10^(-8)S≫tau_(c)\tau_{x} \approx 10^{-8} \mathrm{~S} \gg \tau_{c} in our example.
For (a) and (b) to hold, we should have tau_(v)≫tau_(x)≫tau_(c)\tau_{v} \gg \tau_{x} \gg \tau_{c}.
As by the assumptions on the scales, a particle moves freely between two encounters, we can approximate the interaction of two particles as a scattering process, described by the differential cross section (d sigma)/(d Sigma)\frac{d \sigma}{d \Sigma}, as indicated in the following sketch:
Figure 3.1: Classical scattering of particles in the "fixed target frame".
Here d sigma=bdbd phid \sigma=b d b d \phi is the infinitesimal area element shaded in grey, through which a flux of particles passes, and d Omegad \Omega is the infinitesimal area element on the unit sphere, indicating into which direction these particles are scattered. In other words, |(d sigma)/(d Omega)|\left|\frac{d \sigma}{d \Omega}\right| is the Jacobian between vec(b)\vec{b} and hat(Omega)=(theta,phi)\hat{\Omega}=(\theta, \phi). Hence, if F=(#" of incoming particles ")/(" area "*" time ")F=\frac{\# \text { of incoming particles }}{\text { area } \cdot \text { time }} is the incoming flux of particles, then |(d sigma)/(d Omega)(Omega)|Fd Omega\left|\frac{d \sigma}{d \Omega}(\Omega)\right| F d \Omega is the number rate of particles being scattered into the area element d Omegad \Omega.
If, instead of the fixed target frame indicated in the figure, one considers scattering in the center of mass frame and denotes vec(p)= vec(p)_(1)- vec(p)_(2)\vec{p}=\vec{p}_{1}-\vec{p}_{2} and vec(p)^(')= vec(p)_(1)^(')- vec(p)_(2)^(')\vec{p}^{\prime}=\vec{p}_{1}^{\prime}-\vec{p}_{2}^{\prime} as the relative momentum before and after the scattering (where we assume elastic collision, i.e., | vec(p)|=|\vec{p}|= | vec(p)|)|\vec{p}|), one can then arrive at the Boltzmann equation
where Omega=(theta,phi)\Omega=(\theta, \phi) is the solid angle between vec(p)\vec{p} and vec(p)\vec{p}, and d^(2)Omega=sin theta d theta d phid^{2} \Omega=\sin \theta d \theta d \phi. The integral expression on the right side of the Boltzmann equation (3.8) is called the collision operator, and is often denoted as C[f_(1)](t, vec(p)_(1), vec(x)_(1))C\left[f_{1}\right]\left(t, \vec{p}_{1}, \vec{x}_{1}\right). It represents the change in the 1-1- particle distribution due to collisions of particles. The two terms in the brackets [...] under the integral in (3.8) can be viewed as taking into account that new particles with momentum vec(p)_(1)\vec{p}_{1} can be created or be lost, respectively, when momentum is transferred from other particles in a collision process.
It is important to know whether f_(1)( vec(p), vec(x);t)f_{1}(\vec{p}, \vec{x} ; t) is stationary, i.e. time-independent. Intuitively, this should be the case when the collision term C[f_(1)]C\left[f_{1}\right] vanishes. This in turn should happen if
As we will now see, one can derive the functional form of the 1-particle density from this condition. Taking the logarithm on both sides of (3.9) gives, with F_(1)=log f_(1)F_{1}=\log f_{1} etc.,
whence FF must be a conserved quantity, i.e. either we have F=beta( vec(p)^(2))/(2m)F=\beta \frac{\vec{p}^{2}}{2 m} or F= vec(alpha)* vec(p)F=\vec{\alpha} \cdot \vec{p} or F=gammaF=\gamma. It follows, after renaming constants, that
In principle c,beta, vec(p)_(0)c, \beta, \vec{p}_{0} could be functions of vec(x)\vec{x} and tt at this stage, but then the left hand side of the Boltzmann equation does not vanish in general. So (3.11) represents the general stationary homogeneous solution to the Boltzmann equation. It is known as the Maxwell-Boltzmann distribution. The proper normalization is, from intf_(1)d^(3)pd^(3)x=\int f_{1} d^{3} p d^{3} x= N,
The mean kinetic energy is found to be (:( vec(p)^(2))/(2m):)=(3)/(2beta)\left\langle\frac{\vec{p}^{2}}{2 m}\right\rangle=\frac{3}{2 \beta}, so beta=(1)/(K_(B)T)\beta=\frac{1}{\mathrm{~K}_{\mathrm{B}} T} is identified with the inverse temperature of the gas.
This interpretation of beta\beta is reinforced by considering a gas of NN particles confined to a
box of volume VV. The pressure of the gas results from a force KK acting on a wall element of area AA, as depicted in the figure below. The force is equal to:
{:[K=(1)/(Delta t)int^(3)p*# obrace(((" particles impacting A ")/(" during "Delta t" with momenta between "( vec(p))" and "( vec(p))+d( vec(p)))))^((f_(1)(( vec(p)))d^(3)p)*(Av_(x)Delta t))xx obrace(((" momentum transfer ")/(" in "x-" direction ")))^(2p_(x))],[=(1)/(Delta t)int_(0)^(oo)dp_(x)int_(-oo)^(oo)dp_(y)int_(-oo)^(oo)dp_(z)f_(1)( vec(p))(Av_(x)Delta t)*(2p_(x)).]:}\begin{aligned}
K & =\frac{1}{\Delta t} \int^{3} p \cdot \# \overbrace{\binom{\text { particles impacting A }}{\text { during } \Delta t \text { with momenta between } \vec{p} \text { and } \vec{p}+d \vec{p}}}^{\left(f_{1}(\vec{p}) d^{3} p\right) \cdot\left(A v_{x} \Delta t\right)} \times \overbrace{\binom{\text { momentum transfer }}{\text { in } x-\text { direction }}}^{2 p_{x}} \\
& =\frac{1}{\Delta t} \int_{0}^{\infty} d p_{x} \int_{-\infty}^{\infty} d p_{y} \int_{-\infty}^{\infty} d p_{z} f_{1}(\vec{p})\left(A v_{x} \Delta t\right) \cdot\left(2 p_{x}\right) .
\end{aligned}
Note, that the first integral is just over half of the range of p_(x)p_{x}, which is due to the fact that only particles moving in the direction of the wall will hit it.
Together with (3.11) it follows that the pressure PP is given by
Comparing with the equation of state for an ideal gas, PV=Nk_(B)TP V=N \mathrm{k}_{\mathrm{B}} T, we get beta=(1)/(k_(B)T)\beta=\frac{1}{\mathrm{k}_{\mathrm{B}} T}.
Figure 3.2: Pressure on the walls due to the impact of particles.
It is noteworthy that, in the presence of external forces, other solutions representing equilibrium (but with a non-vanishing collision term) should also be possible. One only has to think of the following situation, representing a stationary air flow across a wing:
Figure 3.3: Sketch of the air-flow across a wing.
In this case we have to deal with a much more complicated f_(1)f_{1}, not equal to the Maxwell-Boltzmann distribution. As the example of an air-flow suggests, the Boltzmann equation is also closely related to other equations for fluids such as the Euler- or NavierStokes equation, which can be seen to arise as approximations of the Boltzmann equation.
The Boltzmann equation can easily be generalized to a gas consisting of several species alpha,beta,dots\alpha, \beta, \ldots which are interacting via the 2 -body potentials V_(alpha,beta)( vec(x)(alpha)- vec(x)(beta))\mathcal{V}_{\alpha, \beta}(\vec{x}(\alpha)-\vec{x}(\beta)). As before, we can define the 1-particle density f_(1)^((alpha))( vec(p), vec(x),t)f_{1}^{(\alpha)}(\vec{p}, \vec{x}, t) for each species. The same derivation leading to the Boltzmann equation now gives the system of equations
This system of equations has great importance in practice e.g. for the evolution of the abundances of different particle species in the early universe. In this case
are homogeneous distributions and the external force vec(F)\vec{F} on the left hand side of equations (3.14) is related to the expansion of the universe.
i.e. we have the same temperature TT and average velocity vec(v)_(0)\vec{v}_{0} for all alpha\alpha. In the context of the early universe it is essential to study deviations from equilibrium in order to explain the observed abundances.
By contrast to the original system of equations (Hamilton's equations or the BBGKY hierarchy), the Boltzmann equation is irreversible. This can be seen for example by introducing the function
a result which is known as the H\mathbf{H}-theorem. We just showed this equality holds if and only if f_(1)f_{1} is given by the Maxwell-Boltzmann distribution. Thus, we conclude that h(t)h(t) is an increasing function, as long as f_(1)f_{1} is not equal to the Maxwell-Boltzmann distribution. In particular, the evolution of f_(1)f_{1}, as described by the Boltzmann equation, is irreversible. Since the Boltzmann equation is only an approximation to the full BBGKY hierarchy, which is reversible, there is no mathematical inconsistency. However, it is not clear, a priori, at which stage of the derivation the irreversibility has been allowed to enter. Looking at the approximations (a) and (b) made above, it is clear that the assumption that the 2 -particle correlations f_(2)f_{2} are factorized, as in (b), cannot be exactly true, since the outgoing momenta of the particles are correlated. Although this correlation is extremely small after several collisions, it is not exactly zero. Our decision to neglect it can be viewed as one reason for the emergence of irreversibility on a macroscopic scale.
The close analogy between the definition of the Boltzmann HH-function and the information entropy S_("inf ")S_{\text {inf }}, as defined in (2.15), together with the monotonicity of h(t)h(t) suggest that hh should represent some sort of entropy of the system. The HH-theorem is then viewed as a "derivation" of the 2^("nd ")2^{\text {nd }} law of thermodynamics (see Chapter 6). However, this point of view is not entirely correct, since h(t)h(t) only depends on the 1-particle density f_(1)f_{1} and not on the higher particle densities f_(s)f_{s}, which in general should also contribute to the entropy. It is not clear how an entropy with sensible properties has to be defined in a completely general situation, in particular when the above approximations (a) and (b) are not justified.
3.2 Boltzmann Equation, Approach to Equilibrium in Quantum Mechanics
A version of the Boltzmann equation and the HH-theorem can also be derived in the quantum mechanical context. The main difference to the classical case is a somewhat modified collision term: the classical differential cross section is replaced by the quantum mechanical differential cross section (in the Born approximation) and the combination
is somewhat changed in order to accommodate Bose-Einstein resp. Fermi-Dirac statistics (see section 5.1 for an explanation of these terms). This then leads to the corresponding equilibrium distributions in the stationary case. Starting from the quantum Boltzmann equation, one can again derive a corresponding HH-theorem. Rather than explaining the details, we give a simplified "derivation" of the HH-theorem, which also will allow us to introduce a simple minded but very useful approximation of the dynamics of probabilities, discussed in more detail in the Appendix.
The basic idea is to ascribe the approach to equilibrium to an incomplete knowledge of the true dynamics due to perturbations. The true Hamiltonian is written as
where H_(1)H_{1} is a tiny perturbation over which we do not have control. For simplicity, we assume that the spectrum of the unperturbed Hamiltonian H_(0)H_{0} is discrete and we write H_(0)|n:)=E_(n)|n:)H_{0}|n\rangle=E_{n}|n\rangle. For a typical eigenstate |n:)|n\rangle we then have
Let p_(n)p_{n} be the probability that the system is in the state |n:)|n\rangle, i.e. we ascribe to the system the density matrix rho=sum_(n)p_(n)|n:)(:n|\rho=\sum_{n} p_{n}|n\rangle\langle n|. For generic perturbations H_(1)H_{1}, this ensemble is not stationary with respect to the true dynamics because [rho,H]!=0[\rho, H] \neq 0. Consequently, the von Neumann entropy S_(v.N". of ")rho(t)=e^(itH)rhoe^(-itH)S_{\mathrm{v} . \mathrm{N} \text {. of }} \rho(t)=e^{i t H} \rho e^{-i t H} depends upon time. We define this to be the HH-function
Next, we approximate the dynamics by imagining that our perturbation H_(1)H_{1} will cause jumps from state |n:)|n\rangle to state |m:)|m\rangle leading to time-dependent probabilities as described
by the master equation ^(1){ }^{1}
where T_(nm)T_{n m} is the transition rate of going from state |n:)|n\rangle to state |m:)|m\rangle. Thus, the approximated, time-dependent density matrix is rho(t)=sum_(n)p_(n)(t)|n:)(:n|\rho(t)=\sum_{n} p_{n}(t)|n\rangle\langle n|, with p_(n)(t)p_{n}(t) obeying the master equation. By the latter,
The latter inequality follows from the fact that both terms in parentheses [...] have the same sign, just as in the proof of the classical HH-theorem (problem B.11. Note that if we had defined h(t)h(t) as the von Neumann entropy, using a density matrix rho\rho that is diagonal in an eigenbasis of the full Hamiltonian HH (rather than the unperturbed Hamiltonian), then we would have obtained [rho,H]=0[\rho, H]=0 and consequently rho(t)=rho\rho(t)=\rho, i.e. a constant h(t)h(t). Thus, in this approach, the HH-theorem is viewed as a consequence of our partial ignorance about the system, which prompts us to ascribe to it a density matrix rho(t)\rho(t) which is diagonal with respect to H_(0)H_{0}. In order to justify working with a density matrix rho\rho that is diagonal with respect to H_(0)H_{0} (and therefore also in order to explain the approach to equilibrium), one may argue very roughly as follows: suppose that we start with a system in a state |Psi:)=sum_(n)gamma_(n)|n:)|\Psi\rangle=\sum_{n} \gamma_{n}|n\rangle that is not an eigenstate of the true Hamiltonian HH. Let us write
|Psi(t):)=sum_(n)gamma_(n)(t)e^((-iE_(n)t)/(ℏ))|n:)-=e^(-iHt)|Psi:).|\Psi(t)\rangle=\sum_{n} \gamma_{n}(t) e^{\frac{-i E_{n} t}{\hbar}}|n\rangle \equiv e^{-i H t}|\Psi\rangle .
for the time evolved state. If there is no perturbation, i.e. H_(1)=0H_{1}=0, we get
{:(3.25)(:n|rho(T)|m:)=(1)/(T)int_(0)^(T)gamma_(n)(t) bar(gamma_(m)(t))e^((it(E_(n)-E_(m)))/(ℏ))dt:}\begin{equation*}
\langle n| \rho(T)|m\rangle=\frac{1}{T} \int_{0}^{T} \gamma_{n}(t) \overline{\gamma_{m}(t)} e^{\frac{i t\left(E_{n}-E_{m}\right)}{\hbar}} d t \tag{3.25}
\end{equation*}
$$
For T rarr ooT \rightarrow \infty the oscillating phase factor e^(it(E_(n)-E_(m)))e^{i t\left(E_{n}-E_{m}\right)} is expected to cause the integral to vanish for E_(n)!=E_(m)E_{n} \neq E_{m} (destructive interference). Thus, we expect that (:n|rho(T)|m:)rarr_("T rarr oo")^("longrightarrow")\langle n| \rho(T)|m\rangle \xrightarrow[T \rightarrow \infty]{\longrightarrow}p_(n)delta_(n,m)p_{n} \delta_{n, m}. It follows that
where the density matrix rho\rho is rho=sum_(n)p_(n)|n:)(:n|\rho=\sum_{n} p_{n}|n\rangle\langle n|. Since [rho,H_(0)]=0\left[\rho, H_{0}\right]=0, the ensemble described by rho\rho is stationary with respect to H_(0)H_{0}. The plausibility of this "derivation" rests on the basic assumption that, while (:n|H_(1)|n:)\langle n| H_{1}|n\rangle is ≪E_(n)\ll E_{n}, it can be large compared to DeltaE_(n)=E_(n)-\Delta E_{n}=E_{n}-E_(n+1)=O(e^(-N))E_{n+1}=\mathcal{O}\left(e^{-N}\right) (where NN is the particle number) and can therefore induce transitions causing the system to equilibrate.
Chapter 4
Equilibrium Ensembles
4.1 Generalities
In the probabilistic description of a system with a large number of constituents one considers probability distributions (=ensembles) rho(P,Q)\rho(P, Q) on phase space, rather than individual trajectories. In the previous section, we have given various arguments leading to the expectation that the time evolution of an ensemble will generally lead to an equilibrium ensemble. The study of such ensembles is the subject of equilibrium statistical mechanics. Standard equilibrium ensembles are:
(a) Micro-canonical ensemble (section 4.2).
(b) Canonical ensemble (section 4.3).
(c) Grand canonical (Gibbs) ensemble (section 4.4).
4.2 Micro-Canonical Ensemble
4.2.1 Micro-Canonical Ensemble in Classical Mechanics
Recall that in classical mechanics the phase space Omega\Omega of a system consisting of NN particles without internal degrees of freedom is given by
where HH denotes the Hamiltonian of the system. In the micro-canonical ensemble each point of Omega_(E)\Omega_{E} is considered to be equally likely. In order to write down the corresponding
ensemble, i.e. the density function rho(P,Q)\rho(P, Q), we define the invariant volume |Omega_(E)|\left|\Omega_{E}\right| of Omega_(E)\Omega_{E} by
{:(4.3)|Omega_(E)|:=lim_(Delta E rarr0)(1)/(Delta E)int_(E-Delta E <= H(P,Q) <= E)d^(3N)Pd^(3N)Q:}\begin{equation*}
\left|\Omega_{E}\right|:=\lim _{\Delta E \rightarrow 0} \frac{1}{\Delta E} \int_{E-\Delta E \leqslant H(P, Q) \leqslant E} d^{3 N} P d^{3 N} Q \tag{4.3}
\end{equation*}
which can also be expressed as
{:(4.4)|Omega_(E)|=(del Phi(E))/(del E)","quad" with "quad Phi(E)=int_(H(P,Q) <= E)d^(3N)Pd^(3N)Q:}\begin{equation*}
\left|\Omega_{E}\right|=\frac{\partial \Phi(E)}{\partial E}, \quad \text { with } \quad \Phi(E)=\int_{H(P, Q) \leqslant E} d^{3 N} P d^{3 N} Q \tag{4.4}
\end{equation*}
Thus, we can write the probability density of the micro-canonical ensemble as
To avoid subtleties coming from the delta\delta-function for sharp energy one sometimes replaces this expression by
{:(4.6)rho(P","Q)=(1)/(|{E-Delta E <= H(P,Q) <= E}|)*{[1","," if "H(P","Q)in(E-Delta E","E)],[0","," if "H(P","Q)!in(E-Delta E","E)]:}:}\rho(P, Q)=\frac{1}{|\{E-\Delta E \leqslant H(P, Q) \leqslant E\}|} \cdot \begin{cases}1, & \text { if } H(P, Q) \in(E-\Delta E, E) \tag{4.6}\\ 0, & \text { if } H(P, Q) \notin(E-\Delta E, E)\end{cases}
Strictly speaking, this depends not only on EE but also on Delta E\Delta E. But in typical cases |Omega_(E)|\left|\Omega_{E}\right| depends exponentially on EE, so there is practically no difference between these two expressions for rho(P,Q)\rho(P, Q) as long as Delta E≲E\Delta E \lesssim E. We may alternatively write the second definition as:
As we have already said, in typical cases, changing S(E)S(E) in this definition to k_(B)log|Omega_(E)|\mathrm{k}_{\mathrm{B}} \log \left|\Omega_{E}\right| will not significantly change the result. It is not hard to see that we may equivalently
write in either case
i.e. Boltzmann's definition of entropy coincides with the definition of the information entropy (2.15) of the microcanonical ensemble rho\rho. As defined, SS is a function of EE and implicitly V,NV, N, since these enter the definition of the Hamiltonian and phase space. Sometimes one also specifies other constants of motion or parameters of the system other than EE when defining SS. Denoting these constants collectively as {I_(alpha)}\left\{I_{\alpha}\right\}, one defines WW accordingly with respect to EE and {I_(alpha)}\left\{I_{\alpha}\right\} by replacing the energy surface with:
In this case S=S(E,{I_(alpha)},N,V)S=S\left(E,\left\{I_{\alpha}\right\}, N, V\right) becomes a function of more variables.
Example:
The ideal gas of NN particles in a box has the Hamiltonian H=sum_(i=1)^(N)(( vec(p)^(2))/(2m)+W( vec(x)_(i)))H=\sum_{i=1}^{N}\left(\frac{\vec{p}^{2}}{2 m}+\mathcal{W}\left(\vec{x}_{i}\right)\right), where the external potential W\mathcal{W} represents the walls of a box of volume VV. For a box with hard walls we take, for example,
{:(4.19)E=(3)/(2)Nk_(B)T.:}\begin{equation*}
E=\frac{3}{2} N \mathrm{k}_{\mathrm{B}} T . \tag{4.19}
\end{equation*}
This formula states that for the ideal gas we have the equidistribution law
{:(4.20)(" average energy ")/(" degree of freedom ")=(1)/(2)k_(B)T". ":}\begin{equation*}
\frac{\text { average energy }}{\text { degree of freedom }}=\frac{1}{2} \mathrm{k}_{\mathrm{B}} T \text {. } \tag{4.20}
\end{equation*}
One can similarly verify that the abstract definition of PP in (4.17) above gives
{:(4.21)PV=k_(B)NT",":}\begin{equation*}
P V=\mathrm{k}_{\mathrm{B}} N T, \tag{4.21}
\end{equation*}
which is the familiar "equation of state" for an ideal gas.
In order to further motivate the second relation in (4.17), we consider a system comprised of a piston applied to an enclosed gas chamber:
Figure 4.1: Gas in a piston maintained at pressure PP.
Here, we obviously have PV=mgzP V=m g z. The total energy is obtained as
{:(4.22)H_("total ")=H_(gas)(P","Q)+H_(piston)(p","z)=H_(gas)(P","Q)+ubrace(p^(2)//2mubrace)_("=kin. energy of piston ")+ obrace(mgz)^("=pot. energy of piston ")",":}\begin{equation*}
H_{\text {total }}=H_{\mathrm{gas}}(P, Q)+H_{\mathrm{piston}}(p, z)=H_{\mathrm{gas}}(P, Q)+\underbrace{p^{2} / 2 m}_{\text {=kin. energy of piston }}+\overbrace{m g z}^{\text {=pot. energy of piston }}, \tag{4.22}
\end{equation*}
where mm is the mass of the piston (in a moment, we will let m rarr oom \rightarrow \infty and at the same time g rarr0)g \rightarrow 0). Next, we calculate the reduced probability distribution of the piston, assuming that the total system (piston-gas system) is in the micro canonical ensemble:
using that the force F=mgF=m g is also equal to F=P//AF=P / A, where AA is the area of the piston (so that V=AzV=A z is the volume occupied by the gas, hence mgz=PVm g z=P V ). Now we let m rarr oom \rightarrow \infty keeping the force F=mgF=m g acting on the piston constant. Then the dependence on pp clearly drops out. Let us calculate for which zz the probability rho_("piston ")(z)-=rho_("piston ")(z,p)\rho_{\text {piston }}(z) \equiv \rho_{\text {piston }}(z, p) of finding the piston at position zz is maximized. This happens when
The quantity E_("total ")=E_("gas ")+PVE_{\text {total }}=E_{\text {gas }}+P V is also called the enthalpy.
It is instructive to compare the definition of the temperature in (4.17) with the parameter beta\beta that arose in the Boltzmann-Maxwell distribution (3.11), which we also interpreted as temperature there. We first ask the following question: What is the probability for finding particle number 1 having momentum lying between vec(p)_(1)\vec{p}_{1} and vec(p)_(1)+\vec{p}_{1}+d vec(p)_(1)d \vec{p}_{1} ? The answer is: W( vec(p)_(1))d^(3)p_(1)W\left(\vec{p}_{1}\right) d^{3} p_{1}, where W( vec(p)_(1))W\left(\vec{p}_{1}\right) is given by
This is of course nothing but the reduced probability distribution where system A consists of the the phase space coordinates vec(p)_(1)\vec{p}_{1}, and system B of all other coordinates, ( vec(x)_(1), vec(p)_(2), vec(x)_(2), vec(p)_(3), vec(x)_(3),dots)\left(\vec{x}_{1}, \vec{p}_{2}, \vec{x}_{2}, \vec{p}_{3}, \vec{x}_{3}, \ldots\right). We wish to calculate this for the ideal gas. To this end we introduce the Hamiltonian H^(')H^{\prime} and the kinetic energy E^(')E^{\prime} for the remaining atoms:
(((3N)/(2)+a)!)/(((3N)/(2)+b)!)~~((3N)/(2))^(a-b),quad" for "a,b≪(3N)/(2)\frac{\left(\frac{3 N}{2}+a\right)!}{\left(\frac{3 N}{2}+b\right)!} \approx\left(\frac{3 N}{2}\right)^{a-b}, \quad \text { for } a, b \ll \frac{3 N}{2}
we see that for a sufficiently large number of particles (e.g. N=O(10^(23))N=\mathcal{O}\left(10^{23}\right) )
{:(4.29)W( vec(p)_(1))~~((3N)/(4pi mE))^((3)/(2))(1-( vec(p)_(1)^(2))/(2mE))^((3N)/(2)-(5)/(2)):}\begin{equation*}
W\left(\vec{p}_{1}\right) \approx\left(\frac{3 N}{4 \pi m E}\right)^{\frac{3}{2}}\left(1-\frac{\vec{p}_{1}^{2}}{2 m E}\right)^{\frac{3 N}{2}-\frac{5}{2}} \tag{4.29}
\end{equation*}
which confirms our interpretation of beta\beta as beta=(1)/(k_(B)T)\beta=\frac{1}{\mathrm{k}_{\mathrm{B}} T}.
We can also confirm the interpretation of beta\beta by the following consideration: consider two initially isolated systems and put them in thermal contact. The resulting joint probability distribution is given by
For typical systems, the integrand is very sharply peaked at the maximum (E_(1)^(**),E_(2)^(**))\left(E_{1}^{*}, E_{2}^{*}\right), as depicted in the following figure:
Figure 4.2: The joint number of states for two systems in thermal contact.
At the maximum we have (delS_(1))/(del E)(E_(1)^(**))=(delS_(2))/(del E)(E_(2)^(**))\frac{\partial S_{1}}{\partial E}\left(E_{1}^{*}\right)=\frac{\partial S_{2}}{\partial E}\left(E_{2}^{*}\right) from which we get the relation:
{:(4.34)(1)/(T_(1))=(1)/(T_(2))=(1)/(T)quad" (uniformity of temperature). ":}\begin{equation*}
\frac{1}{T_{1}}=\frac{1}{T_{2}}=\frac{1}{T} \quad \text { (uniformity of temperature). } \tag{4.34}
\end{equation*}
Since one expects the function to be very sharply peaked at (E_(1)^(**),E_(2)^(**))\left(E_{1}^{*}, E_{2}^{*}\right), the integral in (4.33) can be approximated by
which means that the entropy is (approximately) additive. Note that from the condition of (E_(1)^(**),E_(2)^(**))\left(E_{1}^{*}, E_{2}^{*}\right) being a genuine maximum (not just a stationary point), one gets the important stability condition
implying (del^(2)S)/(delE^(2)) <= 0\frac{\partial^{2} S}{\partial E^{2}} \leqslant 0 if applied to two copies of the same system. We can apply the same considerations if SS depends on additional parameters, such as other constants of motion. Denoting the parameters collectively as X=(X_(1),dots,X_(n))X=\left(X_{1}, \ldots, X_{n}\right), the stability condition becomes
for any choice of displacements v_(i)v_{i} (negativity of the Hessian matrix). Thus, in this case, SS is a concave function of its arguments. Otherwise, if the Hessian matrix has a positive eigenvalue e.g. in the ii-th coordinate direction, then the corresponding displacement v_(i)v_{i} will drive the system to an inhomogeneous state, i.e. one where the quantity X_(i)X_{i} takes different values in different parts of the system (different phases).
4.2.2 Microcanonical Ensemble in Quantum Mechanics
Let HH be the Hamiltonian of a system with eigenstates |n:)|n\rangle and eigenvalues E_(n)E_{n}, i.e. H|n:)=E_(n)|n:)H|n\rangle=E_{n}|n\rangle, and consider the density matrix
{:(4.37)rho=(1)/(W)sum_(n:E-Delta E <= E_(n) <= E)|n:)(:n|",":}\begin{equation*}
\rho=\frac{1}{W} \sum_{n: E-\Delta E \leqslant E_{n} \leqslant E}|n\rangle\langle n|, \tag{4.37}
\end{equation*}
where the normalization constant WW is chosen such that tr rho=1\operatorname{tr} \rho=1. The density matrix rho\rho is analogous to the distribution function rho(P,Q)\rho(P, Q) in the classical microcanonical ensemble, eq. (4.6), since it effectively amounts to giving equal probability to all eigenstates with energies lying between EE and E-Delta EE-\Delta E. By analogy with the classical case we get
{:(4.38)W=" number of states between "E-Delta E" and "E",":}\begin{equation*}
W=\text { number of states between } E-\Delta E \text { and } E, \tag{4.38}
\end{equation*}
and we define the corresponding entropy S(E)S(E) again by
Since W(E)W(E) is equal to the number of states with energies lying between E-Delta EE-\Delta E and EE, it also depends, strictly speaking, on Delta E\Delta E. But for Delta E≲E\Delta E \lesssim E and large NN, this dependency can be neglected. Note that
{:[S_(v.N.)(rho)=-k_(B)tr(rho log rho)=-k_(B)*sum_(n:E-Delta E <= E_(n) <= E)quad(1)/(W)log((1)/(W))],[=k_(B)log W*(1)/(W)quadsum_(n:E-Delta E <= E_(n) <= E)],[=k_(B)log W]:}\begin{aligned}
S_{\mathrm{v} . \mathrm{N} .}(\rho) & =-\mathrm{k}_{\mathrm{B}} \operatorname{tr}(\rho \log \rho)=-\mathrm{k}_{\mathrm{B}} \cdot \sum_{n: E-\Delta E \leqslant E_{n} \leqslant E} \quad \frac{1}{W} \log \frac{1}{W} \\
& =\mathrm{k}_{\mathrm{B}} \log W \cdot \frac{1}{W} \quad \sum_{n: E-\Delta E \leqslant E_{n} \leqslant E} \\
& =\mathrm{k}_{\mathrm{B}} \log W
\end{aligned}
so S=k_(B)log WS=\mathrm{k}_{\mathrm{B}} \log W is equal to the von Neumann entropy for the statistical operator rho\rho, defined in (4.37) above.
Example: Free atom in a cube We consider a free particle (N=1)(N=1) in a cube of side lengths (L_(x),L_(y),L_(z))\left(L_{x}, L_{y}, L_{z}\right). The Hamiltonian is given by H=(1)/(2m)(p_(x)^(2)+p_(y)^(2)+p_(z)^(2))H=\frac{1}{2 m}\left(p_{x}^{2}+p_{y}^{2}+p_{z}^{2}\right). We impose boundary conditions such that the normalized wave function Psi\Psi vanishes at the boundary of the cube. This yields the eigenstates
where k_(x)=(pin_(x))/(L_(x)),dotsk_{x}=\frac{\pi n_{x}}{L_{x}}, \ldots, with n_(x)=1,2,3,dotsn_{x}=1,2,3, \ldots
The corresponding energy eigenvalues are given by
since p_(x)=(ℏ)/(i)(del)/(del x)p_{x}=\frac{\hbar}{i} \frac{\partial}{\partial x}, etc. Recall that WW was defined by
W=" number of states "|n_(x),n_(y),n_(z):)" with "E-Delta E <= E_(n) <= EW=\text { number of states }\left|n_{x}, n_{y}, n_{z}\right\rangle \text { with } E-\Delta E \leqslant E_{n} \leqslant E
The following figure gives a sketch of this situation (with k_(z)=0k_{z}=0 ):
Figure 4.3: Number of states with energies lying between E-Delta EE-\Delta E and EE.
In the continum approximation we have (recalling that ℏ=(h)/(2pi)\hbar=\frac{h}{2 \pi} ):
{:[W=sum_(E-Delta E <= E_(n) <= E)1~~int_({E-Delta E <= E_(n) <= E})d^(3)n],[=int_({E-Delta E <= (ℏ^(2))/(2m)(k_(x)^(2)+k_(y)^(2)+k_(z)^(2)) <= E})(L_(x)L_(y)L_(z))/(pi^(3))d^(3)k],[=((2m)/(ℏ^(2)))^((3)/(2))(V)/(pi^(3))int_({E-Delta E <= E^(') <= E})(1)/(2)E^('(1)/(2))dE^(')int_(1//8" of "S^(2))d^(2)Omega],[(4.42)=(4pi)/(3)(V)/((2pi)^(3))((2mE)/(ℏ^(2)))^((3)/(2))|_(E-Delta E)^(E)]:}\begin{align*}
& W=\sum_{E-\Delta E \leqslant E_{n} \leqslant E} 1 \approx \int_{\left\{E-\Delta E \leqslant E_{n} \leqslant E\right\}} d^{3} n \\
& =\int_{\left\{E-\Delta E \leqslant \frac{\hbar^{2}}{2 m}\left(k_{x}^{2}+k_{y}^{2}+k_{z}^{2}\right) \leqslant E\right\}} \frac{L_{x} L_{y} L_{z}}{\pi^{3}} d^{3} k \\
& =\left(\frac{2 m}{\hbar^{2}}\right)^{\frac{3}{2}} \frac{V}{\pi^{3}} \int_{\left\{E-\Delta E \leqslant E^{\prime} \leqslant E\right\}} \frac{1}{2} E^{\prime \frac{1}{2}} d E^{\prime} \int_{1 / 8 \text { of } S^{2}} d^{2} \Omega \\
& =\left.\frac{4 \pi}{3} \frac{V}{(2 \pi)^{3}}\left(\frac{2 m E}{\hbar^{2}}\right)^{\frac{3}{2}}\right|_{E-\Delta E} ^{E} \tag{4.42}
\end{align*}
If we compute WW according to the definition in classical mechanics, we would get
{:[W=int_({E-Delta E <= H <= E})d^(3)pd^(3)x=Vint_({E-Delta E <= ( vec(p)^(2))/(2m) <= E})d^(3)p],[=V(2m)^((3)/(2))int_({E-Delta E <= E^(') <= E})(1)/(2)E^('(1)/(2))dE^(')int_(S^(2))d^(2)Omega],[=(4pi)/(3)V(2mE)^((3)/(2))|_(E-Delta E)^(E)]:}\begin{aligned}
W & =\int_{\{E-\Delta E \leqslant H \leqslant E\}} d^{3} p d^{3} x=V \int_{\left\{E-\Delta E \leqslant \frac{\vec{p}^{2}}{2 m} \leqslant E\right\}} d^{3} p \\
& =V(2 m)^{\frac{3}{2}} \int_{\left\{E-\Delta E \leqslant E^{\prime} \leqslant E\right\}} \frac{1}{2} E^{\prime \frac{1}{2}} d E^{\prime} \int_{S^{2}} d^{2} \Omega \\
& =\left.\frac{4 \pi}{3} V(2 m E)^{\frac{3}{2}}\right|_{E-\Delta E} ^{E}
\end{aligned}
This is just h^(3)h^{3} times the quantum mechanical result. For the case of NN particles, this suggests the following relation ^(1){ }^{1} :
This can be understood intuitively by recalling the uncertainty relation Delta p Delta x≳h\Delta p \Delta x \gtrsim h, together with p∼(pi nℏ)/(V^((1)/(3))),n inNp \sim \frac{\pi n \hbar}{V^{\frac{1}{3}}}, n \in \mathbb{N}.
4.2.3 Mixing entropy of the ideal gas
A puzzle concerning the definition of entropy in the micro-canonical ensemble (e.g. for an ideal gas) is revealed if we consider the following situation of two chambers, each of which is filled with an ideal gas:
Figure 4.4: Two gases separated by a removable wall.
The total volume is given by V=V_(1)+V_(2)V=V_{1}+V_{2}, the total particle number by N=N_(1)+N_(2)N=N_{1}+N_{2} and the total energy by E=E_(1)+E_(2)E=E_{1}+E_{2}. Both gases are at the same temperature TT. Using the expression (4.16) for the classical ideal gas, the entropies S_(i)(N_(i),V_(i),E_(i))S_{i}\left(N_{i}, V_{i}, E_{i}\right) are calculated as
with c_(i)=(N_(i))/(N)c_{i}=\frac{N_{i}}{N} and v_(i)=(V_(i))/(V)v_{i}=\frac{V_{i}}{V}. This holds also for an arbitrary number of components and raises the following paradox: if both gases are identical with the same density (N_(1))/(N)=(N_(2))/(N)\frac{N_{1}}{N}=\frac{N_{2}}{N}, from a macroscopic viewpoint clearly "nothing happens" as the wall is removed. Yet, Delta S!=0\Delta S \neq 0. The resolution of this paradox is that the particles have been treated as distinguishable, i.e. the states
have been counted as microscopically different. However, if both gases are the same, they ought to be treated as indistinguishable. This change results in a different definition of WW in both cases. Namely, depending on the case considered, the correct definition of WW should be:
{:(4.49)W(E,V,{N_(i)}):={[|Omega(E,V,{N_(i)})|," if distinguishable "],[(1)/(prod_(i)N_(i)!)|Omega(E,V,{N_(i)})|," if indistinguishable "]:}:}W\left(E, V,\left\{N_{i}\right\}\right):= \begin{cases}\left|\Omega\left(E, V,\left\{N_{i}\right\}\right)\right| & \text { if distinguishable } \tag{4.49}\\ \frac{1}{\prod_{i} N_{i}!}\left|\Omega\left(E, V,\left\{N_{i}\right\}\right)\right| & \text { if indistinguishable }\end{cases}
where N_(i)N_{i} is the number of particles of species ii. Thus, the second definition is the physically correct one in our case. With this change (which in turn results in a different definition of the entropy SS ), the mixing entropy of two identical gases is now Delta S=0\Delta S=0. In quantum mechanics the symmetry factor (1)/(N!)\frac{1}{N!} in W^("qm ")W^{\text {qm }} (for each species of indistinguishable particles) is automatically included due to the Bose/Fermi alternative, which we shall discuss later, leading to an automatic resolution of the paradox.
The non-zero mixing entropy of two identical gases is seen to be unphysical also
at the classical level because the entropy should be an extensive quantity. Indeed, the arguments of the previous subsection suggest that for V_(1)=V_(2)=(1)/(2)VV_{1}=V_{2}=\frac{1}{2} V and N_(1)=N_(2)=(1)/(2)NN_{1}=N_{2}=\frac{1}{2} N we have
{:(4.52)S(E","N","V)=N*sigma(epsilon","n):}\begin{equation*}
S(E, N, V)=N \cdot \sigma(\epsilon, n) \tag{4.52}
\end{equation*}
for some function sigma\sigma in two variables, where epsilon=(E)/(N)\epsilon=\frac{E}{N} is the average energy per particle and n=(N)/(V)n=\frac{N}{V} is the particle density. Hence SS is an extensive quantity, i.e. SS is proportional to NN. A non-zero mixing entropy would contradict the extensivity property of SS.
4.3 Canonical Ensemble
4.3.1 Canonical Ensemble in Quantum Mechanics
We consider a system (system A) in thermal contact with an (infinitely large) heat reservoir (system B):
Figure 4.5: A small system in contact with a large heat reservoir.
The overall energy E=E_(A)+E_(B)E=E_{A}+E_{B} of the combined system is fixed, as are the particle numbers N_(A),N_(B)N_{A}, N_{B} of the subsystems. We think of N_(B)N_{B} as much larger than N_(A)N_{A}; in fact we shall let N_(B)rarr ooN_{B} \rightarrow \infty at the end of our derivation. We accordingly describe the total Hilbert space of the system by a tensor product, H=H_(A)oxH_(B)\mathcal{H}=\mathcal{H}_{A} \otimes \mathcal{H}_{B}. The total Hamiltonian of the combined system is
{:(4.53)H=ubrace(H_(A)ubrace)_("system A ")+ubrace(H_(B)ubrace)_("system B ")+ubrace(H_(AB)ubrace)_("interaction (neglected) ")",":}\begin{equation*}
H=\underbrace{H_{A}}_{\text {system A }}+\underbrace{H_{B}}_{\text {system B }}+\underbrace{H_{A B}}_{\text {interaction (neglected) }}, \tag{4.53}
\end{equation*}
where the interaction is needed in order that the subsystems can interact with each other. Its precise form is not needed, as we shall assume that the interaction strength is arbitrarily small. The Hamiltonians H_(A)H_{A} and H_(B)H_{B} of the subsystems AA and BB act on the Hilbert spaces H_(A)\mathcal{H}_{A} and H_(B)\mathcal{H}_{B}, and we choose bases so that:
Since EE is conserved, the quantum mechanical statistical operator of the combined system is given by the micro canonical ensemble with density matrix
{:(4.54)rho=(1)/(W)*sum_({:[n","m:],[E-Delta E <= E_(n)^((A))+E_(m)^((B)) <= E]:})|n","m:)(:n","m|.:}\begin{equation*}
\rho=\frac{1}{W} \cdot \sum_{\substack{n, m: \\ E-\Delta E \leqslant E_{n}^{(A)}+E_{m}^{(B)} \leqslant E}}|n, m\rangle\langle n, m| . \tag{4.54}
\end{equation*}
The reduced density matrix for sub system A is calculated as
rho_(A)=(1)/(W)sum_(n) obrace((sum_(m:E-E_(n)^((A))-Delta E <= E_(m)^((A)) <= E-E_(n)^((A)))1))^(=W_(B)(E-E_(n)^((A))))|n:)_(AA)(:n|.\rho_{A}=\frac{1}{W} \sum_{n} \overbrace{\left(\sum_{m: E-E_{n}^{(A)}-\Delta E \leqslant E_{m}^{(A)} \leqslant E-E_{n}^{(A)}} 1\right)}^{=W_{B}\left(E-E_{n}^{(A)}\right)}|n\rangle_{A A}\langle n| .
Now, using the extensiveness of the entropy S_(B)S_{B} of system B we find (with n_(B)=N_(B)//V_(B)n_{B}=N_{B} / V_{B}
the particle density and sigma_(B)\sigma_{B} the entropy per particle of system B )
Thus, using beta=(1)/(k_(B)T)\beta=\frac{1}{\mathrm{k}_{\mathrm{B}} T} and (1)/(T)=(del S)/(del E)\frac{1}{T}=\frac{\partial S}{\partial E}, we have for an infinite reservoir
Here we have dropped the subscripts "A" referring to our sub system since we can at this point forget about the role of the reservoir B (so H=H_(A),V=V_(A)H=H_{A}, V=V_{A} etc. in this formula). This finally leads to the statistical operator of the canonical ensemble:
Particular, the only quantity characterizing the reservoir entering the formula is the temperature TT.
4.3.2 Canonical Ensemble in Classical Mechanics
In the classical case we can make similar considerations as in the quantum mechanical case. Consider the same situation as above. The phase space coordinates of the combined system are divided up as
(P,Q)=(ubrace(P_(A),Q_(A)ubrace)_("system A "),ubrace(P_(B),Q_(B)ubrace)_("system "B)).(P, Q)=(\underbrace{P_{A}, Q_{A}}_{\text {system A }}, \underbrace{P_{B}, Q_{B}}_{\text {system } \mathrm{B}}) .
H_(AB)H_{A B} accounts for the interaction between the particles from both systems and is neglected in the following. By analogy with the quantum mechanical case we get a reduced probability distribution rho_(A)\rho_{A} for sub system A:
It is then demonstrated precisely as in the quantum mechanical case that the reduced density matrix rho-=rho_(A)\rho \equiv \rho_{A} for system A is given by (for an infinitely large system B ):
where P=P_(A),Q=Q_(A),H=H_(A)P=P_{A}, Q=Q_{A}, H=H_{A} in this formula. The classical canonical partition function Z=Z(beta,N,V)Z=Z(\beta, N, V) for NN indistinguishable particles is conventionally fixed by (h^(3N)N!)^(-1)int rhod^(3N)Pd^(3N)Q=1\left(h^{3 N} N!\right)^{-1} \int \rho d^{3 N} P d^{3 N} Q=1. For an external square well potential (H(P,Q)=(H(P, Q)= {:sum_(i=1)^(3N)(P_(i)^(2))/(2m)+V_(N)(Q))\left.\sum_{i=1}^{3 N} \frac{P_{i}^{2}}{2 m}+\mathcal{V}_{N}(Q)\right) confining the system to a box of volume VV this leads to
The quantity lambda:=(h)/(sqrt(2pi mk_(B)T))\lambda:=\frac{h}{\sqrt{2 \pi m \mathrm{k}_{\mathrm{B}} T}} is sometimes called the "thermal deBroglie wavelength". As a rule of thumb, quantum effects start being significant if lambda\lambda exceeds the typical dimensions of the system, such as the mean free path length or system size. Using this definition, we can write
Of course, this form of the partition function applies to classical, not quantum, systems. The unconventional factor of h^(3N)h^{3 N} is nevertheless put in by analogy with the quantum mechanical case because one imagines that the "unit" of phase space for NN particles (i.e. the phase space measure) is given by d^(3N)Pd^(3N)Q//(N!h^(3N))d^{3 N} P d^{3 N} Q /\left(N!h^{3 N}\right), inspired by the uncertainty principle Delta Q Delta P∼h\Delta Q \Delta P \sim h, see e.g. our discussion of the atom in a cube for why the normalized classical partition function then approximates the quantum partition function. The motivation of the factor NN ! is due to the fact that we want to treat the particles as indistinguishable. Therefore, a permuted phase space configuration should be viewed as equivalent to the unpermuted one, and since there are NN ! permutations, the factor 1//N1 / N ! effectively compensates a corresponding overcounting (here we implicitly assume that V_(N)\mathcal{V}_{N} is symmetric under permutations). For the discussion of the NN !-factor, see also our discussion on mixing entropy. In practice, these factors often do not play a major role because the quantities most directly related to thermodynamics are derivatives of
{:(4.64)F:=-beta^(-1)log Z(beta","N","V):}\begin{equation*}
F:=-\beta^{-1} \log Z(\beta, N, V) \tag{4.64}
\end{equation*}
for instance P=-del F// del V|_(T,N)P=-\partial F /\left.\partial V\right|_{T, N}, see chapter 6.5 for a detailed discussion of such relations. FF is also called the free energy.
Example:
One may use the formula (4.61) to obtain the barometric formula for the average particle density at a position vec(x)\vec{x} in a given external potential. In this case the Hamiltonian HH is given by
H=sum_(i=1)^(N)( vec(p)_(i)^(2))/(2m)+sum_(i=1)^(N)ubrace(W( vec(x)_(i))ubrace)_({:[" external potential, "],[" no interaction "],[" between the particles "]:}),H=\sum_{i=1}^{N} \frac{\vec{p}_{i}^{2}}{2 m}+\sum_{i=1}^{N} \underbrace{\mathcal{W}\left(\vec{x}_{i}\right)}_{\begin{array}{c}
\text { external potential, } \\
\text { no interaction } \\
\text { between the particles }
\end{array}},
To double-check with our intuition we provide an alternative derivation of this formula: let P( vec(x))P(\vec{x}) be the pressure at vec(x)\vec{x} and vec(F)( vec(x))=- vec(grad)W( vec(x))\vec{F}(\vec{x})=-\vec{\nabla} \mathcal{W}(\vec{x}) the force acting on one particle. For the average force density vec(f)( vec(x))\vec{f}(\vec{x}) in equilibrium we thus obtain
The function AA should be chosen such that that the integrand falls of sufficiently rapidly. For A(P,Q)=p_(i alpha)A(P, Q)=p_{i \alpha} and A(P,Q)=x_(i alpha)A(P, Q)=x_{i \alpha}, respectively, we find
The first of these equations is called equipartition or equidistribution law.
We split up the potential V\mathcal{V} into a part coming from the interactions of the particles and
a part describing an external potential, i.e.
Writing vec(x)_(kl)-= vec(x)_(k)- vec(x)_(l)\vec{x}_{k l} \equiv \vec{x}_{k}-\vec{x}_{l} for the relative distance between the kk-th and the ll-th particle, we find by a lengthy calculation:
where we assumed that the potential W\mathcal{W} is constant within the volume, so that the pressure is also constant. According to the equipartition law we have sum_(i alpha)(:x_(i alpha)(delV)/(delx_(i alpha)):)=\sum_{i \alpha}\left\langle x_{i \alpha} \frac{\partial \mathcal{V}}{\partial x_{i \alpha}}\right\rangle=3Nk_(B)T3 N \mathrm{k}_{\mathrm{B}} T and therefore obtain the virial law for classical systems
{:(4.83)PV=Nk_(B)T-ubrace((1)/(6)sum_(k!=l)(: vec(x)_(kl)(delV)/(del vec(x)_(kl)):)ubrace)_(=0" for ideal gas ").:}\begin{equation*}
P V=N \mathrm{k}_{\mathrm{B}} T-\underbrace{\frac{1}{6} \sum_{k \neq l}\left\langle\vec{x}_{k l} \frac{\partial \mathcal{V}}{\partial \vec{x}_{k l}}\right\rangle}_{=0 \text { for ideal gas }} . \tag{4.83}
\end{equation*}
Thus, interactions tend to increase PP when they are repulsive, and tend to decrease PP when they are attractive. This is of course consistent with our intuition.
A well-known application of the virial law is the following example:
Example: estimation of the mass of distant galaxies:
Figure 4.6: Distribution and velocity of stars in a galaxy.
We use the relations (4.80) we found above,
(:( vec(p)_(1)^(2))/(m_(1)):)=(:(delV)/(del vec(x)_(1)) vec(x)_(1):)=3k_(B)T\left\langle\frac{\vec{p}_{1}^{2}}{m_{1}}\right\rangle=\left\langle\frac{\partial \mathcal{V}}{\partial \vec{x}_{1}} \vec{x}_{1}\right\rangle=3 \mathrm{k}_{\mathrm{B}} T
assuming that the stars in the outer region have reached thermal equilibrium, so that they can be described by the canonical ensemble. We put vec(v)= vec(p)_(1)//m_(1),v=| vec(v)|\vec{v}=\vec{p}_{1} / m_{1}, v=|\vec{v}| and R=| vec(x_(1))|R=\left|\overrightarrow{x_{1}}\right|, and assume that (: vec(v)^(2):)~~(:v:)^(2)\left\langle\vec{v}^{2}\right\rangle \approx\langle v\rangle^{2} as well as
{:(4.84)(:(delV)/(del vec(x)_(1)) vec(x)_(1):)=m_(1)Gsum_(j!=1)(:(m_(j))/(| vec(x)_(1)- vec(x)_(j)|):)~~m_(1)MG(:(1)/(R):)~~m_(1)MG(1)/((:R:)):}\begin{equation*}
\left\langle\frac{\partial \mathcal{V}}{\partial \vec{x}_{1}} \vec{x}_{1}\right\rangle=m_{1} G \sum_{j \neq 1}\left\langle\frac{m_{j}}{\left|\vec{x}_{1}-\vec{x}_{j}\right|}\right\rangle \approx m_{1} M G\left\langle\frac{1}{R}\right\rangle \approx m_{1} M G \frac{1}{\langle R\rangle} \tag{4.84}
\end{equation*}
supposing that the potential felt by star 1 is dominated by the Newton potential created by the core of the galaxy containing most of the mass M~~sum_(j)m_(j)M \approx \sum_{j} m_{j}. Under these approximations, we conclude that
This relation is useful for estimating MM because (:R:)\langle R\rangle and (:v:)\langle v\rangle can be measured or estimated. Typically (:v:)=O(10^(2)((km))/((s)))\langle v\rangle=\mathcal{O}\left(10^{2} \frac{\mathrm{~km}}{\mathrm{~s}}\right).
Continuing with the general discussion, if the potential attains a minimum at Q=Q_(0)Q=Q_{0} we have (delV)/(delx_(i alpha))(Q_(0))=0\frac{\partial \mathcal{V}}{\partial x_{i \alpha}}\left(Q_{0}\right)=0, as sketched in the following figure:
Figure 4.7: Sketch of a potential V\mathcal{V} of a lattice with a minimum at Q_(0)Q_{0}.
Setting V(Q_(0))-=V_(0)\mathcal{V}\left(Q_{0}\right) \equiv \mathcal{V}_{0}, we can Taylor expand around Q_(0)Q_{0} :
where Delta Q=Q-Q_(0)\Delta Q=Q-Q_{0}. In this approximation |Delta Q|≪1|\Delta Q| \ll 1, i.e. for small oscillations around the minimum) we have, setting the zero point energy V_(0)=0\mathcal{V}_{0}=0,
{:[(4.87)sum_(i,alpha)(:x_(i alpha)(delV)/(delx_(i alpha)):)~~2(:V:)=sum_(i,alpha)k_(B)T=3Nk_(B)T],[(4.88)sum_(i,alpha)(:(p_(i alpha)^(2))/(m_(i)):)=2(:sum_(i)( vec(p)_(i)^(2))/(2m_(i)):)=3Nk_(B)T]:}\begin{align*}
\sum_{i, \alpha}\left\langle x_{i \alpha} \frac{\partial \mathcal{V}}{\partial x_{i \alpha}}\right\rangle & \approx 2\langle\mathcal{V}\rangle=\sum_{i, \alpha} \mathrm{k}_{\mathrm{B}} T=3 N \mathrm{k}_{\mathrm{B}} T \tag{4.87}\\
\sum_{i, \alpha}\left\langle\frac{p_{i \alpha}^{2}}{m_{i}}\right\rangle & =2\left\langle\sum_{i} \frac{\vec{p}_{i}^{2}}{2 m_{i}}\right\rangle=3 N \mathrm{k}_{\mathrm{B}} T \tag{4.88}
\end{align*}
It follows that the mean energy (:H:)\langle H\rangle of the system is given by
{:(4.89)(:H:)=3Nk_(B)T.:}\begin{equation*}
\langle H\rangle=3 N \mathrm{k}_{\mathrm{B}} T . \tag{4.89}
\end{equation*}
This relation is called the Dulong-Petit law. For real lattice systems there are deviations from this law at low temperature TT through quantum effects and at high temperature TT through non-linear effects, which are not captured by the approximation (4.86).
Our discussion for classical systems can be adapted to the quantum mechanical context, but there are some changes. Consider the canonical ensemble with statistical operator rho=(1)/(Z)e^(-beta H)\rho=\frac{1}{Z} e^{-\beta H}. From this it immediately follows that
By using [a,bc]=[a,b]c+b[a,c][a, b c]=[a, b] c+b[a, c] and vec(p)_(j)=(ℏ)/(i)(del)/(del vec(x)_(j))\vec{p}_{j}=\frac{\hbar}{i} \frac{\partial}{\partial \vec{x}_{j}} we obtain
Applying now the same arguments as in the classical case to evaluate the left hand side leads to
{:(4.94)PV=ubrace((2)/(3)(:H_(kin):)ubrace)_({:[],[!=Nk_(B)T" for ideal gas "],[],[=>" quantum effects! "]:})-(1)/(6)sum_(k!=l)(: vec(x)_(kl)(delV)/(del vec(x)_(kl)):).:}\begin{equation*}
P V=\underbrace{\frac{2}{3}\left\langle H_{\mathrm{kin}}\right\rangle}_{\substack{ \\\neq N \mathrm{k}_{\mathrm{B}} T \text { for ideal gas } \\ \\ \Rightarrow \text { quantum effects! }}}-\frac{1}{6} \sum_{k \neq l}\left\langle\vec{x}_{k l} \frac{\partial \mathcal{V}}{\partial \vec{x}_{k l}}\right\rangle . \tag{4.94}
\end{equation*}
For an ideal gas the contribution from the potential is by definition absent, but the contribution from the kinetic piece does not give the same formula as in the classical case, as we will discuss in more detail below in chapter 5. Thus, even for an ideal quantum gas (V=0)(\mathcal{V}=0), the classical formula PV=Nk_(B)TP V=N \mathrm{k}_{\mathrm{B}} T receives corrections!
4.4 Grand Canonical Ensemble
This ensemble describes the following physical situation: a small system (system A) is coupled to a large reservoir (system B). Energy and particle exchange between A and B are possible.
Figure 4.8: A small system coupled to a large heat and particle reservoir.
The treatment of this ensemble is similar to that of the canonical ensemble. For definiteness, we consider the quantum mechanical case. We have E=E_(A)+E_(B)E=E_{A}+E_{B} for the total energy, and N=N_(A)+N_(B)N=N_{A}+N_{B} for the total particle number. The total system A+B\mathrm{A}+\mathrm{B}
is described by the microcanonical ensemble, since EE and NN are conserved. The Hilbert space for the total system is again a tensor product, and the statistical operator rho\rho of the total system is accordingly given by
{:(4.95)rho=(1)/(W)*sum_({:[E-Delta E <= E_(n)^((A))+E_(m)^((B)) <= E],[N_(n)^((A))+N_(m)^((B))=N]:})|n","m:)(:n","m|",":}\begin{equation*}
\rho=\frac{1}{W} \cdot \sum_{\substack{E-\Delta E \leqslant E_{n}^{(A)}+E_{m}^{(B)} \leqslant E \\ N_{n}^{(A)}+N_{m}^{(B)}=N}}|n, m\rangle\langle n, m|, \tag{4.95}
\end{equation*}
where the total Hamiltonian of the combined system is
{:(4.96)H=ubrace(H_(A)ubrace)_("system A ")+ubrace(H_(B)ubrace)_("system B ")+ubrace(H_(AB)ubrace)_("interaction (neglected) ")",":}\begin{equation*}
H=\underbrace{H_{A}}_{\text {system A }}+\underbrace{H_{B}}_{\text {system B }}+\underbrace{H_{A B}}_{\text {interaction (neglected) }}, \tag{4.96}
\end{equation*}
We are using notations similar to the canonical ensemble such as |n,m:)=|n:)_(A)|m:)_(B)|n, m\rangle=|n\rangle_{A}|m\rangle_{B} and
Note that the particle numbers of the individual subsystems fluctuate, so we describe them by number operators hat(N)_(A), hat(N)_(B)\hat{N}_{A}, \hat{N}_{B} acting on H_(A),H_(B)\mathcal{H}_{A}, \mathcal{H}_{B}.
The statistical operator for system A is described by the reduced density matrix rho_(A)\rho_{A} for this system, namely by
for some function sigma\sigma in two variables. Now we let V_(B)rarr ooV_{B} \rightarrow \infty, but keeping (E)/(V_(B))\frac{E}{V_{B}} and (N)/(V_(B))\frac{N}{V_{B}} constant. Arguing precisely as in the case of the canonical ensemble, and using now also the definition of the chemical potential in (4.17), we find
for N_(B),V_(B)rarr ooN_{B}, V_{B} \rightarrow \infty. By the same arguments as for the temperature in the canonical ensemble the chemical potential mu\mu is the same for both systems in equilibrium. We
obtain for the reduced density matrix of system A:
Thus, only the quantities beta\beta and mu\mu characterizing the reservoir (system B ) have an influence on system A. Dropping from now on the reference to "A", we can write the statistical operator of the grand canonical ensemble as
where HH and hat(N)\hat{N} are now operators. The constant Y=Y(mu,beta,V)Y=Y(\mu, \beta, V) is determined by trrho_(1)=1\operatorname{tr} \rho_{1}=1 and is called the grand canonical partition function. Explicitly:
The grand canonical partition function can be related to the canonical partition function. The Hilbert space of our system (i.e., system A) can be decomposed
Then [H, hat(N)]=0(( hat(N)):}[H, \hat{N}]=0\left(\hat{N}\right. has eigenvalue NN on {:H_(N))\left.\mathcal{H}_{N}\right), and HH and hat(N)\hat{N} are simultaneously diagonalized, with (assuming a discrete spectrum of HH )
{:(4.106)H|alpha","N:)=E_(alpha,N)|alpha","N:)quad" and "quad hat(N)|alpha","N:)=N|alpha","N:):}\begin{equation*}
H|\alpha, N\rangle=E_{\alpha, N}|\alpha, N\rangle \quad \text { and } \quad \hat{N}|\alpha, N\rangle=N|\alpha, N\rangle \tag{4.106}
\end{equation*}
which is the desired relation between the canonical and the grand canonical partition function.
We also note that for a potential of the standard form
V_(N)=sum_(1 <= i < j <= N)V( vec(x)_(i)- vec(x)_(j))+sum_(1 <= i <= N)W( vec(x)_(i))\mathcal{V}_{N}=\sum_{1 \leqslant i<j \leqslant N} \mathcal{V}\left(\vec{x}_{i}-\vec{x}_{j}\right)+\sum_{1 \leqslant i \leqslant N} \mathcal{W}\left(\vec{x}_{i}\right)
we may think of replacing H_(N)rarrH_(N)-mu NH_{N} \rightarrow H_{N}-\mu N as being due to WrarrW-mu\mathcal{W} \rightarrow \mathcal{W}-\mu. Therefore, for variable particle number NN, there is no arbitrary additive constant in the 1-particle potential W\mathcal{W}, but it is determined by the chemical potential mu\mu. A larger mu\mu gives greater statistical weight in YY to states with larger NN, just as larger TT (smaller beta\beta ) gives greater weight to states with larger EE.
4.5 Summary of different equilibrium ensembles
Let us summarize the equilibrium ensembles we have discussed in this chapter in a table:
Ensemble "Defining
property" "Partition
function" Statistical operator rho
"Microcanonical
ensemble" "no energy exchange
no particle exchange" W(E,N,V) (1)/(W)[Theta(H-E+Delta E)-Theta(H-E)]
"Canonical
ensemble" "energy exchange
no particle exchange" Z(beta,N,V) (1)/(Z)e^(-beta H)
"Grand canonical
ensemble" "energy exchange
particle exchange" Y(beta,mu,V) (1)/(Y)e^(-beta(H-mu hat(N)))| Ensemble | Defining <br> property | Partition <br> function | Statistical operator $\rho$ |
| :--- | :--- | :--- | :--- |
| Microcanonical <br> ensemble | no energy exchange <br> no particle exchange | $W(E, N, V)$ | $\frac{1}{W}[\Theta(H-E+\Delta E)-\Theta(H-E)]$ |
| Canonical <br> ensemble | energy exchange <br> no particle exchange | $Z(\beta, N, V)$ | $\frac{1}{Z} e^{-\beta H}$ |
| Grand canonical <br> ensemble | energy exchange <br> particle exchange | $Y(\beta, \mu, V)$ | $\frac{1}{Y} e^{-\beta(H-\mu \hat{N})}$ |
Table 4.1: Properties of the different equilibrium ensembles.
The relationship between the partition functions W,Z,YW, Z, Y and the corresponding natural termodynamic "potentials" is summarized in the following table:
Further explanations regarding the various thermodynamic potentials are given below
Ensemble
Name of
potential
Name of
potential| Name of |
| :--- |
| potential |
Symbol
Relation with
partion function
Relation with
partion function| Relation with |
| :--- |
| partion function |
Ensemble "Name of
potential" Symbol "Relation with
partion function"
"Microcanonical
ensemble" Entropy S(E,N,V) S=k_(B)log W
"Canonical
ensemble" Free energy F(beta,N,V) F=-beta^(-1)log Z
"Grand canonical
ensemble" "Gibbs free
energy" G(beta,mu,V) G=-beta^(-1)log Y| Ensemble | Name of <br> potential | Symbol | Relation with <br> partion function |
| :--- | :--- | :--- | :--- |
| Microcanonical <br> ensemble | Entropy | $S(E, N, V)$ | $S=\mathrm{k}_{\mathrm{B}} \log W$ |
| Canonical <br> ensemble | Free energy | $F(\beta, N, V)$ | $F=-\beta^{-1} \log Z$ |
| Grand canonical <br> ensemble | Gibbs free <br> energy | $G(\beta, \mu, V)$ | $G=-\beta^{-1} \log Y$ |
Table 4.2: Relationship to different thermodynamic potentials.
in section 6.7.
4.6 Approximation methods
For interacting systems, it is normally impossible to calculate thermodynamic quantities exactly. In these cases, approximations, estimations or numerical methods must be used. In the appendix, we present an example of a numerical method, the Monte Carlo algorithm, which can be turned into an efficient method for numerically evaluating quantities like partition functions. In problem B. 16 we discuss the mean field approximation in the example of the Ising model. In the following two subsections we present an example of an expansion technique and an example of a method based on rigorous estimations.
4.6.1 The cluster expansion
For simplicity, we consider a classical system in a box of volume VV, with NN-particle Hamiltonian H_(N)H_{N} given by
where V_(ij)=V( vec(x)_(i)- vec(x)_(j))\mathcal{V}_{i j}=\mathcal{V}\left(\vec{x}_{i}-\vec{x}_{j}\right) is the two-particle interaction between the ii-th and the jj-th particle. The partition function for the grand canonical ensemble is (see (4.103)):
Here, lambda=(h)/(sqrt(2pi mk_(B)T))\lambda=\frac{h}{\sqrt{2 \pi m \mathrm{k}_{\mathrm{B}} T}} is the thermal deBroglie wavelength. To compute the remaining integral over Q=( vec(x)_(1),dots, vec(x)_(N))Q=\left(\vec{x}_{1}, \ldots, \vec{x}_{N}\right) is generally impossible, but one can derive an expansion of which the first few terms may often be evaluated exactly. For this we write the integrand as
where we have set f_(ij)-=f( vec(x)_(i)- vec(x)_(j))=1-e^(-betanu_(ij))f_{i j} \equiv f\left(\vec{x}_{i}-\vec{x}_{j}\right)=1-e^{-\beta \nu_{i j}}. The idea is that we can think of |f_(ij)|\left|f_{i j}\right| as small in some situations of interest, e.g. when the gas is dilute (such that |V_(ij)|≪1\left|\mathcal{V}_{i j}\right| \ll 1 in "most of phase space"), or when beta\beta is small (i.e. for large temperature TT ). With this in mind, we expand the above product as
and substitute the result into the integral int_(V^(N))d^(3N)Qe^(-betaV_(N)(Q))\int_{V^{N}} d^{3 N} Q e^{-\beta \mathcal{V}_{N}(Q)}. The general form of the resulting integrals that we need to evaluate is suggested by the following representative example for N=6N=6 particles:
To keep track of all the integrals that come up, we introduced the following convenient graphical notation. In our example, this graphical notation amounts to the following. Each circle corresponds to an an integration, e.g.
The connected parts of a diagram are called "clusters". Obviously, the integral associated with a graph factorizes into the corresponding integrals for the clusters. Therefore, the "cluster integrals" are the building blocks, and we define
{:(4.116)b_(l)(V","beta)=(1)/(l!lambda^(3l-3)V)*(" sum of all "l"-point cluster integrals ").:}\begin{equation*}
b_{l}(V, \beta)=\frac{1}{l!\lambda^{3 l-3} V} \cdot(\text { sum of all } l \text {-point cluster integrals }) . \tag{4.116}
\end{equation*}
The main result in this context, known as the linked cluster theorem ^(3){ }^{3}, is that
where z=e^(beta mu)z=e^{\beta \mu} is sometimes called the fugacity. If the f_(ij)f_{i j} are sufficiently small, the first few terms (b_(1),b_(2),b_(3),dots)\left(b_{1}, b_{2}, b_{3}, \ldots\right) will give a good approximation. Explicitly, one finds (exercise):
since the possible 1-,2- and 3-clusters are given by:
As exemplified by the first 3 terms in b_(3)b_{3}, topologically identical clusters (i.e. ones that differ only by a permutation of the particles) give the same cluster integral. Thus, we only need to evaluate the cluster integrals for topologically distinct clusters.
Given an approximation for (1)/(V)log Y\frac{1}{V} \log Y, one obtains approximations for the equations of state etc. by the general methods described in more detail in section 6.5.
4.6.2 Peierls contours
Next, we present an example of a rigorous estimation method proving the existence of phase transition in the two-dimensional Ising model. In this model, we have spins sigma_(i)=+-1\sigma_{i}= \pm 1 on a square lattice and the Hamitonian (energy) is
where the first sum is over all lattice bonds iki k and the second sum is over all lattice sites ii. Note that we shifted the energies in the first term such that a pair of parallel spins has vanishing contribution to the total energy. We present an argument, due to Peierls, showing that under the boundary condition that the spins on the boundary of the lattice are positive, at sufficiently low temperatures, this model shows an equilibrium magnetization. In the following, we set b=0b=0.
Each configuration {sigma_(i)}={sigma_(1),sigma_(2),dots}\left\{\sigma_{i}\right\}=\left\{\sigma_{1}, \sigma_{2}, \ldots\right\} of spins is in one-to-one correspondence with set of (connected) contours separating regions of positive and negative spins, known as Peierls contours. They may be chosen so that they consist of the line segments in the middle between opposite spins, see figure 4.9.
Figure 4.9: A Peierls contour
Due to our boundary condition, these contours are closed. Each pair of antiparallel spins contributes an energy of +2J+2 J. Since the total number of antiparallel spin pairs in a configuration corresponding to the contours C_(1),dots,C_(r)C_{1}, \ldots, C_{r} is given by the sum of their lengths, |C_(1)|+cdots+|C_(r)|\left|C_{1}\right|+\cdots+\left|C_{r}\right|, the energy of that configuration is
Consider the totality of configurations having a given contour CC as a Peierls contour or domain wall. The sum of their probabilities P_(C)=sum_({sigma}sup C)Z^(-1)e^(-beta H({sigma}))P_{C}=\sum_{\{\sigma\} \supset C} Z^{-1} e^{-\beta H(\{\sigma\})} is the probability that there is a Peierls contour CC. We can find an estimate for that probability as follows. For each configuration {sigma}\{\sigma\} containing CC, we may define a modified configuration {sigma^(')}\left\{\sigma^{\prime}\right\} obtained by flipping all spins inside domain defined by CC. Then {sigma^(')}\left\{\sigma^{\prime}\right\} no longer has
the Peierls contour CC. The subset of all distinct configurations {sigma}\{\sigma\} containing CC leads in this way to a set of distinct configurations {sigma^(')}\left\{\sigma^{\prime}\right\} without CC, and this subset is itself a subset of all configurations without CC. Since, by (4.121), the ratio of the probabilities of the original and the modified configuration is e^(-2beta J|C|)\mathrm{e}^{-2 \beta J|C|}, and since the sum of the probabilities of configurations without CC is at most 1 , the probability for CC to be a Peierls contour is at most e^(-2beta J|C|)\mathrm{e}^{-2 \beta J|C|}.
Now, if the spin sigma_(x)\sigma_{x} at some place xx is negative, there must be a Peierls contour surrounding xx. Therefore, the probability P_(x)^(-)P_{x}^{-}that the spin at xx is negative can be estimated against the sum of the probabilities P_(C) <= e^(-2beta J|C|)P_{C} \leqslant e^{-2 \beta J|C|} for there to be a contour CC surrounding xx. Thus,
where N(l)N(l) is the number of contours of length ll surrounding xx. The sum starts at length 4, because the length of a contour is at least 4 times the lattice spacing, which is assumed to be 1 .
To get an estimate for N(l)N(l), we observe the following. First, we consider the set of all possible shapes of contours of the given length ll, where two contours are considered to have the same shape if they are congruent after some translation. We follow a given contour at some starting point, from which we have 2 possibilities to proceed (taking into account that the order is irrelevant). At the subsequent sites, we have at most 3 possibilities to continue, as we can not go straight back. At the last site, there is at most one possibility to close the contour, so that there are at most 2xx3^(l-2)2 \times 3^{l-2} shapes of closed curves of length ll (Actually, this is a vast over-counting since the curves we include in this counting may neither be closed nor free of self-intersections!) Now we impose that xx must lie within the contour. Within a contour of length ll there can be at most (l//4)^(2)(l / 4)^{2} points since l//4l / 4 is the side-length of a square of circumference ll. If a contour surrounds xx, then we may shift it in as many ways as there are points inside it while still keeping xx inside. Therefore,
By the integral test, the right hand side converges for 2beta J > log 32 \beta J>\log 3, and it tends to zero
for beta rarr oo\beta \rightarrow \infty i.e. T rarr0T \rightarrow 0. Hence, for every m in(0,1)m \in(0,1), there exists a temperature T_(0)T_{0} such that P_(x)^(-) < (1)/(2)(1-m)P_{x}^{-}<\frac{1}{2}(1-m) for all lower temperatures. Then, P_(x)^(+) > (1)/(2)(1+m)P_{x}^{+}>\frac{1}{2}(1+m) and thus
for all T < T_(0)T<T_{0}. Since this holds for all lattice sites xx, we conclude that below a certain threshold temperature, the system shows a macroscopic magnetization.
Chapter 5
The Ideal Quantum Gas
5.1 Hilbert Spaces, Canonical and Grand Canonical Formulations
When discussing the mixing entropy of classical ideal gases in section 4.2.3, we noted that Gibbs' paradox could resolved by treating the particles of the same gas species as indistinguishable. How to treat indistinguishable particles in quantum mechanics? If we have NN particles, the state vectors Psi\Psi are elements of a Hilbert space, such as H_(N)=L^(2)(V xx dots xx V,d^(3)x_(1)dotsd^(3)x_(N))\mathcal{H}_{N}=L^{2}\left(V \times \ldots \times V, d^{3} x_{1} \ldots d^{3} x_{N}\right) for particles in a box V subR^(3)V \subset \mathbb{R}^{3} without additional quantum numbers. The probability density of finding the NN particles at prescribed positions vec(x)_(1),dots, vec(x)_(N)\vec{x}_{1}, \ldots, \vec{x}_{N} is given by |Psi( vec(x)_(1),dots, vec(x)_(N))|^(2)\left|\Psi\left(\vec{x}_{1}, \ldots, \vec{x}_{N}\right)\right|^{2}. For identical particles, this should be the same as |Psi( vec(x)_(sigma(1)),dots, vec(x)_(sigma(N)))|^(2)\left|\Psi\left(\vec{x}_{\sigma(1)}, \ldots, \vec{x}_{\sigma(N)}\right)\right|^{2} for any permutation
Thus, the map U_(sigma):Psi( vec(x)_(1),dots, vec(x)_(N))|->Psi( vec(x)_(sigma(1)),dots, vec(x)_(sigma(N)))\mathcal{U}_{\sigma}: \Psi\left(\vec{x}_{1}, \ldots, \vec{x}_{N}\right) \mapsto \Psi\left(\vec{x}_{\sigma(1)}, \ldots, \vec{x}_{\sigma(N)}\right) should be represented by a phase, i.e.
Every permutation sigma\sigma can be expressed as a concatenation of transpositions, i.e., an interchange of two elements. Performing a transposition pi\pi twice yields the original wave function, hence, U_(pi)^(2)=1\mathcal{U}_{\pi}^{2}=\mathbb{1}𝟙, so eta_(pi)^(2)=1\eta_{\pi}^{2}=1. It follows that, eta_(pi)in{+-1}\eta_{\pi} \in\{ \pm 1\}. Furthermore, from U_(sigma)U_(sigma^('))=U_(sigmasigma^('))\mathcal{U}_{\sigma} \mathcal{U}_{\sigma^{\prime}}=\mathcal{U}_{\sigma \sigma^{\prime}} it follows that eta_(sigma)eta_(sigma^('))=eta_(sigmasigma^('))\eta_{\sigma} \eta_{\sigma^{\prime}}=\eta_{\sigma \sigma^{\prime}}, and as any permutation sigma\sigma can be expressed as a product of transpositions, the only possible constant assignments for eta_(sigma)\eta_{\sigma} are therefore
given by
{:(5.2)sgn(sigma)=(-1)^(#{" transpositions in "sigma})=(-1)^(#{" "crossings" in "sigma}):}\begin{equation*}
\operatorname{sgn}(\sigma)=(-1)^{\#\{\text { transpositions in } \sigma\}}=(-1)^{\#\{\text { "crossings" in } \sigma\}} \tag{5.2}
\end{equation*}
The second characterization also makes plausible the fact that sgn(sigma)\operatorname{sgn}(\sigma) is an invariant satisfying sgn(sigma)sgn(sigma^('))=sgn(sigmasigma^('))\operatorname{sgn}(\sigma) \operatorname{sgn}\left(\sigma^{\prime}\right)=\operatorname{sgn}\left(\sigma \sigma^{\prime}\right).
Example:
Consider the following permutation:
In this example we have sgn(sigma)=+1=(-1)^(4)\operatorname{sgn}(\sigma)=+1=(-1)^{4}.
In order to go from the Hilbert space H_(N)\mathcal{H}_{N} of distinguishable particles such as
The Hilbert spaces for Bosons/Fermions, respectively, are then given by
{:(5.3)H_(N)^(+-)={[P_(+)H_(N)," for Bosons "],[P_(-)H_(N)," for Fermions. "]:}:}\mathcal{H}_{N}^{ \pm}= \begin{cases}\mathcal{P}_{+} \mathcal{H}_{N} & \text { for Bosons } \tag{5.3}\\ \mathcal{P}_{-} \mathcal{H}_{N} & \text { for Fermions. }\end{cases}
In the following, we consider NN non-interacting, non-relativistic particles of mass mm in a box with volume V=L^(3)V=L^{3}, together with Dirichlet boundary conditions. The Hamiltonian of the system in either case is given by
where k_(x)=(pin_(x))/(L),dotsk_{x}=\frac{\pi n_{x}}{L}, \ldots, with n_(x)=1,2,3,dotsn_{x}=1,2,3, \ldots, and similarly for the y,zy, z-components. The product wave functions Psi_( vec(k)_(1))( vec(x)_(1))cdotsPsi_( vec(k)_(N))( vec(x)_(N))\Psi_{\vec{k}_{1}}\left(\vec{x}_{1}\right) \cdots \Psi_{\vec{k}_{N}}\left(\vec{x}_{N}\right) do not satisfy the symmetry requirements for Bosons/Fermions. To obtain these we have to apply the projectors P_(+-)\mathcal{P}_{ \pm}to the states | vec(k)_(1):)ox cdots ox| vec(k)_(N):)inH_(N)\left|\vec{k}_{1}\right\rangle \otimes \cdots \otimes\left|\vec{k}_{N}\right\rangle \in \mathcal{H}_{N}. We define:
where c_(+-)c_{ \pm}is a normalization constant, defined by demanding that _(+-)(: vec(k)_(1),dots, vec(k)_(N)∣ vec(k)_(1),dots, vec(k)_(N):)_(+-)={ }_{ \pm}\left\langle\vec{k}_{1}, \ldots, \vec{k}_{N} \mid \vec{k}_{1}, \ldots, \vec{k}_{N}\right\rangle_{ \pm}= 1. (We have used the Dirac notation (:x∣ vec(k):)-=Psi_( vec(k))( vec(x))\langle x \mid \vec{k}\rangle \equiv \Psi_{\vec{k}}(\vec{x}).) Explicitly, we have:
with | vec(k)_(1), vec(k)_(2):)_(-)=0\left|\vec{k}_{1}, \vec{k}_{2}\right\rangle_{-}=0 if vec(k)_(1)= vec(k)_(2)\vec{k}_{1}=\vec{k}_{2}. This implements the Pauli principle.
More generally, for an NN-particle fermion state we have
(c) Bosons with N=3N=3 : A normalized three-particle boson state with vec(k)_(1)= vec(k), vec(k)_(2)=\vec{k}_{1}=\vec{k}, \vec{k}_{2}=vec(k)_(3)= vec(p)\vec{k}_{3}=\vec{p} is given by
The normalization factors c_(+),c_(-)c_{+}, c_{-}are given in general as follows:
(a) Bosons: Let n_( vec(k))n_{\vec{k}} be the number of appearances of the mode vec(k)\vec{k} in | vec(k)_(1),dots, vec(k)_(N):)_(+)\left|\vec{k}_{1}, \ldots, \vec{k}_{N}\right\rangle_{+}, i.e. n_( vec(k))=sum_(i)delta_( vec(k), vec(k)_(i))n_{\vec{k}}=\sum_{i} \delta_{\vec{k}, \vec{k}_{i}}. Then c_(+)c_{+}is given by
because the term under the second sum is zero unless the permuted { vec(k)}\{\vec{k}\} 's are
identical (this happens prod_( vec(k))n_( vec(k))\prod_{\vec{k}} n_{\vec{k}} ! times for either bosons or fermions), and because for fermions, the occupation numbers n_( vec(k))n_{\vec{k}} can be either zero or one.
The canonical partition function Z^(+-)Z^{ \pm}is now defined as:
In general the partition function is difficult to calculate. It is easier to momentarily move onto the grand canonical ensemble, where the particle number NN is variable, i.e. it is given by a particle number operator hat(N)\hat{N} with eigenvalues N=0,1,2,dotsN=0,1,2, \ldots. The Hilbert space is then given by the bosonic (+)(+) or fermionic (-)(-) Fock space
On H_(N)^(+-)\mathcal{H}_{N}^{ \pm}the particle number operator hat(N)\hat{N} has eigenvalue NN. The grand canonical partition function Y^(+-)Y^{ \pm}is then defined as before as (cf. (4.103) and (4.107)):
Another representation of the states in H^(+-)\mathcal{H}^{ \pm}is the one based on the occupation numbers n_( vec(k))n_{\vec{k}} :
(a) |{n_( vec(k))}:)_(+),quadn_( vec(k))=0,1,2,3,dots\left|\left\{n_{\vec{k}}\right\}\right\rangle_{+}, \quad n_{\vec{k}}=0,1,2,3, \ldots for Bosons,
(b) |{n_( vec(k))}:)_(-),quadn_( vec(k))=0,1\left|\left\{n_{\vec{k}}\right\}\right\rangle_{-}, \quad n_{\vec{k}}=0,1 for Fermions.
In the bosonic case, one defines the creation annihilation operator for a mode vec(k)\vec{k} as
so they behave as the ladder operators of a system of independent harmonic oscillators (one for each mode vec(k)\vec{k} ). In particular, hat(N)_( vec(k))=a_( vec(k))^(†)a_( vec(k))\hat{N}_{\vec{k}}=a_{\vec{k}}^{\dagger} a_{\vec{k}} is the operator counting the number of particles in mode vec(k)\vec{k}.
To arrive at similar creation/annihilation operators in the fermionic case, we first consider a single mode, which can be occupied by no or one particle. In the corresponding basis {|0:),|1:)}\{|0\rangle,|1\rangle\}, the annihilation and creation operator have the following matrix representation
where {A,B}=AB+BA\{A, B\}=A B+B A denotes the anticommutator of two operators. Furthermore, we see that a^(†)a= hat(N)a^{\dagger} a=\hat{N} can again be interpreted as the number operator. It is natural to require anticommutation relations also for creation/annihilation operator corresponding to different modes, i.e.
For the sign factors on the right hand side, which implement the anticommutation relations between operators corresponding to different modes, one chooses an arbitrary ordering of the wave vectors vec(k)\vec{k}.
Both for bosons and fermion, the operator counting the number of particles in mode vec(k)\vec{k} is hat(N)_( vec(k))=a_( vec(k))^(†)a_( vec(k))\hat{N}_{\vec{k}}=a_{\vec{k}}^{\dagger} a_{\vec{k}}, with eigenvalues n_( vec(k))n_{\vec{k}}. The Hamiltonian may then be written as
where epsilon( vec(k))=(ℏ^(2) vec(k)^(2))/(2m)\epsilon(\vec{k})=\frac{\hbar^{2} \vec{k}^{2}}{2 m} for non-relativistic particles. With the formalism of creation and destruction operators at hand, the grand canonical partition function for bosons and fermions, respectively, may now be calculated as follows:
(a) Bosons ("+"):
Applying similar arguments in the fermionic case we find for the expected number densities:
{:(5.32){:[ bar(n)_( vec(k))=(1)/(e^(beta(epsilon( vec(k))-mu))-1)","," for bosons "],[ bar(n)_( vec(k))=(1)/(e^(beta(epsilon( vec(k))-mu))+1)","quad" for fermions. "]:}:}\begin{array}{ll}
\bar{n}_{\vec{k}}=\frac{1}{e^{\beta(\epsilon(\vec{k})-\mu)}-1}, & \text { for bosons } \\
\bar{n}_{\vec{k}}=\frac{1}{e^{\beta(\epsilon(\vec{k})-\mu)}+1}, \quad \text { for fermions. } \tag{5.32}
\end{array}
These distributions are called Bose-Einstein distribution and Fermi-Dirac distribution, respectively. Note that for the case of bosons, we have to require that the chemical potential is lower than the ground state energy, mu < epsilon(0)\mu<\epsilon(0), in order to avoid a diverging particle number. Also note, that the particular form of epsilon( vec(k))\epsilon(\vec{k}) was not important in the derivation. In particular, (5.31) and (5.32) also hold for relativistic particles (see section 5.3). The classical distribution bar(n)_( vec(k))prope^(-beta epsilon( vec(k)))\bar{n}_{\vec{k}} \propto e^{-\beta \epsilon(\vec{k})} is obtained in the limit beta epsilon( vec(k))≫1\beta \epsilon(\vec{k}) \gg 1 i.e. epsilon( vec(k))≫k_(B)T\epsilon(\vec{k}) \gg k_{\mathrm{B}} T, consistent with our experience that quantum effects are usually only important for energies that are small compared to the temperature.
Our aim is now to calculate the canonical partition function Z_(N)^(+-)Z_{N}^{ \pm}[for more details concerning, see e.g. Ch. 7 of M. Kardar: "Statistical Physics of Particles" Cambridge (2007),
which we mostly follow in this section]. Let |{ vec(x)}:)_(+-)|\{\vec{x}\}\rangle_{ \pm}be an eigenbasis of the position operators. Then, with eta in{+,-}\eta \in\{+,-\} :
where Psi_( vec(k))( vec(x))inH_(1)\Psi_{\vec{k}}(\vec{x}) \in \mathcal{H}_{1} are the 1-particle wave functions and
{:(5.36)eta_(sigma)={[1," for bosons "],[sgn(sigma)," for fermions "]:}:}\eta_{\sigma}= \begin{cases}1 & \text { for bosons } \tag{5.36}\\ \operatorname{sgn}(\sigma) & \text { for fermions }\end{cases}
The sum sum_({ vec(k)_(1),dots, vec(k)_(N)})^(')\sum_{\left\{\vec{k}_{1}, \ldots, \vec{k}_{N}\right\}}^{\prime} is restricted in order to ensure that each identical particle state appears only once. We may equivalently work in terms of the occupation number representation |{n_( vec(k))}:)_(+-)\left|\left\{n_{\vec{k}}\right\}\right\rangle_{ \pm}. It is then clear that
where the factor in the unrestricted sum compensates the over-counting. This gives with the formulas for c_(eta)c_{\eta} derived above eta(:{ vec(x)^(')}|rho|{ vec(x)}:)_(eta)=sum_({ vec(k)})(prod_( vec(k))n_( vec(k))!)/(N!)(1)/(prod_( vec(k))n_( vec(k))!N!)sum_(sigma,sigma^(')inS_(N))(eta_(sigma)eta_(sigma^(')))/(Z_(N))e^(-betasum_(i)(ℏ^(2) vec(k)_(i)^(2))/(2m))Psi_(sigma^('){ vec(k)})^(+)({ vec(x)^(')})Psi_(sigma{ vec(k)})({ vec(x)})\eta\left\langle\left\{\vec{x}^{\prime}\right\}\right| \rho|\{\vec{x}\}\rangle_{\eta}=\sum_{\{\vec{k}\}} \frac{\prod_{\vec{k}} n_{\vec{k}}!}{N!} \frac{1}{\prod_{\vec{k}} n_{\vec{k}}!N!} \sum_{\sigma, \sigma^{\prime} \in S_{N}} \frac{\eta_{\sigma} \eta_{\sigma^{\prime}}}{Z_{N}} e^{-\beta \sum_{i} \frac{\hbar^{2} \vec{k}_{i}^{2}}{2 m}} \Psi_{\sigma^{\prime}\{\vec{k}\}}^{+}\left(\left\{\vec{x}^{\prime}\right\}\right) \Psi_{\sigma\{\vec{k}\}}(\{\vec{x}\}).
It is now convenient to work with period boundary conditions instead of the Dirichlet boundary conditions used so far, i.e., we require that Psi(0,y,z)=Psi(L,y,z)\Psi(0, y, z)=\Psi(L, y, z), with LL the length of the cube, and similarly for the yy and zz direction. The normalized eigenmodes are then Psi_( vec(k))=(1)/(sqrtV)e^(i vec(k)* vec(x))\Psi_{\vec{k}}=\frac{1}{\sqrt{V}} e^{i \vec{k} \cdot \vec{x}} with vec(k)=(2pi)/(L)(n_(x),n_(y),n_(z))\vec{k}=\frac{2 \pi}{L}\left(n_{x}, n_{y}, n_{z}\right), where n_(x),n_(y),n_(z)inZn_{x}, n_{y}, n_{z} \in \mathbb{Z}. Considering that the spacing between two wave vectors is (2pi)/(L)\frac{2 \pi}{L} in every direction, we may replace the sum sum_( vec(k))\sum_{\vec{k}} by (V)/((2pi)^(3))intd^(3)k\frac{V}{(2 \pi)^{3}} \int d^{3} k in the limit V rarr ooV \rightarrow \infty, which yields
with the thermal deBroglie wavelength lambda=(h)/(sqrt(2pi mk_(B)T))\lambda=\frac{h}{\sqrt{2 \pi m \mathrm{k}_{\mathrm{B}} T}}. Relabeling the summation indices then results in
Setting vec(x)^(')= vec(x)\vec{x}^{\prime}=\vec{x}, taking intd^(3N)x\int d^{3 N} x on both sides gives, and using tr rho=^(!)1\operatorname{tr} \rho \stackrel{!}{=} 1 gives:
The terms with sigma!=id\sigma \neq \mathrm{id} are suppressed for lambda rarr0\lambda \rightarrow 0 (i.e. for h rarr0h \rightarrow 0 or T rarr ooT \rightarrow \infty ), so the leading order contribution comes from sigma=id\sigma=\mathrm{id}. The next-to leading order corrections come from those sigma\sigma having precisely 1 transposition (there are (N(N-1))/(2)\frac{N(N-1)}{2} of them). A permutation with precisely one transposition corresponds to an exchange of two particles. Neglecting next-to-next-to leading order corrections, the canonical partition function is given by
{:(5.44)P=nk_(B)T(1-eta n(lambda^(3))/(2^((5)/(2)))+dots):}\begin{equation*}
P=n \mathrm{k}_{\mathrm{B}} T\left(1-\eta n \frac{\lambda^{3}}{2^{\frac{5}{2}}}+\ldots\right) \tag{5.44}
\end{equation*}
where n=(N)/(V)n=\frac{N}{V} is the particle density. Comparing to the classical ideal gas, where we had P=nk_(B)TP=n \mathrm{k}_{\mathrm{B}} T, we see that when nlambda^(3)n \lambda^{3} is of order 1 , quantum effects significantly increase the pressure for fermions (eta=-1)(\eta=-1) while they decrease the pressure for bosons (eta=+1)(\eta=+1). As we can see comparing the expression (5.41) with the leading order term in the cluster expansion of the classical gas (see chapter 4.6), this effect is also present for a classical gas to leading order if we include a 2-body potential V( vec(r))\mathcal{V}(\vec{r}), such that
{:(5.45)e^(-betaV( vec(r)))-1=etae^((-2pi vec(r)^(2))/(lambda^(2)))quad(" from "(5.41)):}\begin{equation*}
e^{-\beta \mathcal{V}(\vec{r})}-1=\eta e^{\frac{-2 \pi \vec{r}^{2}}{\lambda^{2}}} \quad(\text { from }(5.41)) \tag{5.45}
\end{equation*}
It follows that for the potential V( vec(r))\mathcal{V}(\vec{r}) it holds
{:(5.46)V( vec(r))=-k_(B)T log[1+etae^(-(2pi vec(r)^(2))/(lambda^(2)))]~~-k_(B)T etae^(-(2pi vec(r)^(2))/(lambda^(2)))","quad" for "r≳lambda:}\begin{equation*}
\mathcal{V}(\vec{r})=-\mathrm{k}_{\mathrm{B}} T \log \left[1+\eta e^{-\frac{2 \pi \vec{r}^{2}}{\lambda^{2}}}\right] \approx-\mathrm{k}_{\mathrm{B}} T \eta e^{-\frac{2 \pi \vec{r}^{2}}{\lambda^{2}}}, \quad \text { for } r \gtrsim \lambda \tag{5.46}
\end{equation*}
A sketch of V( vec(r))\mathcal{V}(\vec{r}) is given in the following picture:
Figure 5.1: The potential V( vec(r))\mathcal{V}(\vec{r}) ocurring in (5.46).
Thus, we can say that quantum effects lead to an effective potential. For fermions the resulting correction to the pressure PP in (5.44) is called degeneracy pressure. Note that according to (5.44) the degeneracy pressure is proportional to k_(B)Tn^(2)lambda^(3)\mathrm{k}_{\mathrm{B}} T n^{2} \lambda^{3} for fermions, which increases strongly for increasing density nn. It provides a mechanism to support very dense objects against gravitational collapse, e.g. in neutron stars.
5.3 Spin Degeneracy
For particles with spin the energy levels have a corresponding gg-fold degeneracy. Since different spin states have the same energy the Hamiltonian is now given by
It is easy to see that for the grand canonical ensemble this results in the following expressions for the expected number densities bar(n)_( vec(k))\bar{n}_{\vec{k}} and the mean energy E_(+-)E_{ \pm}:
In the canonical ensemble we find similar expressions. For a non-relativistic gas we get, with sum_( vec(k))rarr V int(d^(3)k)/((2pi)^(3))\sum_{\vec{k}} \rightarrow V \int \frac{d^{3} k}{(2 \pi)^{3}} for V rarr ooV \rightarrow \infty :
Setting x=(ℏ^(2)k^(2))/(2mk_(B)T)x=\frac{\hbar^{2} k^{2}}{2 m \mathrm{k}_{\mathrm{B}} T} or equivalently k=(2pi^((1)/(2)))/(lambda)x^((1)/(2))k=\frac{2 \pi^{\frac{1}{2}}}{\lambda} x^{\frac{1}{2}} and defining the fugacity z:=e^(beta mu)z:=e^{\beta \mu}, we find
Furthermore, we also have the following relation for the pressure P_(+-)P_{ \pm}and the grand canonical potential G_(+-)=-k_(B)T log Y^(+-)(cfG_{ \pm}=-\mathrm{k}_{\mathrm{B}} T \log Y^{ \pm}(\mathrm{cf}. section 4.4):
Taking the logarithm on both sides and taking a large volume V rarr ooV \rightarrow \infty to approximate the sum by an integral as before yields
{:[(P_(+-))/(k_(B)T)=∓g int(d^(3)k)/((2pi)^(3))log[1∓ze^(-(ℏ^(2)k^(2))/(2mk_(B)T))]],[(5.57)=(g)/(lambda^(3))(4)/(3)sqrtpi^(-1)int_(0)^(oo)(dxx^((3)/(2)))/(z^(-1)e^(x)∓1)]:}\begin{align*}
\frac{P_{ \pm}}{\mathrm{k}_{\mathrm{B}} T} & =\mp g \int \frac{d^{3} k}{(2 \pi)^{3}} \log \left[1 \mp z e^{-\frac{\hbar^{2} k^{2}}{2 m \mathrm{k}_{\mathrm{B}} T}}\right] \\
& =\frac{g}{\lambda^{3}} \frac{4}{3} \sqrt{\pi}^{-1} \int_{0}^{\infty} \frac{d x x^{\frac{3}{2}}}{z^{-1} e^{x} \mp 1} \tag{5.57}
\end{align*}
To go to the last line, we used a partial integration in xx. For z≪1z \ll 1, i.e. mu beta=(mu)/(k_(B)T)≪0\mu \beta=\frac{\mu}{\mathrm{k}_{\mathrm{B}} T} \ll 0 one can expand bar(n)^(+-)\bar{n}^{ \pm}in zz around z=0z=0. Using the relation
int(dxx^(m-1))/(z^(-1)e^(x)-eta)=eta(m-1)!sum_(n=1)^(oo)((eta z)^(n))/(n^(m))\int \frac{d x x^{m-1}}{z^{-1} e^{x}-\eta}=\eta(m-1)!\sum_{n=1}^{\infty} \frac{(\eta z)^{n}}{n^{m}}
(which for eta z=1\eta z=1 yields the Riemann zeta\zeta-function), one finds that
There are two possibilities for the helicity ("spin") of a photon which is either parallel or anti-parallel to vec(p)\vec{p}, corresponding to the polarization of the light. Hence, the degeneracy factor for photons is g=2g=2 and the Hamiltonian is given by
Under normal circumstances there is practically no interaction beween the photons, so the interaction terms indicated by "..." can be neglected in the previous formula. The following picture is a sketch of a 4 -photon interaction, where sigma\sigma denotes the cross section for the corresponding 2-2 scattering process obtained from the computational rules of quantum electrodynamics:
Figure 5.2: Lowest-order Feynman diagram for photon-photon scattering in Quantum Electrodynamics.
The mean collision time of the photons is given by
{:(5.63)(1)/(tau)=(c sigma N)/(V)=c sigma n~~10^(-44)xx n(cm^(3))/((s)):}\begin{equation*}
\frac{1}{\tau}=\frac{c \sigma N}{V}=c \sigma n \approx 10^{-44} \times n \frac{\mathrm{~cm}^{3}}{\mathrm{~s}} \tag{5.63}
\end{equation*}
where N=(: hat(N):)N=\langle\hat{N}\rangle is the average number of photons inside VV and n=N//Vn=N / V their density. Even in extreme places like the interior sun, where T~~10^(7)KT \approx 10^{7} \mathrm{~K}, this leads to a mean collision time of 10^(18)s10^{18} \mathrm{~s}. This is more than the age of the universe, which is approximately 10^(17)10^{17} s. From this we conclude that we can safely treat the photons as an ideal gas!
By the methods of the previous subsection we find for the grand canonical partition function, with mu=0\mu=0 :
since the degeneracy factor is g=2g=2 and photons are bosons. For the Gibbs free energy
(in the limit V rarr ooV \rightarrow \infty ) we get ^(1){ }^{1}
{:[G=-k_(B)T log Y=(2V)/(beta)int(d^(3)p)/((2piℏ)^(3))log(1-e^(-beta cp))=(V(k_(B)T)^(4))/(pi^(2)(ℏc)^(3))int_(0)^(oo)dxx^(2)log(1-e^(-x))],[=(V(k_(B)T)^(4))/(pi^(2)(ℏc)^(3))(-(1)/(3))int_(0)^(oo)(dxx^(3))/(e^(x)-1)=-(V(k_(B)T)^(4))/((ℏc)^(3))(pi^(2))/(45).],[=-2zeta(4)=-(pi^(4))/(45)],[(5.66)=>G=-(4sigma)/(3c)VT^(4).]:}\begin{align*}
& G=-\mathrm{k}_{\mathrm{B}} T \log Y=\frac{2 V}{\beta} \int \frac{d^{3} p}{(2 \pi \hbar)^{3}} \log \left(1-e^{-\beta c p}\right)=\frac{V\left(\mathrm{k}_{\mathrm{B}} T\right)^{4}}{\pi^{2}(\hbar c)^{3}} \int_{0}^{\infty} d x x^{2} \log \left(1-e^{-x}\right) \\
& =\frac{V\left(\mathrm{k}_{\mathrm{B}} T\right)^{4}}{\pi^{2}(\hbar c)^{3}}\left(-\frac{1}{3}\right) \int_{0}^{\infty} \frac{d x x^{3}}{e^{x}-1}=-\frac{V\left(\mathrm{k}_{\mathrm{B}} T\right)^{4}}{(\hbar c)^{3}} \frac{\pi^{2}}{45} . \\
& =-2 \zeta(4)=-\frac{\pi^{4}}{45} \\
& \Rightarrow G=-\frac{4 \sigma}{3 c} V T^{4} . \tag{5.66}
\end{align*}
Here, sigma=5.67 xx10^(-8)((J))/((s)m^(2)K^(4))\sigma=5.67 \times 10^{-8} \frac{\mathrm{~J}}{\mathrm{~s} \mathrm{~m}^{2} \mathrm{~K}^{4}} is the Stefan-Boltzmann constant.
The entropy was defined as S:=-k_(B)tr(rho log rho)S:=-\mathrm{k}_{\mathrm{B}} \operatorname{tr}(\rho \log \rho) with rho=(1)/(Y)e^(-beta H)\rho=\frac{1}{Y} e^{-\beta H}. One easily finds the relation
As an example, for the sun, with T_("sun ")=10^(7)KT_{\text {sun }}=10^{7} \mathrm{~K}, the pressure is P=25,000,000atmP=25,000,000 \mathrm{~atm} and
$$
{:(5.65)zeta(s)=sum_(n >= 1)n^(-s)","quad" for "Re(s) > 1:}\begin{equation*}
\zeta(s)=\sum_{n \geqslant 1} n^{-s}, \quad \text { for } \operatorname{Re}(s)>1 \tag{5.65}
\end{equation*}
foraH-bomb,with$T_("bomb ")=10^(5)K$,thepressureis$P=0.25atm$.From(5.69),(5.70)oneobtainsfor a H-bomb, with $T_{\text {bomb }}=10^{5} \mathrm{~K}$, the pressure is $P=0.25 \mathrm{~atm}$.
From (5.69), (5.70) one obtains
{:(5.72)P=(1)/(3)(E)/(V)quad<=>quad E=3PV:}\begin{equation*}
P=\frac{1}{3} \frac{E}{V} \quad \Leftrightarrow \quad E=3 P V \tag{5.72}
\end{equation*}
$$
This is also known as the Stefan-Boltzmann law.
Let u(nu)u(\nu) be the spectral energy density, i.e., u(nu)d nuu(\nu) d \nu is the contribution to the total energy density due to radiation in the range of frequencies [nu,nu+d nu][\nu, \nu+d \nu]. To derive an expression for this, we recall that the average number of photons with wave vector vec(k)\vec{k} is
Hence, taking the usual conversion from discrete to continuous wave vectors into account, the expected number of photons with wave vector in the range [ vec(k), vec(k)+d vec(k)][\vec{k}, \vec{k}+d \vec{k}] is (2)/(e^(beta chk)-1)V(d^(3)k)/((2pi)^(3))\frac{2}{e^{\beta c h k}-1} V \frac{d^{3} k}{(2 \pi)^{3}}. For the expected number of photons with the modulus of the wave vector in the range [k,k+dk][k, k+d k], we thus get (V)/(pi^(2))(k^(2))/(e^(beta chk)-1)dk\frac{V}{\pi^{2}} \frac{k^{2}}{e^{\beta c h k}-1} d k. The frequency of a wave with wave vector vec(k)\vec{k} is nu=(c)/(2pi)k\nu=\frac{c}{2 \pi} k, so that the number of photons in the frequency range [nu,nu+d nu][\nu, \nu+d \nu] is (8pi V)/(c^(3))(nu^(2))/(e^(beta h nu)-1)d nu\frac{8 \pi V}{c^{3}} \frac{\nu^{2}}{e^{\beta h \nu}-1} d \nu. Multiplication with the energy h nuh \nu per photon and dividing by VV, we obtain the Planck distribution
This is the famous law found by Planck in 1900 which lead to the development of quantum theory! The Planck distribution looks as follows:
Figure 5.3: Sketch of the Planck distribution for different temperatures.
This can be measured by drilling a hole in a cavity and measuring the spectral intensity of the outgoing radiation. An almost perfect black body spectrum is observed in the cosmic microwave background, at T≃2.7KT \simeq 2.7 \mathrm{~K}.
Solving u^(')(nu_(max))=0u^{\prime}\left(\nu_{\max }\right)=0 one finds that the maximum of u(nu)u(\nu) lies at hnu_(max)~~2.82k_(B)Th \nu_{\max } \approx 2.82 \mathrm{k}_{\mathrm{B}} T, a relation also known as Wien's law. The following limiting cases are noteworthy:
(i) h nu≪k_(B)Th \nu \ll \mathrm{k}_{\mathrm{B}} T :
In this case we have
{:(5.75)u(nu)~~(k_(B)Tnu^(2))/(pic^(3)):}\begin{equation*}
u(\nu) \approx \frac{\mathrm{k}_{\mathrm{B}} T \nu^{2}}{\pi c^{3}} \tag{5.75}
\end{equation*}
This formula is valid in particular for h rarr0h \rightarrow 0, i.e. it represents the classical limit. It was known before the Planck formula. It is not only inaccurate for larger frequencies but also fundamentally problematic since it suggests (:H:)=E prop int d nu u(nu)=oo\langle H\rangle=E \propto \int d \nu u(\nu)=\infty, which indicates an instability not seen in reality.
(ii) h nu≫k_(B)Th \nu \gg \mathrm{k}_{\mathrm{B}} T :
Combining this formula with that for the entropy SS, eq. (5.68), gives the relation
{:(5.78)S=(8pi^(4))/(3zeta(3))k_(B)N~~3.6 Nk_(B):}\begin{equation*}
S=\frac{8 \pi^{4}}{3 \zeta(3)} \mathrm{k}_{\mathrm{B}} N \approx 3.6 N \mathrm{k}_{\mathrm{B}} \tag{5.78}
\end{equation*}
where N-=(: hat(N):)N \equiv\langle\hat{N}\rangle is the mean total particle number from above. Thus, for an ideal photon gas we have S=O(1)k_(B)NS=\mathcal{O}(1) \mathrm{k}_{\mathrm{B}} N, i.e. each photon contributes one unit to (S)/(k_(B))\frac{S}{\mathrm{k}_{\mathrm{B}}} on average (see problem B. 18 for an application of this elementary relation).
5.5 Degenerate Bose Gas
Ideal quantum gases of bosonic particles show a particular behavior for low temperature TT and large particle number densities n=((:( hat(N)):))/(V)n=\frac{\langle\hat{N}\rangle}{V}. We first discuss the ideal Bose gas in a finite volume. In this case, the expected particle density was given by
The sum is calculated for sufficiently large volumes again by replacing sum_( vec(k))\sum_{\vec{k}} by V int(d^(3)k)/((2pi)^(3))V \int \frac{d^{3} k}{(2 \pi)^{3}}, which yields
{:[n~~g int(d^(3)k)/((2pi)^(3))(1)/(e^(beta(epsilon( vec(k))-mu))-1)],[(5.80)=(g)/(2pi^(2))int_(0)^(oo)dk(k^(2))/(e^(beta(epsilon( vec(k))-mu))-1)]:}\begin{align*}
n & \approx g \int \frac{d^{3} k}{(2 \pi)^{3}} \frac{1}{e^{\beta(\epsilon(\vec{k})-\mu)}-1} \\
& =\frac{g}{2 \pi^{2}} \int_{0}^{\infty} d k \frac{k^{2}}{e^{\beta(\epsilon(\vec{k})-\mu)}-1} \tag{5.80}
\end{align*}
The particle density is clearly maximal for mu rarr0\mu \rightarrow 0 and its maximal value is given by n_(c)n_{c} where, with epsilon(k)=(ℏ^(2)k^(2))/(2m)\epsilon(k)=\frac{\hbar^{2} k^{2}}{2 m},
{:[n_(c)=(g)/(2pi^(2))int_(0)^(oo)dk(k^(2))/(e^(beta epsilon( vec(k)))-1)],[=(g)/(2pi^(2))((2m)/(betaℏ^(2)))^((3)/(2))int_(0)^(oo)(dxx^(2))/(e^(x^(2))-1)],[=(g)/(2pi^(2))((2m)/(betaℏ^(2)))^((3)/(2))sum_(n=1)^(oo)int_(0)^(oo)dxx^(2)e^(-nx^(2))],[=(g)/(lambda^(3))zeta((3)/(2))]:}\begin{aligned}
n_{c} & =\frac{g}{2 \pi^{2}} \int_{0}^{\infty} d k \frac{k^{2}}{e^{\beta \epsilon(\vec{k})}-1} \\
& =\frac{g}{2 \pi^{2}}\left(\frac{2 m}{\beta \hbar^{2}}\right)^{\frac{3}{2}} \int_{0}^{\infty} \frac{d x x^{2}}{e^{x^{2}}-1} \\
& =\frac{g}{2 \pi^{2}}\left(\frac{2 m}{\beta \hbar^{2}}\right)^{\frac{3}{2}} \sum_{n=1}^{\infty} \int_{0}^{\infty} d x x^{2} e^{-n x^{2}} \\
& =\frac{g}{\lambda^{3}} \zeta\left(\frac{3}{2}\right)
\end{aligned}
and where lambda=sqrt((h^(2))/(2pi mk_(B)T))\lambda=\sqrt{\frac{h^{2}}{2 \pi m \mathrm{k}_{\mathrm{B}} T}} is the thermal deBroglie wavelength. From this wee see that n <= n_(c)n \leqslant n_{c}. For a given density nn, the minimal temperate is (for T < T_(c)T<T_{c}, we would have n > n_(c)n>n_{c} )
{:(5.81)T_(c)=(h^(2))/(2pi mk_(B))((n)/(g zeta((3)/(2))))^((2)/(3)):}\begin{equation*}
T_{c}=\frac{h^{2}}{2 \pi m \mathrm{k}_{\mathrm{B}}}\left(\frac{n}{g \zeta\left(\frac{3}{2}\right)}\right)^{\frac{2}{3}} \tag{5.81}
\end{equation*}
Equilibrium states with higher densities n > n_(c)n>n_{c} are not possible at finite volume. A new phenomenon happens, however, for infinite volume, i.e. in the thermodynamic limit, V rarr ooV \rightarrow \infty. Here, we must be careful because density matrices are only formal (e.g. the partition function Y rarr ooY \rightarrow \infty ), so it is better to characterize equilibrium states by the so-called KMS condition (for Kubo-Martin-Schwinger) for equilibrium states. As we will see, new interesting equilibrium states that can be found in this way in the thermodynamic limit. They correspond to a Bose-condensate, or a gas in a superfluid state.
In the present context, the KMS condition for a Gibbs state (:dots:)\langle\ldots\rangle for the ideal Bose gas is simply
which was already derived earlier. We can put this relation in a more convenient form recalling that in the case of no spin (g=1)(g=1) we had the commutation relations [a_( vec(k)),a_( vec(p))^(+)]=\left[a_{\vec{k}}, a_{\vec{p}}^{+}\right]=delta_( vec(k), vec(p))\delta_{\vec{k}, \vec{p}} for the creation/destruction operators. From this it follows that
So far, we are still at finite volume VV. In the thermodynamic limit (infinite volume), V rarr ooV \rightarrow \infty, we should make the replacements
finite volume: vec(k)in((pi )/(L)Z)^(3)\vec{k} \in\left(\frac{\pi}{L} \mathbb{Z}\right)^{3} and {[a_( vec(k))],[delta_( vec(k), vec(p))]longrightarrow:}\left\{\begin{array}{l}a_{\vec{k}} \\ \delta_{\vec{k}, \vec{p}}\end{array} \longrightarrow\right. infinite volume: vec(k)inR^(3)\vec{k} \in \mathbb{R}^{3} and {[a( vec(k))],[delta^(3)( vec(k)- vec(p))]:}\left\{\begin{array}{l}a(\vec{k}) \\ \delta^{3}(\vec{k}-\vec{p})\end{array}\right.
Thus, we expect that in the thermodynamic limit:
In that limit, the statistical operator rho\rho of the grand canonical ensemble does not make mathematical sense, because e^(-beta H+beta mu hat(N))e^{-\beta H+\beta \mu \hat{N}} does not have a finite trace (i.e. Y=ooY=\infty ). Nevertheless, the KMS condition (5.84) still makes perfect sense. We view it as the appropriate substitute for the notion of Gibbs state in the thermodynamic limit. There, it can be possible to get new equilibrium states at given temperature TT and chemical potential mu\mu that are described by certain additional "order parameters", and that are impossible at finite volume. We think of these as describing different phases.
To see this concretely in the case of the ideal Bose gas, we must therefore ask: What are the solutions of the KMS-condition (5.84)? For mu < 0\mu<0 the unique solution is the usual Bose-Einstein distribution:
The point is that for mu=0\mu=0 other solutions are also possible, for instance
for some n_(0) >= 0n_{0} \geqslant 0 (this follows from (:A^(+)A:) >= 0\left\langle A^{+} A\right\rangle \geqslant 0 for operators AA in any state). The particle number density in the thermodynamic limit (V rarr oo)(V \rightarrow \infty) is best expressed in terms of the creation operators at sharp position vec(x)\vec{x} :
The particle number density at the point vec(x)\vec{x} is then defined as hat(N)( vec(x)):=a^(†)( vec(x))a( vec(x))\hat{N}(\vec{x}):=a^{\dagger}(\vec{x}) a(\vec{x}) and therefore we have, for mu=0\mu=0 :
Thus, in this equilibrium state we have a macroscopically large occupation number n_(0)n_{0} of the zero mode causing a different particle density at mu=0\mu=0. The fraction of zero modes, that is, that of the modes in the "condensate", can be written using our definition of T_(c)T_{c} as
for TT below T_(c)T_{c}, and n_(0)=0n_{0}=0 above T_(c)T_{c}. The formation of the condensate can thereby be seen as a phase transition at T=T_(c)T=T_{c}.
We can also write down more general solutions to the KMS-condition, for example:
where ff is any harmonic function, i.e. a function such that vec(grad)^(2)f=0\vec{\nabla}^{2} f=0. To understand the physical meaning of these states, we define the particle current operator vec(j)( vec(x))\vec{j}(\vec{x}) as
An example of a harmonic function is f( vec(x))=1+im vec(v)* vec(x)f(\vec{x})=1+i m \vec{v} \cdot \vec{x}, and in this case one finds the expectation value
This means that the condensate flows in the direction of vec(v)\vec{v} without leaving equilibrium. Another solution is f( vec(x))=f(x,y,z)=x+iyf(\vec{x})=f(x, y, z)=x+i y. In this case one finds
(: vec(j)(x,y,z):)=(1)/(m)(-y,x,0)\langle\vec{j}(x, y, z)\rangle=\frac{1}{m}(-y, x, 0)
describing a circular motion around the origin (vortex). The condensate can hence flow or form vortices without leaving equilibrium. This phenomenon goes under the name of superfluidity.
Chapter 6
The Laws of Thermodynamics
The laws of thermodynamics predate the ideas and techniques from statistical mechanics, and are, to some extent, simply consequences of more fundamental ideas derived in statistical mechanics. However, they are still in use today, mainly because:
(i) they are easy to remember.
(ii) they are to some extent universal and model-independent.
(iii) microscopic descriptions are sometimes not known (e.g. black hole thermodynamics) or are not well-developed (non-equilibrium situations).
(iv) they are useful!
The laws of thermodynamics are based on:
(o) The empirical evidence that systems approach a new thermal equilibrium state after being pushed out of equilibrium by an external influence.
(i) The empirical evidence that, for a very large class of macroscopic systems, equilibrium states can generally be characterized by very few parameters. These thermodynamic parameters, often called X_(1),dots,X_(n)X_{1}, \ldots, X_{n} in the following, can hence be viewed as "coordinates" on the space of equilibrium systems.
(ii) The idea to perform mechanical work on a system, or to bring equilibrium systems into "thermal contact" with reservoirs in order to produce new equilibrium states in a controlled way. The key idea here is that these changes (e.g. by "heating up a system" through contact with a reservoir system) should be extremely gentle so that the system is not pushed out of equilibrium too much. One thereby imagines that one can describe such a gradual change of the system by a succession of equilibrium states, i.e. a curve in the space of coordinates X_(1),dots,X_(n)X_{1}, \ldots, X_{n} characterizing the different equilibrium states. This idealized notion of an infinitely gentle/slow change is often referred to as "quasi-static".
(iii) Given the notions of quasi-static changes in the space of equilibrium states, one can then postulate certain rules guided by empirical evidence that tell us which kind of changes should be possible, and which ones should not. These are, in essence, the laws of thermodynamics. For example, one knows that if one has access to equilibrium systems at different temperature, then one system can perform work on the other system. The first and second law state more precise conditions about such processes and imply, respectively, the existence of an energy- and entropy function on equilibrium states. The zeroth law just states that being in thermal equilibrium with each other is an equivalence relation for systems, i.e. in particular transitive. It implies the existence of a temperature function labelling the different equivalence classes.
6.1 The Zeroth Law
0^("th ")0^{\text {th }} law of thermodynamics: If two subsystems I,II are separately in thermal contact with a third system, III, then they are in thermal equilibrium with each other.
The 0^("th ")0^{\text {th }} law implies the existence of a function
Theta:{" equilibrium systems "}rarrR\Theta:\{\text { equilibrium systems }\} \rightarrow \mathbb{R}
such that Theta\Theta is equal for systems in thermal equilibrium with each other. To see this, let us imagine that the equilibrium states of the systems I,II and III are parametrized by some coordinates {A_(1),A_(2),dots},{B_(1),B_(2),dots}\left\{A_{1}, A_{2}, \ldots\right\},\left\{B_{1}, B_{2}, \ldots\right\} and {C_(1),C_(2),dots}\left\{C_{1}, C_{2}, \ldots\right\}. Since a change in I implies a corresponding change in III, there must be a constraint ^(1){ }^{1}
we can proceed by noting that for {A_(1),A_(2),dots,B_(1),B_(2),dots}\left\{A_{1}, A_{2}, \ldots, B_{1}, B_{2}, \ldots\right\} which satisfy the last equation, (6.3) must be satisfied for any {C_(2),C_(3),dots}\left\{C_{2}, C_{3}, \ldots\right\} ! Thus, we let III be our reference system and set {C_(2),C_(3),dots}\left\{C_{2}, C_{3}, \ldots\right\} to any convenient but fixed value. This reduces the condition (6.4) for equilibrium between I and II to:
$$
By bringing this system (for V rarr ooV \rightarrow \infty ) in contact with any other system, we can measure the (absolute) temperature of the latter. For example, one can define the triple point of the system water-ice-vapor to be at 273.16 K . Together with the definition of k_(B)=1.4 xx10^(-23)((J))/((K))\mathrm{k}_{\mathrm{B}}=1.4 \times 10^{-23} \frac{\mathrm{~J}}{\mathrm{~K}} ) this then defines, in principle, the Kelvin temperature scale. Of course in practice the situation is more complicated because ideal gases do not exist.
Figure 6.1: The triple point of ice water and vapor in the (P,T)(P, T) phase diagram
The Zeroth Law implies in particular: The temperature of a system in equilibrium is constant throughout the system. This has to be the case since subsystems obtained by imaginary walls are in equilibrium with each other, see the following figure:
Figure 6.2: A large system divided into subsystems I and II by an imaginary wall.
6.2 The First Law
1^("st ")1^{\text {st }} law of thermodynamics: The amount of work required to change adiabatically a thermally isolated system from an initial state ii to a final state ff depends only on ii and ff, not on the path of the process.
Figure 6.3: Change of system from initial state ii to final state ff along two different paths.
Here, by an 'adiabatic change", one means a change without heat exchange. Consider a particle moving in a potential. By fixing an arbitrary reference point X_(0)X_{0}, we can define an energy landscape
{:(6.7)E(X)=int_(X_(0))^(X)delta W:}\begin{equation*}
E(X)=\int_{X_{0}}^{X} \delta W \tag{6.7}
\end{equation*}
where the integral is along any path connecting X_(0)X_{0} with XX, and where X_(0)X_{0} is a reference point corresponding to the zero of energy. delta W\delta W is the infinitesimal change of work done along the path. In order to define more properly the notion of such integrals of "infinitesimals", we will now make a short mathematical digression on differential forms.
Differentials ("differential forms")
A 1-form (or differential) is an expression of the form
so the integral of an exact 1-form only depends on the beginning and endpoint of the path. An example of a curve gamma:[0,1]rarrR^(2)\gamma:[0,1] \rightarrow \mathbb{R}^{2} is given in the following figure:
Figure 6.4: A curve gamma:[0,1]rarrR^(2)\gamma:[0,1] \rightarrow \mathbb{R}^{2}.
The converse is also true: The integral is independent of the path gamma\gamma if and only if there
exists a function ff on R^(N)\mathbb{R}^{N}, such that df=alpha\mathrm{d} f=\alpha, or equivalently, if and only if alpha_(i)=(del f)/(delX_(i))\alpha_{i}=\frac{\partial f}{\partial X_{i}}.
The notion of a pp-form generalizes that of a 1 -form. It is an expression of the form
where alpha_(i_(1)dotsi_(p))\alpha_{i_{1} \ldots i_{p}} are (smooth) functions of the coordinates X_(i)X_{i}. We declare the dX_(i)\mathrm{d} X_{i} to anticommute,
where sigma\sigma is any permutation of pp elements and sgns g n is its signum (see the discussion of fermions in the chapter on the ideal quantum gas). We may now introduce an operator d with the following properties:
(i) d(fg)=dfg+(-1)^(p)fdg\mathrm{d}(f g)=\mathrm{d} f g+(-1)^{p} f \mathrm{~d} g if ff is a pp form and gg a qq form,
(ii) d(lambda f+eta g)=lambdadf+etadg\mathrm{d}(\lambda f+\eta g)=\lambda \mathrm{d} f+\eta \mathrm{d} g if f,gf, g are pp forms and lambda,eta\lambda, \eta are constants.
(iii) df=sum_(i)(del f)/(delX_(i))dX_(i)\mathrm{d} f=\sum_{i} \frac{\partial f}{\partial X_{i}} \mathrm{~d} X_{i} for 0 -forms ff,
(iv) d^(2)X_(i)=0\mathrm{d}^{2} X_{i}=0
On scalars (i.e. 0 -forms) the operator d is defined as before, and the rules (i)-(iv) then determine it for any pp-form. The relation (6.13) can be interpreted as saying that we should think of the differentials dX_(i),i=1,dots,N\mathrm{d} X_{i}, i=1, \ldots, N as "fermionic-" or "anti-commuting variables". ^(2){ }^{2} For instance, we then get for a 1-form alpha\alpha :
The expression for dalpha\mathrm{d} \alpha of a pp-form follows similarly by applying the rules (i)-(iv). The rules imply the most important relation for pp forms,
Conversely, it can be shown that for any p+1p+1 form ff on R^(N)\mathbb{R}^{N} such that df=0\mathrm{d} f=0 we must have f=dalphaf=\mathrm{d} \alpha for some pp-form alpha\alpha. This result is often referred to as the Poincare\mathbf{P o i n c a r e} lemma. An important and familiar example for this from field theory is provided by force fields vec(f)\vec{f} on R^(3)\mathbb{R}^{3}. The components f_(i)f_{i} of the force field may be identified with the components of a 1-form called F=sumf_(i)dX_(i)F=\sum f_{i} \mathrm{~d} X_{i}. The condition dF=0\mathrm{d} F=0 is seen to be equivalent to vec(grad)xx vec(f)=0\vec{\nabla} \times \vec{f}=0, i.e. we have a conservative force field. Poincaré's lemma implies the existence of a potenial -W-\mathcal{W}, such that F=-dWF=-\mathrm{d} \mathcal{W}; in vector notation, vec(f)=- vec(grad)W\vec{f}=-\vec{\nabla} \mathcal{W}. A similar statement is shown to hold for pp-forms
Just as a 1-form can be integrated over oriented curves (1-dimensional surfaces), a pp form can be integrated over an oriented pp-dimensional surface Sigma\Sigma. If that surface is parameterized by NN functions X_(i)(t_(1),dots,t_(p))X_{i}\left(t_{1}, \ldots, t_{p}\right) of pp parameters (t_(1),dots,t_(p))in U subR^(p)\left(t_{1}, \ldots, t_{p}\right) \in U \subset \mathbb{R}^{p} (the ordering of which defines an orientation of the surface), we define the corresponding integral as
The value of this integral is independent of the chosen parameterization up to a sign which corresponds to our choice of orientation. The most important fact pertaining to integrals of differential forms is Gauss' theorem (also called Stokes' theorem in this context):
In particular, the integral of a form dalpha\mathrm{d} \alpha vanishes if the boundary del Sigma\partial \Sigma of Sigma\Sigma is empty.
Using the language of differentials, the 1^("st ")1^{\text {st }} law of thermodynamics may also be stated as saying that, in the absence of heat exchange, the infinitesimal work is an exact 1-form,
{:(6.20)dE=delta W:}\begin{equation*}
\mathrm{d} E=\delta W \tag{6.20}
\end{equation*}
This relation is best viewed as the definition of the infinitesimal heat change delta Q\delta Q. Thus, we could say that the first law is just energy conservation, where energy can consist of either mechanical work or heat. We may then write
from which it can be seen that delta Q\delta Q is a 1-form depending on the variables (E,X_(1),dots,X_(n))\left(E, X_{1}, \ldots, X_{n}\right).
An overview over several thermodynamic forces and displacements is given in the following table:
System
Force J_(i)J_{i}
Displacement X_(i)X_{i}
wire
tension FF
length LL
film
surface tension tau\tau
area AA
fluid/gas
pressure PP
volume VV
magnet
magnetic field vec(B)\vec{B}
magnetization vec(M)\vec{M}
electricity
electric field vec(E)\vec{E}
polarization vec(Pi)\vec{\Pi}
stat. potential phi\phi
charge qq
chemical
chemical potential mu\mu
particle number NN
System Force J_(i) Displacement X_(i)
wire tension F length L
film surface tension tau area A
fluid/gas pressure P volume V
magnet magnetic field vec(B) magnetization vec(M)
electricity electric field vec(E) polarization vec(Pi)
stat. potential phi charge q
chemical chemical potential mu particle number N| System | Force $J_{i}$ | Displacement $X_{i}$ |
| :---: | :---: | :---: |
| wire | tension $F$ | length $L$ |
| film | surface tension $\tau$ | area $A$ |
| fluid/gas | pressure $P$ | volume $V$ |
| magnet | magnetic field $\vec{B}$ | magnetization $\vec{M}$ |
| electricity | electric field $\vec{E}$ | polarization $\vec{\Pi}$ |
| | stat. potential $\phi$ | charge $q$ |
| chemical | chemical potential $\mu$ | particle number $N$ |
Table 6.1: Some thermodynamic forces and displacements for various types of systems.
Since delta Q\delta Q is not an exact differential (in particular ddelta Q!=0\mathrm{d} \delta Q \neq 0 ) we have
So, there does not exist a function Q=Q(V,A,N,dots)Q=Q(V, A, N, \ldots) such that delta Q=dQ\delta Q=\mathrm{d} Q ! Traditionally, one refers to processes where delta Q!=0\delta Q \neq 0 as "non-adiabatic", i.e. heat is transferred.
6.3 The Second Law
2^("nd ")\mathbf{2}^{\text {nd }} law of thermodynamics (Kelvin): There are no processes in which heat goes over from a reservoir, is completely converted to other forms of energy, and nothing else happens.
One important consequence of the 2^("nd ")2^{\text {nd }} law is the existence of a state function SS, called entropy. As before, we denote the nn "displacement variables" generically by X_(i)inX_{i} \in{V,N,dots}\{V, N, \ldots\} and the "forces" by J_(i)in{-P,mu,dots}J_{i} \in\{-P, \mu, \ldots\}, and consider equilibrium states labeled by (E,{X_(i)})\left(E,\left\{X_{i}\right\}\right) in an n+1n+1-dimensional space. We consider within this space the "adiabatic" submanifold A\mathcal{A} of all states that can be reached from a given state (E^(**),{X_(i)^(**)})\left(E^{*},\left\{X_{i}^{*}\right\}\right) by means of a reversible and quasi-static (i.e. sufficiently slowly performed) process. On this submanifold we must have
i.e. int_(gamma)delta Q=0\int_{\gamma} \delta Q=0 for any closed curve gamma\gamma in A\mathcal{A}. Otherwise there would exist processes disturbing the energy balance (through the exchange of heat), and we could then choose a sign of delta Q\delta Q such that work is performed on a system by converting heat energy into work, which is impossible by the 2^("nd ")2^{\text {nd }} law.
We choose a (not uniquely defined) function SS labeling different submanifolds A\mathcal{A} :
Figure 6.5: Sketch of the submanifolds A\mathcal{A}.
This means that dS\mathrm{d} S is proportional to dE-sum_(i=1)^(n)J_(i)dX_(i)\mathrm{d} E-\sum_{i=1}^{n} J_{i} \mathrm{~d} X_{i}. Thus, at each point (E,{X_(i)})\left(E,\left\{X_{i}\right\}\right) there is a function Theta(E,X_(1),dots,X_(n))\Theta\left(E, X_{1}, \ldots, X_{n}\right) such that
Theta\Theta can be identified with the temperature T[K]T[\mathrm{~K}] for suitable choice of S=S(E,X_(1),dots,X_(n))S=S\left(E, X_{1}, \ldots, X_{n}\right),
which then uniquely defines SS. This is seen for instance by comparing the coefficients in
{:(6.29)(1)/(T)=(del S(E,{X_(j)}))/(del E)quad" and "quad-(J_(i))/(T)=(del S(E,{X_(j)}))/(delX_(i)):}\begin{equation*}
\frac{1}{T}=\frac{\partial S\left(E,\left\{X_{j}\right\}\right)}{\partial E} \quad \text { and } \quad-\frac{J_{i}}{T}=\frac{\partial S\left(E,\left\{X_{j}\right\}\right)}{\partial X_{i}} \tag{6.29}
\end{equation*}
We recognize the first of those relations as the defining relation for temperature, whereas the second includes the definitions of the pressure and chemical potential. These relations were already stated in the microcanocical ensemble (cf. section 4.2.1.). We can now rewrite (6.26) as
By comparing this formula with that for energy conservation for a process without heat transfer, we identify
{:(6.31)delta Q=" heat transfer "=TdS quad=>quaddS=(delta Q)/(T)quad" (noting that "d(delta Q)!=0!" ). ":}\begin{equation*}
\delta Q=\text { heat transfer }=T \mathrm{~d} S \quad \Rightarrow \quad \mathrm{~d} S=\frac{\delta Q}{T} \quad \text { (noting that } d(\delta Q) \neq 0!\text { ). } \tag{6.31}
\end{equation*}
Equation (6.30), which was derived for quasi-static processes, is the most important equation in thermodynamics.
Example: As an illustration, we calculate the adiabatic curves A\mathcal{A} for an ideal gas. The defining relation is, with n=1n=1 and X_(1)=VX_{1}=V in this case
0=dE+PdV0=\mathrm{d} E+P \mathrm{~d} V
Since PV=Nk_(B)TP V=N \mathrm{k}_{\mathrm{B}} T and E=(3)/(2)Nk_(B)TE=\frac{3}{2} N \mathrm{k}_{\mathrm{B}} T for the ideal gas, we find
{:(6.33)0=dE+(2)/(3)(E)/(V)dV:}\begin{equation*}
0=\mathrm{d} E+\frac{2}{3} \frac{E}{V} \mathrm{~d} V \tag{6.33}
\end{equation*}
Thus, we can parametrize the adiabatic A\mathcal{A} by E=E(V)E=E(V), such that dE=(del E(V))/(del V)dV\mathrm{d} E=\frac{\partial E(V)}{\partial V} \mathrm{~d} V on A\mathcal{A}. We then obtain
which hold generally (cf. section 4.2 .1 , eq. (4.17)). For an ideal gas (PV=Nk_(B)T:}\left(P V=N \mathrm{k}_{\mathrm{B}} T\right. and E=(3)/(2)Nk_(B)TE=\frac{3}{2} N \mathrm{k}_{\mathrm{B}} T ) we thus find
{:[-(del E)/(del V)V=k_(B)N(del E)/(del S)],[E=(3)/(2)k_(B)N(del E)/(del S)]:}\begin{aligned}
-\frac{\partial E}{\partial V} V & =\mathrm{k}_{\mathrm{B}} N \frac{\partial E}{\partial S} \\
E & =\frac{3}{2} \mathrm{k}_{\mathrm{B}} N \frac{\partial E}{\partial S}
\end{aligned}
i.e. the formula found before in the micro canonical ensemble for a suitable choice of the entropy S^(**)=S(E^(**),V^(**))S^{*}=S\left(E^{*}, V^{*}\right) at the reference point. ^(3){ }^{3} The entropy at the reference point depends on NN and on the microscopic parameter of the system (i.e. the particle mass mm ), which clearly cannot be determined in the present context, since this information is neither contained in the equations of state, nor, of course, in the first law of thermodynamics.
6.4 Cyclic processes
6.4.1 The Carnot Engine
We next discuss the Carnot engine for an ideal (mono atomic) gas. As discussed in section 4.2., the ideal gas is characterized by the relations:
{:(6.42)E=(3)/(2)Nk_(B)T=(3)/(2)PV.:}\begin{equation*}
E=\frac{3}{2} N \mathrm{k}_{\mathrm{B}} T=\frac{3}{2} P V . \tag{6.42}
\end{equation*}
We consider the cyclic process consisting of the following steps: IrarrII:quad\mathrm{I} \rightarrow \mathrm{II}: \quad isothermal expansion at T=T_(H)T=T_{H},
II rarr\rightarrow III: adiabatic expansion (delta Q=0)(\delta Q=0),
III rarr\rightarrow IV: isothermal compression at T=T_(C)T=T_{C},
IV rarr\rightarrow I: adiabatic compression
where we assume T_(H) > T_(C)T_{H}>T_{C}.
We want to work out the efficiency eta\eta, which is defined as
is the total heat added to the system (analogously, DeltaQ_(out)=int_(III)^(IV)delta Q\Delta Q_{\mathrm{out}}=\int_{I I I}^{I V} \delta Q is the total heat given off by system into a colder reservoir), and where
Delta W=ointdelta W=(int_(I)^(II)+int_(II)^(III)+int_(III)^(IV)+int_(IV)^(I))delta W\Delta W=\oint \delta W=\left(\int_{I}^{I I}+\int_{I I}^{I I I}+\int_{I I I}^{I V}+\int_{I V}^{I}\right) \delta W
is the total work done by the system. We may also write delta Q=TdS\delta Q=T \mathrm{~d} S and delta W=PdV\delta W=P d V (or more generally delta W=-sum_(i=1)^(n)J_(i)dX_(i)\delta W=-\sum_{i=1}^{n} J_{i} d X_{i} if other types of mechanical/ chemical work are performed by the system). By definition no heat exchange takes place during II rarr\rightarrow III and IV rarr\rightarrow I.
We now wish to calculate eta_("Carnot ")\eta_{\text {Carnot }}. We can for instance take PP and VV as the variables to describe the process. We have PV=P V= const. for isothermal processes by (6.42). To calculate the adiabatics, we could use the results from above and change the variables from (E,V)rarr(P,V)(E, V) \rightarrow(P, V) using (6.42), but it is just as easy to do this from scratch: We start with delta Q=0\delta Q=0 for an adiabatic process. From this follows that
{:(6.44)0=dE+PdV:}\begin{equation*}
0=\mathrm{d} E+P \mathrm{~d} V \tag{6.44}
\end{equation*}
Since on adiabatics we may take P=P(V)P=P(V), this yields
This fundamental relation for the efficiency of a Carnot cycle can be derived also using the variables (T,S)(T, S) instead of (P,V)(P, V), which also reveals the distinguished role played by this process. As dT=0\mathrm{d} T=0 for isotherms and dS=0\mathrm{d} S=0 for adiabatic processes, the Carnot cycle is just a rectangle in the TT - SS-diagram:
Figure 6.8: The Carnot cycle in the (T,S)(T, S)-diagram.
We evidently have for the total heat added to the system:
{:(6.52)DeltaQ_(in)=int_(I)^(II)delta Q=int_(I)^(II)TdS=T_(H)(S_(II)-S_(I)):}\begin{equation*}
\Delta Q_{\mathrm{in}}=\int_{I}^{I I} \delta Q=\int_{I}^{I I} T d S=T_{H}\left(S_{I I}-S_{I}\right) \tag{6.52}
\end{equation*}
To compute Delta W\Delta W, the total mechanical work done by the system, we observe that (as ointdE=0)\oint \mathrm{d} E=0)
{:[Delta W=ointdelta W=ointPdV],[=oint(PdV+dE)],[=ointTdS.]:}\begin{aligned}
\Delta W & =\oint \delta W=\oint P \mathrm{~d} V \\
& =\oint(P \mathrm{~d} V+\mathrm{d} E) \\
& =\oint T \mathrm{~d} S .
\end{aligned}
If AA is the domain enclosed by the rectangular curve describing the process in the T-ST-S diagram, Gauss' theorem gives
{:[Delta W=ointTdS=int_(A)d(TdS)=int_(A)dTdS],[=(T_(H)-T_(C))(S_(II)-S_(I))]:}\begin{aligned}
\Delta W & =\oint T \mathrm{~d} S=\int_{A} \mathrm{~d}(T \mathrm{~d} S)=\int_{A} \mathrm{~d} T \mathrm{~d} S \\
& =\left(T_{H}-T_{C}\right)\left(S_{I I}-S_{I}\right)
\end{aligned}
from which it immediately follows that the efficiency eta_("Carnot ")\eta_{\text {Carnot }} is given by
as before. Since T_(H) > T_(C)T_{H}>T_{C}, the efficiency can never be 100%100 \%.
6.4.2 General Cyclic Processes
Consider now the more general cycle given by the curve CC in the (T,S)(T, S)-diagram depicted in the figure below:
Figure 6.9: A generic cyclic process in the (T,S)(T, S)-diagram.
We define C_(+-)C_{ \pm}to be the part of of the boundary curve CC where heat is injected resp. given off. Then we have dS > 0\mathrm{d} S>0 on C_(+)C_{+}and dS < 0\mathrm{d} S<0 on C_(-)C_{-}. For such a process, we define the efficiency eta=eta(C)\eta=\eta(C) as before by the ratio of net work Delta W\Delta W and injected heat DeltaQ_(in)\Delta Q_{\mathrm{in}} :
The quantities Delta W\Delta W and DeltaQ_("in ")\Delta Q_{\text {in }} are then calculated as
{:[Delta W=-oint_(C)delta W=oint_(C)(TdS-dE)=oint_(C)TdS],[DeltaQ_("in ")=int_(C_(+))TdS]:}\begin{aligned}
\Delta W & =-\oint_{C} \delta W=\oint_{C}(T \mathrm{~d} S-\mathrm{d} E)=\oint_{C} T \mathrm{~d} S \\
\Delta Q_{\text {in }} & =\int_{C_{+}} T \mathrm{~d} S
\end{aligned}
from which it follows that the efficiency eta=eta(C)\eta=\eta(C) is given by
{:(6.55)eta=(oint_(C)T(d)S)/(int_(C_(+))T(d)S)=1+(int_(C_(-))T(d)S)/(int_(C_(+))T(d)S)=1-(DeltaQ_(out))/(DeltaQ_(in)):}\begin{equation*}
\eta=\frac{\oint_{C} T \mathrm{~d} S}{\int_{C_{+}} T \mathrm{~d} S}=1+\frac{\int_{C_{-}} T \mathrm{~d} S}{\int_{C_{+}} T \mathrm{~d} S}=1-\frac{\Delta Q_{\mathrm{out}}}{\Delta Q_{\mathrm{in}}} \tag{6.55}
\end{equation*}
Now, if the curve CC is completely contained between two isotherms at temperatures T_(H) > T_(C)T_{H}>T_{C}, as in the above figure, then
{:[0 <= int_(C_(+))TdS <= T_(H)int_(C_(+))dS quad(" as "dS < 0" on "C_(+))","],[int_(C_(-))TdS <= T_(C)int_(C_(-))dS <= 0quad(" as "dS <= 0" on "C_(-)).]:}\begin{aligned}
0 \leqslant & \int_{C_{+}} T \mathrm{~d} S \leqslant T_{H} \int_{C_{+}} \mathrm{d} S \quad\left(\text { as } \mathrm{d} S<0 \text { on } C_{+}\right), \\
& \int_{C_{-}} T \mathrm{~d} S \leqslant T_{C} \int_{C_{-}} \mathrm{d} S \leqslant 0 \quad\left(\text { as } \mathrm{d} S \leqslant 0 \text { on } C_{-}\right) .
\end{aligned}
The efficiency eta_(C)\eta_{C} of our general cycle CC can now be estimated as
where we used the above inequalities as well as 0=ointdS=int_(C_(+))dS+int_(C_(-))dS0=\oint \mathrm{d} S=\int_{C_{+}} \mathrm{d} S+\int_{C_{-}} \mathrm{d} S. Thus, we conclude that an arbitrary process is always less efficient than the Carnot process. This is why the Carnot process plays a distinguished role.
We can get a more intuitive understanding of this important finding by considering the following process:
The heat DeltaQ_("in ")\Delta Q_{\text {in }} is given by DeltaQ_("in ")=T_(H)Delta S\Delta Q_{\text {in }}=T_{H} \Delta S, and as before Delta W=int_(C)TdS=int_(A)dTdS\Delta W=\int_{C} T \mathrm{~d} S=\int_{A} \mathrm{~d} T \mathrm{~d} S. Thus, Delta W\Delta W is the area AA enclosed by the closed curve CC. This is clearly smaller than the area enclosed by the corresponding Carnot cycle (dashed rectangle). Now divide a general cyclic process into C=C_(1)uuC_(2)C=C_{1} \cup C_{2}, as sketched in the following figure:
Figure 6.10: A generic cyclic process divided into two parts by an isotherm at temperature T_(I)T_{I} 。
This process describes two cylic processes acting one after the other, where the heat dropped during cycle C_(1)C_{1} is injected during cycle C_(2)C_{2} at temperature T_(I)T_{I}. It follows from the discussion above that
{:(6.57)eta(C_(2))=(DeltaW_(2))/(DeltaQ_(2," in ")) <= (T_(I)-T_(C))/(T_(I))=1-(T_(C))/(T_(I)):}\begin{equation*}
\eta\left(C_{2}\right)=\frac{\Delta W_{2}}{\Delta Q_{2, \text { in }}} \leqslant \frac{T_{I}-T_{C}}{T_{I}}=1-\frac{T_{C}}{T_{I}} \tag{6.57}
\end{equation*}
which means that the cycle C_(2)C_{2} is less efficient than the Carnot process acting between temperatures T_(I)T_{I} and T_(C)T_{C}. It remains to show that the cycle C_(1)C_{1} is also less efficient than the Carnot cycle acting betweeen temperatures T_(H)T_{H} and T_(I)T_{I}. The work DeltaW_(1)\Delta W_{1} done along C_(1)C_{1} is again smaller than the area enclosed by the latter Carnot cycle, i.e. we have DeltaW_(1) <= (T_(H)-T_(I))Delta S\Delta W_{1} \leqslant\left(T_{H}-T_{I}\right) \Delta S. Furthermore, we must have DeltaQ_(1," in ") >= DeltaQ_(1," out ")=T_(I)Delta S\Delta Q_{1, \text { in }} \geqslant \Delta Q_{1, \text { out }}=T_{I} \Delta S, which yields
Thus, the cycle C_(1)C_{1} is less efficient than the Carnot cycle acting between temperatures T_(H)T_{H} and T_(I)T_{I}. It follows that the cycle C=C_(1)uuC_(2)C=C_{1} \cup C_{2} must be less efficient than the Carnot cycle acting between temperatures T_(H)T_{H} and T_(C)T_{C}.
6.4.3 The Diesel Engine
Another example of a cyclic process is the Diesel engine. The idealized version of this process consists of the following 4 steps: IrarrII\mathrm{I} \rightarrow \mathrm{II} : isentropic (adiabatic) compression
II rarr\rightarrow III: reversible heating at constant pressure
III rarr\rightarrow IV: adiabatic expansion with work done by the expanding fluid
IV rarr\rightarrow I: reversible cooling at constant volume
Figure 6.11: The process describing the Diesel engine in the (P,V)(P, V)-diagram.
As before, we define the thermal efficiency to be
eta_("Diesel ")=(Delta W)/(DeltaQ_(in))=((int_(I)^(II)+int_(II)^(III)+int_(III)^(IV)+int_(IV)^(I))T(d)S)/(int_(II)^(III)T(d)S)\eta_{\text {Diesel }}=\frac{\Delta W}{\Delta Q_{\mathrm{in}}}=\frac{\left(\int_{I}^{I I}+\int_{I I}^{I I I}+\int_{I I I}^{I V}+\int_{I V}^{I}\right) T \mathrm{~d} S}{\int_{I I}^{I I I} T \mathrm{~d} S}
As in the discussion of the Carnot process we use an ideal gas, with V=Nk_(B)T,E=V=N \mathrm{k}_{\mathrm{B}} T, E=(3)/(2)PV\frac{3}{2} P V, and dE=TdS-PdV\mathrm{d} E=T \mathrm{~d} S-P \mathrm{~d} V. Since dS=0\mathrm{d} S=0 on the paths I rarr\rightarrow II and III rarrIV\rightarrow \mathrm{IV}, it follows
that
{:(6.58)eta_("Diesel ")=1+(int_(IV)^(I)T(d)S)/(int_(II)^(III)T(d)S):}\begin{equation*}
\eta_{\text {Diesel }}=1+\frac{\int_{I V}^{I} T \mathrm{~d} S}{\int_{I I}^{I I I} T \mathrm{~d} S} \tag{6.58}
\end{equation*}
Using (6.42), the integrals in this expression are easily calculated as
{:[int_(IV)^(I)TdS=int_(IV)^(I)(dE+PdV)=int_(IV)^(I)((3)/(2)VdP+(5)/(2)Pubrace((d)Vubrace)_(=0))],[=(3)/(2)Nk_(B)(T_(I)-T_(IV))","],[int_(II)^(III)TdS=int_(II)^(III)(dE+PdV)=int_(II)^(III)((3)/(2)Vubrace((d)Pubrace)_(=0)+(5)/(2)PdV)],[=(5)/(2)Nk_(B)(T_(III)-T_(II))","]:}\begin{aligned}
\int_{I V}^{I} T \mathrm{~d} S & =\int_{I V}^{I}(\mathrm{~d} E+P \mathrm{~d} V)=\int_{I V}^{I}(\frac{3}{2} V \mathrm{~d} P+\frac{5}{2} P \underbrace{\mathrm{~d} V}_{=0}) \\
& =\frac{3}{2} N \mathrm{k}_{\mathrm{B}}\left(T_{I}-T_{I V}\right), \\
\int_{I I}^{I I I} T \mathrm{~d} S & =\int_{I I}^{I I I}(\mathrm{~d} E+P \mathrm{~d} V)=\int_{I I}^{I I I}(\frac{3}{2} V \underbrace{\mathrm{~d} P}_{=0}+\frac{5}{2} P \mathrm{~d} V) \\
& =\frac{5}{2} N \mathrm{k}_{\mathrm{B}}\left(T_{I I I}-T_{I I}\right),
\end{aligned}
which means that the efficiency eta_("Diesel ")\eta_{\text {Diesel }} is given by
The first law can be rewritten in terms of other "thermodynamic potentials", which are sometimes useful, and which are naturally related to different equilibrium ensembles.
We start from the 1^("st ")1^{\text {st }} law of thermodynamics in the form
By (6.60) EE is naturally viewed as a function of (S,V,N)(S, V, N) (or more generally of SS and {X_(i)}\left\{X_{i}\right\} ). To get a thermodynamic potential that naturally depends on ( T,V,NT, V, N ) (or more generally, TT and {X_(i)}\left\{X_{i}\right\} ), we form the free energy
{:(6.61)F=E-TS". ":}\begin{equation*}
F=E-T S \text {. } \tag{6.61}
\end{equation*}
By the first of these equations, the entropy S=S(T,V,N)S=S(T, V, N) is naturally a function of (T,V,N)(T, V, N), which suggests a relation between FF and the canonical ensemble. As discussed in section 4.3, in this ensemble we have
In the first of these equations, SS is naturally viewed as a function of the variables (T,mu,V)(T, \mu, V), suggesting a relationship between GG and the grand canonical ensemble. As discussed in section 4.4 , in this ensemble we have
We now seek a function GG satisfying S=-(del G)/(del T)|_(mu,V)S=-\left.\frac{\partial G}{\partial T}\right|_{\mu, V} and N=-(del G)/(del mu)|_(T,V)N=-\left.\frac{\partial G}{\partial \mu}\right|_{T, V}. An easy calculation reveals
The second relation can be demonstrated in a similar way (with N=(: hat(N):)N=\langle\hat{N}\rangle ). To get a function HH which naturally depends on the variables (P,T,N)(P, T, N), we form the free enthalpy (or Gibbs potential)
The free ^(4){ }^{4} enthalpy is often used in the context of chemical processes, because these naturally occur at constant atmospheric pressure. For processes at constant pressure PP (isobaric processes) we have
{:(6.74)dH=-SdT+mudN:}\begin{equation*}
\mathrm{d} H=-S \mathrm{~d} T+\mu \mathrm{d} N \tag{6.74}
\end{equation*}
Assuming that the entropy S=S(E,V,N_(i),dots)S=S\left(E, V, N_{i}, \ldots\right) is an extensive quantity, we can derive relations between the various potentials. The extensive property of SS means that
{:(6.75)S(lambda E,lambda V,lambdaN_(i))=lambda S(E,V,N_(i))","quad" for "lambda > 0:}\begin{equation*}
S\left(\lambda E, \lambda V, \lambda N_{i}\right)=\lambda S\left(E, V, N_{i}\right), \quad \text { for } \lambda>0 \tag{6.75}
\end{equation*}
Taking the partial derivative (del)/(del lambda)\frac{\partial}{\partial \lambda} of this expression gives
dH=-SdT+VdP+mudN\mathrm{~d} H=-S \mathrm{~d} T+V \mathrm{~d} P+\mu \mathrm{d} N
T,P,NT, P, N
"Thermodynamic
potential" Definition First Law "Natural
variables"
entropy S fundamental TdS=dE+PdV-mudN E,V,N
free energy F F=E-TS dF=-SdT-PdV+mudN T,V,N
grand potential G G=E-TS-mu N dG=-SdT-PdV-Ndmu T,V,mu
free enthalpy H H=E-TS+PV dH=-SdT+VdP+mudN T,P,N| Thermodynamic <br> potential | Definition | First Law | Natural <br> variables |
| :---: | :---: | :---: | :---: |
| entropy $S$ | fundamental | $T \mathrm{~d} S=\mathrm{d} E+P \mathrm{~d} V-\mu \mathrm{d} N$ | $E, V, N$ |
| free energy $F$ | $F=E-T S$ | $\mathrm{~d} F=-S \mathrm{~d} T-P \mathrm{~d} V+\mu \mathrm{d} N$ | $T, V, N$ |
| grand potential $G$ | $G=E-T S-\mu N$ | $\mathrm{~d} G=-S \mathrm{~d} T-P \mathrm{~d} V-N \mathrm{~d} \mu$ | $T, V, \mu$ |
| free enthalpy $H$ | $H=E-T S+P V$ | $\mathrm{~d} H=-S \mathrm{~d} T+V \mathrm{~d} P+\mu \mathrm{d} N$ | $T, P, N$ |
Table 6.2: Relationship between various thermodynamic potentials
The relationship between the various potentials can be further elucidated by means of the Legendre transform. This characterization is important because it makes transparent the convexity respectively concavity properties of G,FG, F following from the convexity of SS.
Example: Virial expansion and van der Waals equation of state
As an example for the use of potentials, we employ the cluster expansion discussed in section 4.6 to derive the equation of state for a realistic monoatomic gas. Calculations are left as an exercise (problem B.26).
where YY is the grand canonical partition function, lambda\lambda is the thermal de Broglie wavelength, and z=e^(beta mu)z=\mathrm{e}^{\beta \mu} is the fugacity. Using the Gibbs-Duhem relation and (6.70), the cluster expansion gives an expansion of the pressure,
This is known as the virial expansion, and the B_(l)B_{l} are known as the virial coefficients. The first order contribution yields the equation of state of the classical monoatomic ideal gas. It corresponds to the situation where the particles do not interact. For a realistic dilute gas, it is reasonable to expect that low orders in the expansion give a good approximation; we consider second order. At this order, we obtain
{:(6.80)P~~k_(B)Tn(1+B_(2)(T)n).:}\begin{equation*}
P \approx k_{B} T n\left(1+B_{2}(T) n\right) . \tag{6.80}
\end{equation*}
Let us compute B_(2)B_{2} under the assumption that the interaction is described by the spherically symmetric two-body potential
where r_(0)r_{0} is twice the radius of the atoms. This potential models a hard core repulsion at atomic distances and a moderately decreasing attraction at large distances. Using the integral formula (4.119) for b_(2)b_{2}, in the high temperature limit we find
is the effective volume of one particle. Plugging this into the second order virial expansion (6.80), we get
P~~k_(B)Tn+(V_(a))/(2)(k_(B)T-u_(0))n^(2).P \approx k_{B} T n+\frac{V_{a}}{2}\left(k_{B} T-u_{0}\right) n^{2} .
Substituting
1+x~~(1)/(1-x)1+x \approx \frac{1}{1-x}
this can be written in the form
{:(6.81)(P+a((N)/(V))^(2))(V-Nb)~~Nk_(B)T:}\begin{equation*}
\left(P+a\left(\frac{N}{V}\right)^{2}\right)(V-N b) \approx N k_{B} T \tag{6.81}
\end{equation*}
Equation (6.81) is known as the van-der-Waals equation. The coefficient bb can be interpreted as a volume per particle by which the volume available to the system is reduced due to exclusion of particles. Since r_(0)r_{0} is the distance of minimal approach, i.e., twice the radius of the particles, bb amounts to twice the effective volume of the particles. The coefficient aa decreases the pressure PP in the system due attractive particle interaction. In applications we should bear in mind that the equation can be exptected to give a good approximation in case the volume per particle available is much larger than the effective volume of the particles, V≫b//2V \gg b / 2, and for high temperature, u_(0)≪k_(B)Tu_{0} \ll k_{B} T.
6.6 Chemical Equilibrium
We consider chemical reactions characterized by a kk-tuple r_=(r_(1),dots,r_(k))\underline{r}=\left(r_{1}, \ldots, r_{k}\right) of integers corresponding to a chemical reaction of the form
is described by chi_(1)=C,chi_(2)=O_(2),chi_(3)=CO_(2)\chi_{1}=\mathrm{C}, \chi_{2}=\mathrm{O}_{2}, \chi_{3}=\mathrm{CO}_{2} and r_(1)=-1,r_(2)=-1,r_(3)=+1r_{1}=-1, r_{2}=-1, r_{3}=+1, or r_=\underline{r}=(-1,-1,+1)(-1,-1,+1). The full system is described by some complicated Hamiltonian H(V)H(V) and number operators hat(N)_(i)\hat{N}_{i} for the ii-th compound. Since the dynamics can change the particle number, we will have [H(V), hat(N)_(i)]!=0\left[H(V), \hat{N}_{i}\right] \neq 0 in general. We imagine that an entropy S(E,V,{N_(i)})S\left(E, V,\left\{N_{i}\right\}\right) can be assigned to an ensemble of states with energy between E-Delta EE-\Delta E and EE, and average particle numbers {N_(i)=(: hat(N)_(i):)}\left\{N_{i}=\left\langle\hat{N}_{i}\right\rangle\right\}, but we note that the definition of SS in microscopic terms is far from obvious because hat(N)_(i)\hat{N}_{i} is not a constant of motion.
The entropy should be maximized in equilibrium. Since N_=(N_(1),dots,N_(k))\underline{N}=\left(N_{1}, \ldots, N_{k}\right) changes
by r_=(r_(1),dots,r_(k))\underline{r}=\left(r_{1}, \ldots, r_{k}\right) in a reaction, the necessary condition for equilibrium is
Since by definition (del S)/(delN_(i))|_(V,E)=-(mu_(i))/(T)\left.\frac{\partial S}{\partial N_{i}}\right|_{V, E}=-\frac{\mu_{i}}{T}, in equilibrium we must have
Let us now assume that in equilibrium we can use the expression for mu_(i)\mu_{i} of an ideal gas with kk distinguishable components and N_(i)N_{i} indistinguishable particles of the ii-th component. This is basically the assumption that interactions contribute negligibly to the entropy of the equilibrium state. According to the discussion in section 4.2.3 the total entropy is given by
{:(6.85)S=sum_(i=1)^(k)S_(i)+Delta S:}\begin{equation*}
S=\sum_{i=1}^{k} S_{i}+\Delta S \tag{6.85}
\end{equation*}
where S_(i)=S(E_(i),V_(i),N_(i))S_{i}=S\left(E_{i}, V_{i}, N_{i}\right) is the entropy of the ii-th species, Delta S\Delta S is the mixing entropy, and we have
where c_(i)=(N_(i))/(N)c_{i}=\frac{N_{i}}{N} is the concentration of the ii-th component. Let bar(mu)_(i)\bar{\mu}_{i} be the chemical potential of the ii-th species without taking into account the contribution due to the mixing:
with Delta h=sum_(i)r_(i)h_(i)=\Delta h=\sum_{i} r_{i} h_{i}= enthalpy increase for one reaction. The above relation is sometimes called the "mass-action law". It is clearly in general not an exact relation, because we have treated the constituents as ideal gases. Nevertheless, it is often a surprisingly good approximation.
6.7 Phase Co-Existence and Clausius-Clapeyron Relation
We consider a system comprised of kk compounds with particle numbers N_(1),dots,N_(k)N_{1}, \ldots, N_{k}. It is assumed that chemical reactions are not possible, so each N_(i)N_{i} is conserved. The entropy is assumed to be given as a function of S=S(X_)S=S(\underline{X}), where X_=(E,V,N_(1),dots,N_(k))\underline{X}=\left(E, V, N_{1}, \ldots, N_{k}\right) (here we also include EE into the thermodynamic coordinates.) We assume that the system is in an equilibrium state with varphi\varphi coexisting pure phases which are labeled by alpha=1,dots,varphi\alpha=1, \ldots, \varphi. The equilibrium state for each phase alpha\alpha is thus characterized by some vector X_^((alpha))\underline{X}^{(\alpha)}, or rather the corresponding ray {lambdaX_^((alpha))∣lambda > 0}\left\{\lambda \underline{X}^{(\alpha)} \mid \lambda>0\right\} since we can scale up the volume, energy, and numbers of particles by a positive constant.
Examples:
Consider the following example of a phase boundary between coffee and sugar:
Figure 6.12: The phase boundary between solution and a solute.
In this example we have k=2k=2 compounds (coffee, sugar) with varphi=2\varphi=2 coexisting phases (solution, sugar at bottom). The solid phase and coffee/sugar solution phases are described by vectors
respectively. In this example we need 2 independent parameters to describe phase equilibrium, such as the temperature TT of the coffee and the concentration cc of sugar, i.e. sweetness of the coffee.
2) Another example is the ice-vapor-water diagram where we only have k=1k=1 substance (water). At the triple point, we have varphi=3\varphi=3 coexisting phases. At the water-vapor boundary, we have varphi=2\varphi=2 coexisting phases, and we need one parameter to fix where we are on this phase boundary. Away from any phase boundary, only varphi=1\varphi=1 phase is possible.
The temperature TT, pressure PP, and chemical potentials mu_(i)\mu_{i} must have the same value in each phase, i.e. we have for all alpha\alpha :
As an example consider the following phase diagram for 6 phases:
Figure 6.13: Imaginary phase diagram for the case of 6 different phases. At each point on a phase boundary which is not an intersection point, varphi=2\varphi=2 phases are supposed to coexist. At each intersection point varphi=4\varphi=4 phases are supposed to coexist.
From the discussion in the previous sections we know that
(1) SS is extensive in equilibrium:
as long as sum_(alpha)lambda^((alpha))=1,lambda^((alpha)) >= 0\sum_{\alpha} \lambda^{(\alpha)}=1, \lambda^{(\alpha)} \geqslant 0. Since the coexisting phases are in equilibrium with each other, we must have "=" rather than " << " in the above inequality. Otherwise, the entropy would be maximized for some non-trivial linear combination X__(min)=\underline{X}_{\min }=sum_(alpha)lambda^((alpha))X_^((alpha))\sum_{\alpha} \lambda^{(\alpha)} \underline{X}^{(\alpha)}, and only one homogeneous phase given by this minimizer X__(min)\underline{X}_{\min } could be realized.
By (1) and (2) it follows that in the region C subR^(2+k)C \subset \mathbb{R}^{2+k}, where several phases can coexist, SS is linear, S(X_)=xi _*X_S(\underline{X})=\underline{\xi} \cdot \underline{X} for all X_in C,xi _=\underline{X} \in C, \underline{\xi}= const. in CC, and CC consists of positive linear combinations
in other words, the coexistence region CC is a convex cone generated by the vectors X_^((alpha)),alpha=1,dots,varphi\underline{X}^{(\alpha)}, \alpha=1, \ldots, \varphi. The set of points in the space (P,T,{c_(i)})\left(P, T,\left\{c_{i}\right\}\right) where equilibrium between varphi\varphi phases holds (i.e. the phase boundaries in a P-T-{c_(i)}P-T-\left\{c_{i}\right\}-diagram) can be characterized as follows. Since xi _\underline{\xi} is constant within the convex cone CC, we have for any X_in C\underline{X} \in C and any alpha=1,dots,varphi\alpha=1, \ldots, \varphi and any II :
where we denote the k+2k+2 components of X_\underline{X} by {X_(I)}\left\{X_{I}\right\}. Multiplying this equation by dX_(I)\mathrm{d} X_{I} and summing over II, this relation can be written as
which must hold in the coexistence region CC. Since the equation must hold for all alpha=1,dots,varphi\alpha=1, \ldots, \varphi, the coexistence region is is subject to varphi\varphi constraints, and we therefore need f=(2+k-varphi)f=(2+k-\varphi) parameters to describe the coexistence region in the phase diagram. This statement is sometimes called the Gibbs phase rule.
Examples: Consider again the example of a phase boundary between coffee and sugar, where we had k=2k=2 compounds (coffee, sugar) with varphi=2\varphi=2 coexisting phases (solution, sugar at bottom). The phase rule tells us that we need f=2+2-2=2f=2+2-2=2 independent parameters to describe phase equilibrium, which is correct. In the ice-vapor-water diagram we only had k=1k=1 substance (water). At the triple point, we have varphi=3\varphi=3 coexisting phases and f=1+2-3=0f=1+2-3=0, which is consistent because a point is a 0 -dimensional manifold. At the water-ice coexistence line, we have varphi=2\varphi=2 and f=1+2-2=1f=1+2-2=1, which is the correct dimension of a line.
Now consider a 1-component system (k=1)(k=1), such that X_=(E,N,V)\underline{X}=(E, N, V) and xi _=\underline{\xi}=((1)/(T),(P)/(T),-(mu )/(T))\left(\frac{1}{T}, \frac{P}{T},-\frac{\mu}{T}\right) The varphi\varphi different phases are described by
We assume that the particle numbers are equal in both phases, N^((1))=N^((2))-=NN^{(1)}=N^{(2)} \equiv N, which
means that f=2+k-varphi=1f=2+k-\varphi=1.Thus,
As an application, consider a solid (phase 1) in equilibrium with its vapor (phase 2). For the volume we should have V^((1))≪V^((2))V^{(1)} \ll V^{(2)}, from which it follows that Delta V=V^((1))-V^((2))~~\Delta V=V^{(1)}-V^{(2)} \approx-V^((2))-V^{(2)}. For the vapor phase, we assume the relations for an ideal gas, PV^((2))=k_(B)TN^((2))=P V^{(2)}=\mathrm{k}_{\mathrm{B}} T N^{(2)}=k_(B)TN\mathrm{k}_{\mathrm{B}} T N. Substitution for PP gives
{:(6.102)(dP)/((d)T)=(Delta Q)/(N)(P)/(k_(B)T^(2))","quad" with "Delta Q=-Delta S*T:}\begin{equation*}
\frac{\mathrm{d} P}{\mathrm{~d} T}=\frac{\Delta Q}{N} \frac{P}{\mathrm{k}_{\mathrm{B}} T^{2}}, \quad \text { with } \Delta Q=-\Delta S \cdot T \tag{6.102}
\end{equation*}
Assuming Delta q=(Delta Q)/(N)\Delta q=\frac{\Delta Q}{N} to be roughly independent of TT, we obtain
Figure 6.14: Phase boundary of a vapor-solid system in the (P,T)(P, T)-diagram
6.8 Osmotic Pressure
We consider a system made up of two compounds and define
{:[N_(1)=" particle number of "ions" (solute) "],[N_(2)=" particle number of "water molecules" (solvent). "]:}\begin{aligned}
& N_{1}=\text { particle number of "ions" (solute) } \\
& N_{2}=\text { particle number of "water molecules" (solvent). }
\end{aligned}
The corresponding chemical potentials are denoted mu_(1)\mu_{1} and mu_(2)\mu_{2}. The grand canonical partition function,
where Y_(N_(1))Y_{N_{1}} is the grand canonical partition function for substance 2 with a fixed number N_(1)N_{1} of particles of substance 1^(5)1^{5}. Let now y_(N):=(1)/(V)(Y_(N))/(Y_(0))y_{N}:=\frac{1}{V} \frac{Y_{N}}{Y_{0}}. It then follows that
{:(6.106)log Y=log Y_(0)+Vy_(1)(ubrace(mu_(2),betaubrace)_({:[" no "V" dependence for large "],[" systems as free energy "],[G=-k_(B)T log Y∼V]:}))e^(betamu_(1))+O(e^(2betamu_(2))):}\log Y=\log Y_{0}+V y_{1}(\underbrace{\mu_{2}, \beta}_{\begin{array}{c}
\text { no } V \text { dependence for large } \tag{6.106}\\
\text { systems as free energy } \\
G=-\mathrm{k}_{\mathrm{B}} T \log Y \sim V
\end{array}}) e^{\beta \mu_{1}}+\mathcal{O}\left(e^{2 \beta \mu_{2}}\right)
For the (expected) particle number of substance 1 we therefore have
Using e^(betamu_(1))=(n_(1))/(y_(1))+O(n_(1)^(2))e^{\beta \mu_{1}}=\frac{n_{1}}{y_{1}}+\mathcal{O}\left(n_{1}^{2}\right), which follows from (6.109), we get
{:(6.111)P(mu_(2),N_(1),T)=P(mu_(2),N_(1)=0,T)+k_(B)Tn_(1)+O(n_(1)^(2)):}\begin{equation*}
P\left(\mu_{2}, N_{1}, T\right)=P\left(\mu_{2}, N_{1}=0, T\right)+\mathrm{k}_{\mathrm{B}} T n_{1}+\mathcal{O}\left(n_{1}^{2}\right) \tag{6.111}
\end{equation*}
Here we note that y_(1)y_{1}, which in general is hard to calculate, fortunately does not appear on the right hand side at this order of approximation.
Consider now two copies of the system called A and B, separated by a wall which leaves through water, but not the ions of the solute. The concentration n_(1)^((A))n_{1}^{(A)} of ions on one side of the wall need not be equal to the concentration n_(1)^((B))n_{1}^{(B)} on the other side. So we have different pressures P^((A))P^{(A)} and P^((B))P^{(B)}. Their difference is
hence, writing Delta n=n_(1)^((A))-n_(1)^((B))\Delta n=n_{1}^{(A)}-n_{1}^{(B)}, we obtain the osmotic formula, due to van 't Hoff:
{:(6.112)Delta P=k_(B)T Delta n:}\begin{equation*}
\Delta P=\mathrm{k}_{\mathrm{B}} T \Delta n \tag{6.112}
\end{equation*}
In the derivation of this formula we neglected terms of the order n_(1)^(2)n_{1}^{2}, which means that the formula is valid only for dilute solutions!
Appendix A
Dynamical Systems and Approach to Equilibrium
A. 1 The Master Equation
In this section, we will study a toy model for dynamically evolving ensembles (i.e. non-stationary ensembles) [in this section, we follow mostly Ch. 6 of "Physique Statistique" by A. Georges, M. Mézard, École Polytechnique (2010)]. We will not start from a Hamiltonian description of the dynamics, but rather work with a phenomenological description which is already probabilistic. In this approach, the ensemble is described by a time-dependent probability distribution {p_(n)(t)}\left\{p_{n}(t)\right\}, where p_(n)(t)p_{n}(t) is the probability of the system to be in state nn at time tt. Since p_(n)(t)p_{n}(t) are to be probabilities, we evidently should have sum_(i=1)^(N)p_(i)(t)=1,p_(i)(t) >= 0\sum_{i=1}^{N} p_{i}(t)=1, p_{i}(t) \geqslant 0 for all tt.
We assume that the time dependence is determined by the dynamical law
where T_(ij) > 0T_{i j}>0 is the transition amplitude for going from state jj to the state ii per unit of time. We call this law the "master equation." As already discussed in sec. 3.2 , the master equation can be thought of as a version of the Boltzmann equation. In the context of quantum mechanics, the transition amplitudes T_(ij)T_{i j} induced by some small perturbation of the dynamics H_(1)H_{1} would e.g. be given by Fermi's golden rule, {:T_(ij)=2pi n//h|(:i|H_(1)|j:)|^(2)\left.T_{i j}=2 \pi n / h\left|\langle i| H_{1}\right| j\right\rangle\left.\right|^{2} and would therefore be symmetric in ii and j,T_(ij)=T_(ji)j, T_{i j}=T_{j i}. In this section, we do not assume that the transition amplitude is symmetric as this would exclude interesting examples.
It is instructive to check that the master equation has the desired property of keeping p_(i)(t) >= 0p_{i}(t) \geqslant 0 and sum_(i)p_(i)(t)=1\sum_{i} p_{i}(t)=1. The first property is seen as follows. Suppose that t_(0)t_{0} is the first time that some p_(i)(t_(0))=0p_{i}\left(t_{0}\right)=0. From the structure of the master equation, it then follows
that dp_(i)(t_(0))//dt > 0d p_{i}\left(t_{0}\right) / d t>0, unless in fact all p_(j)(t_(0))=0p_{j}\left(t_{0}\right)=0. This is impossible, because the sum of the probabilities equal to 1 for all times. Indeed,
An equilibrium state corresponds to a distribution {p_(i)^("eq ")}\left\{p_{i}^{\text {eq }}\right\} which is constant in time and is a solution to the master equation, i.e.
An important special case is the case of symmetric transition amplitudes. We are in this case for example if the underlying microscopic dynamics is reversible. In that case, the uniform distribution p_(i)^("eq ")=(1)/(N)p_{i}^{\text {eq }}=\frac{1}{N} is always stationary (micro canonical ensemble).
Example: Time evolution of a population of bacteria
Consider a population of some kind of bacteria, characterized by the following quantities:
{:[n=" number of bacteria in the population "],[M=" mortality rate "],[R=" reproduction rate "],[p_(n)(t)=" probability that the population consists of "n" bacteria at instant "t]:}\begin{aligned}
n & =\text { number of bacteria in the population } \\
M & =\text { mortality rate } \\
R & =\text { reproduction rate } \\
p_{n}(t) & =\text { probability that the population consists of } n \text { bacteria at instant } t
\end{aligned}
{:(A.6)R(n-1)p_(n-1)^(eq)+M(n+1)p_(n+1)^(eq)=(R+M)np_(n)^(eq)","quad" with "n >= 1" and "p_(1)=0:}\begin{equation*}
R(n-1) p_{n-1}^{\mathrm{eq}}+M(n+1) p_{n+1}^{\mathrm{eq}}=(R+M) n p_{n}^{\mathrm{eq}}, \quad \text { with } n \geqslant 1 \text { and } p_{1}=0 \tag{A.6}
\end{equation*}
It follows by induction that in this example the only possible equilibrium state is given by
{:(A.7)p_(n)^(eq)={[1," if "n=0],[0," if "n >= 1]:}:}p_{n}^{\mathrm{eq}}= \begin{cases}1 & \text { if } n=0 \tag{A.7}\\ 0 & \text { if } n \geqslant 1\end{cases}
i.e. we have equilibrium if and only if all bacteria are dead.
{:(A.9)X_(ij)={[T_(ij)," if "i!=j],[-sum_(k!=i)T_(ki)," if "i=j]:}:}\mathcal{X}_{i j}= \begin{cases}T_{i j} & \text { if } i \neq j \tag{A.9}\\ -\sum_{k \neq i} T_{k i} & \text { if } i=j\end{cases}
We immediately find that X_(ij) >= 0\mathcal{X}_{i j} \geqslant 0 for all i!=ji \neq j and X_(ii) <= 0\mathcal{X}_{i i} \leqslant 0 for all ii. We can obtain X_(ii) < 0\mathcal{X}_{i i}<0 if we assume that for each ii there is at least one state jj with nonzero transition amplitude T_(ij)T_{i j}. We make this assumption from now on. The formal solution of (A.8) is given by the following matrix exponential:
(We also assume that the total number NN of states is finite).
We would now like to understand whether there must always exist an equilibrium state, and if so, how it is approached. An equilibrium distribution must satisfy 0=0=sum_(j)X_(ij)p_(j)^("eq ")\sum_{j} \mathcal{X}_{i j} p_{j}^{\text {eq }}, which is possible if and only if the matrix X\mathcal{X} has a zero eigenvalue. Thus, we must have some information about the eigenvalues of X\mathcal{X}. We note that this matrix need
not be symmetric, so its eigenvalues, EE, need not be real, and we are not necessarily able to diagonalize it! Nevertheless, it turns out that the master equation gives us a sufficient amount of information to understand the key features of the eigenvalue distribution. If we define the evolution matrix A(t)A(t) by
then, since A(t)A(t) maps element-wise positive vectors p_=(p_(1),dots,p_(N))\underline{p}=\left(p_{1}, \ldots, p_{N}\right) to vectors with the same property, it easily follows that A_(ij)(1) >= 0A_{i j}(1) \geqslant 0 for all i,ji, j. Hence, by the PerronFrobenius theorem, the eigenvector v_\underline{v} of A(1)A(1) whose eigenvalue lambda_("max ")\lambda_{\text {max }} has the largest real part must be element wise positive, v_(i) >= 0v_{i} \geqslant 0 for all ii, and lambda_(max)\lambda_{\max } must be real and positive,
This (up to a rescaling) unique vector v_\underline{v} must also be an eigenvector of X\mathcal{X}, with real eigenvalue log lambda_(max)=E_(max)\log \lambda_{\max }=E_{\max }. We next show that any eigenvalue EE of X\mathcal{X} (possibly inC\in \mathbb{C} ) has Re(E) <= 0\operatorname{Re}(E) \leqslant 0 by arguing as follows: Let w_\underline{w} be an eigenvector of X\mathcal{X} with eigenvalue EE, i.e. Xw_=Ew_\mathcal{X} \underline{w}=E \underline{w}. Then
which follows from the triangle inequality and X_(ij) >= 0\mathcal{X}_{i j} \geqslant 0 for i!=ji \neq j. Taking the sum sum_(i)\sum_{i} and using (A.9) then yields sum_(i)(X_(ii)+|E-X_(ii)|)|w_(i)| <= 0\sum_{i}\left(\mathcal{X}_{i i}+\left|E-\mathcal{X}_{i i}\right|\right)\left|w_{i}\right| \leqslant 0 and therefore (X_(ii)+|E-X_(ii)|)|w_(i)| <= 0\left(\mathcal{X}_{i i}+\left|E-\mathcal{X}_{i i}\right|\right)\left|w_{i}\right| \leqslant 0 for at least one ii. Since X_(ii) < 0\mathcal{X}_{i i}<0, this is impossible unless Re(E) <= 0\operatorname{Re}(E) \leqslant 0. Then it follows that E_(max) <= 0E_{\max } \leqslant 0 and then also lambda_(max) <= 1\lambda_{\max } \leqslant 1. We would now like to argue that E_(max)=0E_{\max }=0, in fact. Assume on the contrary E_(max) < 0E_{\max }<0. Then
which is impossible as evolution preserves sum_(i)v_(i)(t) > 0\sum_{i} v_{i}(t)>0. From this we conclude that E_(max)=0E_{\max }=0, or Xv_=0\mathcal{X} \underline{v}=0, and thus
is an equilibrium distribution. This equilibrium distribution is unique (from the PerronFrobenius theorem). Since any other eigenvalue EE of X\mathcal{X} must have Re(E) < 0\operatorname{Re}(E)<0, any distribution {p_(i)(t)}\left\{p_{i}(t)\right\} must approach this equilibrium state. We summarize our findings:
There exists a unique equilibrium distribution {p_(j)^("eq ")}\left\{p_{j}^{\text {eq }}\right\}.
Any distribution {p_(i)(t)}\left\{p_{i}(t)\right\} obeying the master equation must approach equilibrium as |p_(j)(t)-p_(j)^(eq)|=O(e^(-t//tau_("relax ")))\left|p_{j}(t)-p_{j}^{\mathrm{eq}}\right|=\mathcal{O}\left(e^{-t / \tau_{\text {relax }}}\right) for all states jj, where the relaxation timescale is given by tau_("relax ")=-1//E_(1)\tau_{\text {relax }}=-1 / E_{1}, where E_(1) < 0E_{1}<0 is largest non-zero eigenvalue of X\mathcal{X}.
where epsilon_(i)\epsilon_{i} is the energy of the state ii. Equation (A.16) is called the detailed balance condition. It is easy to see that it implies
p_(i)^(eq)=e^(-betaepsilon_(i))//Zp_{i}^{\mathrm{eq}}=e^{-\beta \epsilon_{i}} / Z
Thus, in this case, the unique equilibrium distribution is the canonical ensemble, which was motivated already in chapter 4.
If the detailed balance condition is fulfilled, we may pass from X_(ij)\mathcal{X}_{i j}, which need not be symmetric, to a symmetric (hence diagonalizable) matrix by a change of the basis as follows. If we set q_(i)(t)=p_(i)(t)e^((betaE_(i))/(2))q_{i}(t)=p_{i}(t) e^{\frac{\beta E_{i}}{2}}, we get
is now symmetric. We can diagonalize it with real eigenvalues lambda_(n) <= 0\lambda_{n} \leqslant 0 and real eigenvectors w_^((n))\underline{w}^{(n)}, so that tilde(X)w_^((n))=lambda_(n)w_^((n))\tilde{\mathcal{X}} \underline{w}^{(n)}=\lambda_{n} \underline{w}^{(n)}. The eigenvalue lambda_(0)=0\lambda_{0}=0 again corresponds to equilibrium and w_(i)^((0))prope^(-betaepsilon_(i)//2)w_{i}^{(0)} \propto e^{-\beta \epsilon_{i} / 2}. Then we can write
where c_(n)=q(0)*w_^((n))c_{n}=q(0) \cdot \underline{w}^{(n)} are the Fourier coefficients. We see again that p_(i)(t)p_{i}(t) converges to the equilibrium state exponentially with relaxation time -(1)/(lambda_(1)) < oo-\frac{1}{\lambda_{1}}<\infty, where lambda_(1) < 0\lambda_{1}<0 is the largest non-zero eigenvalue of tilde(X)\tilde{\mathcal{X}}.
A. 3 Relaxation time vs. ergodic time
We come back to the question why one never observes in practice that a macroscopically large system returns to its initial state. We discuss this in a toy model consisting of NN
spins. A state of the system is described by a configuration CC of spins:
The system has 2^(N)2^{N} possible states CC, and we let p_(C)(t)p_{C}(t) be the probability that the system is in the state CC at time tt. Furthermore, let tau_(0)\tau_{0} be the time scale for one update of the system, i.e. a spin flip occurs with probability (dt)/(tau_(0))\frac{d t}{\tau_{0}} during the time interval [t,t+dt][t, t+d t]. We assume that all spin flips are equally likely in our model. This leads to a master equation (A.1) of the form
Here, the first term in the brackets {dots}\{\ldots\} describes the increase in probability due to a change C_(i)rarr CC_{i} \rightarrow C, where C_(i)C_{i} differs from CC by flipping the i^("th ")i^{\text {th }} spin. This change occurs with probability (1)/(N)\frac{1}{N} per time tau_(0)\tau_{0}. The second term in the brackets {dots}\{\ldots\} describes the decrease in probability due to the change C rarrC_(i)C \rightarrow C_{i} for any ii. It can be checked from definition of X\mathcal{X} that
Furthermore it can be checked that the equilibrium configuration is given by
{:(A.22)p_(C)^(eq)=(1)/(2^(N))quad AA C in{-1","+1}^(N):}\begin{equation*}
p_{C}^{\mathrm{eq}}=\frac{1}{2^{N}} \quad \forall C \in\{-1,+1\}^{N} \tag{A.22}
\end{equation*}
Indeed: sum_(C^('))X_(CC^('))p_(C^('))^(eq)=0\sum_{C^{\prime}} \mathcal{X}_{C C^{\prime}} p_{C^{\prime}}^{\mathrm{eq}}=0, so in the equilibrium distribution, all states CC are equally likely for this model.
If we now imagine a discretized version of the process, where at each time step one randomly chosen spin is flipped, then the timescale over which the system returns to the initial condition is estimated by tau_("ergodic ")~~2^(N)tau_(0)\tau_{\text {ergodic }} \approx 2^{N} \tau_{0} since we have to visit O(2^(N))\mathcal{O}\left(2^{N}\right) sites before returning and each step takes time tau_(0)\tau_{0}. We claim that this is much larger than the relaxation timescale. To estimate the latter, we choose an arbitrary but fixed spin, say the first spin. Then we define p_(+-)=(:delta(sigma_(1)∓1):)p_{ \pm}=\left\langle\delta\left(\sigma_{1} \mp 1\right)\right\rangle, where the time-dependent average is calculated with respect to the distribution {p_(C)(t)}\left\{p_{C}(t)\right\}, in other words
{:(A.23)p_(+-)(t)=sum_(C:sigma_(1)=+-1)p_(C)(t)=" probability for finding the "1^("st ")" spin up/down at time "t:}\begin{equation*}
p_{ \pm}(t)=\sum_{C: \sigma_{1}= \pm 1} p_{C}(t)=\text { probability for finding the } 1^{\text {st }} \text { spin up/down at time } t \tag{A.23}
\end{equation*}
The master equation implies an evolution equation for p_(+)p_{+}(and similarly p_(-)p_{-}), which is
obtained by simply summing (A.20) subject to the condition sum_(C:sigma_(1)=+-1)\sum_{C: \sigma_{1}= \pm 1}. This gives:
So for t rarr oot \rightarrow \infty, we have p_(+)(t)rarr(1)/(2)p_{+}(t) \rightarrow \frac{1}{2} at an exponential rate. This means (1)/(2)\frac{1}{2} is the equilibrium value of p_(+)p_{+}. Since this holds for any chosen spin, we expect that the relaxation time towards equilibrium is tau_("relax ")~~(N)/(2)tau_(0)\tau_{\text {relax }} \approx \frac{N}{2} \tau_{0} and we see
A more precise analysis of relaxation time involves finding the eigenvalues of the 2^(N_(-))2^{N_{-}} dimensional matrix X_(CC^('))\mathcal{X}_{C C^{\prime}} : we think of the eigenvectors u__(0),u__(1),u__(2),dots\underline{u}_{0}, \underline{u}_{1}, \underline{u}_{2}, \ldots with eigenvalues lambda_(0)=0,lambda_(1),lambda_(2),dots\lambda_{0}=0, \lambda_{1}, \lambda_{2}, \ldots as functions u_(0)(C),u_(1)(C),dotsu_{0}(C), u_{1}(C), \ldots where C=(sigma_(1),dots,sigma_(N))C=\left(\sigma_{1}, \ldots, \sigma_{N}\right). Then the eigenvalue equation is
{:(A.28)u_(0)(C)-=u_(0)(sigma_(1),dots,sigma_(N))=p_(C)^(eq)=(1)/(2^(N))quad AA C:}\begin{equation*}
u_{0}(C) \equiv u_{0}\left(\sigma_{1}, \ldots, \sigma_{N}\right)=p_{C}^{\mathrm{eq}}=\frac{1}{2^{N}} \quad \forall C \tag{A.28}
\end{equation*}
Now we define the next NN eigenvectors u_(1)^(j),j=1,dots,Nu_{1}^{j}, j=1, \ldots, N by
{:(A.29)u_(1)^(j)(sigma_(1),dots,sigma_(N))={[alpha," if "sigma_(j)=+1],[beta," if "sigma_(j)=-1]:}:}u_{1}^{j}\left(\sigma_{1}, \ldots, \sigma_{N}\right)= \begin{cases}\alpha & \text { if } \sigma_{j}=+1 \tag{A.29}\\ \beta & \text { if } \sigma_{j}=-1\end{cases}
Imposing the eigenvalue equation gives alpha=-beta\alpha=-\beta, and then lambda_(1)=-(2)/(N)\lambda_{1}=-\frac{2}{N}. The eigenvectors are orthogonal to each other. The next set of eigenvectors u_(2)^(ij),1 <= i < j <= Nu_{2}^{i j}, 1 \leqslant i<j \leqslant N is
{:(A.30)u_(2)^(ij)(sigma_(1),dots,sigma_(N))={[alpha," if "sigma_(i)=1","sigma_(j)=1],[-alpha," if "sigma_(i)=1","sigma_(j)=-1],[-alpha," if "sigma_(i)=-1","sigma_(j)=1],[alpha," if "sigma_(i)=-1","sigma_(j)=-1]:}:}u_{2}^{i j}\left(\sigma_{1}, \ldots, \sigma_{N}\right)= \begin{cases}\alpha & \text { if } \sigma_{i}=1, \sigma_{j}=1 \tag{A.30}\\ -\alpha & \text { if } \sigma_{i}=1, \sigma_{j}=-1 \\ -\alpha & \text { if } \sigma_{i}=-1, \sigma_{j}=1 \\ \alpha & \text { if } \sigma_{i}=-1, \sigma_{j}=-1\end{cases}
The vectors u_(2)^(ij)u_{2}^{i j} are again found to be orthogonal, with the eigenvalue lambda_(2)=-(4)/(N)\lambda_{2}=-\frac{4}{N}. The subsequent vectors are constructed in the same fashion, and we find lambda_(k)=-(2k)/(N)\lambda_{k}=-\frac{2 k}{N} for the kk-th set. The general solution of the master equation is given by (A.10)
a_(i_(1)dotsi_(k))(t)=a_(i_(1)dotsi_(k))(0)e^(-2kt//(Ntau_(0)))a_{i_{1} \ldots i_{k}}(t)=a_{i_{1} \ldots i_{k}}(0) e^{-2 k t /\left(N \tau_{0}\right)}
This gives the relaxation time for a general distribution. We see that the relaxation time is given by the exponential with the smallest decay (the term with k=1k=1 in the sum), leading to the relaxation time tau_("relax ")=Ntau_(0)//2\tau_{\text {relax }}=N \tau_{0} / 2 already guessed before. This is exponentially small compared to the ergodic time! For N=1N=1 mol we have, approximately
The Metropolis algorithm is based in an essential way on the fact that tau_("relax ")≪tau_("ergodic ")\tau_{\text {relax }} \ll \tau_{\text {ergodic }} for typical systems. The general aim of the algorithm is to efficiently compute expectation values of the form
where E(C)E(C) is the energy of the state CC and FF is some observable. A good example to have in mind is the Ising model on a dd-dimensional square lattice, where the energy is given by:
and F=sigma_(i)F=\sigma_{i} or F=sigma_(i)sigma_(j)F=\sigma_{i} \sigma_{j}. Here, a configuration is a set of spins C={sigma_(1),dots,sigma_(N)}C=\left\{\sigma_{1}, \ldots, \sigma_{N}\right\}. In this example, as well as in most other models of statistical physics, the number of configurations scales exponentially with the system size; in the present example this number would be 2^(N)2^{N}. Furthermore, except for a rather special class of models, it seems impossible to do the sum in closed form. This happens for the Ising model for all dimensions d >= 3d \geqslant 3. Then, already for a cubic lattice of very modest side-length such as 10 , we are faced with 2^(1000)2^{1000} configurations, meaning that it is utterly out of question to do this sum by simply adding all terms up on a computer.
If we have to evaluate numerically an integral of the form
I=int_(0)^(1)g(x)dxI=\int_{0}^{1} g(x) d x
then if g(x)g(x) is sufficiently regular and varies on a scale of order 1 , we can do the integral e.g. by generating a sample X={x_(1),dots,x_(m)}X=\left\{x_{1}, \ldots, x_{m}\right\}, chosen according to the uniform probability distribution on [0,1][0,1] (the latter means that we have no prior idea of what g(x)g(x) looks like. Then we expect that
already for a relatively small number mm of points. However, this will fail e.g. if g(x)g(x) is very sharply peaked near some point x_(0)x_{0}, say with peak width 10^(-1000)10^{-1000}. It is clear that generically, none of the randomly chosen points X={x_(1),dots,x_(m)}X=\left\{x_{1}, \ldots, x_{m}\right\} will hit the peak, unless we take m~~10^(1000)m \approx 10^{1000}, which is out of the question. Roughly speaking, we typically run into the same kind of problem when evaluating the state sum in statistical physics.
Again, a simple minded method would be to simply generate a uniformly distributed sample tilde(C)_(1),dots, tilde(C)_(u)\tilde{C}_{1}, \ldots, \tilde{C}_{u} where u≫1u \gg 1 and to approximate (:F:)~~sum_(i=1)^(u)F( tilde(C)_(i))(e^(-beta E( tilde(C)_(i))))/(Z)\langle F\rangle \approx \sum_{i=1}^{u} F\left(\tilde{C}_{i}\right) \frac{e^{-\beta E\left(\tilde{C}_{i}\right)}}{Z}. But this is a very bad idea in most cases since the fraction of configurations out of which the quantity e^(-beta E( tilde(C)_(i)))e^{-\beta E\left(\tilde{C}_{i}\right)} is not practically 0 is exponentially small. The idea is instead to generate a sample C_(1),dots,C_(m)C_{1}, \ldots, C_{m} of configurations distributed according to prope^(-beta E(C))\propto e^{-\beta E(C)}. But how to get such samples? We choose any (!) T_(C,C^('))T_{C, C^{\prime}} satisfying the detailed balance condition (A.16):
Then, according to the above discussion, we expect that an initial distribution p_(C)(t)p_{C}(t) will reach the equilibrium configuration p_(C)^("eq ")=e^(-beta E(C))//Zp_{C}^{\text {eq }}=e^{-\beta E(C)} / Z after about NN time-steps, where NN is the system size. If we chose transition amplitudes such that also sum_(C^('))T_(C^('),C)=1\sum_{C^{\prime}} T_{C^{\prime}, C}=1 for all CC, the discretized version of the master equation then becomes
In the simplest case, the sum is over all configurations CC differing from C^(')C^{\prime} by flipping precisely one spin. If C_(i)^(')C_{i}^{\prime} is the configuration obtained from some configuration C^(')C^{\prime} by flipping spin ii, we therefore assume that T_(C^('),C)T_{C^{\prime}, C} is non-zero only if C=C_(i)^(')C=C_{i}^{\prime} for some ii.
Stating the algorithm in a slightly different way, we can say that, for a given configuration, we accept the change C rarrC^(')C \rightarrow C^{\prime} randomly with probability T_(C^('),C)T_{C^{\prime}, C}. A very simple and practical choice for the acceptance probability (in other words T_(C^('),C)T_{C^{\prime}, C} ) satisfying our
conditions is given by
{:(A.35)p_("accept ")={[1," if "E(C^(')) <= E(C)],[e^(-beta[E(C^('))-E(C)])," if "E(C^(')) >= E(C)]:}:}p_{\text {accept }}= \begin{cases}1 & \text { if } E\left(C^{\prime}\right) \leqslant E(C) \tag{A.35}\\ e^{-\beta\left[E\left(C^{\prime}\right)-E(C)\right]} & \text { if } E\left(C^{\prime}\right) \geqslant E(C)\end{cases}
We may then summarize the algorithm as follows:
Metropolis Algorithm
(1) Choose an initial configuration CC.
(2) Choose randomly a spin ii and determine the change in energy E(C)-E(C_(i))=delta_(i)EE(C)-E\left(C_{i}\right)=\delta_{i} E for the new configuration C_(i)C_{i} obtained by flipping one spin ii.
(3) Choose a uniformly distributed random number u in[0,1]u \in[0,1]. If u < e^(-betadelta_(i)E)u<e^{-\beta \delta_{i} E}, change sigma_(i)rarr-sigma_(i)\sigma_{i} \rightarrow-\sigma_{i}, otherwise leave sigma_(i)\sigma_{i} unchanged.
(4) Rename C_(i)rarr CC_{i} \rightarrow C.
(5) Go back to (2).
Running the algorithm mm times, going through approximately NN iterations each time, gives the desired sample C_(1),dots,C_(m)C_{1}, \ldots, C_{m} distributed approximately according to e^(-beta E(C))//Ze^{-\beta E(C)} / Z. The expectation value < F ><F> is then computed as the average of F(C)F(C) over the sample C_(1),dots,C_(m)C_{1}, \ldots, C_{m}. An important practical point is that the change in energy if we flip one spin is very easy to calculate in the example of the Ising model because the interaction is local (i.e. we have to compute only 2d2 d terms associated with the nearest neighbors of ii ), as it is in most models.
A. 5 Eigenstate thermalization
[In this section, we shall be following Srednicki: J Phys A 32 (1999) 1163-1175, Sections 2 and 3.] We have seen previously that in classical systems, the ensemble average of an observables equals the time-average under typical conditions. This can be seen as a kind of statement about equilibration. For quantum systems showing quantum chaos one can give another argument why the system will equilibrate no matter what its initial state was. It does not rely on an incomplete knowledge of the dynamics (as discussed in chapter 3.2) but rather on the specific structure of observables in such systems. Assume that dynamics is ruled by the time-independent (exact) Hamiltonian hat(H)\hat{H} with nondegenerate eigenvalues E_(n)E_{n} and eigenstates |n:)|n\rangle. Then, in the energy eigenbasis, the diagonal matrix elements of an observable AA can be expressed in terms of a function in one variable E_(n)E_{n} and the off-diagonal matrix elements can be expressed in terms of a function in two
variables E_(n)E_{n} and E_(m)E_{m}, or in terms of their sum and difference
with a nonnegative smooth function hh having unit integral and chosen so that SS is monotonic. Calculation of the expectation value of AA in the canonical ensemble at temperature TT yields
where in the last term, SS stands for the entropy of the ensemble. Extending this interpretation to the function SS in the integrals and using that the entropy is an extensive quantity, one may evaluate the integrals by the method of stationary phase (see eg. Appendix A. 2 in the script on quantum mechanics). This leads to
On the other hand, for the time average (:A:)_(Psi," time ")\langle A\rangle_{\Psi, \text { time }} of the expectation value (:Psi_(t)∣APsi_(t):)\left\langle\Psi_{t} \mid A \Psi_{t}\right\rangle in some given state Psi\Psi, we find
{:(A.39)(:A:)_(Psi," time ")=sum_(n)|gamma_(n)|^(2)alpha(E_(n))+O(e^(-S//2)).:}\begin{equation*}
\langle A\rangle_{\Psi, \text { time }}=\sum_{n}\left|\gamma_{n}\right|^{2} \alpha\left(E_{n}\right)+O\left(\mathrm{e}^{-S / 2}\right) . \tag{A.39}
\end{equation*}
For a state of a macroscopic system that could be realistically prepared in a lab, the uncertainty of hat(H)\hat{H} in the state Psi\Psi typically satisfies
so it is small for large NN. Thus, expanding alpha\alpha in (A.39) about (: hat(H):)_(Psi," time ")\langle\hat{H}\rangle_{\Psi, \text { time }} to second order, we obtain the approximation
Combining this with the approximation for the ensemble average of AA given by (A.37) and choosing TT so that (: hat(H):)_("ens ")=(: hat(H):)_(Psi)\langle\hat{H}\rangle_{\text {ens }}=\langle\hat{H}\rangle_{\Psi}, we finally obtain
(:A:)_(Psi," time ")=(:A:)_(ens)+O(Delta_(Psi)^(2)(( hat(H))))+O(N^(-1))+O(e^(-S//2)).\langle A\rangle_{\Psi, \text { time }}=\langle A\rangle_{\mathrm{ens}}+O\left(\Delta_{\Psi}^{2}(\hat{H})\right)+O\left(N^{-1}\right)+O\left(\mathrm{e}^{-S / 2}\right) .
As a result, the time average of the expectation value of AA in a realistically preparable state is approximately equal to the ensemble average of AA at the appropriate temperature. Therefore, no matter what (realistic) state the system is prepared in, it will always equilibrate.
Appendix B
Exercises
B. 1 Exercises for chapter 2
Problem B.1. [Random walk] Let w(x)dxw(x) d x be an arbitrary probability distribution describing the probability for finding a real random variable XX in the 'interval' [x,x+dx][x, x+d x]. Let the mean and spread be defined as usual as
mu=int xw(x)dx,quad sigma=(int(x-mu)^(2)w(x)dx)^((1)/(2)).\mu=\int x w(x) d x, \quad \sigma=\left(\int(x-\mu)^{2} w(x) d x\right)^{\frac{1}{2}} .
Consider now a random walk on the real axis R\mathbb{R} with increment/decrement X_(i)X_{i} at step ii. The mean distance covered from the origin after NN steps is Y=(1)/(N)(X_(1)+cdots+X_(N))Y=\frac{1}{N}\left(X_{1}+\cdots+X_{N}\right). The aim is to show that, for large N,YN, Y has approximately a Gaussian probability distribution, with mean mu\mu and spread sigma//sqrtN\sigma / \sqrt{N}.
a) Introduce the variable Z=sum(X_(i)-mu)//sqrtNZ=\sum\left(X_{i}-\mu\right) / \sqrt{N} and demonstrate that its probability distribution w_(Z)(z)dzw_{Z}(z) d z is given by
w_(Z)(z)=int(dk)/(2pi)e^(ikz)int dx_(1)dots dx_(N)prod w(x_(i))exp(-ik(x_(1)+cdots+x_(N))//sqrtN+ik musqrtN)w_{Z}(z)=\int \frac{d k}{2 \pi} e^{i k z} \int d x_{1} \ldots d x_{N} \prod w\left(x_{i}\right) \exp \left(-i k\left(x_{1}+\cdots+x_{N}\right) / \sqrt{N}+i k \mu \sqrt{N}\right)
b) Introduce the 'characteristic function'
chi(q)= tilde(w)(q)=int dxw(x)e^(-iqx),\chi(q)=\tilde{w}(q)=\int d x w(x) e^{-i q x},
where tilde(w)(q)\tilde{w}(q) is the Fourier transform of w(x)w(x), and write w_(Z)(z)w_{Z}(z) in terms of it.
c) Show that the first terms in the expansion log chi(q)=Sigma(:x^(n):)_(c)(-iq)^(n)//n\log \chi(q)=\Sigma\left\langle x^{n}\right\rangle_{c}(-i q)^{n} / n ! are given by the cumulants (:x:)_(c)=mu,(:x^(2):)_(c)=sigma^(2)\langle x\rangle_{c}=\mu,\left\langle x^{2}\right\rangle_{c}=\sigma^{2}. Substitute this into (b) and show that the result may be written as
w_(Z)(z)=int(dk)/(2pi)e^(ikz-(1)/(2)(sigma k)^(2)+cdots)w_{Z}(z)=\int \frac{d k}{2 \pi} e^{i k z-\frac{1}{2}(\sigma k)^{2}+\cdots}
where ... stand for terms going to zero as 1//sqrtN1 / \sqrt{N} or faster for large NN.
d) Deduce that for large NN one has w_(Z)(z)rarr(1)/(sqrt(2pi sigma))exp(-z^(2)//(2sigma^(2)))w_{Z}(z) \rightarrow \frac{1}{\sqrt{2 \pi \sigma}} \exp \left(-z^{2} /\left(2 \sigma^{2}\right)\right). Relating this to the distribution of YY, get the desired result.
e) What is the wider significance of the result beyond a random walk on R\mathbb{R} ?
Problem B. 2 (Entropy maximization 1). A system has NN states occupied with probabilities p_(n),n=0,1,dots,Np_{n}, n=0,1, \ldots, N. The nn-th state has energy E_(n)=nE_{n}=n. The average energy, U=sum_(n)E_(n)p_(n)U=\sum_{n} E_{n} p_{n}, is assumed to be given.
a) Show that the entropy is maximized by a distribution of the form p_(n)=e^(-beta n)//Zp_{n}=e^{-\beta n} / Z.
b) In the case N=2N=2, work out the explicit form of beta,Z\beta, Z in terms of UU.
c) Suppose now that the standard deviation,
is also known. What is the form of the distribution maximizing the entropy in this case for general NN ?
Your answers should indicate why the distribution is an actual maximum of the entropy and not just a stationary point.
Problem B. 3 (Entropy maximization 2). Repeatedly throwing an ideal dice will evidently yield the average result of 3.5 . However, it is found for an-obviously manipulateddice that the average is instead 4. In the absence of further information, what is the probability distribution, to within 3 significant figures, assigned to this dice?
Hint: Maximize the information entropy. You may use a computer (e.g. MATHEMATICA) to help you with any equation that you need to solve numerically.
Problem B. 4 (Information entropy). This problem motivates the definition of information entropy. Consider an experiment with NN possible outcomes that occur randomly with probabilities p_(1),dots,p_(N)p_{1}, \ldots, p_{N}. (Think of throwing a dice, where N=6,p_(i)=(1)/(6)N=6, p_{i}=\frac{1}{6}.) To determine which outcome OO has occurred, we allow ourselves yes/no questions of the type: Is O in SO \in S ? where SS is some subset of {1,dots,N}\{1, \ldots, N\}. For example, for S={i}S=\{i\}, the corresponding question would be: Is O in{i}O \in\{i\}, i.e. has event ii occurred? Or, if S={1,3}S=\{1,3\}, the question is: Has event 1 or event 3 occurred?
a) Consider the following question strategy to find out what OO was: We first ask: Has outcome 1 occurred? If yes, we are done, if no, we ask: Has outcome 2 occurred? and so on. What is the maximum number of questions needed to determine OO in this strategy? What is the average number of questions needed in this strategy?
b) Let I=-sum_(i)p_(i)log_(2)p_(i)I=-\sum_{i} p_{i} \log _{2} p_{i} be the information entropy. Show that in any strategy, the average number of questions needed to determine OO is >= I\geqslant I.
c) Verify that this is indeed the case for strategy a), applied to a dice. For a dice, suggest a strategy which requires fewer questions on average than a).
d)* Show that there always exists a strategy such that the average number of questions needed is <= I+1.b\leqslant I+1 . \mathrm{b} ) and d) show that II is an estimate of the average number of questions needed to find out what the outcome was.
Problem B.5. [Ising spin chain] We consider the 1-dimensional Ising spin chain with periodic boundary conditions. In this model, we have nn spins sigma_(1),dots,sigma_(n)in{+-1}\sigma_{1}, \ldots, \sigma_{n} \in\{ \pm 1\}, and the energy of a configuration is given by
a) Draw a picture of the spin chain for a configuration minimizing/maximizing HH.
b) Show that Z=sum_({sigma_(j)})exp(-beta H({sigma_(j)}))Z=\sum_{\left\{\sigma_{j}\right\}} \exp \left(-\beta H\left(\left\{\sigma_{j}\right\}\right)\right).
c) Show that ZZ can be written alternatively as
Hint: write out HH and the multiple sums in Z=sum_({sigma_(j)})exp(-beta H({sigma_(j)}))Z=\sum_{\left\{\sigma_{j}\right\}} \exp \left(-\beta H\left(\left\{\sigma_{j}\right\}\right)\right).
d) Show that TT has eigenvalues lambda_(1)=2cosh(J beta),lambda_(2)=2sinh(J beta)\lambda_{1}=2 \cosh (J \beta), \lambda_{2}=2 \sinh (J \beta). (This means that we can diagonalize TT, i.e. we have T=UDU^(†)T=U D U^{\dagger}, where D=diag(lambda_(1),lambda_(2))D=\operatorname{diag}\left(\lambda_{1}, \lambda_{2}\right) and UU is a unitary matrix.) Use this and Z=trT^(n)Z=\operatorname{tr} T^{n} to show that
Note that you do not need to compute UU (explain why!).
Problem B. 6 (Initial conditions). 1mm^(3)1 \mathrm{~mm}^{3} of a gas at normal pressure and temperature contains about 10^(15)10^{15} particles. Considering the particles as point-like and classical, provide a rough, conservative estimate for how many hard drives would be necessary to store the initial conditions of all gas particles. (As of 2013, a normal hard drive can store about 5 TB of data.)
Problem B. 7 (Time evolution of ensemble averages).
a) Let (P,Q)-=( vec(p)_(1), vec(q)_(1),dots, vec(p)_(N), vec(q)_(N))inR^(6N)(P, Q) \equiv\left(\vec{p}_{1}, \vec{q}_{1}, \ldots, \vec{p}_{N}, \vec{q}_{N}\right) \in \mathbb{R}^{6 N} be a point in the phase space of NN particles, and let the dynamical law be given through a time-independent Hamiltonian H(P,Q)H(P, Q), which defines the trajectories (P(t),Q(t))(P(t), Q(t)) via Hamilton's equations. Let Phi_(t)(P,Q):=(P(t),Q(t))\Phi_{t}(P, Q):=(P(t), Q(t)) with initial condition (P(0),Q(0))=(P,Q)(P(0), Q(0))=(P, Q). Show that for any observable O(P,Q)O(P, Q), the 'time-translated' observable O_(t)(P,Q)=O[Phi_(t)(P,Q)]O_{t}(P, Q)=O\left[\Phi_{t}(P, Q)\right] have the same expectation value, (:O:)=(:O_(t):)\langle O\rangle=\left\langle O_{t}\right\rangle for all tt, where the probability distribution describing the ensemble has constant value on each energy surface.
b) Is the phase space for NN particles always of the form R^(6N)\mathbb{R}^{6 N} ? Hint: Think e.g. of a gas consisting of molecules.
Problem B. 8 (Phase space density). The purpose of this problem is to explain why the phase space density for an equilibrium ensemble must generically be a function of EE alone.
a) Show that the "classical trace" of a rapidly decaying function f(P,Q)f(P, Q) on the phase space R^(6N)\mathbb{R}^{6 N}, defined by
tr(f)=int f(P,Q)d^(3N)Qd^(3N)P\operatorname{tr}(f)=\int f(P, Q) d^{3 N} Q d^{3 N} P
Hint: Use the results of problem B.7.
b) Let rho(P,Q)\rho(P, Q) be a classical phase space distribution, O(P,Q)O(P, Q) an observable, and (:O:)=tr(O rho)\langle O\rangle=\operatorname{tr}(O \rho), with the "classical trace" defined in a). Show that if rho=rho(I_(0),dots,I_(N))\rho=\rho\left(I_{0}, \ldots, I_{N}\right) is a function of conserved quantities I_(i)I_{i} of the system (which include at least I_(0)=HI_{0}=H ), then the ensemble defined by rho\rho is stationary, i.e. (:O_(t):)=(:O:)\left\langle O_{t}\right\rangle=\langle O\rangle for all tt and all OO. Is the converse statement also true? What are the conserved quantities for a generic Hamiltonian of the form
(note that WW can be used to describe the "walls" of a box.) What if W=0W=0 ? What if W( vec(x)_(i))=w(| vec(x)_(i)|)W\left(\vec{x}_{i}\right)=w\left(\left|\vec{x}_{i}\right|\right), i.e. a spherically symmetric external potential? What if V=W=0?V=W=0 ?
Problem B. 9 (Density matrices).
a) Verify the following elementary properties of the trace of matrices:
b) Let rho,sigma\rho, \sigma be density matrices. Show that for any nonnegative real numbers p,qp, q satisfying p+q=1p+q=1, the matrix p rho+q sigmap \rho+q \sigma has the properties of a density matrix.
Problem B. 10 (Entanglement entropy). Ignoring all degrees of freedom other than spin, the Hilbert space of a single neutron is H=C^(2)\mathscr{H}=\mathbb{C}^{2}, with spin-operators
a) The neutron is aligned with probability pp in the +z+z direction and with probability 1-p1-p in the +x+x direction. What is the density matrix describing this ensemble? What are its eigenvalues?
b) Now consider two neutrons in the normalized state |psi:)=alpha|uarr uarr:)+beta|darr darr:)|\psi\rangle=\alpha|\uparrow \uparrow\rangle+\beta|\downarrow \downarrow\rangle, where |uarr:)|\uparrow\rangle resp. |darr:)|\downarrow\rangle are the normalized eigenstates for the +z+z resp. -z-z direction and |uarr uarr:)=|uarr:)ox|uarr:)|\uparrow \uparrow\rangle=|\uparrow\rangle \otimes|\uparrow\rangle etc. What is the reduced density matrix for the first neutron? When is the corresponding entanglement entropy maximal/minimal?
c) Assume now that the two neutron system is in an eigenstate |chi:)|\chi\rangle with zero total spin in the zz-direction. What is the reduced density matrix for the first neutron? Show that if |chi:)|\chi\rangle is either symmetric or anti-symmetric, then the entanglement entropy is maximized.
d) Can an arbitrary density matrix rho_(1)\rho_{1} for the first neutron arise as the reduced density matrix of a suitable (pure) state of the system with two neutrons?
B. 2 Exercises for chapter 3
Problem B. 11 (Boltzmann's H-theorem). Let f( vec(v), vec(x),t)f(\vec{v}, \vec{x}, t) be the 1-particle probability distribution, expressed in terms of the velocity vec(v)= vec(p)//m\vec{v}=\vec{p} / m. Define a function H(t)H(t) by
H(t)=-intd^(3)vd^(3)xf( vec(v), vec(x),t)log f( vec(v), vec(x),t)H(t)=-\int d^{3} v d^{3} x f(\vec{v}, \vec{x}, t) \log f(\vec{v}, \vec{x}, t)
which is similar in nature to the von Neumann ( prop\propto information-) entropy. The aim of this problem is to derive the important consequence H^(˙) >= 0\dot{H} \geqslant 0 of the Boltzmann equation.
a) Explain why the Boltzmann equation can alternatively be written as
for some W > 0W>0. Hint: "Undo" the momentum- and energy-conservation rule which has already been included in the Boltzmann equation by introducing delta^(3)(( vec(v))+ vec(v)_(1)-:}\delta^{3}\left(\vec{v}+\vec{v}_{1}-\right.vec(v)_(3)- vec(v)_(4)\vec{v}_{3}-\vec{v}_{4} ) and new integrations etc.
b) Argue physically why the following relations should hold:
Hint: What is the physical meaning of these equations for the collision?
c) Let H( vec(x),t)H(\vec{x}, t) be defined as H(t)H(t) but without the d^(3)xd^{3} x-integration. Show that
using the shorthand f_(1)=f(( vec(x)), vec(v)_(1),t),f_(2)=f(( vec(x)), vec(v)_(2),t)f_{1}=f\left(\vec{x}, \vec{v}_{1}, t\right), f_{2}=f\left(\vec{x}, \vec{v}_{2}, t\right), etc.
d) Using b), show that II can be written as
and conclude that I >= 0I \geqslant 0. Using c), show that H^(˙) >= 0\dot{H} \geqslant 0.
e) What is the physical significance of this result?
Problem B. 12 (Master equation). We consider a time dependent probability distribution {p_(i)(t)}\left\{p_{i}(t)\right\} described by the master equation:
a) Show that the time-independent distribution p_(i)=e^(-betaE_(i))//Zp_{i}=e^{-\beta E_{i}} / Z is a (stationary) solution to the master equation.
b) Show that the master equation implies sum_(i)p_(i)(t)=1\sum_{i} p_{i}(t)=1 for all tt if this is true at t=0t=0. (Hint: differentiate this sum and use the master equation.) Give an argument why each p_(i)(t)p_{i}(t) has to remain positive for all times if this holds initially. (Hint: consider a t_(0)t_{0} such that p_(i)(t_(0))=0p_{i}\left(t_{0}\right)=0 and use the master equation.)
c) Consider a population of bacteria. Let nn be the number of bacteria, MM the mortality rate, RR the reproduction rate. p_(n)(t)p_{n}(t) is the probability that the population consists of nn bacteria. We consider the evolution equation
p^(˙)_(n)(t)=R(n-1)p_(n-1)(t)+M(n+1)p_(n+1)(t)-(M+R)np_(n)(t)\dot{p}_{n}(t)=R(n-1) p_{n-1}(t)+M(n+1) p_{n+1}(t)-(M+R) n p_{n}(t)
for n > 0n>0 and p^(˙)_(0)(t)=Mp_(1)(t)\dot{p}_{0}(t)=M p_{1}(t) for n=0n=0. Show that the equation has the form of a master equation. Derive the possible equilibrium state(s) of this system.
B. 3 Exercises for chapter 4
Problem B. 13 (1-dimensional classical Ising model). The dd-dimensional Ising model is exactly solvable in d=1,2d=1,2, but not beyond. Here we look at the (easier) case when d=1d=1. Consider a 1-dimensional lattice of N+1N+1 atoms, each of which is assumed to carry a spin sigma_(i)=+-1,i=0,dots,N\sigma_{i}= \pm 1, i=0, \ldots, N. The energy of the state described by {sigma_(i)}=(sigma_(0),dots,sigma_(N))\left\{\sigma_{i}\right\}=\left(\sigma_{0}, \ldots, \sigma_{N}\right) is assumed to be
where JJ is a constant which determines the strength of the interaction.
a) Neighboring spins can have equal or opposite signs, in which case they are called "parallel" resp. "anti-parallel". Let nu\nu be the number of anti-parallel pairs in {sigma_(i)}\left\{\sigma_{i}\right\}. Express the energy in terms of nu\nu. Count the number of states with nu\nu anti-parallel pairs. Hence, what is the number W(E)W(E) of configurations {sigma_(i)}\left\{\sigma_{i}\right\} having H({sigma_(i)})=EH\left(\left\{\sigma_{i}\right\}\right)=E ?
b) Using the result of a), calculate the canonical partition function
Hint: Rewrite the sum as a sum over nu\nu.
c) Calculate the free energy F//NF / N per spin and the entropy per spin S//N\operatorname{spin} S / N in the canonical and micro-canonical ensembles for large NN.
Problem B. 14 (Heat capacity of a crystal). We study a simplified microscopic model to understand the heat capacity of a crystal. We suppose that the crystal consists of NN atoms (or ions) arranged in some sort of lattice. The equilibrium position of an atom is at some lattice site, about which it can oscillate. We assume that the oscillations are small and can be described by a harmonic oscillator, and that the individual oscillators are independent, i.e. do not interact with one another (to what extent is the last assumption realistic in practice?). The total Hamiltonian is hence the sum of the Hamiltonians for the individual oscillators:
where vec(p)_(i)\vec{p}_{i} is the momentum, vec(x)_(i)\vec{x}_{i} is the position relative to the equilibrium position, and mm is the mass of the atom.
a) Einstein model, canonical approach.
i) Describe the eigenstates and eigenvalues (energy levels) of the crystal in terms of those of a single 1-dimensional harmonic oscillator.
ii) Give the quantum canonical partition function Z_(N)Z_{N} as a function of the temperature
iii) Deduce the mean energy UU and the specific heat CC of the system. Compare this to the specific heat of a paramagnetic (non-interacting) spin chain. What is the behavior of CC for high/low temperatures? What is the numerical value for CC at high temperature for a crystal containing N_(A)=6.022 xx10^(23)N_{A}=6.022 \times 10^{23} atoms? What is the characteristic temperature T_(0)=ℏomega//k_(B)T_{0}=\hbar \omega / k_{B} for a value ℏomega=0.1eV\hbar \omega=0.1 \mathrm{eV} ?
iv) Evaluate also the classical canonical partition function Z_(N)^("class ")Z_{N}^{\text {class }} of NN distinguishable classical oscillators. Demonstrate that it is comparable to the quantum partition function for T≫T_(0)T \gg T_{0}. Comment?
b) Micro-canonical approach. Here, one fixes the total energy, E=(M+(3)/(2)N)ℏomegaE=\left(M+\frac{3}{2} N\right) \hbar \omega, where M≫1M \gg 1.
i) What is the number W(E)W(E) of accessible micro-states of the system?
ii) Deduce the entropy S(E)S(E) of the system.
iii) Express EE as a function of the temperature.
iv) Compute CC and compare the result to that found in part 1.
c) Modified Einstein model. The prediction for the heat capacity as a function of TT is qualitatively in accord with experiments for high temperatures, but not for low temperatures, where experiments show the behavior C∼T^(3)C \sim T^{3}. In order to have a model which is in accord with that behavior for low TT, suppose that we have instead NN oscillators with variable frequencies omega_(i),i=1,dots,N\omega_{i}, i=1, \ldots, N between 0 and some maximum omega_(max)\omega_{\max }. The heat capacity is now given by a sum. Approximate this sum by an integral in terms of the frequency distribution function D(omega)D(\omega), defined in such a way that D(omega)d omegaD(\omega) d \omega is the number of atoms with frequency in the range omega dots omega+d omega\omega \ldots \omega+d \omega. Assuming that D(omega)D(\omega) behaves as D(omega)∼Aomega^(nu)D(\omega) \sim A \omega^{\nu} for omega≪1\omega \ll 1, find the correct value of nu\nu to reproduce the behavior C∼T^(3)C \sim T^{3} for low TT.
Problem B. 15 (Paramagnetism). Consider a system of a large number NN of spins. Let s_(i)=+-1s_{i}= \pm 1 be the value of the ii-th spin in the zz-direction.
a) In the absence of a magnetic field, all spin configurations (s_(1),dots,s_(N))\left(s_{1}, \ldots, s_{N}\right) are equally probable.
i) What is the probability for a fixed configuration of spins?
ii) What is the probability of finding a configuration with N_(+)N_{+}positive spins (and N_(-)=N-N_(+)N_{-}=N-N_{+}negative spins)?
iii) Generalize the probability law in 2 to the case when a positive spin occurs with probability pp and a negative spin with probability 1-p1-p. Calculate, in this case, the mean value (:N_(+):)\left\langle N_{+}\right\rangleand the spread DeltaN_(+)\Delta N_{+}. What is the dependence on NN for a large system?
b) Now suppose the spins are associated with the electrons sitting at distinct lattice sites of a crystal. The magnetic moment associated with the ii-th spin is
where vec(sigma)=(sigma_(x),sigma_(y),sigma_(z))\vec{\sigma}=\left(\sigma_{x}, \sigma_{y}, \sigma_{z}\right) are the Pauli matrices. What is the Hamiltonian in the presence of a constant magnetic field vec(B)\vec{B} in the zz-direction? What are the maximum and minimum values E_(max//min)E_{\max / \min } of the energy of the system interacting with the external magnetic field? Express the degeneracy of a given energy level as a function of N_(+),N_(-)N_{+}, N_{-}.
c) Entropy:
i) Calculate the entropy of the system given the information that the energy is at the fixed value
Express the entropy in terms of NN and the variable epsilon=(E)/(N mu B)\epsilon=\frac{E}{N \mu B} (energy per spin in units of mu B\mu B ). Sketch S=S(N,epsilon)S=S(N, \epsilon). Verify that the entropy is a concave function of the energy, meaning that
(You may recall Stirling's formula which states log n!=n log n-n+O(log n)\log n!=n \log n-n+O(\log n) for large nn.) Why is it plausible that the entropy should be concave?
ii) Suppose the energy of the system is not known exactly, but only up to Delta U\Delta U, where we assume Delta U≪N mu B\Delta U \ll N \mu B. What is the entropy of the system? Is the result significantly different from that in 1 ?
d) Temperature: Recall that the absolute temperature of the system is defined by
T=(1)/(del S//del E)T=\frac{1}{\partial S / \partial E}
i) Express T(E)T(E) as a function of energy for a spin system with NN spins. Sketch this function, and comment on the behavior of T(E)T(E) for E > 0E>0 !
ii) Invert the relation between temperature and energy and obtain the energy as a function E=E(T,N)E=E(T, N) of T,NT, N.
iii) Consider a spin system with positive energy. The spin system is put in thermal contact with an ideal monoatomic gas at temperature T_(g)T_{\mathrm{g}}. The energy of the gas is, as usual, E_(g)=(3)/(2)N_(g)k_(B)T_(g)E_{\mathrm{g}}=\frac{3}{2} N_{\mathrm{g}} k_{B} T_{\mathrm{g}}. Once thermal equilibrium is reached, what can be said about the final temperature T_(f)T_{\mathrm{f}} ? What are its limits for N//N_(g)rarr ooN / N_{\mathrm{g}} \rightarrow \infty resp. rarr0\rightarrow 0 ?
e) Curie's law: The magnetization vec(M)=(M_(x),M_(y),M_(z))\vec{M}=\left(M_{x}, M_{y}, M_{z}\right) is defined as the average magnetic moment in the spin system. The magnetic susceptibility per volume VV is defined in the small field limit as ^(1){ }^{1}
A substance having chi > 0\chi>0 is called paramagnetic, while a substance having chi < 0\chi<0 is called diamagnetic. For paramagnetic substances, one finds experimentally that, to a good precision, chi\chi is inversely proportional to the absolute temperature. This behavior is called 'Curie's law'. Suppose the spin system is in thermal equilibrium at temperature TT. Give the magnetic moment M-=M_(z)M \equiv M_{z} as a function of beta,B-=B_(z),N\beta, B \equiv B_{z}, N.
Deduce the susceptibility and verify Curie's law. Calculate also the heat capacity C=del E//del TC=\partial E / \partial T and sketch CC as a function of B//TB / T and T//BT / B.
Problem B. 16 (Mean field theory and ferromagnetism). A ferromagnetic material has a spontaneous magnetization below a critical temperature T_(c)T_{c}, even in the absence of an external magnetic field BB. Above T_(c)T_{c}, the spontaneous magnetization is zero, and the material behaves like a paramagnet. To understand this effect, we study the famous Ising model. In this model, one considers independent spins sigma_(i)=+-1\sigma_{i}= \pm 1 on the sites ii of a hypercubic lattice Z^(d)\mathbb{Z}^{d} in dd spatial dimensions. The energy of a configuration of spins {sigma_(i)}\left\{\sigma_{i}\right\} is taken to be
The first sum is over all lattice bonds, i.e. pairs (i,j)(i, j) with i < ji<j such that spin ii and spin jj are nearest neighbors. The second sum is over all lattice sites. bb is related to the background field by b=mu Bb=\mu B, where mu\mu is of the order of the Bohr magneton, ∼-9.3 xx10^(-24)J//T.J\sim-9.3 \times 10^{-24} J / T . J is the ferromagnetic coupling between the spins. The probability distribution for the spin configurations is
with the partition function Z=Z(beta,J,b)Z=Z(\beta, J, b).
a) Write down a formula for the partition function.
b) Write down a formula for rho(sigma_(1),dots,sigma_(i)=+1,dots)//rho(sigma_(1),dots,sigma_(i)=-1,dots)\rho\left(\sigma_{1}, \ldots, \sigma_{i}=+1, \ldots\right) / \rho\left(\sigma_{1}, \ldots, \sigma_{i}=-1, \ldots\right) in terms of beta\beta and h_(i)h_{i}. Let p_(+-)p_{ \pm}be the probabilities that the ii-th spin is +-1\pm 1, respectively. Show that the mean magnetization defined as m=(:sigma_(i):)m=\left\langle\sigma_{i}\right\rangle is independent of ii and can be written as
(The mean value is defined wrt. the probability distribution rho\rho given above.)
c) In the mean field approximation, each individual spin can be thought of as being subject to an "effective" magnetic field
and assumes that it is consistent to replace h_(i)h_{i} with its mean value h:=(:h_(i):)h:=\left\langle h_{i}\right\rangle. Assuming the mean field approximation, derive, using a), the "self-consistency" relation
m=tanh beta(Jvm+b)m=\tanh \beta(J v m+b)
where vv is the number of nearest neighbors in the lattice, i.e. v=2v=2 in 1d,v=41 d, v=4 in 2d,v=62 d, v=6 in 3d3 d, etc.
d) The free energy is given as usual by F=-k_(B)T log ZF=-k_{B} T \log Z, where beta^(-1)=k_(B)T\beta^{-1}=k_{B} T. Verify that
{:(B.2)Nm=-(del F)/(del b):}\begin{equation*}
N m=-\frac{\partial F}{\partial b} \tag{B.2}
\end{equation*}
Here NN is the total number of lattice sites, which we assume to be finite in this part (box). To calculate ZZ, write sigma_(i)=(:sigma_(i):)+deltasigma_(i)\sigma_{i}=\left\langle\sigma_{i}\right\rangle+\delta \sigma_{i}, with deltasigma_(i)=sigma_(i)-(:sigma_(i):)\delta \sigma_{i}=\sigma_{i}-\left\langle\sigma_{i}\right\rangle. Substitute this into the formula for HH, and neglect terms that are quadratic in deltasigma_(i)\delta \sigma_{i}. Calculate ZZ and FF in this approximation, and verify that
F=-beta^(-1)N log[2cosh beta(vJm+b)]+(1)/(2)NvJm^(2)F=-\beta^{-1} N \log [2 \cosh \beta(v J m+b)]+\frac{1}{2} N v J m^{2}
Verify that the self-cosistency relation of c) is consistent with eq. (B.2) in this approximation.
e) Now let b=0b=0 (no external field). Show that the self-consistency equation for mm in b) has m=0m=0 as its only solution if T > T_(c)T>T_{c}, where T_(c):=vJ//k_(B)T_{c}:=v J / k_{B}. Whence, in this case there is no spontaneous magnetization. Show that, for T < T_(C)T<T_{\mathcal{C}}, the self-consistency equation has two nonzero solutions. Thus, below the critical temperature, there is spontaneous magnetization. For T_(c)-T > 0T_{c}-T>0 and small, solve the self-consistency equation by expanding the tanh around m=0m=0. Show that the solution m(T)m(T) behaves as
as T rarrT_(c)T \rightarrow T_{c}. This behavior is characteristic in the theory of phase transitions. The exponent is called the critical exponent.
Problem B. 17 (Directed polymer). A polymer consists of atoms i=0,1,2,dots,Ni=0,1,2, \ldots, N at positions (x_(i),y_(i))inZ^(2)\left(x_{i}, y_{i}\right) \in \mathbb{Z}^{2} of a square lattice. The atom at the origin is fixed at the position x_(0)=y_(0)=0x_{0}=y_{0}=0, and the other atoms are chained together such that x_(i)-x_(i-1)=1x_{i}-x_{i-1}=1 and |y_(i)-y_(i-1)|=1\left|y_{i}-y_{i-1}\right|=1. This polymer is hence oriented in the xx-direction and it does not selfintersect.
a) Determine the total number of micro-states of the polymer.
b) Determine the number of microstates W(N,y)W(N, y) having the property that y_(N)=yy_{N}=y.
c) Determine, more generally, the number of microstates W(i,y)W(i, y) having the property that y_(i)=yy_{i}=y. Also calculate y//W(i,y)y / W(i, y).
d) Determine the number of micro-states W(i,y,N,y^('))W\left(i, y, N, y^{\prime}\right) having the property that y_(i)=y_{i}=y,y_(N)=y^(')y, y_{N}=y^{\prime}. Also calculate yy^(')//W(i,y,N,y^('))y y^{\prime} / W\left(i, y, N, y^{\prime}\right).
e) Finally, calculate the typical deflection of the chain end,
Problem B. 18 (Entropy budget of the Earth). It is estimated that the mass of carbon bound in newly generated biomass on earth is about 10^(11)-10^(12)10^{11}-10^{12} tons per year. Carbon is mostly taken out of the atmosphere by converting CO_(2)\mathrm{CO}_{2} gas and water vapor into organic material via photosynthesis. Organic material consists of highly organized structures and consequently should have a much lower entropy than water vapor and CO_(2)\mathrm{CO}_{2} gas. In order to reconcile this with the principle that the entropy of a system cannot decrease, one notes that the earth is not an isolated system, but receives high energy photons from the sun and emits heat in the form of low energy photons back into space. Through this process, the entropy of the photons is increased. The aim of this question is to estimate this gain and to show that it can account for the entropy decrease through newly generated biomass.
a) Most photons arriving from the sun have a wavelength of ∼520nm\sim 520 \mathrm{~nm}. Using the Einstein relation E=h nuE=h \nu for the energy of a single photon, and using the value 1400((J))/((s)*m^(2))1400 \frac{\mathrm{~J}}{\mathrm{~s} \cdot \mathrm{~m}^{2}} for the energy of solar radiation per area per unit of time, estimate the number of photons arriving on earth from the sun per year. Of the photons arriving on earth, only about 50%50 \% are absorbed on the surface, whereas the rest is reflected or absorbed by clouds etc. Of these, only about 0.1%0.1 \% participate in the actual photosynthesis that results in a net gain of glucose (and then biomass). Hence, what is the number of photons per year participating in the creation of new biomass?
b) The average temperature on the surface of the earth is about T=280KT=280 \mathrm{~K}. Assuming that the intensity of low energy photons emitted from the earth back into space
follows a black body distribution,
I(nu)=(2pi hnu^(3))/(c^(2))(1)/(exp((h nu)/(k_(B)T))-1)I(\nu)=\frac{2 \pi h \nu^{3}}{c^{2}} \frac{1}{\exp \left(\frac{h \nu}{k_{B} T}\right)-1}
what is the most probable frequency nu\nu of photons emitted from earth, i.e. that maximizing II (do this first for general TT )? The total energy of photons absorbed by earth is approximately equal to that of the photons emitted back into space (this follows from energy conservation; we can ignore the chemical energy stored in new biomass). Hence, what is the ratio of photons received on earth to that emitted?
c) The entropy of a gas of N_(gamma)N_{\gamma} photons in thermal equilibrium is S_(gamma)∼0.9k_(B)N_(gamma)S_{\gamma} \sim 0.9 k_{\mathrm{B}} N_{\gamma}. Whence, what is the gain in entropy coming from those photons participating in the creation of new biomass (you can leave k_(B)k_{\mathrm{B}} in the formula)?
d) Now estimate the entropy decrease through the creation of new biomass. In an extremely simplified description of this process, we can say that CO_(2)\mathrm{CO}_{2} gets converted to C which is bound in organic material, and O_(2)\mathrm{O}_{2}, which is released back into the atomosphere. The atmosphere is treated as an ideal gas. According to the formula for the entropy of an ideal gas in the lecture, the entropy contributions are, respectively
The entropy of bound carbon in organic material is neglected. Using the atomic masses 12 u12 u for C and 16 u16 u for O , what is the decrease in entropy due to the conversion of CO_(2)\mathrm{CO}_{2} into O_(2)\mathrm{O}_{2} via the creation of new biomass per year (you can leave k_(B)k_{\mathrm{B}} in the formula)? Compare your answer to c). Comment?
Problem B. 19 (Athmospheric pressure). Consider a gas of NN classical, non-interacting particles enclosed in an infinitely high cylinder with base occupying an area SS. The cylinder is placed upright into a gravitational field i.e. vec(F)=m vec(g)\vec{F}=m \vec{g}, where the axis of the cylinder is parallel to vec(g)\vec{g}.
a) What is the Hamiltonian of the system? What is the density function rho\rho for the canonical ensemble?
b) What is the average number of particles above height hh ?
c) Derive from b) a formula for the pressure as a function of hh.
Problem B. 20 (Relativistic classical ideal gas). For a relativistic particle, the energymomentum relation is epsilon(p_)=sqrt(m^(2)c^(4)+c^(2)p^(2))\epsilon(\underline{p})=\sqrt{m^{2} c^{4}+c^{2} p^{2}}, where p=|p_|p=|\underline{p}|. We first consider a classical gas of NN indistinguishable massless particles enclosed in a box with volume VV.
a) Show that the partition function (canonical ensemble) is given by
b) By analogy with the non-relativistic case, where lambda_(cl)=h//sqrt(mk_(B)T)\lambda_{\mathrm{cl}}=h / \sqrt{m k_{B} T}, we see that the thermal de Broglie wave length is now lambda_(rel)=hc//(k_(B)T)\lambda_{\mathrm{rel}}=h c /\left(k_{B} T\right). Using Stirling's formula, find the expression
for the free energy. Use the standard relation P=-del F// del V|_(T)P=-\partial F /\left.\partial V\right|_{T} to derive the equation of state for the relativistic gas. Compare your result to the non-relativistic case. Do the same for the internal energy E=F-T del F// del T|_(V)E=F-T \partial F /\left.\partial T\right|_{V}.
c) Show that the relativistic and non-relativistic de Broglie wave lengths are related by
so that, for non-relativistic particles with k_(B)T≪mc^(2)k_{B} T \ll m c^{2} we have lambda_(rel)≫lambda_(cl)\lambda_{\mathrm{rel}} \gg \lambda_{\mathrm{cl}}. We can consider d=(V//N)^(1//3)d=(V / N)^{1 / 3} as the mean distance between the particles. Quantum effects should become important when dd becomes less than the de Broglie wave length. Increasing NN, where should quantum effects show up first, in the non-relativistic or the relativistic system?
d) At what temperature is the de Broglie wavelength comparable to the wave length for photons in the visible part of the spectrum, say 500 nm ? What wave length do photons have if their wave length is equal to the de Brogile wave length at room temperature?
e) Repeat the derivation in a) and b) for a massive relativistic particle working to first non-trivial order in mc^(2)//k_(B)Tm c^{2} / k_{B} T.
B. 5 Exercises for chapter 6
Problems
Problem B. 21 (First law of thermodynamics). Consider the first law of thermodynamics:
Hint: rewrite the first law in terms of the differentials dE,dV,dSd E, d V, d S.
b) Introduce the free energy by F=E-TSF=E-T S, viewed as a function of T,N,VT, N, V. Write the first law in terms of FF instead of EE.
c) Write the first law as dS=dotsd S=\ldots. Applying the exterior differential dd to the resulting equation and using d(dS)=0d(d S)=0, derive the relation
Hint: Keep in mind that dEdV=-dVdEd E d V=-d V d E.
Problem B. 22 (Idealized Otto engine). An idealized Otto engine is described by the following cycle:
I rarr III \rightarrow I I : Adiabatic compression of air: piston moves up.
II rarr III\rightarrow I I I : Constant volume heat transfer: ignition and burning of fuel.
III rarr IV\rightarrow I V : Adiabatic expansion: power stroke, piston moves down.
IV rarr I\rightarrow I : Constant volume cooling.
a) Draw the cycle in a (P,V)(P, V)-diagram. Identify those processes in the diagram where work is performed by/on the system, and where heat is injected/given off by the system.
b) Treating the fluid as an ideal gas, compute the net work Delta W\Delta W performed by the system in one cycle, and the heat DeltaQ_("in ")\Delta Q_{\text {in }} injected into the system. (Give each of these quantities in terms of the temperatures {:T_(I),dots,T_(IV))\left.T_{I}, \ldots, T_{I V}\right). Compute the efficiency eta\eta of the idealized Otto-cycle in terms of the temperatures.
Problem B. 23 (Cyclic process). Consider the following cyclic process:
I rarr III \rightarrow I I : Adiabatic (constant SS ) expansion
II rarr IIII I \rightarrow I I I : Isochoric (constant VV ) cooling
III rarr IV\rightarrow I V : Adiabatic (constant SS ) compression
IV rarr II V \rightarrow I : Isothermal (constant TT ) expansion
Throughout it is assumed that the particle number NN remains constant (so that dN=0d N=0 in the entire process), and we assume that the equations of state of an ideal gas hold:
PV=Nk_(B)T,quad E=(3)/(2)PV.P V=N k_{B} T, \quad E=\frac{3}{2} P V .
a) Show that PV=cstP V=c s t. on isotherms and PV^(5//3)=cstP V^{5 / 3}=c s t. on adiabatics using the equation(s) of state and the first law TdS=dE+PdVT d S=d E+P d V. (If you cannot do this, carry on with b)-e) assuming these results.)
b) Sketch the process in a ( P,VP, V )-diagram, and identify where heat is injected/given off by the system.
c) What is the work Delta W\Delta W performed by the system in one cycle?
d) What is the heat DeltaQ_("in ")\Delta Q_{\text {in }} injected into the system in one cycle?
e) What is the efficiency eta=(Delta W)/(DeltaQ_("in "))\eta=\frac{\Delta W}{\Delta Q_{\text {in }}} ?
State your answers in c)-e) in terms of N,T_(I),V_(I),V_(II)(=V_(III)),V_(IV)N, T_{I}, V_{I}, V_{I I}\left(=V_{I I I}\right), V_{I V}.
Problem B. 24 (Gibbs-Duhem relation). Consider a system in equilibrium characterized by a fixed energy EE, volume VV, and particle number N_(i)N_{i} for the ii-th species of particle. We argued in chapter 4 that the entropy S(E,V,{N_(i)})S\left(E, V,\left\{N_{i}\right\}\right) of such an equilibrium state is extensive in the sense that, for each nu\nu, we have
(Use the definitions of T,P,mu_(i)T, P, \mu_{i} in terms of SS given in the lectures.)
b) Write this relation as H=sum_(i)mu_(i)N_(i)H=\sum_{i} \mu_{i} N_{i} in terms of the free enthalpy HH, and derive the relationship
dP=sdT+sum_(i)n_(i)dmu_(i)d P=s d T+\sum_{i} n_{i} d \mu_{i}
for the pressure, where s=S//Vs=S / V and n_(i)=N_(i)//Vn_{i}=N_{i} / V are the entropy- and number densities. Derive the identities
c) Consider two copies of a system characterized by the variables z^((1))=(E^((1)),V^((1)),N^((1)))z^{(1)}=\left(E^{(1)}, V^{(1)}, N^{(1)}\right) and z^((2))=(E^((2)),V^((2)),N^((2)))z^{(2)}=\left(E^{(2)}, V^{(2)}, N^{(2)}\right), which are separately in equilibrium but not necessarily with each other. Since the entropy of the composite system is maximal in equilibrium, we should normally have
Argue that the entropy must be a concave function. [Recall that a function f(x)f(x) of nn variables x=(x_(1),dots,x_(n))x=\left(x_{1}, \ldots, x_{n}\right) is called concave iff f(lambda x+(1-lambda)y) >= lambda f(x)+(1-lambda)f(y)f(\lambda x+(1-\lambda) y) \geqslant \lambda f(x)+(1-\lambda) f(y) for all 0 <= lambda <= 10 \leqslant \lambda \leqslant 1.] In particular, show that
Problem B. 25 (Charged gas). We consider a gas of particles of unit charge +-q\pm q. The eigenstates of the charge operator hat(Q)\hat{Q} and the Hamiltonian hat(H)\hat{H} are |n_(+),n_(-):)\left|n_{+}, n_{-}\right\ranglewith
where n_(+),n_(-) >= 0n_{+}, n_{-} \geqslant 0 are integers that have the interpretation of the number of positively resp. negatively charged particles in the state. We consider a density matrix of the form
The information entropy is defined by S(rho)=-k_(B)tr rho log rhoS(\rho)=-k_{B} \operatorname{tr} \rho \log \rho, the mean energy by E=(: hat(H):)E=\langle\hat{H}\rangle and the mean charge by Q=(: hat(Q):)Q=\langle\hat{Q}\rangle.
a) Using the method of Lagrange multipliers, show that the density matrix which maximizes S(rho)S(\rho) for fixed E,QE, Q is of the form
(Here beta\beta and Phi\Phi are constants.)
b) Define G=-k_(B)T log Y(T,Phi)G=-k_{B} T \log Y(T, \Phi), where beta^(-1)=k_(B)T\beta^{-1}=k_{B} T. Show that
where S,QS, Q are defined as above.
c) For a charged gas at fixed volume, the first law of thermodynamics is TdS=dE-T d S=d E-Phi dQ\Phi d Q. What is the physical meaning of Phi\Phi ? Show that if we define G=E-TS-Phi QG=E-T S-\Phi Q, then G=G(T,Phi)G=G(T, \Phi) satisfies dG=-SdT-Qd Phid G=-S d T-Q d \Phi.
d) Verify the relations (B.3) using dG=-SdT-Qd Phid G=-S d T-Q d \Phi.
Problem B. 26 (Virial expansion and van der Waals equation of state). The aim of this exercise is to use the linked cluster expansion in order to derive an equation of state for a realistic monoatomic gas. Recall that the cluster expansion for a classical monoatomic non-relativistic gas is
where YY is the grand canonical partition function, z=e^(beta mu)z=e^{\beta \mu} is the fugacity, and lambda\lambda is the thermal de Broglie wavelength.
a) Using the Gibbs-Duhem relation and expressing the grand potential GG in terms of YY according to (6.70), show that
where n=N//Vn=N / V is the particle density.
c) We next want to eliminate zz in favor of nn in a). For this we write z=lambda^(3)n+z=\lambda^{3} n+a_(2)(lambda^(3)n)^(2)+a_(3)(lambda^(3)n)^(3)+dotsa_{2}\left(\lambda^{3} n\right)^{2}+a_{3}\left(\lambda^{3} n\right)^{3}+\ldots and substitute this in b) in order to determine a_(2),a_(3)a_{2}, a_{3}. Show that a_(2)=-2b_(2)a_{2}=-2 b_{2} and a_(3)=8b_(2)^(2)-3b_(3)a_{3}=8 b_{2}^{2}-3 b_{3}.
d) Using the result obtained in c) in a), derive the virial expansion
where B_(2)=-b_(2)lambda^(3),B_(3)=(4b_(2)^(2)-3b_(3))lambda^(6)B_{2}=-b_{2} \lambda^{3}, B_{3}=\left(4 b_{2}^{2}-3 b_{3}\right) \lambda^{6}.
e) Let us now study the virial coefficient B_(2)B_{2} for a typical gas. We use the following approximation:
v(r)={[+oo," for "r < r_(0)],[-u_(0)(r_(0)//r)^(6)," for "r > r_(0)]:}v(r)= \begin{cases}+\infty & \text { for } r<r_{0} \\ -u_{0}\left(r_{0} / r\right)^{6} & \text { for } r>r_{0}\end{cases}
Sketch this potential. Show that
2lambda^(3)b_(2)=-(4pir_(0)^(3))/(3)+4piint_(r_(0))^(oo)[e^(u_(0)(r_(0)//r)^(6)//(k_(B)T))-1]r^(2)dr2 \lambda^{3} b_{2}=-\frac{4 \pi r_{0}^{3}}{3}+4 \pi \int_{r_{0}}^{\infty}\left[e^{u_{0}\left(r_{0} / r\right)^{6} /\left(k_{B} T\right)}-1\right] r^{2} d r
Approximating the integrand by ~~u_(0)(r_(0)//r)^(6)//(k_(B)T)\approx u_{0}\left(r_{0} / r\right)^{6} /\left(k_{B} T\right) in the high temperature limit u_(0)//k_(B)T≪1u_{0} / k_{B} T \ll 1, show that
where V_(a)=(4pir_(0)^(3))/(3)V_{a}=\frac{4 \pi r_{0}^{3}}{3} is the effective volume of one atom.
f) Using the results of e), we can write the virial expansion as
neglecting higher orders than n^(2)n^{2} (i.e., assuming low density). Show that this can be written as
(P+a(N//V)^(2))(V-bN)~~Nk_(B)T\left(P+a(N / V)^{2}\right)(V-b N) \approx N k_{B} T
which is known as the van der Waals equation. Identify the van der Waals parameters a,ba, b with the microscopic parameters of the system.
g) Plot the isotherms of the van der Waals equation using a computer programme such as Mathematica. It is sensible to plot p=P//P_(c),P_(c)=a//(27b^(2))p=P / P_{c}, P_{c}=a /\left(27 b^{2}\right) against v=V//(3bN)v=V /(3 b N) and several isotherms around T_(c)=8a//(27 b)T_{c}=8 a /(27 b) in the range 0 < P//P_(c) < 20<P / P_{c}<2 and 0 < v < 100<v<10. You should see a distinctive qualitative change of the isotherms above and below T_(c)T_{c}. Compare this to the isotherms of the ideal gas.
Acknowledgements
These lecture notes are based on lectures given by Prof. Dr. Stefan Hollands at the University of Leipzig.
^(1){ }^{1} Of course this theory turned out to be incorrect. Nevertheless, we nowadays know that heat can be radiated away by particles which we call "photons". This shows that, in science, even a wrong idea can contain a germ of truth. ^(2){ }^{2} It seems that Lavoisier's foresight in political matters did not match his superb scientific insight. He became very wealthy owing to his position as a tax collector during the "Ancien Régime" but got in trouble for this lucrative but highly unpopular job during the French Revolution and was eventually sentenced to death by a revolutionary tribunal. After his execution, one onlooker famously remarked: "It takes one second to chop off a head like this, but centuries to grow a similar one."
^(1){ }^{1} This description is not always appropriate, as the example of a rigid body shows. Here the phase space coordinates take values in the co-tangent space of the space of all orthogonal frames describing the configuration of the body, i.e. Omega~=T^(**)SO(3)\Omega \cong T^{*} S O(3), with SO(3)S O(3) the group of orientation preserving rotations.
^(2){ }^{2} A general self-adjoint operator on a Hilbert space will have a spectral decomposition A=A=int_(-oo)^(oo)adE_(A)(a)\int_{-\infty}^{\infty} a d E_{A}(a). The spectral measure does not have to be atomic, as suggested by the formula (2.58). The corresponding probability measure is in general d mu(a)=(:Psi∣dE_(A)(a)Psi:)d \mu(a)=\left\langle\Psi \mid d E_{A}(a) \Psi\right\rangle.
^(1){ }^{1} This equation can be viewed as a discretized analog of the Boltzmann equation in the present context. See the Appendix for further discussion of this equation.
^(1){ }^{1} The quantity W^(cl)W^{\mathrm{cl}} is for this reason often defined by
Also, one often includes further combinatorial factors to include the distinction between distinguishable and indistinguishable particles, cf. (4.49).
^(2){ }^{2} For distinguishable particles, this would be H_(N)=L^(2)(R^(N))\mathcal{H}_{N}=L^{2}\left(\mathbb{R}^{N}\right). However, in real life, quantum mechanical particles are either bosons or fermions, and the corresponding definition of the NN-particle Hilbert space has to take this into account, see Ch. 5.
^(3){ }^{3} The proof of the linked cluster theorem is very similar to that of the formula (2.10) for the cumulants (:x^(n):)_(c)\left\langle x^{n}\right\rangle_{c}, see section 2.1.
^(1){ }^{1} Here, we make use of the Riemann zeta function, which is defined by
^(1){ }^{1} This is how one could actually mathematically implement the idea of "thermal contact"
^(2){ }^{2} Mathematically, the differentials dX_(i)\mathrm{d} X_{i} are the generators of a Grassmann algebra of dimension NN.
^(3){ }^{3} Here we quote the formula for indistinguishable particles, which means that we should include the (1)/(N!)\frac{1}{N!} into the definition of the microcanonical partition function W(E,N,V)W(E, N, V) for indistinguishable particles, cf. section 4.2.3.
^(4){ }^{4} One also uses the enthalpy defined as E+PVE+P V. Its natural variables are T,P,NT, P, N which is more useful for processes leaving NN unchanged.
^(5){ }^{5} Here we assume implicitly that [H, hat(N)_(1)]=0\left[H, \hat{N}_{1}\right]=0 so that HH maps subspaces of N_(1)N_{1}-particles to itself.
^(1){ }^{1} More precisely, chi_(ij)=lim(M_(i))/(V*B_(j))\chi_{i j}=\lim \frac{M_{i}}{V \cdot B_{j}} is a tensor. Here we only look at the zzz z-component of this tensor, which is relevant in our situation.