Variational Monte Carlo

This is the fourth part in a short series of blog posts about quantum Monte Carlo (QMC). The series is derived from an introductory lecture I gave on the subject at the University of Guelph.

Part 1 – calculating Pi with Monte Carlo

Part 2 – Galton’s peg board and the central limit theorem

Part 3 – Markov Chain Monte Carlo

Introduction to QMC – Part 4: High dimensional calculations with VMC

In the previous post, Markov Chains were introduced along with the Metropolis algorithm. We then looked at a Markov Chain Monte Carlo (MCMC) Python script for sampling probability distributions. Today we’ll use a modified version of that script to perform calculations inspired by quantum mechanics. This will involve sampling high dimensional probability distributions for systems with multiple particles.

We are heading into complicated territory, so I set up the first section of this post in question-answer format.


Why is it called VMC and not MCMC?

As far as I can tell, variational Monte Carlo (VMC) is the same as MCMC. The name was probably inspired by the variational theorem from quantum mechanics, but we’ll get to this later. For now let’s simply pose the problem:

Given a probability distribution \mathbf{P(R)}, where \mathbf{R=(r_1, r_2, ...)} is the configuration vector for the positions of \mathbf{N} particles in a box, calculate the average energy \mathbf{\langle E \rangle} of the system.

Why do we need to worry about probability distributions? Because in the quantum world things to not always have well defined positions, but instead only have well defined probabilities of being in specific positions. Hence the existence of probability distributions.


What function are we calculating?

We’ll calculate the average value of the local energy E(\mathbf{R}), which is defined as the sum of kinetic and potential energies:

E(\mathbf{R}) = T(\mathbf{R}) + V(\mathbf{R}).

It’s not unreasonable to plug in some \mathbf{R} and calculate E(\mathbf{R}) by hand or with a computer, but to calculate the average we’ll need to sample the probability distribution with Monte Carlo.

Kinetic term T

The kinetic energy will depend on the second derivative of the wave function, which is relatively difficult and computationally time consuming to calculate. To avoid dealing with this beast we’ll use a made-up function

T(\mathbf{R}) = \Big[\sum_i^{N/2}|\, x_i \cdot y_i \, |\Big] \cdot \Big[\sum_{j'}^{N/2} | \, x_{j'} \cdot y_{j'} \, |\Big].

It could be interpreted as the particles having more kinetic energy (i.e., larger T) depending on how far they are from the center of the box (as we’ll see the box is centered about r=(0,0)). This doesn’t make sense physically of course.

Potential term V

This is the cumulative potential energy of the particles, and unlike T it will be calculated “for real” (i.e., how it actually can be done in practice). We’ll assume the system is split into equal sized groups of different “species” particles, denoted i and j' [1], and only consider interactions between particles of different species. In this case we have

V(\mathbf{R}) = \sum_{i<j'}^{N} v(r_{i, j'}) ,

where v(r_{i, j'}) is the two-body potential and r_{ij'} is the distance between particles i and j'. The i<j' notation ensures that we only count each interaction once. We'll take v(r_{i, j'}) to be a shifted Gaussian:

from scipy import stats
mu = 0.5
sig = 0.1
y = lambda r: -stats.norm.pdf((r-mu)/(sig))


We’ll focus on the case where N=4, for which the potential becomes

V(\mathbf{R}) = v(r_{1, 1'}) + v(r_{1, 2'}) + v(r_{2, 1'}) + v(r_{2, 2'}).


What probability distribution will we sample?

From elementary quantum mechanics, the probability density is given by the square of the wave function. So, for our many-body wave function \Psi_V, we have that

P(\mathbf{R}) = |\Psi_V^2(\mathbf{R})|.

The absolute value is meaningful when the wave function has imaginary components, which is not the case for us today. Our first \Psi_V will be a linear combination of single-particle wave functions \psi_1(\mathbf{r}) and \psi_2(\mathbf{r}) [2]:

\Psi_V(\mathbf{R}) = \sum_i^N \Big[\psi_1(\mathbf{r}_i)+\psi_2(\mathbf{r}_i)\Big].

def prob_density(R, N):
    ''' The square of the many body wave function
        Psi_V(R). '''

    # e.g. for N=4:
    # psi_v = psi(r_1) + psi(r_2) + psi(r_3) + psi(r_4)
    psi_v = sum([psi_1(R[n][0], R[n][1]) for n in range(N)]) + \
            sum([psi_2(R[n][0], R[n][1]) for n in range(N)])

    # Setting anything outside the box equal to zero
    # This will keep particles inside
    for coordinate in R.ravel():
        if abs(coordinate) >= 1:
            psi_v = 0

    return np.float64(psi_v**2)

Our configuration space is two dimensional (so we can visualize it nicely) and so particle positions consist of just x and y coordinates, i.e. \mathbf{r}=(x, y). Lets take \psi_1(\mathbf{r}) to be a positive Gaussian and \psi_2(\mathbf{r}) to be a negative Gaussian.

def psi_1(x, y):
    ''' A single-particle wave function. '''
    g1 = lambda x, y: mlab.bivariate_normal(x, y, 0.2, 0.2, -0.25, -0.25, 0)
    return g1(x, y)

def psi_2(x, y):
    ''' A single-particle wave function. '''
    g2 = lambda x, y: -mlab.bivariate_normal(x, y, 0.2, 0.2, 0.25, 0.25, 0)
    return g2(x, y)

The summation of \psi_1(\mathbf{r}) and \psi_2(\mathbf{r}) looks like:


If we take this summation to be the wave function of a system with just one particle then the associated probability distribution for the particle location would look like:


So what about a plot of the many-body wave function \Psi_V(\mathbf{R})? We would require a higher dimensional space for this. Take the example of N=4 particles. In this case we have \mathbf{R}=(\mathbf{r_1}, \mathbf{r_2}, \mathbf{r_3}, \mathbf{r_4}) and would need to include an axes for not just “x and y” as done above, but for x_1, x_2, x_3, x_4, y_1, y_2, y_3, and y_4.


Calculating the average energy

We’ll focus on the N=4 particle system where 2 are species i and 2 are species j'. The wave function in this case is

\Psi_V(\mathbf{R}) = \psi_1(\mathbf{r}_1)+\psi_2(\mathbf{r}_1)+ \psi_1(\mathbf{r}_2)+\psi_2(\mathbf{r}_2)+
\,\,\,\,\, \psi_1(\mathbf{r}_{1'})+\psi_2(\mathbf{r}_{1'})+ \psi_1(\mathbf{r}_{2'})+\psi_2(\mathbf{r}_{2'}).

Running a quick simulation with 1000 walkers (i.e., 1000 samples per step) for 40 steps, the samples look like this:


The top left panel shows the initial state where walkers are distributed randomly. As the system equilibrates we see them drifting into areas where the probability density P(\mathbf{R}) is large. Species i is plotted in blue and j' in red.

To calculate \langle E \rangle we average over many values of E(\mathbf{R}) for equilibrated samples. An example using 200 walkers is shown below.


Each point is an average over all walkers at the given step. In this case we calculate an average energy of -0.211 \pm 0.017, where the first 200 steps have been excluded so we only average over the equilibrated system. The error, as shown by the red band around the average, is calculated as the sample standard deviation of \langle E \rangle [3]:

E_{\text{error}} = \sqrt{\frac{1}{N} \sum_{i=0}^N (E_i - \langle E \rangle)^2}

Running a calculation with 2000 walkers, we can see a dramatically reduced error:


Now we calculate \langle E \rangle = -0.202 \pm 0.005, which agrees within error to the previous calculation.


Equilibration times

In the calculations above, the system appeared to equilibrate almost immediately. This is because the initial configurations were randomly distributed about the box. If instead we force the particles to start near the corners of the box (far away from the areas where the probability distribution is large) we can clearly see E(\mathbf{R}) decreasing as equilibration occurs. Below we plot this along with the particle density from various regions of the calculation (as marked in yellow in the right panel).



Variational Theorem

The usefulness of VMC for quantum mechanical problems has to do with the variational theorem. In words, this theorem says that the energy expectation value (denoted \langle \hat{H} \rangle for short) is minimized by the true ground state wave function of the system. This can be written mathematically as follows:

\langle \hat{H} \rangle \equiv \langle \Psi_V~|~\hat{H}~|~\Psi_V \rangle \geq E_0,

where E_0 is the ground state energy of the system. A proof can be found at the bottom of page 60 of my masters thesis. The triple-bar equals sign simply means “is defined as”, what’s important is the other part of the equation.

If the name of the game is to find the ground state energy, which is often the case, then a good estimate can be achieved using VMC. A \Psi_V can include variable parameters to optimize, which is done by calculating \langle \hat{H} \rangle for a particular set of parameters (the same way we’ve calcualted \langle E \rangle in this post) and repeating for different parameters until the lowest value is found. The resulting energy estimate will be an upper bound to the true ground state energy of the system.


Calculating average energy with a different many-body wave function

To show how changing \Psi_V can impact the calculation of \langle E \rangle we’ll adjust the wave function:

\Psi_V(\mathbf{R}) = \sum_i^{N/2} \psi_1(\mathbf{r}_i) + \sum_{j'}^{N/2} \psi_2(\mathbf{r}_{j'}),

where the two species i and j' now have different single-particle wave functions. As shown above, we define \psi_1(\mathbf{r}) as the left Gaussian (looking back to the first figure) and \psi_2(\mathbf{r}) as the right one. For N=4 we now have:

\Psi_V(\mathbf{R}) = \psi_1(\mathbf{r}_1)+\psi_1(\mathbf{r}_2)+\psi_2(\mathbf{r}_{1'})+\psi_2(\mathbf{r}_{2'}),

The new probability density will be:

def prob_density(R, N):
    ''' The square of the many body wave function
        Psi_V(R). '''

    psi_v = sum([psi_1(R[n][0], R[n][1]) for n in range(int(N/2))]) + \
            sum([psi_2(R[n][0], R[n][1]) for n in range(int(N/2),N)])

    # Setting anything outside the box equal to zero
    # This will keep particles inside
    for coordinate in R.ravel():
        if abs(coordinate) >= 1:
            psi_v = 0

    return np.float64(psi_v**2)

This should have the effect of separating the two species of particles. Do you think \langle E \rangle will become larger or smaller as a result?

Plotting some of the samples, we can see how the system now equilibrates according to the new probability density:


Comparing a calculation with the new \Psi_V (red) to the old one (blue), we see an increase in \langle E \rangle:



Thanks for reading! You can find the entire ipython notebook document here. If you would like to discuss any of the plots or have any questions or corrections, please write a comment. You are also welcome to email me at or tweet me @agalea91



[1] – For example we could have a system of cold atoms with two species that are identical except for the total spin. Where one species is spin-up and the other is spin-down.

[2] – I made the many-body wave function a sum of single-particle wavefunctions \psi but it would have been more realistic to make it a product of \psi‘s as can be seen here (for example).

[3] – In practice, the error on Monte Carlo calculations is often given by the standard deviation divided by \sqrt{N}. Otherwise the error does not decrease as N increases, which should intuitively be the case because we are building confidence in the calculation as the simulation is run longer. For more information see this article – specifically the first few equations and Figure 1.

5 thoughts on “Variational Monte Carlo

  1. Hi, I think you may have a mistake, at the bottom of the part “WHAT FUNCTION ARE WE CALCULATING?” ,the formula V(R) should be V(R)=v(r1,1′)+v(r1,2′)+v(r2,2′)+v(2,1′), I don’t understand why you didn’t write the last term , or maybe I was wrong. So, if you know why, please help and explain to me. thanks.


  2. I also don’t understand why you write the many body wave function as the sum of single particle wave function , I think this is not correct.


    1. What I have done should not be done in practice (and I regret doing it here). A common way to combine single particles wave functions is in a product (see my newly added point [2]). All I am doing is “guessing” a wave function (which is of course allowed) and performing VMC with that wavefunction. Then when I “guess” a different one we see a change in the VMC result.

      Thanks so much for your comments!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s