30 min read. Published 2025-06-25. Last edited 2025-07-14.
Hello! In this article, we will explore the topic of random fields, which are continuous random functions from some metric space to the real numbers. In trying to model them, we will come up with some properties we want them to have (called desiderata) and find an explicit formula that describes their statistical properties. This formula will be able to test if a process for generating random fields satisfies our desiderata, and will also suggest an algorithm guaranteed to satisfy it.
This article is intended to be read by people with at least basic college-level math experience. Be prepared to take the time to parse and understand some potentially lengthy mathematical expressions to get the most out of it.
I love theoretical physics. I've been subscribed to PBS Spacetime for about 8 years now and get excited when I see new things from Quanta Magazine. Theoretical physics is, in part, the search for a so-called Theory of Everything. Such a Theory of Everything would be able to explain every phenomenon and predict every outcome, at least in theory. It might take a very big computer and some clever approximations to actually calculate what the theory says, but at least you know what the rules of the universe are at their most fundamental level.
However, right now theoretical physicists are stuck with two successful but incompatible theories: Quantum Field Theory (QFT) and General Relativity (GR). QFT is best at explaining things that happen around the scale of atoms. GR is best at explaining gravity, which happens around heavy things like planets. By making informed simplifications of the two theories, they can also be extended to explain things that happen in our everyday life. As water drips from the faucet, the surface tension of the water is a quantum property, whereas the water falling into the sink is because of gravity.
And yet despite their success, QFT and GR remain separate theories. This is frustrating for theoretical physicists, because it means there are some phenomena that even in principle we don't know how to predict. Most of these have to do with black holes, like what happens inside a black hole, or on the surface of a black hole.
A theory that would successfully unify QFT and GR would be a theory of Quantum Gravity. This is currently considered the Holy Grail of theoretical physics. I often wonder what a theory of Quantum Gravity could look like, and I once heard that part of the answer could be that spacetime itself is randomly configured. But what could that even mean? What would that look like? I needed to get my hands on some math, so I decided to do this project.
I decided to focus on trying to model random fields, which are a randomly-fluctuating variable on an existing static space, like a 2D or 3D space. It just so happens that there are a lot of tools for describing randomly-fluctuating varibles in time, under the name Stochastic Process Theory.
The most basic example of a stochastic process is Brownian motion. At each timestep $dt$, the Brownian motion value randomly increases or decreases a little bit; in the limit as $dt\to 0$, the result is a random function from $\R\to\R$. This could be used to model a stock price going up and down with time, or a continuous random walk in one dimension. In the case of stock prices the value of the Brownian motion has units of dollars for each point in time, but in line with the physics intution in general we will say that the Brownian motion has a certain field strength, and will represent this field strength with $\phi$.
Brownian motion has many interesting properties.
One of the fundamental properties of Brownian motion is the probability relation between field strengths at two points. Specifically, for a function $\phi(t)$ representing Brownian motion, we have the following Probability Density Function (PDF), denoted $\mathcal{P}$, in $\phi_1$:
$$\mathcal{P}\left(\phi(t_1)=\phi_1 \mid \phi(t_0)=\phi_0\right) \sim \exp\left(-\tfrac{1}{2\nu}\frac{\left(\phi_1-\phi_0\right)^2}{|t_1-t_0|}\right)$$
Most of the math in this article will look like the quantities in the above relation, so please take your time to understand it. This is a normal distribution for $\phi_1$ with mean $\phi_0$ and variance $\nu |t_1-t_0|$. Here, $\nu$ is a proportionality constant with units of $\phi^2 / t$. The larger the value of $\nu$, the faster the resulting Brownian motion changes field strength. Probability density functions also have a proportionality constant so that they integrate to one, but we will be omitting this; the important thing is the exponential and its contents. As such, we only say that the PDF is proportional to the exponential expression, denoted by the $\sim$. Moreover, this proportionality constant is not a function of $\phi_0$ or $\phi_1$, but it may be a function of $\nu$, $t_0$, and $t_1$. From now on, $\sim$ forbids proportionality constants that depend on any $\phi_i$.
For reference, we will name the above exponential expression the "correlation function" for this configuration. Note that from a correlation function, the actual PDF can be recovered by finding the result of integrating the correlation function over all inputs, and dividing the correlation function by that result. This means that when we integrate this new quantity over all inputs, we get $1$, which is the requirement on a Probability Density Function.
So we have the correlation function for the field strength of $\phi(t_1)$ given knowledge of $\phi(t_0)$. We can use this correlation function to begin to generate a sample of Brownian motion. Starting with some $t_0$ where $\phi(t_0)$ equals some $\phi_0$, we can then sample a field strength for $\phi(t_1)$ using this correlation function. If we could find the joint correlation function for $\phi(t_1)$ and $\phi(t_2)$ given $\phi(t_0)$, then by using Bayes' theorem we could find the probability distribution for $\phi(t_2)$ given $\phi(t_0)$ and $\phi(t_1)$. Specifically, $\mathcal{P}\left(\phi(t_2)=\phi_2\mid \phi(t_0)=\phi_0 \land \phi(t_1)=\phi_1\right)=\frac{\mathcal{P}\left(\phi(t_1)=\phi_1\land \phi(t_2)=\phi_2\mid \phi(t_0)=\phi_0\right)}{\mathcal{P}\left(\phi(t_1)=\phi_1\mid \phi(t_0)=\phi_0\right)}$, which would be proportional to the correlation function for $\phi(t_1)$ and $\phi(t_2)$ given $\phi(t_0)$ divided by the correlation function for $\phi(t_1)$ given $\phi(t_0)$. Based on this distribution, we could sample a field strength for $\phi(t_2)$ given field strengths for $\phi(t_0)$ and $\phi(t_1)$. Imagine that if we then could compute the joint correlation function for $\phi(t_1)$, $\phi(t_2)$, and $\phi(t_3)$ given $\phi(t_0)$, we'd be able to sample a field strength for $\phi(t_3)$ using the same method, and so on. This gives us an algorithm to generate as many points of Brownian motion as we want, and in the limit we can generate a sample for the entire function.
Let's then try to directly compute the joint correlation function for $\phi(t_1)$ and $\phi(t_2)$ given $\phi(t_0)$, assuming that $t_0<t_1<t_2$. This means computing $\mathcal{P}\left(\phi(t_1)=\phi_1 \land \phi(t_2)=\phi_2 \mid \phi(t_0)=\phi_0\right)$. By Bayes' theorem, this equals $\mathcal{P}\left(\phi(t_2)=\phi_2 \mid \phi(t_0)=\phi_0 \land \phi(t_1)=\phi_1\right)\mathcal{P}\left(\phi(t_1)=\phi_1 \mid \phi(t_0)=\phi_0\right)$. The Markov property says that the distribution for $\phi(t_2)$ only depends on $\phi(t_1)$, because $\phi(t_1)$ effectively "cuts off" dependence on $\phi(t_0)$. So applying the Markov property, we simplify to $\mathcal{P}\left(\phi(t_2)=\phi_2 \mid \phi(t_1)=\phi_1\right)\mathcal{P}\left(\phi(t_1)=\phi_1 \mid \phi(t_0)=\phi_0\right)$. Because this is a product of PDFs, the correlation function for this will just be the product of each individual correlation function. As a result, the correlation function for the joint distribution is $\exp\left(-\tfrac{1}{2\nu}\left(\frac{\left(\phi_2-\phi_1\right)^2}{|t_2-t_1|} + \frac{\left(\phi_1-\phi_0\right)^2}{|t_1-t_0|}\right)\right)$. Notice how within the exponential expression, we simply add the two fractions associated with each individual correlation function. Not only that, it can be expressed as a quadratic polynomial in the $\phi_i$'s, and in general this form looks like it has a lot of structure.
This is very nice! This result is a sign that this kind of problem might continue to yield to symbolic methods.
Ok, so we know what the correlation function for $\phi(t_1)$ and $\phi(t_2)$ is given $\phi(t_0)$; it's $\exp\left(-\tfrac{1}{2\nu}\left(\frac{\left(\phi_2-\phi_1\right)^2}{|t_2-t_1|} + \frac{\left(\phi_1-\phi_0\right)^2}{|t_1-t_0|}\right)\right)$. What is the correlation function for $\phi(t_0)$ and $\phi(t_1)$ given $\phi(t_2)$? Well, by the above argument the correlation function should be the product of the correlation function for $\phi(t_0)$ given $\phi(t_1)$ times the correlation function for $\phi(t_1)$ given $\phi(t_2)$, which is also $\exp\left(-\tfrac{1}{2\nu}\left(\frac{\left(\phi_2-\phi_1\right)^2}{|t_2-t_1|} + \frac{\left(\phi_1-\phi_0\right)^2}{|t_1-t_0|}\right)\right)$. And directly by the Markov property, the correlation function for $\phi(t_0)$ and $\phi(t_2)$ given $\phi(t_1)$ is the product of those two individual correlation functions, meaning the joint correlation function is also $\exp\left(-\tfrac{1}{2\nu}\left(\frac{\left(\phi_2-\phi_1\right)^2}{|t_2-t_1|} + \frac{\left(\phi_1-\phi_0\right)^2}{|t_1-t_0|}\right)\right)$.
So... the correlation function for $\phi(t_0)$ and $\phi(t_1)$ given $\phi(t_2)$ equals the correlation function for $\phi(t_1)$ and $\phi(t_2)$ given $\phi(t_0)$ equals the correlation function for $\phi(t_0)$ and $\phi(t_2)$ given $\phi(t_1)$. No matter which $\phi(t_i)$ is the "given", they are all the same. In fact, this is a general property of correlation functions. We will prove this in a moment, but first we will introduce a shorthand as otherwise the terms involved would be lengthy. Instead of writing something like $\mathcal{P}\left(\phi(t_{i_0})=\phi_{i_0}\land\phi(t_{i_1})=\phi_{i_1}\land\ldots\land\phi(t_{i_m})=\phi_{i_m}\mid\phi(t_{i_{m+1}})=\phi_{i_{m+1}}\land\ldots\land\phi(t_{i_{m'}})=\phi_{i_{m'}}\right)$, I will instead just write out the $\phi_i$ terms, like $\mathcal{P}\left(\phi_{i_0},\phi_{i_1},\ldots,\phi_{i_m}\mid \phi_{i_{m+1}},\ldots,\phi_{i_{m'}}\right)$.
On to the proof. Let $C_i(\phi_0, \ldots, \phi_n)$ be the correlation function for all $\phi(t_j)$ except $\phi(t_i)$, with $\phi(t_i)$ instead being given. We want to prove that $C_i(\phi_0, \ldots, \phi_n)=C_j(\phi_0, \ldots, \phi_n)$ for all $i$ and $j$. By Bayes' theorem, $\mathcal{P}\left(\phi_0,\ldots,\phi_{i-1},\phi_{i+1},\ldots,\phi_n\mid \phi_i\right)=\mathcal{P}\left(\phi_0,\ldots,\phi_{j-1},\phi_{j+1},\ldots,\phi_n\mid \phi_j\right)\frac{\mathcal{P}\left(\phi_j\mid \phi_i\right)}{\mathcal{P}\left(\phi_i\mid \phi_j\right)}$. So as long as $\mathcal{P}\left(\phi_j\mid \phi_i\right)\sim\mathcal{P}\left(\phi_i\mid \phi_j\right)$, then $\mathcal{P}\left(\phi_0,\ldots,\phi_{i-1},\phi_{i+1},\ldots,\phi_n\mid \phi_i\right)\sim\mathcal{P}\left(\phi_0,\ldots,\phi_{j-1},\phi_{j+1},\ldots,\phi_n\mid \phi_j\right)$. From this we can prove that $C_i(\phi_0, \ldots, \phi_n)\sim C_j(\phi_0, \ldots, \phi_n)$ because of the proportional relationship of each to their PDFs, but we would also like to prove that they are exactly equal. An important thing to note is that because of the constraint on proportionality constants, in the case where each correlation function is the exponential of a polynomial where all terms are exactly degree 2 over $\phi_0,\ldots,\phi_n$ (like the above formula), then any kind of "proportionality constant" would look like an extra term of degree 0 in the polynomial, which if nonzero is disallowed by construction. Therefore, the proportionality constant must always be $1$, which means that $C_i(\phi_0, \ldots, \phi_n)=C_j(\phi_0, \ldots, \phi_n)$ in these cases.
What this means is that as long as each of the individual correlation functions $C_i(\phi_i,\phi_j)$ and $C_j(\phi_i,\phi_j)$ are equal, then by proportionality $\mathcal{P}\left(\phi_j\mid \phi_i\right)\sim\mathcal{P}\left(\phi_i\mid \phi_j\right)$, and it is always true that $C_i(\phi_0,\ldots,\phi_n)=C_j(\phi_0,\ldots,\phi_n)$. In other words, we could completely drop the subscript on $C$ and just write $C(\phi_0,\ldots,\phi_n)$, because they are all the same. This is completely true in the case of Brownian motion, where the individual correlation functions in both cases are just $\exp\left(-\tfrac{1}{2\nu}\frac{\left(\phi_j-\phi_i\right)^2}{|t_j-t_i|}\right)$.
These correlation functions, then, somehow capture the actual level of mutual likelihood for any configuration of field strengths $\phi(t_0),\ldots,\phi(t_n)$, regardless of which one is being treated as "given". As such, they encode the level of "correlation" between different points in the configuration. They may seem simple right now because of the Markov property making them easily factorable, but they will grow more complex in the later sections, where they will serve as invaluable tools in our analysis.
Now that we have an example of a joint correlation function, let's walk through a certain operation on joint correlation functions that will help us later. We have the joint correlation function for $\phi(t_0)$, $\phi(t_1)$, and $\phi(t_2)$. Because this encodes information about the distribution of $\phi(t_1)$ and $\phi(t_2)$ given $\phi(t_0)$, we'd like to prove that if we are somehow able to ignore the information we get about $\phi(t_1)$, then we recover the distribution for $\phi(t_2)$ given $\phi(t_0)$, represented by the correlation function for $\phi(t_0)$ and $\phi(t_2)$. This would ensure that our joint correlation function consistently generalizes the individual correlation functions.
We will do this by integrating the correlation function over all possible field strengths of $\phi_1$. This is equivalent to adding up the probability that $\phi_1$ takes on any field strength, which is to say no constraint on $\phi_1$ at all. Formally speaking, this looks like $\mathcal{P}\left(\phi_2\mid \phi_0\right)=\int_\R\mathcal{P}\left(\phi_1, \phi_2\mid \phi_0\right)d\phi_1$. (Note the equality because this is written using PDFs.) But because we are working with correlation functions, we instead say $\mathcal{P}\left(\phi_2\mid \phi_0\right)\sim\int_\R \exp\left(-\tfrac{1}{2\nu}\left(\frac{\left(\phi_2-\phi_1\right)^2}{|t_2-t_1|} + \frac{\left(\phi_1-\phi_0\right)^2}{|t_1-t_0|}\right)\right) d\phi_1$. Now, it turns out that given constants $v_j$ for $j\in{0\ldots n}$, $\int_\R\exp\left(-\tfrac{1}{2\nu}\left(\sum_{j=0}^nv_j\phi_j\right)^2\right)d\phi_i\sim 1$. However, if instead the correlation function looks like $\exp\left(-\tfrac{1}{2\nu}\left(\left(\sum_{j=0}^nv_j\phi_j\right)^2+\eta\right)\right)$ where $\eta$ is just some term that doesn't depend on $\phi_i$, then integrating over $\phi_i$ will just result in $\exp\left(-\tfrac{1}{2\nu}\eta\right)$. The intuition is we have isolated all the behavior associated with $\phi_i$ into a self-contained Gaussian distribution for $\phi_i$, represented by a square quadratic in the correlation function, at which point integrating over it eliminates it cleanly.
Ok wow, this seems like an elegant tool. All we have to do is put our correlation function into a form like this, and then we immediately get what the correlation function would be if we integrate over $\phi_1$! It turns out that $\frac{\left(\phi_2-\phi_1\right)^2}{|t_2-t_1|} + \frac{\left(\phi_1-\phi_0\right)^2}{|t_1-t_0|}=\left(\frac{1}{\left|t_2-t_1\right|}+\frac{1}{\left|t_1-t_0\right|}\right)^{-1}\left(\frac{\phi_1-\phi_2}{\left|t_2-t_1\right|}+\frac{\phi_1-\phi_0}{\left|t_1-t_0\right|}\right)^{2}+\frac{\left(\phi_2-\phi_0\right)^{2}}{\left|t_2-t_0\right|}$. This might look pretty nasty, but the upshot is that the first term in the sum is exactly that square quadratic isolating $\phi_1$ we were looking for. The second term would then be what the correlation function reduces to when we integrate over $\phi_1$, that is to say $\exp\left(-\tfrac{1}{2\nu}\frac{\left(\phi_2-\phi_0\right)^{2}}{\left|t_2-t_0\right|}\right)$. And hey, look! It's exactly what we would expect the correlation function for $\phi(t_2)$ to be given knowledge of $\phi(t_0)$.
At this point we should have a pretty good handle on Brownian motion, and we are equipped with several tools to study the joint probability distributions of configurations of points in a Brownian motion process. We are ready for the next step.
Consider that our $\phi$ so far has been a random function $\R \to \R$ that obeys certain statistical properties. But we want to generalize this! The argument to $\phi$ thus far has meant time, but we are by no means limited to such semantics. We can instead imagine that the $\phi(t)$ is a static field that fluctuates on a 1-dimensional space parametrized by $t$. But what if our $\phi$ was instead $\R^2 \to \R$ or $\R^3 \to \R$? What about more complex spaces, like the surface of a sphere or a hyperbolic space? How far can we push this thing? And this is an interesting idea, but we still need some kind of guiding principle or intuition to constrain the space of possibilities and give us some interesting mathematical structure that we can analyze.
Before we continue, I need to introduce the idea of metric spaces and geodesics. A metric space is just a set of points $\mathcal{M}$ and a distance function $d:\mathcal{M}\times\mathcal{M}\to \R_{\geq 0}$ that obeys the standard properties of a distance function. This includes the triangle inequality, which says that $d(x_0, x_1) + d(x_1, x_2) \geq d(x_0, x_2)$, where $x_0, x_1, x_2$ are all points in $\mathcal{M}$. A geodesic, then, is the shortest path between two points in $\mathcal{M}$ according to $d$, parametrized like $\Gamma(t)$ for distance parameter $t$. It generalizes the concept of a "straight line" in Euclidean spaces like $\R^n$, and in Euclidean spaces the geodesics are just the straight lines. If we imagine a particle able to move in $\mathcal{M}$, a geodesic is a trajectory the particle would take if unaffected by external forces. Walking along a line of constant longitude is a geodesic on the surface of a sphere, but aside from the equator, walking along a line of constant latitude is not because that trajectory is curved. (Do you see why?)
Our field $\phi$ is now $\mathcal{M}\to\R$, and is written $\phi(x)$. What if we require that our random fields $\phi(x)$ obey the following property: along any geodesic $\Gamma(t)$, the function $\phi(\Gamma(t))$ looks like a regular instance of 1D Brownian motion. We can even simplify this to the following condition: the correlation function for two points $\phi(x_0)$ and $\phi(x_1)$ with field strengths $\phi_0$ and $\phi_1$ is $\exp\left(-\tfrac{1}{2\nu}\frac{\left(\phi_1-\phi_0\right)^2}{d(x_0, x_1)}\right)$. The geodesic formulation immediately implies the two-point formulation, but the other way around is actually unsatisfiable sometimes, because there are special cases where $\Gamma(t)$ self-intersects in ways that violate the assumptions of Brownian motion. This is fine actually, because the weaker two-point formulation is all we need and is also easier to use.
Given this desideratum, we want to derive the joint correlation functions for arbitrary configurations of points. This will allow us to test if a process for generating random fields actually satisfies our desideratum, and will also allow us to directly generate samples from the random field by the clever algorithm described earlier that uses Bayes' theorem.
Before we derive the solution in the generalized case of $n$ points, let's start with the simplest nontrivial generalization: the correlation function for three points in an arbitrary metric space.
The solution for the three-point correlation function looks like this: $\exp\left(-\tfrac{1}{2\nu}\vec{\phi}^T D^{-1}\vec{\phi}\right)$. In this formula, $\vec{\phi}=\left<\phi_1 - \phi_0, \phi_2 - \phi_0\right>$ and $D$ is a $2\times 2$ matrix. I claim that $D = \begin{bmatrix}d(x_0, x_1) & \frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2} \\ \frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2} & d(x_0, x_2)\end{bmatrix}$, but we will actually derive this together! I admit it is a bit inelegant because we're treating $\phi_0$ inherently differently from all the other field strengths, which means there will always be some awkwardness in this formulation. However, I'm unaware of how to express this in a more elegant way, so it will have to do for now.
Note that assuming that the correlation function is some kind of Gaussian over all the $\phi$'s, and that the correlation function is invariant to adding or subtracting the same amount to all field strengths, then the solution must look like the above for some value of $D^{-1}$. We can require that $D$ is symmetric because any asymmetry will get cancelled out when we dot with $\vec{\phi}$. We will prove that integrating this over either $\phi_0$, $\phi_1$, or $\phi_2$ will yield the corresponding two-point correlation function for the other two points. Note that $D$ inherently has three degrees of freedom (because it is a symmetric $2\times 2$ matrix), and we have three constraints from each of the integrals that will end up exactly constraining $D$ to the above form. Our only condition is that none of the points $x_i$ overlap, because otherwise the correlation function is not well-defined anyways.
What do these constraints look like? There are effectively two types: an integral over $\phi_1$ or $\phi_2$, and an integral over $\phi_0$. The first constraint is simpler, so we'll start with it.
Suppose that we want to integrate over $\phi_2$. First, deconstruct $D=\begin{bmatrix}A & \vec{b} \\ \vec{b}^T & c\end{bmatrix}$ for $1\times 1$ matrix $A$, length 1 vector $\vec{b}$, and scalar $c$. Note that $c$ has its row and column index within $D$ equal to the index containing $\phi_2$ in $\vec{\phi}$. What, then, is $D^{-1}$? In general, the inverse of a matrix in this form looks like $D^{-1}=\begin{bmatrix} A^{-1}+\frac{A^{-1}\vec{b}\vec{b}^TA^{-1}}{c-\vec{b}^TA^{-1}\vec{b}} & -\frac{A^{-1}\vec{b}}{c-\vec{b}^TA^{-1}\vec{b}}\\ -\frac{\vec{b}^TA^{-1}}{c-\vec{b}^TA^{-1}\vec{b}} & \left(c-\vec{b}^TA^{-1}\vec{b}\right)^{-1} \end{bmatrix}$. Note that if we define $\vec{v}=\left<-\frac{A^{-1}\vec{b}}{c-\vec{b}^TA^{-1}\vec{b}}, \left(c-\vec{b}^TA^{-1}\vec{b}\right)^{-1}\right>$, then $D^{-1}=\vec{v}\vec{v}^T+\begin{bmatrix}A^{-1}&\vec{0}\\ \vec{0}^T&0\end{bmatrix}$. Plugging this into our correlation function, we get $\exp\left(-\tfrac{1}{2\nu}\vec{\phi}^T\left(\vec{v}\vec{v}^T+\begin{bmatrix}A^{-1}&\vec{0}\\ \vec{0}^T&0\end{bmatrix}\right)\vec{\phi}\right)=\exp\left(-\tfrac{1}{2\nu}\left(\left(\vec{v}\cdot\vec{\phi}\right)^2+\vec{\phi}^T\begin{bmatrix}A^{-1}&\vec{0}\\ \vec{0}^T&0\end{bmatrix}\vec{\phi}\right)\right)$. Letting $\vec{\psi}=\left<\phi_1-\phi_0\right>$ be a length 1 vector that does not depend on $\phi_2$, this can be further simplified to $\exp\left(-\tfrac{1}{2\nu}\left(\left(\vec{v}\cdot\vec{\phi}\right)^2+\vec{\psi}^TA^{-1}\vec{\psi}\right)\right)$. Hey, wow, would you look at that. It's in exactly the form we need to integrate over $\phi_2$. Doing this, we get that the correlation function for $\phi(x_0)$ and $\phi(x_1)$ is $\exp\left(-\tfrac{1}{2\nu}\vec{\psi}^TA^{-1}\vec{\psi}\right)$. Now of course in this case, $\vec{\psi}$ is just a single value, $\phi_1-\phi_0$, which means that to be consistent with our two-point correlation function, $A^{-1}$ must equal $\begin{bmatrix}\frac{1}{d(x_0, x_1)}\end{bmatrix}$. And because $A$ is just a $1\times 1$ matrix, then $A=\begin{bmatrix}d(x_0, x_1)\end{bmatrix}$.
At this point, we know that $D$ must look something like $\begin{bmatrix}d(x_0, x_1) & ?\\? & ?\end{bmatrix}$. Repeating the argument with integrating over $\phi_1$ will give us the bottom right corner, at which point $D=\begin{bmatrix}d(x_0, x_1) & ?\\? & d(x_0,x_2)\end{bmatrix}$. Still, those $?$'s are as yet undetermined, and for that we will need to figure out how to integrate over $\phi_0$.
The reason why integrating over $\phi_0$ is difficult is because every component of the $\vec{\phi}$ vector depends on $\phi_0$, and so it's initially unclear how we might get something looking like $\exp\left(-\tfrac{1}{2\nu}\vec{\psi}^TA^{-1}\vec{\psi}\right)$ on the other side. We need to rewrite our correlation function so that $\phi_0$ is treated like any of the other $\phi$'s, except now some $\phi_k$ is taking its place. For now, let $k=2$, so we are swapping $\phi_0$ with $\phi_2$. Let $\vec{\phi}'=\left<\phi_1-\phi_2,\phi_0-\phi_2\right>$, and we now define $D'$ such that $\exp\left(-\tfrac{1}{2\nu}\vec{\phi}^TD^{-1}\vec{\phi}\right)=\exp\left(-\tfrac{1}{2\nu}\vec{\phi}'^TD'^{-1}\vec{\phi}'\right)$. For the moment we will postpone the exact relationship between $D$ and $D'$. Like before, we will deconstruct $D'=\begin{bmatrix}A'&\vec{b}'\\ \vec{b}'^T&c'\end{bmatrix}$. Note that again $c'$ has its row and column index equal to the index containing $\phi_0$ in $\vec{\phi}'$. And again, $D'^{-1}=\vec{v}'\vec{v}'^T+\begin{bmatrix}A'^{-1}&\vec{0}\\ \vec{0}^T&0\end{bmatrix}$ where $\vec{v}'=\left<-\frac{A'^{-1}\vec{b}'}{c'-\vec{b}'^TA'^{-1}\vec{b}'}, \left(c'-\vec{b}'^TA'^{-1}\vec{b}'\right)^{-1}\right>$. Continuing, $\vec{\psi}'=\left<\phi_1-\phi_2\right>$, and integrating over $\phi_0$ results in the correlation function $\exp\left(-\tfrac{1}{2\nu}\vec{\psi}'^TA'^{-1}\vec{\psi}'\right)$. By the same logic as above, $A'=\begin{bmatrix}d(x_1,x_2)\end{bmatrix}$.
We will now figure out how $D$ and $D'$ are related. $\vec{\phi}'$ should just be a linear recombination of the components of $\vec{\phi}$, so there exists some $\Lambda$ such that $\vec{\phi}'=\Lambda \vec{\phi}$. In this case, $\Lambda=\begin{bmatrix}1&-1\\0&-1\end{bmatrix}$. Take a moment to convince yourself this is true by looking at the formula for $\vec{\phi}'$ and figuring out how to make it given the components of $\vec{\phi}$. By the definition of $D'$ above, $\vec{\phi}^TD^{-1}\vec{\phi}=\vec{\phi}'^TD'^{-1}\vec{\phi}'=\vec{\phi}^T \Lambda^T D'^{-1} \Lambda \vec{\phi}$. Because this is true for all $\vec{\phi}$, we know that $D^{-1}=\Lambda^T D'^{-1} \Lambda$. Doing some linear algebra, $D'=\Lambda D\Lambda^T$. We can write $D=\begin{bmatrix}d(x_0, x_1)& D_{12}\\ D_{12}&d(x_0, x_2)\end{bmatrix}$ for some unknown $D_{12}$ that we will solve for. Crunching the symbols and solving for $A'$, we get $A'=\begin{bmatrix}d(x_0,x_1)+d(x_0,x_2)-2D_{12}\end{bmatrix}$. Using the fact that $A'=\begin{bmatrix}d(x_1,x_2)\end{bmatrix}$ and solving for $D_{12}$, we get $D_{12}=\frac{d(x_0,x_1)+d(x_0,x_2)-d(x_1,x_2)}{2}$.
We have now completely proven that $D=\begin{bmatrix}d(x_0, x_1)&\frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}\\ \frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}&d(x_0, x_2)\end{bmatrix}$! When I first found this, I thought it was very satisfying. Who would've thought that it would be this simple?
To solve for the correlation function of general $(n+1)$-point configurations, we will start with effectively the same proof structure as the subsection "Integrating over $\phi_2$". First, we write our correlation function in the form $\exp\left(-\tfrac{1}{2\nu}\vec{\phi}^T D^{-1}\vec{\phi}\right)$ where $\vec{\phi}=\left<\phi_1-\phi_0,\ldots,\phi_n-\phi_0\right>$ and $D$ is an $n\times n$ matrix. Although the following argument works for all field strengths except $\phi_0$, for illustrative purposes, we will act as if we are integrating over the last index, $\phi_n$. Deconstruct $D=\begin{bmatrix}A&\vec{b}\\ \vec{b}^T&c\end{bmatrix}$ for $(n-1)\times (n-1)$ matrix $A$, length $(n-1)$ vector $\vec{b}$, and scalar $c$. Again, $D^{-1}=\vec{v}\vec{v}^T+\begin{bmatrix}A^{-1}&\vec{0}\\ \vec{0}^T&0\end{bmatrix}$ for $\vec{v}=\left<-\frac{A^{-1}\vec{b}}{c-\vec{b}^TA^{-1}\vec{b}}, \left(c-\vec{b}^TA^{-1}\vec{b}\right)^{-1}\right>$, so integrating over $\phi_n$ in this case just results in a correlation function of $\exp\left(-\tfrac{1}{2\nu}\vec{\psi}_{(n)}^TA^{-1}\vec{\psi}_{(n)}\right)$ where $\vec{\psi}_{(n)}=\left<\phi_1-\phi_0,\ldots,\phi_{n-1}-\phi_0\right>$.
You may be able to see where this is going, but this actually totally defines $D$ by induction. Let's walk through solving for the 4-point correlation function as an example. In this case, $D$ is a $3\times 3$ matrix. Immediately from the above argument applied to integrating over $\phi_3$ and the prior result, we know $D=\begin{bmatrix}d(x_0, x_1)&\frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}&?\\ \frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}&d(x_0, x_2)&?\\?&?&?\end{bmatrix}$.
What if we apply the above argument to apply to integrating over $\phi_2$ instead? How would we extend that argument? There is a concept in linear algbra called the minor matrix that is helpful here. When we write $D=\begin{bmatrix}A & \vec{b} \\ \vec{b}^T & c\end{bmatrix}$ for our $3\times 3$ matrix, $A$ is the $2\times 2$ minor matrix where row 3 and column 3 are removed. This could be written $D_{(3,3)}$ to represent this fact, and so in the above argument $A=D_{(3,3)}$. (In other literature the minor matrix is written slightly differently but I'm trying to make my notation not-confusing internally.) If we want to integrate over $\phi_2$ instead, then we end up with a correlation function like $\exp\left(-\tfrac{1}{2\nu}\vec{\psi}_{(2)}^TD_{(2,2)}^{-1}\vec{\psi}_{(2)}\right)$ where $\vec{\psi}_{(2)}=\left<\phi_1-\phi_0,\phi_3-\phi_0\right>$. From induction, what is $D_{(2,2)}$? Looking at our prior result and making the appropriate substitutions, we see that it must be $D_{(2,2)}=\begin{bmatrix}d(x_0, x_1)&\frac{d(x_0, x_1) + d(x_0, x_3) - d(x_1, x_3)}{2}\\ \frac{d(x_0, x_1) + d(x_0, x_3) - d(x_1, x_3)}{2}&d(x_0, x_3)\end{bmatrix}$. Notice that the $d(x_0,x_1)$ value is consistent with the other result, but now the other values can be filled in to our $D$ matrix. So at this point, $D=\begin{bmatrix}d(x_0, x_1)&\frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}&\frac{d(x_0, x_1) + d(x_0, x_3) - d(x_1, x_3)}{2}\\ \frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}&d(x_0, x_2)&?\\ \frac{d(x_0, x_1) + d(x_0, x_3) - d(x_1, x_3)}{2}&?&d(x_0,x_3)\end{bmatrix}$.
We can repeat this process with the $D_{(1,1)}$ minor matrix as well, but I'll skip the details and just tell you we end up with $D=\begin{bmatrix}d(x_0, x_1)&\frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}&\frac{d(x_0, x_1) + d(x_0, x_3) - d(x_1, x_3)}{2}\\ \frac{d(x_0, x_1) + d(x_0, x_2) - d(x_1, x_2)}{2}&d(x_0, x_2)&\frac{d(x_0, x_2) + d(x_0, x_3) - d(x_2, x_3)}{2}\\ \frac{d(x_0, x_1) + d(x_0, x_3) - d(x_1, x_3)}{2}&\frac{d(x_0, x_2) + d(x_0, x_3) - d(x_2, x_3)}{2}&d(x_0,x_3)\end{bmatrix}$ as our final answer. Do you see the pattern?
This argument keeps working for all $n$, and in general the correlation function is $\exp\left(-\tfrac{1}{2\nu}\vec{\phi}^T D^{-1}\vec{\phi}\right)$ where $\vec{\phi}=\left<\phi_1-\phi_0,\ldots,\phi_n-\phi_0\right>$ and $D$ is an $n\times n$ matrix where:
The correlation function can then be used to compute the relevant conditional probability distributions, or can be used to generate new points as an algorithm to generate random field samples.
At this point, if there is a random field over a metric space, we now have an elegant, computable formula that can be used to answer questions about the joint probabilities of the field strengths on any collection of points. We invented the concept of correlation functions as a key analytical tool, and exploited a certain structure in inverse matrices.
Why does this matter? Aside from the development of potentially useful mathematical tools, this also offers a quantitative framework through which to understand and evaluate claims about randomly-varying properties in general. The simplicity of the solution here also hints at potentially deeper mathematical structures that could be fruitful to explore in the future.
And there is much more to explore! I have below a list of possible future directions this project could go. Feel free to read them over and pursue any that interest you!
That's all for now. Thank you for reading!