Implementing a kriging algorithm for modeling spatial data is not difficult, unless you choose to code every step yourself. Once the data is prepared with correct spatial attributes, the legwork is done by software packages available in all major programming languages (here is an example with Baltimore crime data implemented in R). Sometimes it helps to know a bit of what goes on inside these packages. What follows is a very shallow dive into the theory of kriging; see here and here for more details. I am using crime data as an example, but any spatial variable, such as house prices in a city, or zinc concentration in soil, works equally well.
Mathematically, if \( n_i \) is the observed number of crimes at a location \( i \), the predicted number \( \hat n_k \) at a new location \( k \) is given by a linear combination of all such observations:
$$ \hat n_k ~~ = ~ \sum_{i=1}^m w_{ik} \: n_i \, , \tag{1} $$
where \( m \) is the size of the dataset, and \(w_{ik}\) is the weight of influence of \(n_i\) on location \( k \). For the kriging model, this weight is determined so as to minimize the variance of the prediction error \(\hat n_k \, – \, n_k\), where \(n_k\) is the (unknown) true number at \(k\). For other interpolation models such as IDW, \(w_{ik}\) decreases with the distance \(d\) between \(i\) and \(k\), typically as some power \(p\) of \(d\).
Variogram
The prediction variance is determined by the spatial correlation between the unknown value \(n_k\) and all observed values \(n_i ,\) which themselves are spatially correlated. Studying these correlations using variogram is an integral part of building kriging models.
Variogram is defined as the variance of the difference between paired observations \(n_i\) and \(n_j\). Variance cannot be computed from a single pair, but if we assume stationarity and isotropy (independence of location and direction) of the processes that generate these values, then the variance depends only on the distance between pairs. The empirical variogram over all pairs \(n_i, n_j\) separated by distance \(d\) can then be expressed as
$$ 2\gamma(d) ~ = ~ \frac{1}{N(d)} \sum_{N(d)} (n_i ~ – ~ n_j)^2 \, , \tag{2} $$
where \(N(d)\) is the number of unique pairs distance \(d\) apart. Semivariance, or \(\gamma(d)\), is half of this variance.
Equation \((2)\) can only give variograms for the locations \(i, j\) where observed data are available, whereas kriging requires weight \(w_{ik}\) at an unobserved location \(k\). So we need a variogram model that best fits the empirical variogram. Three most commonly used models are defined by the exponential, spherical, and Gaussian functions. They all share the property that the function rises initially with distance and then levels off, much like the variance.
Kriging
Empirical variogram \((2)\) can be computed because it depends on observed data, but the prediction variance requires unobserved data \(n_k\). In order to compute it and hence the weights \(w_{ik}\), we need to make some assumptions about \(n_k\).
In general, \(n_k\) and all observed values \(n_i\) are particular realizations, one of infinitely many at each location, of a random variable \(Z(x)\) generated by spatially correlated stochastic processes; the location index \(k\) corresponds to a particular location \(x_k\) of the position variable \(x\). All assumptions about stationarity etc apply to the variable \(Z(x)\) and not on its realizations. \(Z(x)\) can be thought to be made up of a mean \(\mu(x)\) (which represents large-scale processes that generate the spatial structure) and a residual \(\epsilon(x)\) (representing random small-scale processes that are generally unknown):
$$ Z(x) ~ = ~ \mu(x) + \epsilon(x)\,. \tag{3} $$
Here different levels of stationarity are assumed to simplify the complex dependence of \(\mu\) and \(\epsilon\) on \(x\). Strict stationarity means all statistical properties (or “moments”) of \(Z\) are independent of the position \(x\), which is not realistic. Instead, since the linear estimations such as equation \((1)\) only involve the first two statistical moments (mean and variance), two weaker forms of stationarity are often adopted.
Ordinary kriging assumes first order stationarity, under which the mean \(\mu(x) = \mu\) is constant over the entire domain. Additionally an intrinsic (weaker than second order) stationarity is assumed, which states that the variance of the residual difference, \(\epsilon(x + d) ~ – ~ \epsilon(x)\), depends only on the separation \(d\) and not position \(x\). The same intrinsic stationarity is invoked to get the empirical variogram \((2)\).
Simple kriging is same as the ordinary kriging, except that the mean \(\mu\) is not only constant but also known.
Universal kriging, by contrast, allows for a slowly varying \(\mu(x)\) with \(x\), and is usually modeled as a global trend:
$$ \mu(x) ~~ = ~ \sum_{j = 1}^p \beta_j R_j(x) \,, \tag{4} $$
where \(R_j(x)\) are a set of spatial regressor variables and \(\beta_j\) are linear coefficients. Spatial dependence of \(\mu(x)\) precludes first order stationarity, but intrinsic stationarity is still assumed for the residual \(\epsilon\) as in the ordinary kriging.
Hi there! I am Roy, founder of Quantiux and author of everything you see here. I am a data scientist, with a deep interest in playing with data of all colors, shapes and sizes. I also enjoy coding anything that catches my fancy.
Leave a comment below if this article interests you, or contact me for any general comment/question.