# Spatial Autocorrelation

## Spatial Autocorrelation

The instantiation of Tobler’s first law of geography
Everything is related to everything else, but near things are more related than distant things.
Correlation of a variable with itself through space.
The correlation between an observation’s value on a variable and the value of close-by observations on the same variable
The degree to which characteristics at one location are similar (or dissimilar) to those nearby.
Measure of the extent to which the occurrence of an event in an areal unit constrains, or makes more probable, the occurrence of a similar event in a neighboring areal unit.
Several measures available:
Join Count Statistic
Moran’s I
Geary’s C ratio
General (Getis-Ord) G
Anselin’s Local Index of Spatial Autocorrelation (LISA)
Positive spatial autocorrelation

- high values
surrounded by nearby high values
- intermediate values surrounded
by nearby intermediate values
- low values surrounded by
nearby low values
Negative spatial autocorrelation
- high values
surrounded by nearby low values
- intermediate values surrounded
by nearby intermediate values
- low values surrounded by
nearby high values

### Why Spatial Autocorrelation Matters

Spatial autocorrelation is of interest in its own right because it suggests the operation of a spatial process
Additionally, most statistical analyses are based on the assumption that the values of observations in each sample are independent of one another
Positive spatial autocorrelation violates this, because samples taken from nearby areas are related to each other and are not independent

## Moran’s  I

•Where N is the number of cases
X is the mean of the variable
Xi is the variable value at a particular location
Xj is the variable value at another location
Wij is a weight indexing location of i relative to j

•Applied to a continuous variable for polygons or points
•Similar to correlation coefficient: varies between –1.0 and + 1.0
Value 0 or close to 0: indicates no spatial autocorrelation or random data
High values close to 1 or -1: high auto-correlation
Positive value: clustered data
Negative value: dispersed / uniform data
Negative/positive values indicate negative/positive autocorrelation
•Differences from correlation coefficient are:
–Involves one variable only, not two variables
–Incorporates weights (wij) which index relative location
–Think of it as “the correlation between neighboring values on a variable”
–More precisely, the correlation between variable, X,  and  the  “spatial lag” of X formed by averaging all the values of X for the neighboring polygons

### Interpolation

Interpolation is the process of using points with known values or sample points to
estimate values at other unknown points. It can be used to predict unknown values
for any geographic point data, such as elevation, rainfall, chemical concentrations, noise levels, and so on.
It predicts values for cells in a raster from a limited number of sample data points.

Interpolation is based on the assumption that spatially distributed objects are spatially correlated; in other words, things that are close together tend to have similar characteristics.

Why interpolate?
Visiting every location in a study area to measure any data is usually difficult, time consuming and costly. Instead, measurement can be done for some sample input data points, that can be used to predict the values of all other locations. Input points can be either randomly, strategically, or regularly spaced points.

Point based
Given a number of points whose locations and values are known, determine the values of other points; e.g. weather station readings, spot heights, oil well readings, porosity measurements

Lines to points
Line data for interpolation; e.g. contours to elevation grids

Areal interpolation
Given a set of data mapped on one set of source zones determine the values of the data for a different set of target zones; e.g. given population counts for census tracts, estimate populations for electoral districts
Types
Spatial Interpolation method can be categorized in several ways.
First they can be grouped into global and local methods.
1.      Global Interpolation: It maps across a whole region; uses every known point available to estimate an unknown value. It produces smother surface with less abrupt variations. – e.g. Trend surface, regression models
2.      Local Interpolation: It repeatedly applies to small portion of the whole region; uses a sample of known points to estimate an unknown value. This method is designed to capture the local or short range variation. – e.g. IDW, Thiessen polygon, Spline

Second, spatial interpolation methods can be grouped into exact and inexact interpolation.
1. An Exact interpolation predicts a value at the point location that is the same as is known value; honors the data input data points, passes through all the points . -e.g. Kriging
2.  An Inexact interpolation (or approximate) predicts a value at the point location that differ from its known value; used when there is some uncertainty about the surface, believes that in many data sets there are global trends that varies slowly and overlain local fluctuations.

Third: spatial interpolation methods may be deterministic or stochastic

1.Deterministic Models use a mathematical function to predict unknown values and result in hard classification of the value of features.
2. Statistical Techniques produce confidence limits to the accuracy of a prediction but are more difficult to execute since more parameters need to be set.

Deterministic Models :
1.Trend surface analysis / Polynomial
2.Minimum Curvature Spline
3.Inverse Distance Weighted
4.Natural neighbourhood
Rectangular