# Spatial Accuracy Assessment and Integration of Global Land Cover Datasets

## Abstract

## 1. Introduction

## 2. Data

#### 2.1. Global Land Cover Maps

GLC Map | Globcover | LC-CCI | MODIS | Globeland30 |
---|---|---|---|---|

Spatial resolution at the Equator | 300 m | 300 m | 500 m | 250 m |

Input data | MERIS: Bi-monthly from 10-day composites | MERIS global SR composite, SPOT-VGT time series (for updating) | MODIS: Monthly EVI, LST and 7 bands from 8-day composites | Landsat TM, ETM+ and HJ-1 multispectral images |

Time of data collection | 2009 | 2008–2012 | 2010 | 2010 ± 1 year |

Classification method | (Un)supervised spatio-temporal clustering; expert-based labeling | Unsupervised spatio-temporal clustering; machine learning classification | Supervised decision tree boosting | Integration of pixel and object based classification and Knowledge based interactive verification |

Classification scheme | LCCS based:22 classes | LCCS based: 22 classes | 5 different legends including the IGBP (17 classes) | 10 classes |

Reference | [26] | [27] | [28] | [3] |

Code | Land Cover Class | Globcover | LC-CCI | IGBP (MODIS, STEP and VIIRS) | GLC2000 | Geo-Wiki | GLCNMO |
---|---|---|---|---|---|---|---|

1 | Forest | 40–110, 160, 170 | 50–100, 160, 170 | 1–5, 8, 9 | 1–10 | 1 | 1–5 |

2 | Shrubland | 130 | 120 | 6, 7 | 11, 12 | 2 | 7 |

3 | Grassland | 120, 140 | 110, 130, 140 | 10 | 13 | 3 | 8, 9 |

4 | Cropland (incl. mixtures) | 11–30 | 10–40 | 12, 14 | 16–18 | 4 | 11, 12, 13 |

5 | Wetland vegetation | 180 | 180 | 11 | 15 | 6 | 15 |

6 | Urban/built up | 190 | 190 | 13 | 22 | 7 | - |

7 | Bare/sparse vegetation | 150, 200 | 150, 200 | 16 | 14, 19 | 9 | 10, 16, 17 |

8 | Water and Snow/Ice | 210, 220 | 210, 220 | 15, 17 | 20, 21 | 8, 10 | - |

#### 2.2. Reference Datasets

## 3. Method

#### 3.1. Spatial Correspondence Assessment

_{j}/h

^{2}

_{j}as weights, where N

_{j}denotes the number of point pairs in the j-th lag and h

_{j}is the corresponding lag distance.

**Figure 3.**Semivariograms and fitted models for spatial correspondence of the Globcover (

**a**); LC-CCI (

**b**); MODIS (

**c**); and Globeland30 (

**d**) maps (model parameters: partial sills, range and nugget).

#### 3.2. GLC Dataset Integration

#### 3.2.1. Voting

#### 3.2.2. Spatial Correspondence (SC)

#### 3.2.3. Weighted Voting (WeVo)

_{i}(x) denote the spatial correspondence of the i-th GLC map (i = 1, …, 4) at location x. W

_{i}(x), the weight assigned to map i at location x, is then:

_{k}(x) is the total weight of the LC class at location x, and W

_{i,k}(x) is class weight of the GLC map. A LC class with highest total weight at a location (Wk(x)) was then selected for this method.

#### 3.2.4. Regression Kriging (RK)

_{k}(x) denotes the presence probability of a LC class at location x, π

_{k}(x) is a predicted probability trend of an LC class that was obtained by MNL regression [39] and ε

_{k}(x) is the indicator residuals for that class. The latter was obtained by simple kriging. MNL regression also uses indicator values of the LC classes. There is an indicator variable for all but one class [39]. The MNL regression estimated a separate binary logistic regression model for each of these indicator variables. For each indicator variable (k = 2, …, 8), the log odds function for predicted probability is:

_{j}(with j = 1, …,4) are the explanatory variables (LC class of the four GLC maps at sample locations), β

_{1k}… β

_{jk}are the regression coefficients and β

_{0k}is the intercept. To ensure that all probabilities are in the interval [0,1] and that the probabilities sum to 1, Equations (6) and (7) were used [39].

_{k}(x)) denotes the odds of class k at location x. This was implemented using the nnet package in R [40].

_{k}(x)) at un-sampled locations for all classes except water. For the water class, no spatial correlation was observed on the regression residuals based on the experimental semivariogram. Semivariograms were fitted using the same method as described in Section 3.1. Figure 5 demonstrates the semivariograms of regression residual for the LC classes and fitted variogram models used for kriging.

#### 3.2.5. Indicator Kriging (IK)

#### 3.2.6. Cross-Validation

## 4. Results and Discussions

#### 4.1. Spatial Correspondence of GLC Maps in Africa

**Figure 7.**Spatial correspondence of the GLC maps (

**a**–

**d**), maximum correspondence (

**e**) and the map with highest correspondence (

**f**).

#### 4.2. GLC Dataset Integration Methods

#### 4.3. Integrated LC and LC Probability Maps of Africa

**Table 3.**Class-specific correspondences of RK integration and the input GLC maps with reference data.

Globcover | LC-CCI | MODIS | Globeland30 | RK | |
---|---|---|---|---|---|

Forest | 71.1 | 67.3 | 90.2 | 63.7 | 84.9 |

Shrubland | 11.9 | 21.3 | 26.9 | 17.3 | 70.8 |

Grassland | 18.4 | 18.9 | 27.1 | 70.4 | 41.1 |

Cropland | 57.7 | 79.2 | 66.7 | 76.0 | 75.0 |

Wetland | 25.0 | 31.5 | 59.8 | 52.2 | 67.0 |

Built-up | 74.5 | 91.5 | 78.7 | 91.5 | 89.4 |

Bare/sparse vegetation | 76.0 | 78.5 | 75.0 | 72.0 | 87.6 |

Water and snow/ice | 80.0 | 80.0 | 70.0 | 78.0 | 86.7 |

Total | 50.7 | 55.4 | 62.8 | 57.1 | 76.3 |

**Figure 11.**RGB image of class probabilities of shrubland, grassland and cropland. Dark shades represent areas where none of these three classes has a presence probability.

#### 4.4. On the Use of Available Reference Datasets for Integration

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

