1. Introduction
In today’s world, precise and comprehensive land cover (LC) mapping is becoming increasingly crucial for sustainable development and well-informed decision-making. Beyond its relevance in climate studies [
1], LC information finds utility in other fields as well. For instance, in ecology, LC data aids in estimating habitat fragmentation and predicting International Union for Conservation of Nature (IUCN) Red List categories for species [
2]. Additionally, LC serves as a crucial variable in hydrological investigations, as exemplified by studies conducted in the upper Crepori river basin in Brazil and the Gumara catchment in Ethiopia [
3,
4].
The applications of LC data extend to monitoring various phenomena across different regions. Examples include monitoring the desertification process in the Qubqi desert in China [
5], tracking urbanization progress in Abuja, Nigeria [
6], and observing agricultural expansion in the Mato Grosso state of Brazil [
7]. These cases illustrate the diverse range of uses for LC data in monitoring and understanding our changing environment.
The inclusion of open data policies by some providers of satellite imagery has undeniably accelerated the progress of LC mapping [
8,
9,
10]. This favorable development, along with advancements in computing capabilities and satellite technologies, has made significant contributions to the field. Nevertheless, there are still persistent challenges in the domain of LC mapping.
To fully harness the potential of satellite Earth observation resources for land cover mapping, it is crucial to address a significant challenge: the availability of appropriate training data. Specifically, the effectiveness of machine learning (ML) algorithms used to generate land cover maps relies heavily on the quality and relevance of the training data [
11,
12].
In the case of extensive classification tasks such as global high-resolution land cover (HRLC) mapping, the requirements for training datasets become even more demanding. This is because the training data needs to encompass vast geographical areas and offer representative samples with a high level of detail that can capture the diverse landscape characteristics worldwide. Furthermore, deep learning techniques, which are current state-of-art-techniques for LC classification, typically require larger training datasets compared to classical ML techniques [
13,
14,
15].
Dimitrovski et al. [
16] summarized 22 open-access training datasets used for deep learning approaches. The datasets comprise image chips of different dimensions primarily obtained from aerial imagery, supplemented by a limited number sourced from satellite imagery. The biggest dataset—among the ones revised—is Big Earth Net which has samples covering approximately 750,000 km
2 which are located only in Europe [
17]. There are also datasets with global coverage such as Resisc45 [
18] and MLRSNet [
19] but covering smaller areas than Big Earth Net—470,000 km
2 and 182,000 km
2, respectively.
The practice of global HRLC producers to obtain training data includes photo-interpretation, utilization of existing LC data at various resolutions, and sometimes a combination of the two [
20]. DynamicWorld project, the first project for near-real-time global LC mapping, generated its own training dataset of 5 billion 10 m pixels and released it publicly [
21]. The dataset was derived by the photo-interpretation of Sentinel-2 images, dominantly performed by non-expert annotators. The same training was reused by Esri LC [
22]. The collection of training data was based on a photo interpretation for datasets such as Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC) [
23,
24,
25], World Settlement Footprint (WSF) [
26,
27], Global Surface Water (GSW) [
28], and Forest Non-Forest (FNF) [
29]. Various HRLC production projects utilized existing LC data in different ways to support their training dataset collection. For instance, the GlobeLand30 (GL30) dataset allowed photo-interpreters to refer to existing LC datasets during their work [
30]. In the case of the initial version of Global Human Settlements Built-up (GHS BU) datasets, a combination of low-resolution LC (LRLC) and HRLC datasets were employed, with a weighted voting system favouring the HRLC data [
31]. HRLC and medium-resolution LC (MRLC) data, along with photo-interpretation, were utilized to derive the Tree Canopy Cover Dataset [
32]. The Global Mangrove Watch (GMW) dataset combined both HRLC and LRLC datasets [
33,
34]. The European Space Agency’s (ESA) World Cover dataset used existing MRLC and HRLC data to extract training data, although the specific method employed remains unclear [
35]. Initially, the Global Cropland dataset relied on photo-interpreted samples [
36]. The generated LC data was then sampled to obtain training data for subsequent iterations until satisfactory accuracy was achieved. The Global Land Cover with a Fine Classification System at a 30-m resolution (GLC_FCS30) dataset utilized refined MRLC data, obtained through a specific procedure that considers only homogeneous samples [
37].
It is apparent that HRLC producers were aiming to incorporate existing LC data into their training data extraction process, likely due to the high cost associated with global data collection. However, they did not always consider the reliability of training samples derived from existing HRLCs, as observed in the case of GMW.
Among the listed global existing HRLCs, the highest overall accuracy (OA), equal to 86%, was achieved for GL30 and the first release of Esri LC [
38,
39]. The details of GL30 accuracy are not published, while the second release of Esri LC merged Grass and Scrub classes into a single class—Rangeland to compensate for the low accuracy of these classes in the first release. Although achieving an accuracy of 86% is a noteworthy advancement for HRLC products, it is evident that there is room for further enhancements, especially in specific classes [
40].
In this paper, we present the training benchmark dataset that was generated by borrowing two concepts of training sample generation techniques: reuse of existing data [
31,
32,
33,
35,
36,
37] and consensus among multiple annotators in the case of photo interpretation [
23,
41]. During the human labeling of training samples, human error is often mitigated by having multiple annotators. If there is no consensus among them, or at least among the majority, the sample is rejected.
Adhering to the above principles, we reuse existing HRLC datasets, but only those portions in which there is exclusive consensus among multiple datasets. From a practical standpoint, multiple HRLCs are combined using the intersection method to retain only the areas where all datasets agree on LC classes while disregarding areas of disagreement. Accordingly, the dataset obtained is named Map Of Land Cover Agreement (MOLCA). The main purpose of MOLCA is to serve as a reference training dataset, from which to extract samples that are functional for the creation of new HRLC maps. MOLCA was designed to provide training samples mainly for large-scale HRLC mapping using ML and deep learning techniques, which typically demand extensive training data for satellite imagery classification. This dataset was produced within the Climate Change Initiative HRLC (CCI HRLC) project of the ESA. MOLCA has 117 billion 10 m pixels (11.7 million km
2) distributed over an area of 19 million km
2. Classes included in the MOLCA legend are Bareland, Built-up, Cropland, Forest, Grassland, Shrubland, Water, Wetland, and Permanent ice and snow, which depicts LC in the period between 2016–2020. The accuracy estimate of MOLCA shows an OA of 96%. MOLCA offers distinct advantages over alternative methods of training data collection, including a substantially larger number of available pixels and coverage for regions that are frequently underrepresented in existing benchmark training datasets, such as Africa and Siberia [
17,
42,
43,
44,
45,
46,
47,
48,
49]. Standing on the analyzed literature, MOLCA outperforms other existing open-access training datasets in terms of spatial coverage and precautions taken to ensure a high level of accuracy, due to the consideration of multiple—instead of individual—HRLC maps for the generation of training samples. These key features of MOLCA are promising to foster its use in future HRLC map production. Furthermore, the availability of MOLCA as open data further enhances its potential for widespread use.
The structure of this paper is as follows:
Section 2 outlines the region considered for MOLCA generation, input datasets, data generation concepts and methodology, and the validation approach.
Section 3 presents statistical information and the accuracy evaluation of the generated dataset. The analysis and interpretation of the results are discussed in
Section 4, while the concluding remarks are provided in
Section 5.
2. Materials and Methods
MOLCA was produced in the context of the CCI HRLC project of the ESA. The region of interest for the project encompasses three macro-regions of the world: Amazon, Siberia, and Sub-Saharan Africa (see
Figure 1).
The region of interest extends over 19,163,868 km2–4,526,839 km2 in the Siberia, 6,203,824 km2 Amazon, and 8,433,205 km2 Sub-Saharan macro-region. The objective of the CCI HRLC project was to determine the impact of the increased spatial resolution of land cover data on climate models. Besides the selected regions being only partially represented by existing LC training datasets, they are also landmarks for climate change. For these reasons, they were selected for the first implementation of MOLCA.
2.1. Input Datasets
In the derivation of MOLCA, multiple global HRLCs were used as common input across all three regions of interest. However, within each region, an additional regional HRLC was incorporated into the MOLCA computation. The global datasets employed included two general-purpose HRLCs, namely FROM-GLC and GL30, along with two thematic HRLCs specific to the built-up class (WSF and GHS BU Sentinel-1—GHS BU S1NODSM), one thematic HRLC for water (GSW), and one thematic HRLC for forests (FNF), as indicated in
Table 1. As for the regional HRLCs, MapBiomas was used for the Amazon region, CCI Africa Prototype for Africa, and ESA DUE (Data User Element) GlobPermafrost for Siberia. All regional datasets fall under the general type.
The baseline year for these datasets ranged from 2016 to 2020, and the spatial resolution varies from 10 m to 30 m. The most used CRS is WGS84, while a few datasets are supplied in UTM or Web Mercator (see
Table 1).
In this work, we utilized the 2017 map from FROM-GLC, which is a collection of irregular time series of general-purpose land cover (LC) maps developed by Tsinghua University [
23,
24,
25]. The map has a resolution of 10 m and consists of 10 classes in its legend. It is provided in World Geodetic System 1984 (WGS84) Coordinate Reference System (CRS) in the form of 10° × 10° tiles (
http://data.ess.tsinghua.edu.cn, accessed on 28 June 2023). The reported OA of this map is 73%.
As for the GL30 dataset, it is a regular time series of general-purpose LC maps at a resolution of 30 m, developed by the National Geomatics Center of China (NGCC) [
30]. The legend of GL30 consists of 10 classes. For this work, we used the 2020 product version. The reported OA for this map is 86%, as mentioned on the GL30 website (
http://globeland30.org, accessed on 28 June 2023). The distribution of the GL30 product is based on the Universal Transverse Mercator (UTM) projection. The tile size of GL30 varies depending on the location, with most tiles (between 60°N and 60°S) being 5° × 6° in size, although some tiles can be 5° × 12° or even larger.
A comprehensive set of thematic maps called the Global Human Settlement Built-up (GHS BU) distinguishes between built-up and non-built-up surfaces [
50,
51]. These maps were developed by the Joint Research Centre (JRC) of the European Commission. Various GHS BU products exist, each one differing in terms of input imagery, baseline year, and production method. In this study, the GHS BU S1NODSM product, which is based on Sentinel-1 imagery from 2016, was employed. It consists of two classes: Built-up and non-built-up. The product is distributed as a compressed file folder containing 2° × 2° tiles that cover the entire globe (
https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL, accessed on 28 June 2023). The original CRS of the tiles is Web Mercator projection (EPSG:3857). The accuracy of GHS BU S1NODSM is described qualitatively in comparison to another LC dataset [
50].
Another thematic LC product specifically focused on built-up areas is the WSF from the German Aerospace Center—DLR [
27]. It encompasses two classes: Settlements and non-settlements. The product includes two maps with a spatial resolution of 10 m, representing 2015 and 2019. The 2019 map was utilized in this research. WSF is available as 2° × 2° tiles in the WGS84 CRS (
https://download.geoservice.dlr.de/WSF2019, accessed on 28 June 2023). The WSF map for 2019 exhibits an OA of 84% and a Kappa value of 0.65, although information regarding User’s Accuracy (UA) and Producer’s Accuracy (PA) is currently unavailable [
52].
The GSW family comprises a collection of multi-temporal thematic LC maps that focus on inland water bodies [
28]. Produced by JRC, these annual maps span 37 years, from 1984 to 2021. The GSW product offerings include various aspects such as monthly water history, seasonality, yearly history, water occurrence, change intensity, recurrence, transitions, maximum water extent, monthly recurrence, and metadata. They are available for download at
https://global-surface-water.appspot.com/download, accessed on 28 June 2023. For this study, the yearly history for 2019 was employed. The yearly history combines two water classes, seasonal and permanent. The UA and PA for the entire time series, including the 2019 map, exceed 95%.
The FNF map is a thematic LC map that classifies forested regions worldwide [
29]. Developed by the Japan Aerospace Exploration Agency (JAXA), it provides a multi-temporal representation of forest areas with irregular time intervals. The map covers the periods from 2007 to 2010 and from 2015 to 2020, categorizing areas as forest, water, or not water, with an approximate resolution of 25 m. The FNF map for 2019 was used in this research. The accuracy of this specific product is not specified. The product is distributed in two tile sizes: 1° × 1° or 5° × 5° from
https://www.eorc.jaxa.jp/ALOS/index_e.htm, accessed on 28 June 2023.
MapBiomas project focuses on generating maps for six Brazilian biomes, namely the Amazon, Atlantic Forest, Cerrado, Caatinga, Pampa, and Pantanal [
53]. These maps provide a general overview of LC types at 30 m of spatial resolution annually, dating back to 1985. MapBiomas utilizes a hierarchical legend with three levels of classification. The first level consists of six broad classes, which are further subdivided into more specific classes at the second and third levels. Since its inception in 2016, the project has undergone several collections with different data processing methodologies. The MOLCA creation specifically used the map from Collection 7 for the year 2019, which achieved an OA of 89% [
54]. MapBiomas is licensed under the Creative Commons CC-BY-SA license, which means it is freely available and can be accessed through various means, including GoogleEarthEngine (GEE), the GEE app—Toolkit, the MapBiomas dashboard, the QGIS plugin, or direct download in GeoTiff format via a provided link on
https://mapbiomas.org/en/colecoes-mapbiomas-1?cama_set_language=en, accessed on 28 June 2023. The default CRS used is WGS84.
The CCI Africa Prototype is a general-purpose LC map with a resolution of 20 m, produced by the ESA CCI LC team, representing the LC state in Africa for the year 2016. The legend of the CCI Africa Prototype consists of Tree-covered areas, Shrub-covered areas, Grassland, Cropland, Vegetation aquatic or regularly flooded, Lichen and mosses/sparse vegetation, Bare areas, Built up areas, and Snow and/or ice and open water. The product can be downloaded as a single GeoTiff file in WGS84 CRS for the entire African continent from
https://2016africalandcover20m.esrin.esa.int, accessed on 28 June 2023. Accuracy assessments of the CCI Africa Prototype were conducted for four countries: Kenya, Gabon, Ivory Coast, and South Africa [
55]. The OA was found to be 44% for South Africa, 47% for Ivory Coast, 56% for Kenya, and 91% for Gabon.
The ESA DUE GlobPermafrost map describes the LC of permafrost regions, including Western Siberia (Russia), Barrow (Alaska), Teshekpuk (Alaska), Mackenzie Delta (Canada), Umiuaq (Canada), Kytalyk (Russia), Lena Delta (Russia), Seward Peninsula (Alaska), and Yukon Delta (Alaska) [
56]. The legend of ESA DUE GlobPermafrost is very detailed on polar LC types (21 in total). Each permafrost region has a corresponding GeoTiff file, which can be downloaded from the PANGAEA data publisher under the Creative Commons Attribution 4.0 International license [
57]. The CRS used for ESA DUE GlobPermafrost is the UTM projection, and the OA of the map is estimated to be 83%.
2.2. MOLCA Methodology Concepts
The creation of MOLCA involves intersecting multiple HRLC datasets to determine areas of agreement. Only the areas where all the HRLCs agree are retained, while pixels showing LC class discrepancies among the intersected HRLCs are designated as null. From a theoretical standpoint, there is a high probability that the MOLCA has high accuracy, because a manyfold agreement increases the odds of pixels being accurate [
58]. Pixels that are accurately classified have a high likelihood of being found in corresponding positions across different datasets, as correct classification is a primary objective during the classification process. Conversely, errors in the LC derivation result from undesired factors associated with the classification process. These errors can be influenced by various factors, such as the quality and quantity of training data, the suitability of the classification algorithm, the accuracy and quality of satellite imagery, the complexity of the LC types being classified, as well as the presence of atmospheric phenomena such as clouds. Since different agencies and procedures are responsible for producing most of the existing HRLCs, it is expected that errors in different datasets are independent and not replicated across them. Thus, the MOLCA methodology’s primary benefit arises from its utilization of multiple HRLC maps to create the training dataset. This approach is anticipated to improve the classification accuracy compared to relying solely on training samples extracted from individual HRLC-existing maps.
2.3. MOLCA Generation Procedure
A schema of the MOLCA generation procedure is shown in
Figure 2. Different parts of the procedure are grouped into preparation, data harmonization, and MOLCA generation.
The procedure of creating MOLCA started by downloading the identified HRLC products (see
Table 1) for regions of interest. This was conducted automatically with Python [
59] when feasible, otherwise, it was conducted manually. The legends of these datasets were carefully compared to determine common classes across them. The classes that consistently appeared across multiple datasets were chosen as the target classes for MOLCA. Details on MOLCA legend are included in
Table A1 of
Appendix A. To align the legends of the existing datasets with the target legend of MOLCA, a correspondence table was created for each dataset and stored in a textual file. Based on correspondence tables, txt files with reclassification rules were created to be used in later steps. The reclassification rules contain information about the original raster value, the target raster value, and the target class label. The legend harmonization was performed manually because a single class might have a different name and code in different HRLCs.
Since different datasets used different tiling systems, it was necessary to find matching tiles across the datasets. This matching process was automated using Python pandas, shapely, rasterio, and geopandas libraries. The rest of the procedure was automatized by using a combination of GRASS GIS and Python. Principal GRASS GIS modules used for MOLCA generation included
,
,
,
,
,
, and
[
60]. These modules were run through the
library of Python. Some Python libraries independent of GRASS GIS were also used, such as
.
A reference dataset was selected to serve as a guidance for the CRS, extent, and spatial resolution of MOLCA tiles. A tile from the ESA CCI HRLC product was chosen, which had a size of 100 km × 100 km, a resolution of 10 m, and used the WGS84 CRS. Information about spatially matching tiles of reference dataset with non-reference datasets was stored in a CSV file.
Prior to importing data into the GRASS GIS database, its default CRS was set to WGS84 CRS. Then, the datasets were imported and automatically reprojected if their source CRS was not WGS84. The extent and resolution of the imported non-reference data tiles were adjusted to the ones of reference tiles. These non-reference tiles were clipped or merged to match the extent of the reference tile. Furthermore, the non-reference tiles were reclassified in accordance with the MOLCA target legend, following reclassification rules. Resampling to target resolution was conducted on the fly when processing operations were executed.
The next step was to extract areas of agreement. A cross-product was computed from the non-reference datasets, which generated a raster map with different values representing combinations of class values found within the input layers. The cross-product labels were analyzed to identify agreement labels, which were defined as labels that appeared consistently across all input HRLCs or at least two HRLCs, with other labels considered null. The agreement labels and their associated values were converted into reclassification rules. Finally, the cross-product was reclassified into the MOLCA.
2.4. MOLCA Validation
The accuracy of MOLCA was evaluated against photo-interpreted samples collected by the authors in one part of the African region. The accuracy metrics were determined using a conventional error matrix [
61] which was filled with classes derived from photo-interpretation, along with their corresponding MOLCA classes found at the same sampling locations. The number of samples was estimated based on Cochran’s equation [
62]. The sample count was determined to be 1068; an additional 130 samples were preventively included to take into account the chance of discarding some samples due to photo-interpretation uncertainties.
Each class within the MOLCA was considered a distinct stratum. An equal number of samples was selected in each stratum, with the exception of Bareland and Wetland because their count in MOLCA was low. Consequently, the number of samples for these classes was set to match the maximum number of pixels present in MOLCA, specifically 22 for Bareland and 6 for Wetland. The samples within each stratum, except for Bareland and Wetland, were randomly selected, while all pixels belonging to Bareland and Wetland were converted into samples.
The sampling survey was designed in Open Foris Collect platform [
63], while photo interpretation was conducted in Open Foris Collect Earth software [
63] where the photo-interpreter could use either Google imagery or temporal profiles of vegetation indices from Landsat 7/8, Sentinel-2, and MODIS imagery to assign a class label to a sample. A total of 148 samples were discarded due to low confidence in the photo-interpretation deriving from poor image quality, clouds, or a high degree of similarity with other classes. The remaining 1050 samples were used for MOLCA validation.
4. Discussion
MOLCA dataset has billions of LC pixels for the three regions of interest. Its legend and temporal representatives are determined based on the characteristics of input HRLCs. To be included in MOLCA, a class must appear in at least two input HRLCs. Additionally, the other input HRLCs should either have the same class or no class at all. If these conditions are not met, the class will not be part of MOLCA. Moreover, the intersection procedure eliminates small differences in legends between input HRLC datasets. When there is a variation in the definition of a specific class between different HRLCs, the MOLCA derivation procedure ensures that only the common characteristics are retained. To illustrate, if one HRLC defines Forest as an area of trees with at least 2 m height, while another HRLC sets the threshold at 5 m, MOLCA will exclude any Forest patches with trees shorter than 5 m during the intersection procedure. This exclusion occurs because there is no agreement on the representation of Forest with trees below 5m in the second dataset. If classes with the same name are significantly different in their definition, it might happen that they do not constitute any agreement during MOLCA derivation, and therefore they will be eliminated.
Similarly, if there is a difference in the baseline years of input HRLCs, and a land cover change happened between these years if the HRLC with a more recent baseline year captures the changes, it will cause disagreement among the HRLCs for pixels affected by the change, and consequently, such pixels will not be present in MOLCA dataset. Hence, MOLCA’s temporal representativeness falls between the most recent and the least recent baseline year of the input HRLCs.
By combining FROM-GLC, GL30, WSF, GSW, FNF, GHS BU S1NODSM, Mapiomas (Amazon only), CCI Africa Prototype (Africa only), and ESA DUE GlobPermafrost (Siberia only) MOLCA’s legend resulted in Bareland, Built-up, Cropland, Forest, Grassland, Shrubland, Water, and Wetland classes in all regions, plus the Permanent ice and snow class in Siberia. MOLCA legend and its correspondence to the Food and Agriculture Organization (FAO) Land Cover Classification System (LCCS) is displayed in
Table 4. The MOLCA legend aligns with the second out of three levels of the dichotomous phase of the FAO LCCS. In the first level of FAO LCCS, classes are distinguished based on the presence of vegetation, categorized as (A) primarily vegetated and (B)primarily non-vegetated. The second level further discriminates based on the presence of water, distinguishing between (1) terrestrial and (2) aquatic. The third level considers the artificiality of LC. In MOLCA, the vegetation classes are not fully differentiated as per the third level of FAO LCCS, i.e., most classes are not discriminated by artificiality. This drawback hampers the effectiveness of MOLCA, as FAO LCCS is currently the solely available system that facilitates the interoperability of legends of different LC datasets through a hierarchical approach. However, the inherent nature of MOLCA restricts the control over the legend. Despite this, it is worth noting that the legend remains compatible with the majority of existing HRLCs, which is a de facto standard legend.
Since not all classes are derived from the same HRLCs (e.g., water is present in five input HRLCs, while Grassland is present in three of them), the temporal representatives of each class vary. Nonetheless, MOLCA provides an approximate but reliable representation of land cover during the timeframe of 2016–2020, as explained above. Details about temporal representatives of each class in each region are included in
Appendix B (see
Table A2,
Table A3 and
Table A4).
MOLCA statistics (see
Table 2) show that the Forest class emerges as the most abundant class within MOLCA. Among the HRLCs employed, the Built-up and Water classes have the highest representation with five HRLCs, followed by the Forest class with four HRLCs. Other classes primarily rely on three HRLCs, except for Cropland and Permanent ice and snow in Siberia.
Accuracy results (see
Table 3) indicate a general high accuracy given that that the OA of MOLCA is 96%, Kappa index is 95%, and FDR is 4%. Regarding the classes, UA and PA scores exceed 85%, except for the Cropland class. Cropland has a UA of 73%; thus, it exhibits moderate overestimation. Unfortunately, no confident samples were available for the Wetland class, making it impossible to estimate its accuracy. It should be noted that the Bareland class had only three samples, which may not accurately reflect its classification accuracy. F1 score is very high for the majority of classes (>90%), which indicates high accuracy of classes, and a good balance between UA and PA. Being derived from UA and PA, the F1 score is slightly lower for the Cropland class (i.e., 80%) than for other classes, and equal to 0% in the case of Bareland class, for the above-mentioned reasons.
One limitation of MOLCA is the lack of pixels of Permanent ice and snow in Africa. It is present in high mountain peaks, but it is not identified in MOLCA. There are two possible reasons for this issue. Firstly, the area of the class is extremely small in the African region of interest; it significantly reduced the possibility of existing HRLCs having a consensus on such a narrow area. Secondly, it could be that the class is not detected by one of the input HRLCs, and consequently, consensus among HRLCs was not possible.
5. Conclusions and Outlook
The overall objective of MOLCA is to establish a benchmark framework for deriving training datasets from existing HRLC datasets. The HRLCs are combined by the intersection method which ensures that only regions where all datasets align in terms of classes are retained, while conflicting areas are eliminated. Such a manyfold agreement increases the probability that MOLCA retained only correct portions of input HRLCs. Currently, MOLCA covers three macro-regions of the world, two of which are in Siberia and Africa which are rarely included in existing training benchmark datasets [
17,
42,
43,
44,
45,
46,
47,
48,
49]. Another advantage of MOLCA is that it provides 117 billion of 10-m pixels, or 43% of coverage of the region of interest, which is, to our best knowledge, significantly more compared to any other existing training benchmark datasets. Such a large number of pixels is suitable to support deep learning techniques that are gaining popularity and require extensive training datasets. Nevertheless, it can also support traditional ML approaches.
The results of the accuracy evaluation demonstrated an OA of 96%, Kappa index equals to 95%, and low FDR (i.e., 4%), all of which indicate very high accuracy. Among the seven categories evaluated, four of them exhibited an accuracy rate surpassing 90% for both UA and PA. The Grassland category demonstrated UA and PA values nearing 90%. On the other hand, UA for the Cropland category implies a potential overestimation of the Cropland class. The F1 score for each class was in line with UA and PA results. While MOLCA may not achieve perfect accuracy for certain classes, a study by Rolnick et al. [
64] shows that the noisy training samples do not significantly affect the performance of deep neural networks as long as the training dataset is sufficiently large. Moreover, some of the currently available HRLCs were based on other LC products, and in some cases without taking into account that each LC product contains some degree of error. Therefore, we argue that MOLCA might be more suitable for training data than an individual HRLC dataset.
The MOLCA legend is similar to most of the worldwide HRLCs and consists of different types of LC such as Bareland, Built-up, Cropland, Forest, Grassland, Shrubland, Water, Wetland, and Permanent ice and snow (in Siberia only). Although it does not fully align with FAO LCCS, it shows promise in aiding HRLC production if the current legend trend continues.
As a future development of this work, we plan to incorporate other recently published global HRLCs into the MOLCA derivation procedure, since they were not available at the time of generation of this version. On one hand, this would be useful for further refining MOLCA and increasing its accuracy, and on the other hand, it would allow the exploration of a suitable combination of existing HRLCs to ensure the representation of extremely small classes in MOLCA that currently are an issue (e.g., Permanent ice and snow in Africa). Furthermore, we also plan to expand MOLCA availability to other regions of the world.